Open Source GraphQL CDN

We recently announced that WunderGraph is now fully open source. Today we'd like to explain how you can use our API Developer Platform to add Edge-level caching to your GraphQL APIs without locking yourself into a specific vendor.

GraphQL caching at the Edge layer should be vendor-agnostic.

Services like Akamai and GraphCDN/Stellate offer their own solutions to this problem. We will compare different approaches and their trade-offs.

Why did we build our own CDN solutions for GraphQL?

A good question to start with is why did we build our own CDN solutions for GraphQL in the first place?

Most GraphQL implementations ignore how the web works

The problem with most GraphQL implementations is that they don't actually use the “platform” they run on. By platform I mean the web, or more specifically HTTP and REST constraints.

The web has a lot to offer if you use it correctly. If read requests (Queries) used the GET verb, in combination with Cache-Control and ETag headers, browsers, CDNs, cache servers like Varnish, Nginx and many other tools could handle caching out of the box. You wouldn't really need a service that understands GraphQL.

However, the reality is that most GraphQL APIs force you to send requests via HTTP POST. The Cache-Control and ETags headers make no sense in this case, since everyone on the web thinks that we are trying to “manipulate” something.

So we've built edge-level caching solutions that rewrite HTTP POST to GET and can invalidate in very clever ways. By analyzing queries and mutations, they are able to create caches and invalidate objects as mutations pass through the system.

Limitations and Issues of CDN GraphQL/Edge Caches

CDN GraphQL creates a secondary source of truth

As discussed in the limitation layer system Fielding, brokers can be used to implement a shared cache to reduce query latency.

The problem I see with Edge GraphQL “smart” caches is that they look at GraphQL operations to figure out what to cache and what to invalidate. Some GraphQL CDN implementations even allow you to use their API to invalidate objects.

What sounds very clever creates a huge problem, you create a second source of truth. Before using CDN GraphQL, all you had to do was implement your resolvers. Now you need to think about how to invalidate the CDN.

Even worse, you are now programming for a single provider and its implementation, binding your application to a single service provider. You can't just easily switch from one GraphQL CDN provider to another. Unlike REST/HTTP APIs, there is no standard for how to cache the GraphQL API, so each implementation will be different.

Another problem is that we create a secondary source of truth. Imagine we put a GraphQL CDN in front of GitHub's GraphQL API to cache issues for the repository. If we use smart cache invalidation using mutations, we will not be able to update the cache if the user bypasses our CDN and uses the GitHub API directly.

CDN GraphQL only really works if 100% of the traffic goes through the system.

Your GraphQL Edge cache won't work on localhost or in your CI/CD

Testing is a huge part of software development, and we definitely want to have a great developer experience when creating our APIs. Using our own cloud-only caching solution means we can't test it locally or in our continuous integration systems.

You'll develop on your local machine without a CDN, only to find out that something behaves strangely when you use a CDN in production.

If we used standardized caching directives, we could use any cache server such as Nginx or Varnish in our CI/CD pipeline to test our APIs.

GraphQL's native CDNs create a vendor lock-in problem

As mentioned earlier, there is no specification for how GraphQL caching should work. Each implementation is different, so you are always locked into a specific vendor. They could go bankrupt, they could be acquired, they could cancel their service, they could change their pricing model.

Either way, it is a risk that needs to be managed.

CDN GraphQL can introduce cross-domain queries

Depending on your setup, adding a GraphQL CDN to your architecture may mean requiring your applications running in the browser to make cross-domain requests. That is, when your application is running on example.comand your CDN GraphQL is running on example.cdn.comthe browser will always make an additional pre-request.

It's possible to cache the prefetch request, but that still means we have to make at least one additional request when the page is initially loaded.

CDN GraphQL may not work well for authenticated requests with server rendering (SSR)

Let's say your application requires your users to be logged in, but you still want to be able to apply caching.

Additionally, you would like to implement server-side rendering (SSR). In an ideal scenario, you would force your users to log into your authentication server, which sets a cookie on your main domain, so that you are logged in on all subdomains. If your CDN is running on a different domain, it will not be possible to render the page on the server side because the browser will not send the user's cookie to the CDN domain.

Luckily, some providers offer custom domains, so cookie-based authentication can still work for you.

Invalidating deeply nested GraphQL operations may not work

Here is an example of a GraphQL operation that is easily invalidated.

mutation UpdateProfile {
    updateProfile(update: {id: 1, name: "Jannik"}) {
        id
        name
        friendsCount
    }
}

If you previously requested a user with id 1, you can invalidate all records with that id and invalidate the following request:

query {
    profile(id: 1){
        id
        name
        friendsCount
    }
}

Let's make it more complicated and ask all your friends:

query {
    me {
       friends {
            id
            name
        }
    }
}

Now you made friends at the last conference. Let's add them:

mutation AddFriend {
    addFriends(where: {user:{hasConnection: {conferences: {id: {eq: 7}}}}}){
        id
        name
    }
}

Now we have added all users as friends who attended the conference with id equal to 7. At this stage we do not get all the data back when we query all your friends again because the cache cannot know that the mutation addFriends invalidated the cache for the request friends.

At this point, you'll have to start adding response tagging, replacement keys, or parsing GraphQL return types to make your invalidation strategy smarter.

We will return to this topic after the benefits.

Benefits of using CDN GraphQL/Edge Cache

When there is cost, there are benefits!

Using a GraphQL CDN means you don't have to change much in your application. In an ideal scenario, you simply change the GraphQL server URL and everything should work.

You also should not deploy or configure any tools. You are buying a ready-to-use service that just works.

Despite the previously discussed issues, GraphQL CDN can likely improve the performance of many applications.

When not to use CDN GraphQL

As discussed in the section on cache invalidation, creating a smart CDN service that knows nothing about your GraphQL backend is actually very difficult. The problem is that the backend is the source of truth, but does not share all of this information with the cache.

In fact, if you run into problems where invalidation becomes difficult, you might want to implement caching at another level, inside resolvers, or even one level down, at the entity level, using the DataLoader pattern.

Additionally, if your data is expected to change frequently, it may not make sense for you to cache at the Edge level.

Another big underappreciated disadvantage is that distributed cache invalidation is ultimately consistent. This means that you serve legacy content after mutation for a short period of time. If you don't account for this in your architecture, it can break clients' business logic.

Client (US) -> Query all posts to fill the cache
Client (US) -> Run mutation to update the post with ID:1
System fires a web-hook to a third-party service in FRA
Client (FRA) -> Query all posts -> stale content

This is not specific to GraphQL, but with HTTP caching the semantics are better understood. In general, don't use a GraphQL CDN if you need write-after-read consistency.

If your requirement is to maintain write after read consistency, there are several solutions to the problem.

One is to not cache at the Edge level, but rather at the application level. In this case, you're trading latency for consistency, which can be a good trade.

Another way to solve the problem is to distribute state across the Edge and shard data based on location. This sharding model is not always possible, but if the shared state is only used by groups of users in one location, this solution can work very well.

One example of this is Cloudflare Workers + Durable Objects, which gives you a simple key-value store that is stored in a specific location, meaning all users close to that location can have a consistent state at low latency.

When does a GraphQL Edge cache make the most sense?

If you have one GraphQL API, and that API is the only API your frontend talks to, you have complete ownership of that API, and no traffic bypasses your cache, then a CDN like this might actually make sense.

Otherwise, I doubt you'll get the results you're expecting, especially without the additional costs discussed earlier.

WunderGraph – an alternative approach to GraphQL Edge caching without vendor lock-in

We've discussed the pros and cons of GraphQL CDN, now I'd like to suggest a different approach that allows you to remain vendor independent.

When it comes to solving problems, sometimes it's smart to be stupid. I think cache can be a lot smarter when it's dumb and plays by the rules of the web, and that's exactly what we're doing with WunderGraph.

Our solution is open source and it can be deployed anywhere.

How it works?

I've been working with GraphQL for many years now, and I've realized that when we deploy an application that uses the GraphQL API, I've never seen the application change GraphQL operations at runtime.

What applications do at runtime is change variables, but operations usually remain static.

So what we've done is create a GraphQL to JSON-RPC compiler that treats GraphQL operations as “Prepared Statements”, a term you've probably heard of when using a database.

When you first use an SQL statement, you send it to the database and get back a handle to “execute” it later. Subsequent requests can now execute the expression by simply sending the handle and variables. This makes execution much faster.

Since we don't change GraphQL operations at runtime, we can actually do this “compile” step during development.

At the same time, we are replacing HTTP POST for requests with HTTP GET, sending variables as a request parameter.

By sending GraphQL requests via JSON-RPC with HTTP GET, we automatically enable the use of Cache-Control and ETags headers.

And that's the magic of the WunderGraph caching story.

We are not building a “smart” cache. We don't cache objects, and we don't need to build complex invalidation logic. We simply cache the response of unique URLs.

For each operation, you can define whether it should be cached and for how long. If the answer is likely to change frequently, you can also set the cache time to 0, but set “stale-while-revalidate” to a non-negative number. The client will automatically send a request to the origin with the ETag, the server will either send an updated response or a 304 if not changed.

This is a very stupid approach. If the cached value has expired or is out of date, we ask the origin if it has an update. Depending on the configuration, this may create several additional queries to the source, but we also do not create a second source of truth or deal with the complexity of managing cache invalidation tags for nested objects.

This is similar to using Change Data Capture (CDC) as a source for subscriptions versus a simple poll. CDC can be extremely complex to get right, while simple server-side polling can work just fine most of the time. It's simple and reliable.

We haven't really invented much here, it's all standard caching behavior and any service like Cloudflare, Fastly, Varnish or Nginx supports it out of the box.

There is a standard for how all web participants handle Cache-Control and ETag headers, we simply implemented this standard.

By removing GraphQL from the equation, we made it web compatible.

If you're building tools for the web, you have to respect the way the web is built, otherwise you'll create more problems than you solve.

Additional benefits of WunderGraph's approach to GraphQL caching

It's not just CDNs and servers that understand Cache-Control and ETag headers. Browsers also automatically cache and invalidate your responses without adding a single line of code.

Additionally, since we removed GraphQL from the runtime, we automatically reduced the attack surface of our application. If our frontend doesn't modify GraphQL queries at runtime, why provide an API that allows it?

Limitations of the JSON-RPC caching approach

One limitation of this approach is that we can no longer use regular GraphQL clients, since they expect us to send GraphQL operations via HTTP POST and subscriptions via WebSockets.

This is not a problem for new projects, but can be a hindrance for existing applications. By the way, we have an internal RFC to add a compatibility mode allowing for example the Apollo/urql client etc. work with WunderGraph via an adapter. If you are interested, please let us know.

However, using simple JSON-RPC would be very inconvenient. That's why we don't just compile GraphQL to JSON RPC, but also generate fully type-safe clients.

One such client-side integration is the NextJS package, which is very convenient for using WunderGraph with NextJS, including server-side rendering, authentication, file uploading, etc.

Another limitation, for now, is that you must host WunderGraph yourself. We're working on a hosted serverless solution, but for now you'll have to deploy and run it yourself.

While this may not be very convenient, it also has the advantage: WunderGraph is licensed under the Apache 2.0 license, you can run it anywhere.

How can you deploy WunderGraph?

Now that we've discussed the pros and cons of WunderGraph's approach to caching, let's look at how we can deploy WunderGraph to achieve good caching results.

First, you don't need to deploy WunderGraph globally. It can be run close to your source, for example the (micro-)services you would like to use. In this scenario, you can run WunderGraph as a login to your other services.

The architecture of this script looks like this:

Client -> WunderGraph Server -> Origin

Deploying WunderGraph with Nginx or Varnish as an additional caching layer

If you deploy WunderGraph close to the source, it will automatically add the necessary Cache-Control & ETag headers. This setting may already be sufficient for your scenario.

However, in some scenarios you will want to add another layer of cache servers. This could be, for example, a cluster of Nginx or Varnish servers hosted in front of your WunderGraph server.

Architecture updated with this script:

Client -> Cache Server -> WunderGraph Server -> Origin

Deploying WunderGraph with Cloudflare or Fastly as Edge Cache

Depending on where your users are located, a centralized caching layer may not be sufficient for your scenario.

In this scenario, you can use services like Cloudflare (Workers) or Fastly to add a globally distributed GraphQL/Edge Cache CDN.

The important thing to note here is that you are not locked into a specific solution. As mentioned earlier, we use standardized Cache-Control directives that are supported by all cache server solutions, so you are not locked into a specific vendor.

Updated architecture:

Client -> CDN -> WunderGraph Server -> Origin

Deploy WunderGraph directly to Edge using fly.io

Another option is to deploy WunderGraph directly on Edge, eliminating the need for an additional caching layer. Services type fly.io allow you to deploy containers as close to your users as possible.

The architecture of this script looks like this:

Client -> WunderGraph Server (on the Edge) -> Origin

Deploying a service on Edge is not always profitable

There is one important point that I would like to highlight and it applies to all the solutions mentioned in this post.

If we deploy the workload to Edge, the service will be very close to the users, which is generally a good solution.

At the same time, depending on where on the “Edge” we process the request, there could be a random delay of ~0 to 300ms to your origin server(s) per roundtrip. If we need to make multiple roundtrips to retrieve data for a single client request, this delay can accumulate.

As you can see, having logic on Edge is not always beneficial for overall performance.

There is one more thing! This solution works for GraphQL, REST, gRPC and other protocols!

You heard correctly. WunderGraph is not just about the origins of GraphQL. We support a wide range of upstream protocols such as REST, gRPC (coming soon!) and databases such as PostgreSQL, MySQL, Planetscale, etc.

WunderGraph integrates all your services and creates a “Virtual Graph,” a GraphQL diagram representing all your services.

The caching layer described above works not only for the GraphQL origin, but also for all services that you have added to your virtual graph.

This means you can apply a single layer of authentication, authorization, and of course caching to all of your APIs.

In our experience, it's rare that you connect the frontend to a single GraphQL server. Instead, you'll probably be connecting it to a lot of services, which is what we're trying to simplify, and caching is part of that story.

Conclusion

We have discussed various options for improving application performance using different caching systems. There are ready-to-use solutions that provide simplicity but also lock you into specific vendors.

With WunderGraph we are trying to offer an open source alternative that is based on standards implemented by many tools and vendors so that you can choose the best solution for your situation.

If you're looking towards adding caching to your stack, we'd love to talk! Join us on Discord or contact us at Twitter.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *