Lessons learned from building a reliable educational portal
When we started building Kontent.ai Learn, it was a journey, and everything was new. We made mistakes and learned from them.
In this post, I go over the technical side of the journey with focus on increasing performance. Want a TL;DR version? Cache and CDN, use them well.
Jan CermanPublished on Nov 28, 2022
Our starting point
We built our own educational portal from scratch. You can find our motivation to go custom and the details and decisions behind the implementation. You can find the educational portal at Kontent.ai Learn.
In the early phases, our priority was time to market. We made sure we could easily add new content, make it findable, and easy to read for anyone. Performance came close second. We later realized some things weren't as predictable as we'd like. Like page load times, the arch nemesis of user experience.
If it takes time to get, cache it
This may sound obvious in retrospect, but we didn't use any caching when we started. Why bother? Delivery API caches successful responses for you. You just want to get your content. The problem is that even a cached response on the API's side might not be immediate. Not when you repeatedly fetch several responses at once.
The cost of network overhead
The web app needed to make around six API requests to build a single page. For example, it needed to fetch items for navigation, footer, UI messages, site-wide redirects, and article contents. Depending on the complexity of the request, the first responses from the API could have taken anywhere from 100ms (for simple single-item queries) to 1000ms (for complex multi-item queries).
When repeated, these API requests take around 50ms. However, the web app itself might wait longer. There's always a slight network overhead for every request. And the numbers add up. Even with six quick API requests, you might end up waiting for about 0.5s.
It doesn't look like much, but remember that displaying the content in the browser takes time as well. This is sometimes referred to as rendering time or time to interactive. In combination, the added delay leads to a tolerable yet subpar experience.
To optimize or not to optimize?
What's a developer to do? We faced a choice between two options.
- Rewrite the existing logic so that the web app requires fewer API requests to build a page.
- Make fewer API requests in general while keeping the existing logic as-is.
They say premature optimization is the root of all evil. To avoid that evil, we chose to improve what we had instead of taking it apart.
We decided to make fewer API requests by caching the API responses for each of the requests. If the web app needed a certain part of content to build a page, it could just look into its own memory. No need for network requests.
The perks and perditions of caching
The web app caches responses from Delivery API for as long as necessary. If content creators make changes in Kontent.ai, the web app invalidates certain responses in its cache and gets fresh content. But how does the web app know which cached responses to invalidate?
Cache invalidation is fun. It gives you choices to make. For example:
- Rebuild the whole site like some static site generators do.
- Rebuild specific parts of the site.
- Rebuild regularly every few minutes.
- Rebuild only when something changes.
Your choices depend on whether you want to make it easy for yourself or for your content creators. We put the authoring experience of content creators first. After all, we're building a whole system for content creators. These people expect to see their content live a short while after publishing.
We went with rebuilding specific parts of the website when something changed. This meant reacting to webhooks. Long story short, the more structured your content model the more fun you'll have verifying whether your cache invalidation works as intended.
If bots bring you down, hide behind a CDN
Adding a basic caching mechanism helped increase the overall performance. The web app cached the API responses, so there's just one unique API request each time. But the pages were still built anew for every request. Although the page building happens in a split second, it's taxing the web app each time and doesn't scale.
Even a tiny denial of service (DoS) attack by modern standards could bring our web app to a halt. We're talking 10,000 requests over the course of five minutes. Nothing dramatic. Still, it's enough to cause a brief downtime.
To make up for this lack of protection, we hid the web app behind a content delivery network (CDN) provided by Fastly. Every request went through the CDN before it reached the web app. Although CDN adds a layer of complexity (more on that below), it's also a layer of protection that enables you to scale stress-free.
While the CDN might not be a silver bullet for everything, it ensures that your web app knows only about unique requests. If a request is repeated, the CDN has a response cached from the first time and returns the cached response. Less work for the web app, more speed for visitors.
If cache isn't used enough, investigate
With the CDN in place, things got a lot more stable. Just in case, we looked at the CDN logs for the first few weeks and found that the response times weren't stable. Turns out the CDN wouldn't allow certain pages in its cache. This meant our cache hit ratio oscillated and wasn't predictable. It was time to investigate.
Let the right one in
Solving problems is about asking the right questions. The question here was, "What kind of responses are allowed to enter the CDN's cache?" The answer? The responses that look the same for all visitors. It's a key concept to realize. It led to us looking at responses from the web app, their headers, and discovering a bug.
The problem is if you (unknowingly and repeatedly) set the cookie on every page visit. Whenever the cookie is set, the response from the web app comes back with the
set-cookie header. This makes the response uncacheable.
If responses set cookies, they cannot enter cache. Otherwise, the CDN would serve that response to everyone and everyone would get the same set of cookies. That would be a problem on several levels. CDN is smart and doesn't allow that.
Careful with cache policies
Another thing that must be smart is cache policies and invalidation. We had four levels of cache to work with.
- Delivery API – Caches successful API requests.
- Web app – Caches responses from the Delivery API.
- CDN – Caches responses from the web app.
- Browser – Caches responses from the CDN.
From these four, we can control the behavior of the web app and CDN. But it's not necessary to do different things at both levels. We decided to keep the main cache configuration in just the web app. It's easier to manage.
It took a bit of trial and error, following best practices on HTTP cache, and checking whether the changes made the difference we wanted. It was a formative experience that led to exploring how deep the
cache-control rabbit hole goes. For example, did you know you could set different cache policies for the CDN and for the browser itself? All it takes is understanding that header's directives.
With this exercise, we're able to satisfy a core requirement we had: make latest content changes available to every visitor quickly and efficiently.
We made sure the content from the web app was stored for different amounts of time in different places:
- In the web app, content was stored until a webhook said it's time to go. Once the time came, the web app also ensured the content is gone from the CDN, too.
- In the CDN, content was stored for up to a week. But only popular content stayed for so long because the CDN tries to be efficient. It might remove content from its cache if it doesn't see any demand for it.
- In browsers, content was stored for a few minutes. Then the browser should check again with the CDN.
Once we finished measuring performance and optimizing the invisible details, we started adding more content to our project and extending the portal. We added e-learning courses, API documentation for REST APIs, code samples. All in one place called Kontent.ai. To this day, this consolidation of content means we can easily find what we need, reuse it, and make changes quickly. And you can too.