How Kontent.ai handles disaster recovery
Things can go wrong anywhere—even in headless CMSs. How can you be sure that your website will be always-on if you use a SaaS?
Published on Feb 10, 2021
Things can go wrong anywhere—even in headless CMSs. How can you be sure that your website will be always-on if you use a SaaS?
Published on Feb 10, 2021
In this article, I will explain what a disaster recovery plan is, who is responsible for it, and why it’s a crucial part of any SaaS service.
Kontent.ai is a multi-tenant solution. One resource failure is capable of causing an outage for many customers. That’s a fact. We are well aware of these consequences, and they were also the primary motivation for developing a solid disaster recovery plan. To keep our and your business continuously going on.
To create an effective disaster recovery plan, you first need to establish a team and every member’s responsibilities. At Kontent.ai, the team looks like this:
With the right group of people, you can start thinking about:
Let’s take a closer look at each of these questions.
What assets do we own? The answer should define the most critical data. In our case, they include:
Therefore we know which resources need to be restored in case of disaster recovery event:
It’s crucial to review the list regularly. Any implementation change may be a subject of a disaster recovery plan. This strongly applies to Kontent.ai as we release new functionality every two weeks.
Once the asset inventory is completed, you move on to the risk assessment. For each resource, identify potential threats and analyze them before they occur. Always ask the same question: “What’s the worst scenario that our business might have to deal with?”
Our list contains, among others, the following very common scenarios:
Once you know what assets you have and what may happen to them, the next reasonable question is: “What should be restored first?”
It all depends on the asset’s criticality. After speaking to each data owner, you need to determine how critical each resource and the data stored on it really is to your business. A different set of disaster recovery controls is applied to the specific resource based on RTO and RPO.
There are two important metrics defining your business continuity and disaster recovery strategy:
In an ideal world, the RTO and RPO should both be as short as possible. In reality, you need to take your assets and prioritize the recovery according to their criticality and your budget.
When it comes to Kontent.ai, the RPO equals zero minutes. This impressive feat is achieved through the use of incremental backups, which continuously capture and store all changes made since the last backup. This means that in the event of a disaster, data can be recovered up to the last recorded change, ensuring minimal data loss and disruption. In addition, thanks to regular disaster recovery drills, Kontent.ai has effectively minimized their RTO to 12 hours. While RTO represents the maximum downtime, it's worth noting that most incidents are resolved much faster and are documented on our status page. This robust disaster recovery strategy proves Kontent.ai’s commitment to data protection and operational resilience.
Just like every pilot, doctor, fireman, and others need to be regularly trained, a good Disaster recovery plan needs to be verified and regularly tested. Every year we take the most critical assets from the asset inventory explained above, the most likely threats, and see how quickly we can recover from the disaster. We simulate:
But it’s not all about hard skills. Successful disaster recovery is also about communication, cooperation with other teams, and making the right decisions under pressure. These soft skills are an integral part of any disaster recovery plan and an important component of our employee training.
Being a SaaS vendor is a huge responsibility. Responsibility for keeping all services running and for minimizing the disaster impact to an acceptable level. All the described activities help us continuously provide a great service and protect all our clients from unwanted disruptions.