Migrating content from a monolith CMS to a headless CMS
When working on CMSs (Content Management Systems), regardless of the different CMS sizes, you always must think about the future or potential scenarios and consider how to manage the migration of content from one CMS to another.
Ilesh MistryPublished on Mar 19, 2023
In this blog post, I’ll cover the basics you always need to consider when moving from monolith to headless CMS. First, let’s clarify what different types of CMSs there are.
Different CMS shapes and sizes
CMSs come in different shapes and sizes, they can range from hard-coded developer-only CMSs to CMSs where you get presented with just the graphic user interface to complete and everything else, like the design, presentation, and hosting, is all handled for you.
There are also CMSs where you need to investigate the payment plans, as some would be subscription based and some plans are fixed cost. There are also CMSs that deal with all the hosting and infrastructure for you. As well as the differences I’ve mentioned, there are many more and I’m not going to list them all here.
This blog post is not going to cover the CMS differences, but what I did want to categorize are the broad spectrums of CMSs currently labeled out there.
Different types of CMSs
There are many different types of CMSs available, and they can be broadly classified into the following categories:
- Monolith CMS: A monolith CMS is a traditional, all-in-one CMS that includes both the content management and presentation layers in a single, integrated system. Monolith CMSs are typically self-contained and provide all the tools and features needed to manage and publish content, as well as to design and build the front end of a website or application.
A few examples of this type of CMS are Kentico Xperience, Adobe Experience Manager (AEM), Sitecore, and Ektron.
- Headless CMS: A headless CMS is a content management system that decouples the content management and presentation layers. This means that the content is managed and stored in a separate system and can be accessed and consumed by any front-end application or platform via APIs. Headless CMSs are often more flexible and scalable than monolith CMSs but require more development expertise to implement. They are often used in composable architectures and allow you to use best of breed solutions.
A few examples of this type of CMS are Kontent.ai, Contentstack, Contentful, and Storyblok.
- Decoupled CMS: A decoupled CMS is like a headless CMS but includes some limited presentation capabilities. This means that the content can be managed and stored in a separate system, but the CMS also includes tools for designing and building the front end of a website or application. Decoupled CMSs offer more flexibility than monolith CMSs, but less than headless CMSs.
A few examples of this type of CMS are WordPress, Drupal, and Optimizely.
- Hybrid CMS: A hybrid CMS is a combination of a monolith and headless CMS. This means that the CMS includes both content management and presentation capabilities, but also allows the content to be accessed and consumed by external systems and applications via APIs. Hybrid CMSs offer a balance between the flexibility and scalability of headless CMSs and the ease of use and simplicity of monolith CMSs. There are some drawbacks to Hybrids CMSs where they can be more complex as they combine multiple systems and architectures, can be more costly and harder to maintain and not be as flexible as headless CMSs.
A few examples of this type of CMS are Zesty.io, Agility CMS, Core dna, and Crafter CMS.
Overall, the type of CMS you choose will depend on your specific needs and requirements. Each type of CMS has its own strengths and weaknesses, and the right choice for your project will depend on factors such as your technical expertise, the complexity of your project, and your budget.
In my experience, enterprise clients who wanted to grow and needed to scale their content operations always relied heavily on a headless approach as it proved to be the only one that was future-proof, flexible, headache-free and formed an essential part of their composable views allowing them to use the solution products that delivered to their requirements.
Content migration consideration steps
Content migration from a monolith CMS to a headless CMS can be a complex process, depending on the amount and complexity of your content. Core steps in the migration process include:
- Planning: Before starting the migration, it is important to plan out the process and identify any potential challenges or obstacles. This may include identifying the content that needs to be migrated, determining the format and structure of the content, and setting up a schedule for the migration.
- Exporting content: The next step is to export the content from the monolith CMS. This may involve using built-in export features or writing custom scripts to extract the content. It is important to ensure that the content is exported in a format that can be easily imported into the headless CMS.
- Transforming data: Once the content has been exported, it may need to be transformed to fit the format and structure expected by the headless CMS. This may involve cleaning up the content, splitting it into smaller pieces, or combining it with other data.
- Importing content: The final step is to import the transformed content into the headless CMS. This may involve using built-in import features or writing custom scripts to load the content into the CMS. It is important to carefully test the imported content to ensure that it has been migrated successfully.
Overall, content migration from a monolith CMS to a headless CMS can be a complex process, but careful planning, content strategy and execution can help ensure a successful migration.
Let’s investigate the individual steps in a little more detail.
There are several things to consider when moving from a monolithic CMS to a headless CMS. This can be a challenging task, as it involves transferring all the content, including rich text, images, videos, and other media, from the old CMS to the new one.
You will need to consider how you will migrate your existing content from your monolithic CMS to your new headless CMS. This will likely involve exporting your content from your existing CMS, and then importing it into your new headless CMS. You will also need to consider how you will handle any custom data types or data structures that are specific to your existing CMS.
Additionally, you will need to think about how you will integrate your new headless CMS with the rest of your technology stack. This will likely involve building new APIs or integration points to connect your headless CMS to your front-end applications and other systems.
You will also need to plan for the content that requires secure access to the content or assets that are viewed via authentication on the website. Thinking about how you handle this away from the content that doesn’t require additional secure access will need to be considered.
Content migration is one of the major considerations when moving from one platform to another. Careful planning and preparation would need to take place to articulate what methods and tools are available to help with this process. There are also other important factors to consider like time, budget, and resource availability.
You might find that the existing monolith CMS has plugins and features that aided the content editor, and such tools will not be available in the new headless CMS, it is wise to switch off such plugins, so they don’t appear in the exported content.
Deciding what to keep and what to get rid of
It’s important to understand that there is no magic lift and shift functionality when it comes to process, everything needs to be looked at. Key decisions would need to be made on what content is kept and what content is no longer needed or could be created again in the new CMS.
You will need to identify the content you want to migrate—this may include pages, posts, images, videos, and other types of content. Placing the different types of content into categories and having some form of organized architecture will help with the process. Also, an up-to-date IA (Information Architecture) can help to list out the pages of content that need to be considered for migration or the pages that do not.
Example of grouping types of content
As each content migration has its differences, an example of such category grouping that could be used to segregate content could look like this:
- Content which is no longer needed and therefore can be ignored
- Structured content (identify the different types)
- Articles, blog posts, products, etc.
- Rich content types (that have fewer structures, e.g., full page rich text content)
- Landing pages and related content pages, which are generally built up from other content
- One-off simple basic pages such as legal pages and contact us pages
- Local video content, if it is not hosted by a provider like YouTube, Vimeo, etc.
- Images, if they are not hosted by a DAM (Digital Asset Management tool)
- Content in page builders and widgets
- Personalization and variation content
- Custom tables, custom modules, and custom components
- Imported content from external sources, e.g., PIM (Product Information Management), CRM (Customer Relationship Management), etc.
- E-commerce content that is not imported from PIM
Another consideration is whether all the categories need to cover localized content, and if they do, that would add another level of complexity to consider the handling of content and its translations for the different types of content categories.
When it comes to the content in the monolith CMS, it’s essential to understand that versioning of content should be investigated. Recommendations would be to only migrate published content as the destination CMS would have no history of versioning from the old CMS. Therefore, ensure all content that needs to be migrated is in the correct published state.
You may have content within the CMS that is personalized and variations that you have used. It’s important to understand when moving to a headless CMS that features and functionality you may have utilized in the monolith CMS may not be available and you will need to consider third-party offerings to help bridge this gap. A strategy for this should be factored in to enable content to be utilized in the new CMS, which could then put this piece of content into the “need to apply after migration” category.
Form and custom table data
The key aspect of headless CMS is that they specialize in content management and integrating with third-party vendor specialists. So you would need to consider the best of breed to handle features like form capturing or data you stored in tabular format to integrate this into the new CMS. A plan for this should be put into place for this type of data.
Page builder and widget data
A lot of traditional CMSs would have page builders and widget capabilities which aid the content editor; however, they are very specific to the CMS they are on. Data that get stored for this can be in complex data tables and potentially hard-to-retrieve areas. It’s key to think about how you would extract this data and whether in some scenarios it’s probably not worth it and easier to create this data in the new CMS within the new structures, as these features may not be available or similar in the new CMS.
It’s important to perform due diligence on different types of third-party vendors you will need to consider when using the new CMS. You may need to set up subscription plans and recruit the right resource and skills for the different integrations. Thinking about how this will be set up in the new CMS and how everything will work together will be important in the planning process.
Once you’ve identified the different types of content and planned the categorization and grouping, it will be important to collate the content for it to be processed either manually or via some form of automation tool.
Let’s investigate things that you would need to consider regarding exporting content from the existing CMS in our next step.
Exporting content from a monolithic CMS typically involves writing scripts or code that interact with the CMS’s database and API to retrieve the desired content and format it for export. The specifics of the process will depend on the CMS being used and the desired format of the exported content.
It is important to verify that you have the necessary permissions and access to export the content and ensure a backup of the content is completed before attempting to export content.
CMS export tools
Most CMSs provide tools for exporting content, such as XML or CSV exports. You can use these tools to export the content you have identified for migration. The best thing to do in this situation is to identify what the CMS provides. Once you know this, you can then define a strategy to export the content. Exporting the content from the monolith CMS can vary depending on the tools available from the existing CMS, and the content complexities and categorizations of content available. Also, it can all depend on how the content is structured within the existing CMS.
One thing I’ve learned in the past is that having an export feature doesn’t automatically mean everything is okay, it’s more what the export does and then how you can use it that matters more. For instance, in the past, I’ve come across an image export feature, but instead of exporting the CMS image, which was not in a DAM, it provided a reference identifier and other information for it, which is not exactly what I required.
Custom export scripts
If the monolith CMS doesn’t provide the necessary tools to assist with exporting, you may need to write custom scripts to help with this process. Use the planning phase to help provide what content segments to target and which ones could be achieved quicker either manually or using some form of scraping mechanism. The quantity of data or content would dictate the approach you will need to go for, so the more content there is, the more a scripted export feature is required.
Content with the monolith CMS may still be edited by the existing team so you may need to consider if there is a content lockdown or investigate the last modified date or even the ability to export changes and factor this into your solution when gathering the exported content.
Once you have export processes in place and content is successfully stored somewhere, there will be no doubt that you would need to cleanse and/or adjust and rectify this content data to suit a successful content import process. Let’s investigate the next step in this journey after you have exported data.
Once you have exported the content from the monolith CMS, the data content could be in various formats, depending on the export solutions that may have been used. You will need to clean and prepare it for import into the headless CMS. This may include removing any duplicates, fixing any formatting issues, and ensuring that the content is in a format that can be easily imported into the headless CMS.
You may find that some images may not be optimized and may not be of the best quality and, in some cases, must decide if you wish to consider them to be imported into the new CMS. In those situations, it’s worth having some basic image requirements to help you filter out the good from the bad.
Incorporating naming conventions for images and documents will always help, as you don’t want to bring bad asset naming habits into the new CMS!
Any revisions and duplications would need to be removed to avoid clutter in the new CMS.
Simplifying rich text content
It’s important to realize and sometimes people forget this, that the monolith and headless CMSs are not a like for like and you can’t just lift and shift content without running into issues. One of the reasons is that each CMS would have its own rules and requirements to follow. An example of this could be simply down to the type of rich text editor they use and if the content that you wish to import has forbidden elements within it, it could cause the CMS to not take in the value during the import process, therefore, causing errors and delays to the import process.
Another important aspect of headless CMS is that their focus is to allow for content to be served to multiple channels and for this criterion you would need to avoid any markup styling and presentation specifics, as they would get stripped out. If everything is within a big rich text field with lots of styling and nested components, then it will be tricky to dissect this and get it to the format in the new CMS. My recommendation is that when working with rich text elements, it’s safer to go minimalistic and remove any unwanted and unrecognized tags that are not allowed before importing the content into the new CMS.
Formats of data fields
You may also need to look at the format of the different data fields from the new CMS and map them via transformation scripts to meet the new requirements. There could be instances where the text, date, and number fields could be in slightly different data formats, and this can cause a lot of pain if it is not sorted out before the import process.
Script tags in content
Often monolith CMSs allow the capability to include script tags and allow for scripts to be used. This would be another aspect to plan to remove anything that might disrupt or cause the import process to fail for the new CMS.
Once the content is in a good state, the next step is to build or use an existing import process to port the content over into the new CMS.
Finally, you need to import the content into the headless CMS. Most headless CMSs provide tools for importing content, such as APIs (Application Programming Interfaces) or CSV imports. You can use these tools to import the content into the headless CMS and make it available for use in your applications.
Importing content into a headless CMS example
If you can map the content correctly for the new CMS, then there are potentially a few ways to import the content into a headless CMS like Kontent.ai.
You can also check out my blog post Adventures in Content Migration, which talks about the pros and cons of a situation I had to deal with in the past. You may select a different option from what I went with depending on your needs.
Import tool options within a headless CMS
Depending on the headless CMS you are using, there may be tools already out there which would allow for content importing, whether that is through CLI (Command Line Interface) or a bulk import tool, if the mapping of structured content types works using this approach, then you are on to a winner. If not, then there will be plenty of ways to use the headless CMS APIs to create an import process.
When looking to import content it is important to avoid duplication and try to enforce reusability where possible, especially if you want to take advantage of using a headless CMS.
There also could be a mapping exercise that goes through the content that has been cleaned and map it correctly to the type it needs to go into for the new CMS. Content will need to be in a structured format if you want to avoid starting over again.
Look into the headless CMS APIs when importing images and documents
When it comes to importing images and documents, it’s essential to investigate any APIs that the new CMS provides. Headless CMSs generally have an API you can use to perform the import process for assets, so it would be worth looking into this. Whatever you do when importing assets, it’s vital to set up some form of organization with the CMS asset/document library to keep things neat, tidy, and easy to find within it. One of the worst things to find when using a new CMS is that all the imported assets are sitting in a single folder.
Testing your import process
Having some form of ability to perform an import process on a testing environment in the new CMS would be highly recommended, as speaking from experience, the import process never works the first time around and you are bound to get teething issues where tweaks would need to be made.
The ability to perform an update to imported content is always highly beneficial too, as you don’t want to be waiting for long imports, especially for small changes. When doing this, it's highly recommended to perform such operations with clear error messaging and error handling to ensure the import process doesn’t disrupt other functionality or break halfway through, but instead allows you to quickly resolve issues, allowing you to reimport items that had failed. Having a batched approach with clearly visible status updates to see progress is recommended too.
Content that can’t be imported
There will be content that either can’t be imported or needs further actions. This type of content can range from forms, custom tables data, personalization and/or content that is imported from elsewhere.
This type of content would need to be categorized in the planning and research stage, and then managed with a plan of execution.
An example of this could be personalized content. Dealing with this requires incorporating the researched and agreed upon third-party vendor to perform personalization within the new CMS, based on an established content strategy. The same would be the case for forms and capturing form data.
Another area of this type of data is content that was imported from an external source in the monolith CMS, this would also need to be configured into the new CMS as well.
With all the planning and automation that you have in place, you will find a group of content that would need to be entered manually. It’s essential to have a contingency plan for this situation including potential solutions to speed up this process and try to avoid as much manual content entry as possible. When doing this you would need to categorize the manual content into groups.
Depending on the quantity and complexity of the groups, I would recommend building a simple capture form to collate content from the content entry team so that a tool could be built to import the content into the complex structures of the target CMS. The reason for this is that the content entry team might not be too worried at the early stages about how it is configured in the new CMS and is more concerned about getting the data into the CMS in the first instance.
Examples of such data collection techniques can range from adding data into CSVs, spreadsheets, documents, and easy-to-use custom forms. Once you have this data you can then import it into the new CMS by creating custom CMS API scripts, therefore avoiding manual content entry into the CMS, especially for more complex situations.
At a high level, I’ve gone through this blog post explaining that there are many factors to consider, and I would say there isn’t “one tool that works for all”, especially when you want to perform content migration. And there will be some content that will need to be manually entered and additional steps of configuration to be completed.
Migrating content from a monolith CMS to a headless CMS requires careful planning and attention to detail. It is important to follow the steps outlined above to ensure a successful migration and avoid any potential issues.
Overall, moving from a monolithic CMS to a headless CMS can be a complex and challenging process, but it can also provide many benefits, such as increased flexibility, scalability, and the ability to deliver content to a wide range of clients. It is important to carefully evaluate your specific needs and requirements and to plan your migration carefully to ensure a smooth and successful transition.
And as you can imagine there will always be differences whether that is in the content, the structured data, assets or even the different versions of CMS, the important key takeaway is to ensure everything is planned out, estimated for, and correct team resources are scheduled in and executed in the most efficient fashion possible to successfully complete a content migration from a monolith CMS to a headless one.
For more tips and tricks on how to best handle migrated content into a headless CMS like Kontent.ai, listen to Brian McKeiver's podcast Kontent Rocks 16 - Talking about Content Migration.