Content Migration with an API That Fits Like a Glove
An API's design dramatically shapes how you write the rest of your code. The Content Management API lets you import existing content and much more. Let's see how this API's design influences your code.
Evaluating an API from a developer’s point of view is like choosing trekking shoes. Since they are going to stick around for a pretty damn long time, they had better be comfortable, or you are asking for some severe blisters. If you work with Kentico Cloud, you may be familiar with the Delivery API which is used to pull content into your app. The Content Management API (or CM API for short), on the other hand, is capable of modifying the source data in Kentico Cloud directly not only delivering it. Now, there are many cases in which you may need to use the CM API, such as updating content automatically or migrating content between two systems. Today, we’re going to focus on importing existing content from your old CMS into Kentico Cloud. Migrating existing content can cause unwanted friction in your content production flow and delay the whole project. You can read more about the business side of things in my previous blog post on how to gain the edge when importing existing content.
A Few Words on Content Migration
Imagine you’re migrating existing content for a huge project. There are a lot of content items and assets, and everything is densely interlinked. Generally speaking, you can divide content migration into these four high-level steps:
- Defining the content model in the new CMS
- Collecting and revising available content
- Reassembling old content according to the new content model
- Migrating content
Now, the toughest nut to crack is the model reassembly. When transforming content into the new model, you need to make sure all links work correctly after the import. Content can be massively interlinked, and links can appear within rich text as well as within structured content.
Let’s get to the heart of this issue and find a solution. There are two things you need to take care of to import links successfully. First, you need to know the ID of the linked content or asset to use it later as a reference. Moreover, most systems rely on the existence of objects. Therefore, the second issue influences the order in which you import objects. When you import content with links, you need to make sure the referenced objects have been imported before.
In the rest of this article, I'll show you how to optimize the import process and solve the issues above by using the CM API. Since the CM API offers you much more than I'll be able to show you in this short piece, you can visit our developer hub to find detailed information in the CM API reference.
Reassembling Relationships Before Migration
So, let’s tackle the issues one by one. The first challenge is figuring out the ID when transforming links to the new model. Because, in fact, most systems rely on being in control of what the ID looks like. These systems need to ensure the uniqueness and internal shape of the ID. Before the migration, you know the old system ID. Still, during model transformation, you need to know the new system ID to assemble all relationships. Naturally, this is challenging when the system doesn't allow any control over the ID's creation.
Now, most migration tools I’ve seen compensated for this issue with an ID mapping dictionary. They stored pairs of object IDs, mapping the old system ID with the new system ID. However, if the API gives you more control over the ID creation, there’s a more elegant and practical solution.
One of the neatest features of the CM API is that you can create and update content items and assets using an external ID, for example, the old CMS-ID. Although Kentico Cloud still creates a new ID automatically, you’re able to use the external ID to retrieve, link, or modify content.
So, let’s see the CM API in action. The following request creates a content item in Kentico Cloud. But it explicitly specifies the external ID. If you execute the request again, Kentico Cloud finds the content item with the given external ID and updates it. Therefore, you don’t need to store any additional mapping tables when reassembling the content graph to the Kentico Cloud model.
curl --request PUT \
--url https://manage.kenticocloud.com/projects/0aa7de3e-6e10-47fc-879e-1b70471cd8df/items/external-id/59713 \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Content-type: application/json' \
--data '
{
"name": "Using CM API is so straightforward",
"type": {
"codename": "article"
},
"sitemap_locations": []
}'
Ordering Imported Content
You may say the import script scans through the old system, scrapes all content and puts it into the new system. On the other hand, you will probably get in a situation when you're trying to link content that's going to be imported later down the execution. In fact, most systems rely on content and assets being present at the time you try to operate them such as linking them. Still, you might try to figure out a particular order in which you import content. However, connecting nonexistent objects is barely avoidable as your content might contain circular references. With related articles, this situation is way too typical.
One way to solve linking nonexistent content is to execute the import in multiple phases. First, you import all content and assets without relationships. Then, you reread all existing content, this time reconstructing the relationships.
Luckily, an import can be more straightforward when the system doesn't require the objects' existence during the import. Well, the CM API got it covered. Content items and assets don’t need to exist before you try to import them as links. Instead, Kentico Cloud creates a link to nonexistent content or an asset which will automatically work after you import all the dependencies. Therefore, you can import the existing content in any order, while in the end, all relationships and links work.
Let me demonstrate it using articles. Here’s how you can import an article and all links within it by specifying the external IDs of linked content in rich text and modular content.
curl --request PUT \
--url https://manage.kenticocloud.com/projects/0aa7de3e-6e10-47fc-879e-1b70471cd8df/items/external-id/5116/variants/00000000-0000-0000-0000-000000000000 \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--header 'Content-type: application/json' \
--data '
{
"elements": {
"body": "<p>You can import <a data-item-external-id=\"8723\">related article</a> in one go.</p>",
"related_articles": [
{
"external_id": "8723"
}
]
}
}'
Leaving No Room for Mistakes
From what I already wrote about the CM API, it is apparently a tool for administrators and developers who are well aware of how Kentico Cloud works. It allows you to do anything without restrictions, as you’ve seen, even create references to nonexistent content. However, there’s a way to remain confident. You can check for errors after import. In fact, the CM API provides you with granular error messages and an endpoint for testing the validity of your project.
{
"request_id": "8000549e-0002-af00-b63f-84710c7967bb",
"error_code": 5,
"message": "The provided request body is invalid. See the 'validation_errors' attribute for more information and specify a valid JSON object.",
"validation_errors": [
{
"message": "Invalid rich text value. Expected an opening tag of one of the supported elements, but got a text. Please see https://developer.kenticocloud.com/reference#content-management-api-rich-text-element.",
"path": "elements.body",
"line": 1,
"position": 1
}
]
}
Calling validation endpoint scans all content types, content items, and assets in your project and returns all possible errors. Most importantly, it reports all references to non-existent objects, but it also warns you about content items that dissatisfy content type limitations, such as content types allowed in modular content elements.
{
"project": {
"id": "0aa7de3e-6e10-47fc-879e-1b70471cd8df",
"name": "Sandbox"
},
"variant_issues": [
{
"item": {
"id": "b48284b4-705e-5370-b30a-f7b7a240e00b",
"name": "Using CM API is so straightforward",
"codename": "using_cm_api_is_so_straightforward"
},
"language": {
"id": "00000000-0000-0000-0000-000000000000",
"name": "English",
"codename": "default"
},
"issues": [
{
"element": {
"id": "1b78fa7a-08b4-4ef7-abd7-79da5c47864e",
"name": "Body",
"codename": "body"
},
"messages": [
"Element 'Body' contains a content link referencing a non-existent content item 7abbcbb8-2ac2-509d-b028-49cf8e299793."
]
},
{
"element": {
"id": "defbf810-72d1-4b3a-837c-5d3f161e11c2",
"name": "Related articles",
"codename": "related_articles"
},
"messages": [
"Element 'Related articles' references a non-existent content item 7abbcbb8-2ac2-509d-b028-49cf8e299793."
]
}
]
}
]
}
Kentico Cloud provides you with a powerful Content Management API to manipulate the content and assets in your project. It is designed to alleviate the common pains developers have with writing imports such as reassembling content relationships or specifying the order in which objects are imported. Fortunately, with the CM API ever evolving, you can stay tuned for more features coming that will allow you to do more than importing content and assets. So, why don’t you sign in today and give it a try?