Searching Headless Content with Algolia, Rails, Kontent & Next.js

Searching headless content with Algolia, Rails, Kontent & Next.js

By Tom MarshallSep 29, 2021

In this guest article, Tom Marshall from Kyan walks you through the process of delivering engaging and fast search experiences that return relevant results.

As anyone who has implemented search functionality before will know, search is a deceptively complex problem. Google set a high bar. Users naturally assume that search experiences will return exactly what they’re after in the first few results whilst forgiving any mistakes or inaccuracies in their input.

From the user’s perspective, search should ‘just work,’ but that’s easier said than done.

A headless CMS decouples the content from the presentation layer. That provides many benefits, which is why we love headless, but search functionality requires that presentational knowledge. Without this context of how the content comes together to form the site’s pages, headless CMSs cannot provide search solutions capable of meeting users’ expectations out of the box.

Thankfully integrations with Search-as-a-Service products like Algolia offer a solution here.

At Kyan, we deliver search experiences that are engaging, fast, and most importantly, return relevant results. In this article, I’ll show you how, with Algolia, Next.js, Ruby on Rails, and Kontent by Kentico.

Our system architecture

We have a Next.js project as our web front end, which fetches content from the Kontent headless CMS, speaks to a Rails API for additional functionality (e.g., comments), and integrates Algolia’s Instant Search to provide the search UI.

Kontent is the canonical content store, but Algolia cannot read from the Kontent Delivery API in real-time. That would be too slow. It needs a local copy. We could regularly copy the content in bulk, e.g., with a nightly cron job, but we want our search results to remain in sync as content changes are published.

We want changes to be pushed, not pulled.

Kontent emits webhook notifications as content changes are published, but we cannot connect the raw Kontent webhook notifications to Algolia directly. We need something in the middle to listen for those webhook notifications, extract the content relevant to the search, then push that to Algolia.

At Kyan, we love Rails for building backend services and APIs, so for us, that’s going to be the Rails app, but it could equally be a Next.js API endpoint or serverless function. It just needs to be something that speaks HTTP and ideally has open-source client libraries available for the Kontent Delivery and Algolia APIs, though that’s not essential.

Designing the index

Algolia does not need a complete copy of the content, only the fields relevant to the search functionality and UI. To determine which fields are required, we need to answer the following questions.

Which fields are required:

  • For matching search terms against
  • To filter the results with
  • To order results by
  • To display on result cards
  • And, finally, any IDs needed for reference

Using blog articles as an example:

Search UseFields Required
For matching search terms againstTitle & Body
To filter the results withTags & Author
To order results byPublish Date
For presenting results Image, Estimated Reading Time & URL Slug
IDs for referencingKontent Item ID

Some fields may fall into multiple categories, which is fine; we just need a complete set.

Now that we know which fields we need for the search functionality, we can set up our Rails application to maintain the index within Algolia.

Maintaining the index

As content editors make changes in Kontent, we need those propagated to Algolia via Webhook. The sequence for that is as follows:

Setting up Algolia

Having logged in to the Algolia dashboard, we need to create a new application. Don’t worry about creating an index or importing any records at this stage. The algoliasearch-rails gem will take care of that. We just need to extract the API keys for the Rails and Next.js applications.

We will use the pre-generated 'Search-Only API Key' for Next.js and the 'Admin API Key' in the Rails application for demo purposes. For production, you should create specific API keys to secure access control, specify rate limits, HTTP referrers, etc.

Handling the webhook notifications

First, we need to create a Rails application and add the additional gems we’ll need:

$ rails new -T --api -d postgresql blog-kentico-algolia-demo
$ cd blog-kentico-algolia-demo
$ bin/setup
$ bundle add algoliasearch-rails kontent-delivery-sdk-ruby dotenv-rails

We configure the Algolia gem by adding an initializer at config/initializers/algoliasearch.rb:

AlgoliaSearch.configuration = {
  application_id: ENV.fetch('ALGOLIA_APPLICATION_ID'),
  api_key: ENV.fetch('ALGOLIA_API_KEY'),
}

Next, let’s add the values for those Algolia environment variables to our .env:

ALGOLIA_APPLICATION_ID=<your value here>
ALGOLIA_API_KEY=<your value here>

The Algolia gem works as an extension of Rails’ ActiveRecord ORM, so we need to create an Article model and run the generated migration.

$ rails g model article title:string body:text tags:string author:string published_at:datetime image:string estimated_reading_time_mins:integer url_slug:string kentico_id:string published:boolean

> Note: We’re using PostgreSQL, so we’ll tweak the tags attribute to be an array (t.string :tags, array: true, default: []) before running the migrations. Alternatively, tags could be a reference to a Tag model, but we only need the tag names to match and filter against currently.

$ rails db:migrate

Because the Algolia gem works as an extension of ActiveRecord, the Article records will be automatically persisted locally in the PostgreSQL database. This database persistence isn’t strictly necessary, so you can skip it if you’re using a serverless function or Next.js API endpoint. However, this database persistence is how the algoliasearch-rails gem operates by default, so it is the path of least resistance for us in Rails.

Using the algoliasearch-rails methods, we can define how Algolia should use the Article model attributes:

class Article < ApplicationRecord
 include AlgoliaSearch

 algoliasearch if: :indexable? do
   # the list of attributes to include in the Algolia record
   attributes :title,
              :body,
              :tags,
              :author,
              :published_at,
              :image,
              :estimated_reading_time_mins,
              :estimated_reading_time_human_readable,
              :url_slug,
              :kentico_id

   # defines the attributes to match search terms against.
   # list them by order of importance.
   searchableAttributes %w[title tags author body]

   # attributes to filter results by
   attributesForFaceting %w[author tags estimated_reading_time_human_readable]

   # defines the ranking criteria used to compare two matching records in case
   # their text-relevance is equal. It should reflect your record popularity.
   customRanking ['desc(published_at_unix_timestamp)']
 end

 # only include articles that are published within Kontent
 def indexable?
   published?
 end

 # group the estimated reading times into a human readable set for friendlier
 # filtering, rather than having a filter option for each integer value
 def estimated_reading_time_human_readable
   case estimated_reading_time_mins
   when 0...2
     ‘Less than 2 minutes’
   when 2...6
     ‘2 to 6 minutes’
   when 6...10
     ‘6 to 10 minutes’
   when (10..)
     '10+ minutes'
   end
 end

 # convert the date attribute to an integer for sorting
 def published_at_unix_timestamp
   published_at.to_time(:utc).to_i
 end
end

Webhook notifications from Kontent are POST’d to a URL of your choice. To handle the incoming webhook, we’ll need to add a route to config/routes.rb:

Rails.application.routes.draw do
  namespace :webhooks, defaults: { format: :json } do
    post :kontent
  end
end

Next, we need a controller action to handle the requests for that route.

Here’s an example webhook request body from a change to an Article item in Kontent:

{
 "data": {
   "items": [
     {
       "id": "18862937-3bc2-481e-9ca8-c177b813570a",
       "codename": "the_9_worst_songs_about_clothing_websites",
       "language": "default",
       "type": "article",
       "collection": "default"
     }
   ],
   "taxonomies": []
 },
 "message": {
   "id": "f01a8710-3f62-4c3c-b048-61951ece3a4b",
   "project_id": "5e56b927-e956-012f-97f2-fce44a1b6e28",
   "type": "content_item_variant",
   "operation": "publish",
   "api_name": "delivery_production",
   "created_timestamp": "2021-07-15T16:39:23.8202009Z",
   "webhook_url": "https://example.com/api/v1/webhooks/kentico" #TODO
 }
}

You’ll notice that the request body does not contain all of the fields of the Article item that we need for the search index. Instead, the webhook just enumerates which content items have changed. This is why we need to fetch the full content from Kontent after having received the notification.

To keep things manageable, we’re going to breakdown the Rails logic across 4 different files:

> Note: In a production application, we’d use background jobs here to avoid overloading the main thread, but that’s beyond the scope of this demo.

Setting up Kontent

First, log in to Kontent and create a new, empty project.

Next, we’ll need to create a webhook.

To develop and test our Kontent webhook implementation locally, we’ll need the incoming webhooks from the public internet to reach our local development machine.

ngrok can provide a public URL for the webhook that will then tunnel to your local machine.

$ ngrok http 3000

Create a webhook in the Kontent project settings and set the URL as the tunneled URL from ngrok, e.g., https://d347-86-150-50-107.ngrok.io/webhooks/kentico.

> Note: In Rails 6 or newer, you’ll need to add the ngrok host to development config or disable host restriction entirely by adding config.hosts.clear to the config/environments/development.rb.

Add the Kontent project ID and webhook secret to the .env alongside the Algolia environment variables:

KENTICO_PROJECT_ID=<your value here>
KENTICO_WEBHOOK_SECRET=<your value here>

Then start the local Rails server:

$ rails s

At this stage, we need to define our content types and create some test content items. You can restore from the backup in the demo project repo using kontent-cli to avoid doing this manually.

$ npx @kentico/kontent-cli backup --action restore --apiKey=<your value here> 
--projectId=<your value here> --name="kontent-blog-demo-backup"

> Note: You’ll need to enable the Management API in the Kontent project settings to generate the API key, and the --name parameter will automatically apply the .zip suffix.

If everything is working correctly, you should see activity in the ngrok and rails terminals, as the backup restoration triggers webhook requests for the newly created content items.

Once complete, you should have the demo content items in the Kontent project and corresponding Article records in the Rails database.

$ rails c
> Article.find_by(kentico_id: '18862937-3bc2-481e-9ca8-c177b813570a').title
=> "The 9 worst songs about clothing websites"

As well as in the Algolia index via the Web UI (which might need a hard refresh):

Debugging

If you run into issues, Kontent offers a debug log for each webhook to inspect any errors and copy the webhook request bodies, which allows you to test yourself locally with an HTTP client like Postman or cURL.

Finally, the Next.js UI

Algolia offers an InstantSearch component library that provides a customizable pre-built set of components for building a live search experience. As our front-end is a Next.js application, we’ll be using InstantSearch for React.

Algolia’s InstantSearch docs do an excellent job of explaining how to compose the InstantSearch components together into a search interface that works for your product, so we won’t cover that here. Instead, we’ll pull Algolia’s pre-built Next.js server-side rendering demo project and customize that for demonstration purposes.

$ yarn create next-app --example https://github.com/algolia/react-instantsearch/tree/master/examples/next client

Once the yarn create next-app has finished, we need to customize two files to connect Algolia’s demo project to our Algolia application.

First, we need to update the Algolia search client configuration in the pages/index.js to pull our Application ID, search API key from the environment config:

const searchClient = algoliasearch(
  '<your Application ID value here>',
  '<your Search-Only API Key value here>'
);

And then in the same file, the indexName in the DEFAULT_PROPS:

const DEFAULT_PROPS = {
  searchClient,
  indexName: "Article",
};

> Note: With the algoliasearch-rails gem, the index name will default to the model name, but if in doubt, the index names are visible within the Algolia web dashboard.

Second, we need to update the HitComponent markup in the components/app.js to use the attributes in our index, rather than the ones from the demo Algolia project:

const HitComponent = ({ hit }) => (
 <div className="hit">	
   <div>
     <div className="hit-picture">
       <img src={`${hit.image}`} />
     </div>
   </div>
   <div className="hit-content">
     <div>
       <a href={`/blog/${hit.url_slug}`}>
         <Highlight attribute="title" hit={hit} />
       </a>
     </div>
     <hr />
     <div className="hit-author">
       <span>
         By&nbsp;
         <Highlight attribute="author" hit={hit} />
       </span>
       <hr />
     </div>
     <div>Est. Read Time: {hit.estimated_reading_time_mins} mins</div>
   </div>
 </div>
);

And then in the same file update the RefinementList menu with our faceted attributes:

<div className="menu">
 <h2>Authors</h2>
 <RefinementList attribute="author" />
 <h2>Tags</h2>
 <RefinementList attribute="tags" />
 <h2>Est. Read Time</h2>
 <RefinementList attribute="estimated_reading_time_human_readable" />
</div>

Finally, Next.js and Rails both default to port 3000, so to run both at the same time, you’ll need to override the port for Next.js dev script in the package.json.

"scripts": {
   "dev": "next -p 3001",
   ...

With that, we have an end-to-end search solution. We can start the Next.js app locally with:

$ yarn dev

And view it in the browser at http://localhost:3001.

Success!

Any content updates published in Kontent are automatically propagated to Algolia via Rails, and users can filter the results in real-time as they type a search term and select from the faceted attributes.

You can find the demo project on GitHub if you want to try it out for yourself.

Extra credit

Hidden complexity—modular content

Currently, our Rails app will update the Algolia search index when users publish changes to Article content items within Kontent. However, we’re going to allow users to search and filter the blog articles by Tag and Author. What happens if these content items change? Currently, nothing.

To update the search index when Tags or Authors change, we need to update our webhook handling code so that instead of discarding webhook notifications for Tags and Authors, we update all of the Article items linked to that Tag or Author.

You find out more about indexing modular content with Kontent and Algolia on the Kontent blog.

Batch synchronisation

Whenever you’re maintaining a copy of data from the canonical source, it’s prudent to have a mechanism in place for refreshing the complete data set. This mechanism can be used to load the initial data set, but also in the future if you need to update every record following a change in structure to the canonical data.

In our Rails app, we’d add a rake task for this, which would fetch and loop through all of the Article content items to update Algolia.

However, we need to keep in mind that the Kontent Delivery API is restricted to a maximum of 2000 items per response. That includes any linked items, so if you’re requesting data with any linked items included, paginating in chunks of 2000 will not prevent an error.

The simple solution is to loop through in page sizes small enough to be confident that you won’t hit a response with more than 2000 items, e.g., 10, though this value should be tuned based on your specific content.

A more sophisticated approach is to implement an incremental back-off so that if you encounter a maximum items error, the request is retried with a smaller page size until it succeeds. This approach ensures success and should also be more performant but is more complex to implement.

We are Kyan, a technology agency powered by people.

Written by
Tom Marshall

I’m Head of Technology for Kyan, focusing on building technology that changes businesses for the better. I’m a Rubyist at heart, but I’m spending more and more time in the Jamstack space, primarily with Next.js.

More articles from Tom

Subscribe to Kontent Newsletter

Stay in the loop. Get the hottest updates while they’re fresh!