From the beginning of this blog I wanted to provide some kind of full text search in order to give my users the ability to find stuff by keyword.
There are a few Hexo plugins that have approached the subject, but it was not really satisfactory and performant. So I relied on the worlds biggest search engine: Google. A search button, which drives out a small input field and with the pressing of the ENTER key the form was sent via GET to
The procedure was simple, but also burdened with the fact that I always expose my users to Google. At least until now … :)
--- Liam Bigelow @ pagefind.app
Pagefind is a fully static search library that aims to perform well on large sites, while using as little of your users’ bandwidth as possible, and without hosting any infrastructure …
Pardon me? A full text search for SSG’s running completely in the browser? It sounded so great that I had to try it right away. And what can I say … it not only works fantastically well, but is also extremely easy to implement. Of course, you have to consider a few things, especially with regard to the SSG Hexo I use, but I didn’t find any big hurdles, also because the tool is so well documented. Let’s see what my implementation looks like…
… but Liam can explain better how it works:
First of all I decided to store all necessary parameters in a supported config file in the root of my blog project.
source defines the relative folder where all static files are created during the build and which should now be indexed.
bundle_dir overrides the default storage folder called
_pagefind, which is created in the build folder for the search files. This is necessary because my blog is built and hosted on Github Pages and the responsible GitHub Action goes over folders with a starting underscore on deployment. More info on that here and here.
exclude_selectors is a list of all those page elements whose content should NOT be indexed, but more about that later.
With another setting called
glob it is possible to tell Pagefind which files to index, but this currently has its pitfalls when trying to exclude some. Liam already has this on the screen for one of the next versions.
A post on a web page never stands alone, but is surrounded by other elements such as navigation, further links, etc. However, these addional elements should not end up in the index. Pagefind skips some of them like
script automatically, but there always remain some that should be excluded by hand.
Best option to narrow down the indexable content is the use of the attribute
data-pagefind-body. Instead of excluding something, tell Pagefind what to include. However, this approach makes it easier, but also has consequences:
If data-pagefind-body is found anywhere on your site, any pages without this attribute will be removed from your index.
In my case, I had a few places in my templates that I needed to add the attribute to:
All elements inside of the elements attributed like that, I had to exclude via the setting
exclude_selectors in the config (see above).
It was important to me to show the date of a post in the search result, because nothing is as inaccurate as a post that is many years old. With Pagefind you select the HTML element in the templates in which the meta value is located and attribute it with
data-pagefind-meta, for example:
<time class="published dt-published" itemprop="datePublished"
As title for the search hit Pagefind searches for H1 tags and takes the value of the last tag it finds. If you are not sure that there is always only one H1 tag on the page (and for me it is), then you better specify which tag it should take:
<h1 class="<%= class_name %>" itemprop="name" data-pagefind-meta="title">
Thus, on specifying meta data you can refer not only to the content of a tag, but also to other attributes. Here’s the example for my special Hexo implementation for header images:
<img id="header-photo" data-pagefind-meta="image[src], image_alt[alt]"
In case there is simply no element that contains the meta value, you can also specify it within the attribute:
<article id="note-<%= page.slug %>" itemprop="blogPost"
The following code is the basic structure of the search page as suggested by Pagefind:
<link href="/pagefind/pagefind-ui.css" rel="stylesheet">
To accommodate this in Hexo, it is advisable to use your own template and generate the page with an appropriate generator. A standard PAGE in Markdown format is only conditionally suitable for this, because links and scripts are needed. I described how to implement such a generator that renders descriptive Markdown in addition to the EJS template in my post Pattern for dynamic Hexo pages, and I’ve taken that approach here as well.
For simplicity, I won’t list the full code here, but link to my blog’s GitHub repo:
|Markdown file with Frontmatter data and introduction text|
|Layout template for search page|
|Hexo Generator for creating the page during build|
|Customized CSS Variables and style overrides|
For the visual customization of the user interface Pagefind provides some CSS variables in the automatically generated CSS file. These help a bit to customize the UI to your own ideas, but I decided to override some of the styles in a seperate file called
_pagefind.styl, which will be bundled via
@import "_pagefind" in the main
Since the main bundled CSS file is loaded in the HEAD before the
_pagefind.css somewhere in the page, for simplicity I first made sure to pull the overrides with
!important. This is not yet pretty and I will have to revise this later on.
Thus prepared, the rest is a piece of cake. Pagefind does not need to be installed, because if you call the npm package via npx, the latest version will be downloaded and executed automatically. You just have to make sure that the hexo build has run before. The best way is to run the following command:
hexo clean && hexo generate &&
This my result in the console:
Running Pagefind v0.10.7 (Extended)
Since I am hosting this blog on GitHub Pages and the complete build and deployment is done by a GitHub Action, I added a step to the
hexo-build job in the workflow file so that after the build Pagefind indexes the result:
Thankfully, in his article on Pagefind, Bryce also put me on the track of how to prevent a possible security prompt caused by npx from blocking Pagefind to run … npm_config_yes=true.
I hope that my explanation has inspired you to try it out for yourself on your Hexo driven blog or website. If you need some help or advice, drop me a line…
- CloudCannon: Pagefind
- Liam Bigelow: Introducing Pagefind: Static Low-bandwidth Search at Scale
- Bryce Wray: Pagefind is quite a find for site search
- Bryce Wray: Sweeter searches with Pagefind
- Nicolas Deville: Pagefind