Tulips
A New Blog

Convert HTML into Plain Text in Hexo

Render Page.Excerpt into META Tag

Hexo, on which this blog is based, is a Static Site Generator (SSG) that generates a whole structure of HTML files from individual Markdown files in which the articles were written. Besides the actual posts, also overview pages like the archives and others. For the latter, however, it only needs an excerpt from the actual article, which Hexo automatically creates from the initially generated HTML content and which is also available as HTML.

For my Page Meta dialog, however, I recently needed the excerpt as plain text to make it easier to transfer it manually to a Mastodon post, for example. My initial attempts to extract the plain text from the original Markdown turned out to be quite difficult, because in Hexo not only Markdown is used, but also special Tag Plugins in Nunjucks format and of course plain HTML. Long speech, short sense … after the first dozen RegEx-Replace calls, I got doubts to be on the right way and remembered Page.Excerpt, the variant already generated by Hexo in HTML.

Now you would think that JavaScript has some built-in function to extract the plain text out of a bunch of HTML tags, but this is actually not the case. You have to take a little detour to do this:

function convertHtml2PlainText(excerpt) {  
  let e = document.createElement("div");
  e.innerHTML = excerpt;
  return e.textContent || e.innerText;
}

let plainText = convertHtml2PlainText(page.excerpt);

Fine, my problem is solved … hmm… NO, because Node.js does not know a document, because a DOM exists only in the browser. But … there are libraries like jsdom that make a DOM available in Node.js:

const { JSDOM } = require("jsdom");

function convertHtml2PlainText(excerpt) {
  const dom = new JSDOM('<!DOCTYPE html>');
  let e = dom.window.document.createElement("div");
  e.innerHTML = excerpt;
  return e.textContent || e.innerText;
}

let plainText = convertHtml2PlainText(page.excerpt);

Nice … but also doesn’t work, because I need the piece of code in an EJS template, but when processing the same to HTML, the included JavaScript code is executed, but loading external libraries via require() is not supported.

And once again Hexo’s Tag Helpers come to my rescue:

helper-excerpt-plain.js
const { JSDOM } = require("jsdom"); hexo.extend.helper.register('excerpt_plain', function(excerpt){ const dom = new JSDOM('<!DOCTYPE html>'); let e = dom.window.document.createElement("div"); e.innerHTML = excerpt; return e.textContent || e.innerText; });

For the sake of beauty, I also cut out leading and double line breaks after the conversion and put the result in a custom meta tag in the head of the HTML page to have access to it later via JavaScript running in the browser:

head.ejs
... let excerpt = excerpt_plain(page.excerpt) .replace(/^(\r\n|\n|\r)/, "") // Remove leading break .replace(/(\r\n|\n|\r){2,}/g, " ") // Remove multiple breaks .trim(); ... <meta name="excerpt" content="<%= excerpt %>"> ...

Et voilá … I have my excerpt as plain text to show in my Page Meta dialog.

Syndication

You can interact with this article (applause, criticism, whatever) by mention it in one of your posts or by replying to its syndication on Mastodon, which will be shown here as a Webmention.

In case your blog software can't send Webmentions, you can use this form or send it manually via webmention.app or Telegraph:

Webmentions

No Webmentions yet...

Related