[Extending Hexo For My Site] Part 1 - Better Mathjax Rendering

I am a heavy user of Mathjax. Mathjax is a library that renders Tex-compatible syntax into pretty equations in web scenarios. Hence I am always mixing up Markdown and Tex snippets in my writing. The annoying part is Tex snippets have low priority in my Markdown renderer, and are sometimes incorrectly rendered into Markdown elements. For instance, $a_1, a_2$ becomes $a1, a2$, where underscores within $...$ are mistakenly recognized as an emphasis element. A bunch of escaping is required to avoid the situation, which drives me mad. So I got to seek a permanant solution.

The first attempt was to add specialized logic for Tex snippets into Markdown renderer, but I soon found it hardly a neat workaround. The renderer I use is marked. The package itself does not work alone, but depends on hexo-renderer-marked as a wrapper. If I would add magic in marked, I had to fork and edit both marked and hexo-renderer-marked. Also, parsing Tex snippets is something beyond the duty of a Markdown renderer. There is no reason to nail the two stuff together, and I gave up.

Then I came up with another idea: what if I “guard” the Tex snippets before feeding the content into Markdown renderer? By “guarding” I mean to replace Tex snippets by something that marked ignores, and restore them after marked finishes rendering.

Actually, Hexo DOES provide such an official trick, though it’s not mentioned in the docs. In short, you can wrap a piece of content with a special tag <hexoPostRenderCodeBlock >...</hexoPostRenderCodeBlock> in a before_post_render filter, to prevent it from being parsed by renderer. The code that handles the trick can be found at hexo/post.js. Before taken over by renderer, all <hexoPostRenderCodeBlock > tags will be detached and replaced by some special HTML comment tags like <--code\uFFFCxxx-->. After renderer finishes, each comment tag is substituted back to inner content of corresponding <hexoPostRenderCodeBlock > tag. Such usage exists both in Hexo built-in plugin and in the wild.

Our goal is clear now: build a before_post_render filter, which finds out each piece of legitimate Tex snippets, either inline-level $...$ or block-level $$...$$, and wraps them with the magical <hexoPostRenderCodeBlock > tag. The matching can be realized using simple regular expressions:

const rBlockMath = /^\$\$([^\$]+?)\$\$/gm
const rInlineMath = /(?<!\$)\$([^\$\n]+?)\$(?!\$)/g

rInlineMath contains certain assertions in case block-level snippets mistakenly being interpreted as inline ones. Such concern can alternatively be addressed by first matching block-level snippets and then inline ones, but I would rather use assertions for better clarification.

Beyond the regular expressions, there is still some edge cases that should be carefully taken into consideration – $ within code blocks should never be touched. Unfortunately, it’s not straightforward (or impossible) to achieve with mere regular expressions, since code block context is too complicated to match. A workaround here is required, that is to detach all code blocks beforehand, then handle the Tex snippets, and lastly restore the code blocks.

It’s also worth noting that before_post_render filters from other plugins can be executed prior to us, and thus <hexoPostRenderCodeBlock > tags can pre-exist in the content, whose inner texts are also expected to remain untouched. Our workaround then needs to handle three cases, magical tags and two forms of code blocks (single-backticked and triple-backticked).

But in fact, it is enough to consider only two of them. There’s a less known filter bundled in the built-in plugin named backtick_code_block. The filter syntax-highlights triple-backticked code blocks and guards them with the magical tag. At the time our filter gets executed, triple-backticked code blocks will have already been wrapped in <hexoPostRenderCodeBlock >, and no need to handle.

So here’s the pipeline: 1) find out all existing <hexoPostRenderCodeBlock > and single-backticked code blocks, and detach them; 2) find out all Tex snippets (inline or block-level), and wrap them with <hexoPostRenderCodeBlock > tag; 3) restore the aforementioned detached items. It’s also critical to register the filter with priority larger than 10, so that backtick_code_block filter will be called before us.

The filter mathjax is packed in plugin hexo-enhanced and the code at lib/filter/before_post_render/mathjax.js. With it enabled, you can now write math equations on the fly just like in a Tex file, without any burden of escaping special characters.

Author: hsfzxjy.
Link: .
License: CC BY-NC-ND 4.0.
All rights reserved by the author.
Commercial use of this post in any form is NOT permitted.
Non-commercial use of this post should be attributed with this block of text.

«Rust - Python FFI From Scratch


A comment box should be right here...But it was gone due to network issues :-(If you want to leave comments, make sure you have access to disqus.com.