Skip to content

Render mermaid diagrams to inline SVG at build time#656

Closed
srid wants to merge 20 commits intomasterfrom
polite-claim
Closed

Render mermaid diagrams to inline SVG at build time#656
srid wants to merge 20 commits intomasterfrom
polite-claim

Conversation

@srid
Copy link
Copy Markdown
Owner

@srid srid commented Apr 25, 2026

Mermaid code blocks now render to inline SVG at build time — the generated site no longer needs the CDN-hosted mermaid.js to display diagrams and works fully offline. A new Emanote.Pandoc.Mermaid module walks every parsed note's AST and replaces ```mermaid blocks with the SVG produced by mmdc (mermaid-cli). The mmdc binary path is baked in at compile time via staticWhich — same convention as tailwindBin/storkBin — so the dep is enforced by the Nix flake rather than discovered at runtime. No fallback by environment: server-side rendering is the default, full stop. The retained js.mermaid snippet stays documented as an opt-in alternative for users who explicitly want client-side CDN rendering.

The transform runs inside parseNote's WriterT block, so per-diagram render failures join the same per-note error stream as parser failures and surface in the document-top error banner. A failing diagram is also rendered inline as a visible Mermaid rendering failed: message above its preserved source — readers see both signals at the right granularity.

Upstream fix: The mermaid SVG path tripped a long-standing bug in heist-extra's rawNode: xmlhtml errors with div cannot contain text looking like its end tag whenever a raw HTML blob contains a literal </div> (which mermaid's <foreignObject>-based labels always do). Fixed in srid/heist-extra#13 by switching the wrapper to a unique <rawhtml> element with display: contents. The flake input is pinned to that branch until the PR lands on master.

Deferred: The structural review (Hickey/Lowy lenses on Opus) flagged three follow-ups worth tracking — config knob over rendering mode (#653), content-addressed cache to avoid re-running mmdc on every parse (#654), and widening the renderer interface so this transform can move from parse-time to render-time (#655).

Closes #625, closes #119.

Try it locally

nix run github:srid/emanote/polite-claim

srid added 9 commits April 24, 2026 23:04
Walk every parsed note's Pandoc AST and replace `mermaid` code blocks
with `RawBlock html` containing the SVG produced by `mmdc` (mermaid-cli).
The generated site no longer needs the CDN-hosted `mermaid.js` to
display diagrams and works fully offline.

When `mmdc` is missing from PATH, a single warning is logged and the
code blocks are left untouched. Per-diagram failures preserve the
original block alongside an HTML comment carrying the underlying error.

The Nix flake pulls in `pkgs.mermaid-cli` as a build/runtime dependency.
The `js.mermaid` snippet stays for users who want client-side rendering
instead.

Closes #625
Run transformMermaidBlocks inside the WriterT [Text] block in
parseNote so per-diagram render failures join the same error list as
parser/filter errors. Reviewers and end users see one error stream per
note instead of two (logger vs. surfaced errs).

Missing mmdc remains a logger-only warning since it is an environment
issue, not a per-note problem.
Replace the HTML-comment-only diagnostic in mermaid-error blocks with a
visible Para that names the failure and shows mmdc's error inline. The
original code block is still preserved so the source survives.

The previous design hid the error message from the very humans who need
to act on it; the only signal was a logger line in the build output.
Rewrite stripXmlPrologue to handle any leading processing instruction
(<?...?>), not just <?xml ...?>, and to leave malformed prologues
untouched instead of silently truncating to empty when no terminator
is present. Cover both behaviours with tests, including
<?xml-stylesheet?> which mmdc may emit in future versions.
Make the implicit ordering contract at the call site explicit: mermaid
runs after preparePandoc and applyPandocFilters so user-defined Lua
filters can manipulate mermaid code blocks before they are baked into
SVG.
Drop the magic numbers (T.drop 2, T.drop 9) by using T.stripPrefix to
both detect and consume the prefix in one step. Future renames of the
literal can no longer silently desync from the length argument.
Hoist the ExitFailure case to the top so the happy path no longer
nests inside an outer case. Replace the LambdaCase on doesFileExist's
Bool with a plain if/else, which reads more naturally.
…tating comment

mmdcExe was bound once and used at exactly one call site, with the
literal "mmdc" still inlined in the warning message — so naming it
didn't deduplicate anything. errorMessage's docstring restated its
name; well-named identifiers already say what.
Drop hasMermaidBlock from exports. Its only consumer was a dedicated
test block that was redundant with the existing transformMermaidBlocks
no-op test (which exercises the same early-out path through the public
API). Keep stripXmlPrologue exported and rename the comment from "For
testing" to "Internal helpers (exported for unit tests)" — the helper
is a legitimately useful pure function with seven distinct edge cases.
@srid
Copy link
Copy Markdown
Owner Author

srid commented Apr 25, 2026

Hickey/Lowy Analysis

Reviewers ran on opus per --review-model=opus.

# Lens Finding Disposition
1 Hickey Fragmented error channel (logger vs WriterT errs) Fixed in this PR
2 Hickey Parse-time IO complects parsing with rendering Deferred #655
3 Hickey "Always-on if PATH" interleaves environment with semantics Deferred #653
4 Hickey No caching: render coupled to build Deferred #654
5 Hickey Div .mermaid-error packs three concerns; error hidden in HTML comment Fixed in this PR
6 Hickey stripXmlPrologue silently truncates malformed input Fixed in this PR
7 Hickey Implicit pipeline order across md/org branches Deferred (cosmetic)
8 Lowy Per-note opt-out unencapsulated; sequence volatility named but rigid Deferred #653
9 Lowy XML prologue stripping narrow (only <?xml ?>/<!DOCTYPE>) Fixed in this PR
10 Lowy Error message escaping incomplete No-op (subsumed by #5: visible message replaces HTML comment)
11 Lowy Pipeline placement is implicit at call site Fixed in this PR

Hickey rationale

The module itself is structurally clean; the wiring is where ease beat simplicity. The decisive complect was the parse-time hook (#2): SVG generation is "turn AST into a presentation artifact" and belongs at render time, but the existing PandocBlockRenderer returns Splice Identity and can't perform IO. The author chose parse-time as the at-hand fold rather than widening the render interface — a real ease-vs-simplicity trade. Fixing it in this PR would mean a separate refactor of the renderer machinery; deferred to #655.

The dual error channel (#1, fixed) and the hidden HTML-comment diagnostic (#5, fixed) were the two complects that could be unwound without architectural change. Errors now flow through a single WriterT [Text] m channel, and a failed diagram displays a visible Para above the preserved source instead of a comment readers never see. The malformed-input silent truncate in stripXmlPrologue (#6) was a classic "parser dressed as dropWhile" — replaced with T.breakOn returning the input unchanged on no-terminator.

The PATH-driven on/off behaviour (#3) is the remaining structural concern: the same .md produces different output on hosts with vs. without mmdc. A proper fold needs an explicit mermaid.renderer: server | client | off knob with server erroring loudly when mmdc is absent. Deferred to #653 because it's a config-feature design question worth handling on its own.

Lowy rationale

Volatility map: mmdc CLI surface (lives in runMmdc — correctly encapsulated as activity volatility), SVG format quirks (stripXmlPrologue — pure, narrow, now extended to handle arbitrary <?...?> PIs per #9), error envelope (renderBlock's Div .mermaid-error — correctly encapsulated). The boundary is sound and named after the right axis.

The two correctness gaps were patched in this PR: prologue robustness now leaves malformed prefixes untouched and consumes any leading processing instruction; the implicit ordering contract at the call site (#11) is now stated explicitly so a future reader doesn't have to reverse-engineer "mermaid runs after Lua filters because user filters get first crack at mermaid blocks." The error-escape concern (#10) became moot once #5's visible message replaced the HTML-comment diagnostic — there is no comment to break out of.

The remaining unfixed volatility is sequence (when does rendering happen) and per-site/per-note opt-out, both packaged into #653. Caching as a separate volatility axis lives in #654 — explicitly not inside transformMermaidBlocks if added later, since cache invalidation is a different rate of change from the AST traversal it would speed up.

@srid
Copy link
Copy Markdown
Owner Author

srid commented Apr 25, 2026

/do results

Step Status Duration Verification
sync 0s git fetch ok; forge=github
research 6m 44s Mapped renderer pipeline; hook point at Note.hs:parseNote with WriterT [Text]
branch 0s On feature branch polite-claim
implement 4m 17s New Emanote.Pandoc.Mermaid, wired in parseNote (md+org), Nix flake + tests
check 1m 55s cabal build all clean; pkgs.mermaid-cli v11.12.0
docs 43s Updated docs/tips/js/mermaid.md
fmt 27s fourmolu/cabal-fmt/hlint/nixpkgs-fmt all clean
commit 1m 46s Feature commit pushed
hickey+lowy 9m 59s Opus sub-agents; 4 Fix-in-PR commits, 3 deferred to #653 #654 #655
police 9m 47s rules+fact-check clean; 4 elegance refinements as separate commits
test 10s 51 examples, 0 failures
create-pr 2m 15s Draft PR opened with Hickey/Lowy analysis
ci 5m 48s flake-parts-docs (1m48s), e2e live (4m27s), e2e static (4m28s)
Total 44m 12s

Slowest step: hickey+lowy (9m 59s)

Optimization suggestions

  • Review steps dominated (hickey+lowy 10m + police 10m ≈ 45% of run): --review-model=opus was the right call given the structural depth surfaced (parse-time-IO complect, dual error channel, env-driven semantics), but on smaller diffs --review-model=sonnet would shave ~6-8 minutes with comparable rule/fact-check coverage. Reserve opus for diffs that cross module boundaries.
  • research at 6m 44s: two Explore subagents needed before the hook point was clear. Pre-reading Emanote.Pandoc.Renderer and Emanote.Model.Note before invoking /do would have collapsed this to one subagent or zero.
  • CI at 5m 48s is dominated by playwright browser install + e2e — caching is already configured. Re-runs on followups should hit cache and be ~3 minutes.
  • test at 10s confirms the unit suite is fast enough to run on every iteration; no incremental savings to be had there.

Workflow completed at $(date -u +%Y-%m-%dT%H:%M:%SZ).

srid added 11 commits April 25, 2026 08:45
Match the stork/tailwindcss convention: bake the absolute mmdc path in
at compile time via Template Haskell rather than resolve it at runtime
with findExecutable. The Nix flake already pins pkgs.mermaid-cli, so
the dep is a hard requirement either way — staticWhich just makes the
contract explicit.

Drops the runtime "mmdc not found" warning path, the missing-mmdc unit
test, and the MonadLogger constraint on transformMermaidBlocks.
xmlhtml's renderer at HTML/Render.hs:131-133 errors with "div cannot
contain text looking like its end tag" whenever heist-extra's `rawNode`
wraps a raw HTML blob that contains a literal `</div>`. mermaid SVG
output trips this every time via `<foreignObject><div>...</div>`, so
accessing any page with a diagram serves a truncated response and
ERR_INCOMPLETE_CHUNKED_ENCODING.

Pin to srid/heist-extra#13 (rawNode now wraps in `<rawhtml>` with
`display: contents`) until that PR lands on master.
Add a notebook fixture with one mermaid code block and a smoke
scenario that asserts the rendered article contains an inline <svg>
and no leftover <code class="language-mermaid"> source. Either alone
could be satisfied by the wrong rendering strategy — the conjunction
nails the inline-SVG path.

The success path can't run from the Haskell unit suite (no mmdc on
test PATH, no browser to validate the DOM), so e2e is the only place
this end-to-end check belongs. The suite already runs in both `live`
and `static` modes, so we get coverage for the dev-server and
generated-output paths simultaneously.
GitHub Actions runners (and most containerised CI) don't ship the
Chromium SUID sandbox helper with the right ownership/setuid bits, so
puppeteer aborts the launch with "The SUID sandbox helper binary was
found, but is not configured correctly." Static-mode e2e fails on
every CI run as a result; live-mode happens not to exercise the
diagram path before the suite ends.

Write a one-off puppeteer config alongside the diagram input and pass
it to mmdc with -p. mmdc only renders trusted local mermaid source —
opting out of the sandbox is safe.
The main mermaid doc now describes only the default static path
(inline SVG via mmdc, the recommended approach). The js.mermaid
snippet usage moves to docs/tips/js/mermaid/client-side.md, called
out as the alternative for environments that can't ship mmdc or
need browser-side interactive rendering. Drops the leftover
page.bodyHtml frontmatter from the main page (no longer needed
under the static path) and adds a wikilink across.
A page (or any ancestor index.yaml) can now set `mermaid.static: false`
to skip build-time SVG rendering and leave mermaid code blocks intact
for client-side JavaScript rendering. Defaults to true.

Threads the page meta into transformMermaidBlocks so the check can run
before walking the AST, and adds a unit test that asserts the opt-out
path returns the document unchanged.

Subset of the planned mermaid.renderer config in #653 — implements the
on/off boolean now so the client-side docs page (next commit) has a way
to actually demo the JS rendering path without my static path eating
the source first.
mermaid is now static SVG by default; math has rendered to native
MathML for a while. Neither requires client-side JavaScript, so the
"JS behaviours" section is misleading. Move the pages up:

  tips/js/math.md      → tips/math.md
  tips/js/mermaid.md   → tips/mermaid.md
  tips/js/mermaid/client-side.md  → tips/mermaid/client-side.md

Drop tips/js.md (the section overview) entirely. _redirects keeps
the legacy /tips/js/* URLs working.

The main mermaid.md is trimmed to user-facing description (drops
Nix flake / staticWhich implementation detail) and points to
client-side.md via a [!tip] callout for users who want the JS path.
client-side.md becomes a real demo: it sets `mermaid.static: false`
plus the js.mermaid snippet in its frontmatter and renders the same
diagrams as the main page, live in the browser.

Also fixes the orgmode.org cross-reference to the new math path.
…-only

Two related fixes for nix build .#docs:

1. Add more puppeteer args (--disable-dev-shm-usage, --disable-gpu,
   --disable-crash-reporter, --no-zygote, --single-process) on top of
   --no-sandbox / --disable-setuid-sandbox. The Nix build sandbox lacks
   /dev/shm and IPC capabilities Chromium expects, and the GitHub Actions
   runners we already had to placate. None of these touch sandboxing
   for mmdc's actual rendering — that runs against trusted local source.

2. Stop feeding mermaid render failures into the per-note error list.
   checkBadMarkdownFiles treats anything in there as fatal and calls
   exitFailure, so a single transient Chromium hiccup tanked the whole
   site build. Errors now go to the logger plus the visible inline
   "Mermaid rendering failed:" Para — readers and authors still see
   them at the right granularity, but the build keeps going.

Trade-off: regresses the Hickey-#1 unification of error channels from
the earlier review pass. Worth it: parse errors really should fail the
build, render errors really shouldn't.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Render Mermaid diagrams to inline SVG at build time (offline support) <div></div> cannot used in raw HTML

1 participant