Summary
extractMarkdownText runs heading-marker stripping before list-marker stripping in sequence, on the same line. When a heading begins with a number (e.g. ### 1. How well are...), the heading regex turns it into 1. How well are... and then the numbered-list regex strips the 1. away, leaving How well are.... The HTML side keeps 1. as part of the heading text, so the segments don't match.
Repro
Source markdown:
### 1. How well are key programming languages supported by code examples?
Hugo renders:
<h3 id="1-how-well-are-key-programming-languages-supported-by-code-examples">1. How well are key programming languages supported by code examples?</h3>
extractHtmlText segment: 1. How well are key programming languages supported by code examples?
extractMarkdownText flow on this line:
^#{1,6}\s+ strips ### → 1. How well are key programming languages supported by code examples?
^[\s]*\d+\.\s+ strips 1. → How well are key programming languages supported by code examples?
Result: HTML has the 1. prefix, markdown doesn't. Substring containment fails.
Why this is a bug
The numbered-list regex was meant to strip 1. from list items like 1. First thing so the item content matches <li>First thing</li>. It's wrong to apply it to text that came from a heading, because in <h3>1. How well…</h3> the 1. is part of the heading text, not list markup.
Both forms are valid markdown for different reasons:
- Authors who want numbered headings (research questions, RFC sections, "Step 1:") legitimately write
### 1. Title.
- The HTML renderer preserves the literal text inside the
<h3>.
Suggested fix
Track which lines were headings and skip list-marker stripping on those lines. Two ways to do it:
Option A — placeholder-based, like the code protection
// Replace heading lines with placeholders before any other stripping
const headings = [];
text = text.replace(/^#{1,6}\s+(.*)$/gm, (_m, content) => {
const idx = headings.length;
headings.push(content);
return `\x00HEAD${idx}\x00`;
});
// ...all other stripping (bullets, numbered lists, emphasis, etc.)...
// Restore heading text
text = text.replace(/\x00HEAD(\d+)\x00/g, (_m, idx) => headings[parseInt(idx, 10)]);
This guarantees no later regex touches heading content.
Option B — line-by-line state
Process line by line. If a line started with ^#{1,6}\s+, after stripping the marker, do not run the bullet/numbered-list regexes against it.
Option A is more in line with how the existing code already protects code spans/blocks.
Repro material
Site: https://dacharycarey.com. Affected post: https://dacharycarey.com/2025/09/07/audit-conclusions/. Four H3 headings of the form ### 1. ... through ### 4. ... produce four false "missing" segments. Removing the leading numbers from the headings is a workaround but loses author intent (these are numbered research questions, and the surrounding prose refers to them by number).
Related
Summary
extractMarkdownTextruns heading-marker stripping before list-marker stripping in sequence, on the same line. When a heading begins with a number (e.g.### 1. How well are...), the heading regex turns it into1. How well are...and then the numbered-list regex strips the1.away, leavingHow well are.... The HTML side keeps1.as part of the heading text, so the segments don't match.Repro
Source markdown:
### 1. How well are key programming languages supported by code examples?Hugo renders:
extractHtmlTextsegment:1. How well are key programming languages supported by code examples?extractMarkdownTextflow on this line:^#{1,6}\s+strips###→1. How well are key programming languages supported by code examples?^[\s]*\d+\.\s+strips1.→How well are key programming languages supported by code examples?Result: HTML has the
1.prefix, markdown doesn't. Substring containment fails.Why this is a bug
The numbered-list regex was meant to strip
1.from list items like1. First thingso the item content matches<li>First thing</li>. It's wrong to apply it to text that came from a heading, because in<h3>1. How well…</h3>the1.is part of the heading text, not list markup.Both forms are valid markdown for different reasons:
### 1. Title.<h3>.Suggested fix
Track which lines were headings and skip list-marker stripping on those lines. Two ways to do it:
Option A — placeholder-based, like the code protection
This guarantees no later regex touches heading content.
Option B — line-by-line state
Process line by line. If a line started with
^#{1,6}\s+, after stripping the marker, do not run the bullet/numbered-list regexes against it.Option A is more in line with how the existing code already protects code spans/blocks.
Repro material
Site: https://dacharycarey.com. Affected post: https://dacharycarey.com/2025/09/07/audit-conclusions/. Four H3 headings of the form
### 1. ...through### 4. ...produce four false "missing" segments. Removing the leading numbers from the headings is a workaround but loses author intent (these are numbered research questions, and the surrounding prose refers to them by number).Related
_emphasis_not stripped, causes false 'missing' on CommonMark-valid prose #89 (CommonMark_emphasis_not stripped) — same family of "extractMarkdownText doesn't fully match what the HTML renderer produces"<tag>code spans get text-stripped, causing false 'missing' #90 (inline`<tag>`code spans get text-stripped on HTML side) — the other half of the audit-conclusions parity warning