Fix infinite loop when parsing H4–H6 headings by mohamorui · Pull Request #6 · lovstudio/any2pdf

mohamorui · 2026-04-14T03:11:51Z

Summary

parse_md() had handlers for H1–H3 only. Any line starting with ####, ##### or ###### (including most deeply-nested sections in real technical docs) would stall the parser in an infinite loop: the paragraph collector breaks on lines starting with #, so plines stays empty and i is never advanced. The outer while i < len(lines) loop then re-processes the same line forever, pinning one CPU core at 100% with no output. Any Markdown containing an H4 becomes non-convertible.

Reproduction

Minimal input that hangs on main:

# Title

#### Any H4 Heading

body

Observed: python3 md2pdf.py --input x.md --output x.pdf — process runs indefinitely at 100% CPU, never produces a PDF. Triggered in practice on a 1143-line SPEC where the first #### heading is on line 212.

Fix

lovstudio-any2pdf/scripts/md2pdf.py — extend the section handler to match #{3,6}\s+ so H3–H6 share the H3 style (ReportLab has no H4–H6 styles defined here; deeper levels are rendered rather than dropped).
lovstudio-any2pdf/scripts/md2pdf.py — defensive i += 1 fallback in the paragraph branch, so any future unmatched line (#!shebang, bare #, unknown marker) cannot stall the loop again.
tests/07-deep-headings.md — regression test covering H3–H6, a table between deep headings, and deep headings containing & and inline code.

Net diff: +9/-3 in md2pdf.py, plus one new test file.

Verification

Ran on macOS 15 with Python 3.9 and reportlab 4.4.10.

Case	Before	After
Minimal `#### Heading` reproducer	hang (100% CPU)	1 s → PDF
1143-line real SPEC with 14× H4	hang (>8 min, killed)	1 s → 0.5 MB PDF
`tests/01`–`06` × 4 themes (existing)	24/24 pass	24/24 pass (no regression)
`tests/07-deep-headings` × 4 themes (new)	N/A	4/4 pass
`pdftotext` extraction of new test	—	H3/H4/H5/H6 text all present

Test plan

Syntax check (python -m py_compile)
Existing tests/01–06 still produce PDFs across warm-academic / nord-frost / tufte / github-light
New tests/07-deep-headings.md produces PDFs across the same themes
Extracted text confirms H3/H4/H5/H6 content is actually rendered (not silently dropped)
Real-world 1143-line document that used to hang now converts in ~1 s

parse_md() had handlers only for H1–H3. Any line starting with ####, ##### or ###### fell into the paragraph collector, which immediately broke on lines starting with '#', leaving `i` unchanged. The outer `while i < len(lines)` then re-processed the same line forever (100% CPU, no output), which broke conversion of documents using sub-sub-section headings. - Extend the section handler to match H3–H6 (`#{3,6}\s+`). Deeper levels render with the H3 style so their content is preserved instead of stalling the parser. - Add a defensive `i += 1` in the paragraph branch so future unmatched lines (e.g. `#!shebang` or a bare `#`) cannot stall the loop again. - Add tests/07-deep-headings.md as a regression guard covering H3–H6, a table between deep headings, and deep headings with `&` and inline code.

mohamorui force-pushed the fix/h4-h6-infinite-loop branch 2 times, most recently from a392a70 to 713f026 Compare April 14, 2026 03:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix infinite loop when parsing H4–H6 headings#6

Fix infinite loop when parsing H4–H6 headings#6
mohamorui wants to merge 1 commit intolovstudio:mainfrom
mohamorui:fix/h4-h6-infinite-loop

mohamorui commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mohamorui commented Apr 14, 2026

Summary

Reproduction

Fix

Verification

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant