Skip to content

Fix infinite loop when parsing H4–H6 headings#6

Open
mohamorui wants to merge 1 commit intolovstudio:mainfrom
mohamorui:fix/h4-h6-infinite-loop
Open

Fix infinite loop when parsing H4–H6 headings#6
mohamorui wants to merge 1 commit intolovstudio:mainfrom
mohamorui:fix/h4-h6-infinite-loop

Conversation

@mohamorui
Copy link
Copy Markdown

Summary

parse_md() had handlers for H1–H3 only. Any line starting with ####, ##### or ###### (including most deeply-nested sections in real technical docs) would stall the parser in an infinite loop: the paragraph collector breaks on lines starting with #, so plines stays empty and i is never advanced. The outer while i < len(lines) loop then re-processes the same line forever, pinning one CPU core at 100% with no output. Any Markdown containing an H4 becomes non-convertible.

Reproduction

Minimal input that hangs on main:

# Title

#### Any H4 Heading

body

Observed: python3 md2pdf.py --input x.md --output x.pdf — process runs indefinitely at 100% CPU, never produces a PDF. Triggered in practice on a 1143-line SPEC where the first #### heading is on line 212.

Fix

  1. lovstudio-any2pdf/scripts/md2pdf.py — extend the section handler to match #{3,6}\s+ so H3–H6 share the H3 style (ReportLab has no H4–H6 styles defined here; deeper levels are rendered rather than dropped).
  2. lovstudio-any2pdf/scripts/md2pdf.py — defensive i += 1 fallback in the paragraph branch, so any future unmatched line (#!shebang, bare #, unknown marker) cannot stall the loop again.
  3. tests/07-deep-headings.md — regression test covering H3–H6, a table between deep headings, and deep headings containing & and inline code.

Net diff: +9/-3 in md2pdf.py, plus one new test file.

Verification

Ran on macOS 15 with Python 3.9 and reportlab 4.4.10.

Case Before After
Minimal #### Heading reproducer hang (100% CPU) 1 s → PDF
1143-line real SPEC with 14× H4 hang (>8 min, killed) 1 s → 0.5 MB PDF
tests/0106 × 4 themes (existing) 24/24 pass 24/24 pass (no regression)
tests/07-deep-headings × 4 themes (new) N/A 4/4 pass
pdftotext extraction of new test H3/H4/H5/H6 text all present

Test plan

  • Syntax check (python -m py_compile)
  • Existing tests/0106 still produce PDFs across warm-academic / nord-frost / tufte / github-light
  • New tests/07-deep-headings.md produces PDFs across the same themes
  • Extracted text confirms H3/H4/H5/H6 content is actually rendered (not silently dropped)
  • Real-world 1143-line document that used to hang now converts in ~1 s

parse_md() had handlers only for H1–H3. Any line starting with ####,
##### or ###### fell into the paragraph collector, which immediately
broke on lines starting with '#', leaving `i` unchanged. The outer
`while i < len(lines)` then re-processed the same line forever
(100% CPU, no output), which broke conversion of documents using
sub-sub-section headings.

- Extend the section handler to match H3–H6 (`#{3,6}\s+`). Deeper
  levels render with the H3 style so their content is preserved
  instead of stalling the parser.
- Add a defensive `i += 1` in the paragraph branch so future
  unmatched lines (e.g. `#!shebang` or a bare `#`) cannot stall
  the loop again.
- Add tests/07-deep-headings.md as a regression guard covering
  H3–H6, a table between deep headings, and deep headings with
  `&` and inline code.
@mohamorui mohamorui force-pushed the fix/h4-h6-infinite-loop branch 2 times, most recently from a392a70 to 713f026 Compare April 14, 2026 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant