Fix infinite loop when parsing H4–H6 headings#6
Open
mohamorui wants to merge 1 commit intolovstudio:mainfrom
Open
Fix infinite loop when parsing H4–H6 headings#6mohamorui wants to merge 1 commit intolovstudio:mainfrom
mohamorui wants to merge 1 commit intolovstudio:mainfrom
Conversation
parse_md() had handlers only for H1–H3. Any line starting with ####,
##### or ###### fell into the paragraph collector, which immediately
broke on lines starting with '#', leaving `i` unchanged. The outer
`while i < len(lines)` then re-processed the same line forever
(100% CPU, no output), which broke conversion of documents using
sub-sub-section headings.
- Extend the section handler to match H3–H6 (`#{3,6}\s+`). Deeper
levels render with the H3 style so their content is preserved
instead of stalling the parser.
- Add a defensive `i += 1` in the paragraph branch so future
unmatched lines (e.g. `#!shebang` or a bare `#`) cannot stall
the loop again.
- Add tests/07-deep-headings.md as a regression guard covering
H3–H6, a table between deep headings, and deep headings with
`&` and inline code.
a392a70 to
713f026
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
parse_md()had handlers for H1–H3 only. Any line starting with####,#####or######(including most deeply-nested sections in real technical docs) would stall the parser in an infinite loop: the paragraph collector breaks on lines starting with#, soplinesstays empty andiis never advanced. The outerwhile i < len(lines)loop then re-processes the same line forever, pinning one CPU core at 100% with no output. Any Markdown containing an H4 becomes non-convertible.Reproduction
Minimal input that hangs on
main:Observed:
python3 md2pdf.py --input x.md --output x.pdf— process runs indefinitely at 100% CPU, never produces a PDF. Triggered in practice on a 1143-line SPEC where the first####heading is on line 212.Fix
lovstudio-any2pdf/scripts/md2pdf.py— extend the section handler to match#{3,6}\s+so H3–H6 share the H3 style (ReportLab has no H4–H6 styles defined here; deeper levels are rendered rather than dropped).lovstudio-any2pdf/scripts/md2pdf.py— defensivei += 1fallback in the paragraph branch, so any future unmatched line (#!shebang, bare#, unknown marker) cannot stall the loop again.tests/07-deep-headings.md— regression test covering H3–H6, a table between deep headings, and deep headings containing&and inline code.Net diff: +9/-3 in
md2pdf.py, plus one new test file.Verification
Ran on macOS 15 with Python 3.9 and reportlab 4.4.10.
#### Headingreproducertests/01–06× 4 themes (existing)tests/07-deep-headings× 4 themes (new)pdftotextextraction of new testTest plan
python -m py_compile)tests/01–06still produce PDFs acrosswarm-academic/nord-frost/tufte/github-lighttests/07-deep-headings.mdproduces PDFs across the same themes