Skip to content

fix: PDF Unicode font + malformed Markdown fallback#34

Draft
lklimek wants to merge 2 commits into
mainfrom
fix/pdf-unicode-and-malformed-md
Draft

fix: PDF Unicode font + malformed Markdown fallback#34
lklimek wants to merge 2 commits into
mainfrom
fix/pdf-unicode-and-malformed-md

Conversation

@lklimek
Copy link
Copy Markdown
Owner

@lklimek lklimek commented May 5, 2026

Summary

  • QA-004 (Unicode TTF fonts in PDF): scripts/generate_review_report.py now auto-discovers a Unicode TrueType font via pdfmetrics.registerFont / registerFontFamily. Discovery order: $CLAUDIUS_PDF_FONT env override -> bundled scripts/fonts/DejaVuSans.ttf -> common Linux paths (/usr/share/fonts/truetype/dejavu, /usr/share/fonts/truetype/noto). ParagraphStyles now flow through an F dict so emoji and non-Latin scripts (Cyrillic, Arabic, Hebrew, etc.) render instead of tofu boxes. Falls back to Helvetica/Courier with a stderr warning when no Unicode TTF is available -- never crashes.
  • QA-005 (Malformed-Markdown fallback): render_markdown_to_reportlab() is now wrapped in try/except. On any failure (parser exception, ReportLab mini-XML rejection, unclosed fence, etc.) it logs a warning and falls back to a single XML-escaped preformatted block containing the raw source -- content is preserved, never silently swallowed.
  • Smoke-tested end-to-end with a payload containing emoji, Cyrillic, CJK, Arabic, and an unclosed code fence: DejaVuSans is embedded in the PDF, all scripts render, and the malformed-MD path renders as escaped plaintext as designed.

Bundled CJK / emoji-color TTFs are out of scope for this PR -- DejaVuSans covers Latin + Cyrillic + Greek + a useful subset of symbols. Users who need full CJK or color emoji can drop a TTF anywhere on disk and point CLAUDIUS_PDF_FONT at it.

Test plan

  • Helper unit tests for font auto-discovery (env override, bundled path, system path, no-font fallback warning)
  • Helper unit tests for render_markdown_to_reportlab fallback (malformed input -> escaped pre-block, no exception bubbles)
  • End-to-end PDF generation with multi-script content (Latin / Cyrillic / CJK / Arabic / emoji) -- DejaVuSans embedded, glyphs render
  • End-to-end PDF generation with malformed Markdown (unclosed fence) -- fallback path engages, content preserved
  • Reviewer spot-check on a real review report PDF in their own environment

🤖 Co-authored by Claudius the Magnificent AI Agent

claude added 2 commits May 5, 2026 11:03
QA-004: register a Unicode TrueType font (DejaVu Sans by default, with
bold/italic/mono siblings via pdfmetrics.registerFontFamily) so emoji
and non-Latin scripts (Cyrillic, Arabic, CJK ranges that the chosen
face supports) render correctly in PDF output instead of as tofu boxes.
Discovery order: $CLAUDIUS_PDF_FONT env override -> bundled
scripts/fonts/DejaVuSans.ttf -> common Linux locations
(/usr/share/fonts/truetype/dejavu, /usr/share/fonts/truetype/noto).
Falls back to Helvetica/Courier with a stderr warning when no TTF is
found -- never crashes.

QA-005: render_markdown_to_reportlab() now wraps the Markdown -> HTML
-> ReportLab pass in try/except. On parser failure, mini-XML rejection,
or any other exception it logs a warning and falls back to a single
XML-escaped preformatted block containing the raw source so no content
is silently swallowed.

Bumps plugin.json patch version (3.14.0 -> 3.14.1) and adds an
[Unreleased] CHANGELOG entry. The version bump may need reconciling
if another concurrent change also bumps the patch level -- pick the
next free patch at merge time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Skip 3.14.1 (claimed by PR #33) and 3.14.2 (reserved for #33 follow-up).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants