Skip to content

Conversation

@JorjMcKie
Copy link
Collaborator

These are important performance improvements for table handling:

Multiple standard methods in pymupdf's Rect class show grossly insufficient performance and slow down this module's work.
This is particularly true for rectangle containment and intersection checks.
We might consider re-assessing these places in PyMuPDF itself, but improving the table module is too urgent to wait. At first sight, the standard methods cover cases which may occur in a general context (e.g. also checking whether a Quad is contained in the rectangle), but not during table processing.

Other optimizations:
Multiple unnecessary extra text extractions have been replaced by look-ups in the character array (CHAR).

@JorjMcKie JorjMcKie force-pushed the table-performance branch 6 times, most recently from 705b890 to 16c0d56 Compare December 17, 2025 21:39
Adjust test script:
Function test_2979() will no longer show the tail of error messages, because of the avoided text extractions.
@JorjMcKie JorjMcKie merged commit af475a7 into main Dec 18, 2025
3 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Dec 18, 2025
@JorjMcKie JorjMcKie deleted the table-performance branch December 18, 2025 13:12
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants