Skip to content

Conversation

@jonnythebard
Copy link

@jonnythebard jonnythebard commented Jun 30, 2018

I've been processing over 50k of pdf file with PyPDF2 for last several weeks and found it isn't filtering some malformed pdf file. The problem with malformed pdf file was that it had %%EOF marker at the beginning followed by 30m bytes of b'\x00'. Current version of PyPDF2 tries to travel all the way though 30m bytes of b'\x00' and find %%EOF. Since %%EOF marker should appear in last 1k of the file i thought it would make sense to add last1K limit to readNextEndLine function. i applied this to my application and it works fine.

@elyssonmr
Copy link

Looking forward for this Pull request to be accepted

@jonnythebard jonnythebard force-pushed the master branch 2 times, most recently from 272851b to bd3ae44 Compare December 19, 2019 04:54
@MartinThoma MartinThoma added is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF Tiny Pull requests that make a tiny change - and thus should be easy to merge labels Apr 6, 2022
@MartinThoma
Copy link
Member

Have you seen #642 ? What do you think about it?

@jonnythebard
Copy link
Author

@MartinThoma Yes it looks much better than mine because my commit replaces a condition in the if phrase instead of adding one. I didn't notice my mistake. Glad that someone is finally being aware of this issue though 😂

@codecov-commenter
Copy link

codecov-commenter commented Apr 16, 2022

Codecov Report

Merging #439 (8c2cc97) into main (d5a5eea) will not change coverage.
The diff coverage is 80.00%.

@@           Coverage Diff           @@
##             main     #439   +/-   ##
=======================================
  Coverage   70.59%   70.59%           
=======================================
  Files          10       10           
  Lines        3425     3425           
  Branches      798      798           
=======================================
  Hits         2418     2418           
  Misses        763      763           
  Partials      244      244           
Impacted Files Coverage Δ
PyPDF2/pdf.py 72.42% <80.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d5a5eea...8c2cc97. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF Tiny Pull requests that make a tiny change - and thus should be easy to merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PdfFileReader keep looking for "%%EOF" on more than the last 1024 bytes of stream in malformed PDF files

4 participants