-
Notifications
You must be signed in to change notification settings - Fork 680
Closed
Labels
Description
Description of the bug
Can not read the .docx file. It worked perfectly on v1.24.6.
Logs:
Traceback (most recent call last):
File "~/d.py", line 15, in <module>
print(extract_text(local_file_path))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "~/d.py", line 8, in extract_text
content += page.get_text() + '\n\n'
^^^^^^^^^^^^^^^
File "~/python3.11/site-packages/pymupdf/utils.py", line 798, in get_text
cb = page.cropbox
^^^^^^^^^^^^
File "~/python3.11/site-packages/pymupdf/__init__.py", line 8535, in cropbox
page = self._pdf_page()
^^^^^^^^^^^^^^^^
File "~/python3.11/site-packages/pymupdf/__init__.py", line 8051, in _pdf_page
return _as_pdf_page(self.this)
^^^^^^^^^^^^^^^^^^^^^^^
File "~/python3.11/site-packages/pymupdf/__init__.py", line 337, in _as_pdf_page
assert ret.m_internal
AssertionError
How to reproduce the bug
import fitz
def extract_text(file: str) -> str:
content = ""
with fitz.open(file) as document:
for page in document:
content += page.get_text() + '\n\n'
content = content.strip()
return content
if __name__ == '__main__':
local_file_path = '/path/to/TEST.docx'
print(extract_text(local_file_path))PyMuPDF version
1.24.7
Operating system
MacOS
Python version
3.11