Skip to content

issue with extract_text #5

@rsteca

Description

@rsteca

When doing:

import doc2text
doc = doc2text.Document()
doc.read('something.pdf')
doc.process()
doc.extract_text()

I get the following error:

AttributeError                            Traceback (most recent call last)
<ipython-input-5-57184997370d> in <module>()
----> 1 doc.extract_text()

/usr/local/lib/python2.7/dist-packages/doc2text/__init__.pyc in extract_text(self)
     89             for page in self.processed_pages:
     90                 new = page
---> 91                 text = new.extract_text()
     92                 self.page_content.append(text)
     93         else:

/usr/local/lib/python2.7/dist-packages/doc2text/page.pyc in extract_text(self)
     36     def extract_text(self):
     37         temp_path = 'text_temp.png'
---> 38         cv2.imwrite(temp_path, self.image)
     39         self.text = pytesseract.image_to_string(Image.open(temp_path))
     40         os.remove(temp_path)

AttributeError: Page instance has no attribute 'image'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions