Skip to content

Corrupted PDF file export with hOCR data #486

@tukusejssirs

Description

@tukusejssirs

@manisandro, I believe it is either not yet fixed or not fully fixed.

I installed gIR v3.3.1 (GTK) from Fedora repos, loaded 846 images (mostly text, three simple tables + front and back covers as images). All images processed in ST Advanced and in 600 dpi. An image size is around 500 KiB, 5 images are between 1.2 and 2.9 MiB and the covers are 16 MiB and 36 MiB respectively. Altogether, their size is 251 MiB.

gIR crashes a lot, loading hOCR HTML takes some time (I have 10 GB RAM installed in this computer, 2 cores, 4 threads).

However, sometimes it works (albeit slowly). I’ve exported the PDF file, but both Evince and Adobe Acrobat Reader (both on Linux) fails to open the file and say that it is corrupted/damaged.

I have no idea how to generate a PDF file with hOCR data. I’ve read about hocr-tools package and its hocr-pdf command; and about hocr2pdf from exactImage package, but I could not make such PDF (from either image files or pre-created PDF file and from hOCR data).

Originally posted by @tukusejssirs in #424 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions