Corrupted PDF file export with hOCR data

@manisandro, I believe it is either not yet fixed or not _fully_ fixed.

I installed gIR v3.3.1 (GTK) from Fedora repos, loaded 846 images (mostly text, three simple tables + front and back covers as images). All images processed in ST Advanced and in 600 dpi. An image size is around 500 KiB, 5 images are between 1.2 and 2.9 MiB and the covers are 16 MiB and 36 MiB respectively. Altogether, their size is 251 MiB.

gIR crashes a lot, loading hOCR HTML takes some time (I have 10 GB RAM installed in this computer, 2 cores, 4 threads).

However, sometimes it works (albeit slowly). I’ve exported the PDF file, but both Evince and Adobe Acrobat Reader (both on Linux) fails to open the file and say that it is corrupted/damaged.

I have no idea how to generate a PDF file with hOCR data. I’ve read about `hocr-tools` package and its `hocr-pdf` command; and about `hocr2pdf` from `exactImage` package, but I could not make such PDF (from either image files or pre-created PDF file and from hOCR data).

_Originally posted by @tukusejssirs in https://github.com/manisandro/gImageReader/issues/424#issuecomment-758819893_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Corrupted PDF file export with hOCR data #486

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Corrupted PDF file export with hOCR data #486

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions