Making PDFs Searchable Using OCR (CLI)

**ernz** · October 6th, 2014

This has taken be the entire evening to figure out. It appears that when it comes to linux, there are dozens of paid-for options or dead FOSS projects.

Lots of old tutorials reference tesseract, hocr, gs conversions, cuneiform, scantailor. I have tried many MANY combinations of all of these and they always have something wrong. Resolutions are changed, the overlaid hocr mappings are off or the font size if huge, etc. Many of the how-to's are now 8+ years old and not of any use any more.

I finally settled on https://github.com/fritz-hh/OCRmyPDF

It's a beautilfully written python extension with lots of cli flexibility. It does have lots of dependencies, but it works exceptionally well!

**TheFu** · October 7th, 2014

gscan2pdf didn't work for you?
It worked on my 10.04 box, but I haven't tried it on the fresh 14.04 install.

Thread: Making PDFs Searchable Using OCR (CLI)

Thread Tools

Display

Making PDFs Searchable Using OCR (CLI)

Re: Making PDFs Searchable Using OCR (CLI)

Tags for this Thread

Bookmarks

Bookmarks

Posting Permissions