View Full Version : [all variants] Pdf ocr

August 14th, 2009, 04:56 AM

I own a tablet, and i have several books i'd like to scan so i can read/highlight them on the computer.

I want to scan them to a pdf, and use OCR to make it searchable. I want to keep the PDF format though, instead of just spitting the text out to say, a .txt file.

Can someone recommend a good OCR program that will recognize the text in a pdf, yet still keep the format, images, the way they are?


September 16th, 2009, 11:47 AM
ABBYY released beta version of FineReader Online – www.finereaderonline.com (http://ubuntuforums.org/www.finereaderonline.com)

The service converts scanned documents and digital images online – so it is suitable for any OS. The OCR quality is really very good. I’m not sure if it recognizes PDFs, but you can scan to any other format (for ex. TIFF, which is supported).

For now FineReader Online supports 6 recognition languages and also can process multilingual documents.

Hope it will be helpful for you ;)

April 18th, 2010, 08:03 PM
For future reference, I have made a script called pdfocr which does this (performs OCR on a PDF file and embeds the text back into the PDF file). My guide for it is located at http://ubuntuforums.org/showthread.php?t=1456756