PDA

View Full Version : [ubuntu] installing Tesserect OCR On ubantu



programmer_007
March 31st, 2010, 09:53 AM
can anyone guide me how to install Tesserect engine on Ubantu? i am a newbie on ubantu

ajgreeny
March 31st, 2010, 11:53 AM
Just use whatever way you prefer to install tesseract-ocr, eg synaptic. This will also install tesseract-ocr-deu by default, but if you don't want German as the language, choose another and the uninstall the german version. You have to have at least one language, of course.

To use it, you will have to scan documents at high resolution (300dpi or more), save the image as a tif (note tif, not tiff) and then use the command
tesseract file.tif textwhich will turn the file.tif into a txt file now called text.txt. Note youdo not put the txt suffix on the output file name, it is added by the system.

The only downside of tesseract over gocr is that it can not be incorporated into xsane in the same way that gocr is. It is, however, much more accurate than gocr, which I found to be a waste of time.