I often need to use OCR (optical character recognition) to get new text onto the computer.
When I use tesseract, I get terrible results, unless I use GIMP to set the image to 1-bit monochrome (and save as TIFF).
(See "Preparing images for Tesseract" at the bottom of the Community OCR page.)
As I often get these files, I'd like to automate the conversion, rather than have to fire up GIMP and manually do it every time.
I tried using convert, but the command gives an error when I try to convert it to monochrome:
Code:
$ convert newtext.jpg -flatten -monochrome newtext.tif
convert: BitsPerSample 1 not allowed for JPEG. `JPEGSetupEncode'.
If I use convert without the -monochrome option, then tesseract puts out garbage.
Any ideas?
Bookmarks