I am excited about this script, I think it will be very useful. However I ran into a different error than the one posted above:
Code:
aaron@aaron-desktop:~$ pdfocr -i 0175.pdf -o out3.pdf
Input file is /home/aaron/0175.pdf
Output file is /home/aaron/out3.pdf
Using working dir /tmp/d20100421-20950-4g11i8
Getting info from PDF file
InfoKey: Creator
InfoValue: XSane version 0.996 (sane 1.0) - by Oliver Rauch
InfoKey: Title
InfoValue: XSane scanned image
InfoKey: Producer
InfoValue: XSane 0.996
InfoKey: CreationDate
InfoValue: D:20100421215339+00'00'
NumberOfPages: 1
Converting 1 pages
==========
Extracting page 1
Converting page 1 to ppm
Running OCR on page 1
1.ppm is not a BMP file.
Cuneiform for Linux 0.9.0
Error while running OCR on page 1
Merging together PDF files
Error: Failed to open PDF file:
/tmp/d20100421-20950-4g11i8/*-new.pdf
Errors encountered. No output created.
Done. Input errors, so no output created.
Updating PDF info for /home/aaron/out3.pdf
Error: Failed to open PDF file:
/tmp/d20100421-20950-4g11i8/merged.pdf
Errors encountered. No output created.
Done. Input errors, so no output created.
Cleaning up temporary files
Notice the "1.ppm is not a BMP file." line. I get a similar error if I run cuneiform by itself:
Code:
aaron@aaron-desktop:~$ cuneiform -f hocr -o out.hocr 0175.pdf
Cuneiform for Linux 0.9.0
0175.pdf is not a BMP file.
It's as if cuneiform defaults to assuming the input file is a bmp. Is there a way to change this?
I should also mention that I am on amd64, and I initially had problems with cuneiform throwing an error. I had to add /usr/local/lib64 to a .conf file in /etc/ld.so.conf.d, and running ldconfig (per https://answers.launchpad.net/cuneif...uestion/100695). After making this change cuneiform seems to work, but there still could be other unseen issues.
Thanks.