Page 1 of 4 123 ... LastLast
Results 1 to 10 of 36

Thread: OCR program?

  1. #1
    Join Date
    May 2006
    Location
    Ramsgate, England
    Beans
    147
    Distro
    Lubuntu

    OCR program?

    Hi, all

    I'm almsot there! There, as in ready to ditch Windows altogether.
    I have a very specific problem- as a teacher, I sometimes need to scan in documents for editing.
    So far, I haven't been able to get an OCR program to work in Linux, or at least, not accurately enough.
    Could anyone please recommend one? I am quite happy to use the command line, but can't find a detailed "how to" for an OCR program.
    Regards, and thanks in advance

    Nifty

  2. #2
    Join Date
    Sep 2009
    Location
    Hampshire, UK
    Beans
    375
    Distro
    Ubuntu 11.04 Natty Narwhal

    Re: OCR program?

    Abbeyyocr seems to do a good job if this blog is to believed, but it is a commercial product.
    10 Euros/9 GB pounds. There is a free 15 day trial here.
    Toshiba AMD64 2Ghz, 4Gb RAM: dual Natty/win7. Dell Mini10v: 10.04. SamsungN140:win7/11.04. NB200:10.10.
    ++ desktops P4 3Ghz ++ servers
    Full Circle Magazine | Ubuntu Pocket Guide | HantsLUG

  3. #3
    Join Date
    May 2006
    Location
    Ramsgate, England
    Beans
    147
    Distro
    Lubuntu

    Re: OCR program?

    Hi

    Thanks for the reply. I don't mind paying, provided the product works. I'm still wondering about a "how to", though. Do you start with Simple Scan?
    Regards
    Nifty

    PS. What have they done to the thanks button?

  4. #4
    Join Date
    Sep 2009
    Location
    Hampshire, UK
    Beans
    375
    Distro
    Ubuntu 11.04 Natty Narwhal

    Re: OCR program?

    Toshiba AMD64 2Ghz, 4Gb RAM: dual Natty/win7. Dell Mini10v: 10.04. SamsungN140:win7/11.04. NB200:10.10.
    ++ desktops P4 3Ghz ++ servers
    Full Circle Magazine | Ubuntu Pocket Guide | HantsLUG

  5. #5
    Join Date
    May 2006
    Location
    Ramsgate, England
    Beans
    147
    Distro
    Lubuntu

    Re: OCR program?

    Hi, Matt

    Thanks for that. I shall give this a try now.

    Your halp is much appreciated.

    Regards

    Nifty

  6. #6
    Join Date
    Jul 2005
    Location
    I think I'm here! Maybe?
    Beans
    Hidden!
    Distro
    Xubuntu 22.04 Jammy Jellyfish

    Re: OCR program?

    I use tesseract v3 in Lucid 10.04, which can be added from a number of ppa repositories and is a big jump forward in accuracy terms, in my opinion.

    I can get it to use other image formats than the black and white image.tif images that are required for tesseract v2, so I suggest you at least have a look at that before spending any money, as you may find that it is good enough for what you want.

    I use xsane for the scan process when needed, though I see no reason why simple scan would not work just as well; it's just that I've never tried it but will now do so.

    Yes, just tried and a full page of Times New Roman, 12 point text, scanned with Simple-scan at 300dpi and saved as an image.png or an image.jpg file has just OCR'd with only a tiny error of an added ' at one point where the page was slightly mucky. Both jpg and png files worked with no problem.

  7. #7
    Join Date
    May 2005
    Location
    Indiana
    Beans
    1,933
    Distro
    Hardy Heron (Ubuntu Development)

    Re: OCR program?

    There's no reason to go beyond the free software included in the repos. Open up Ubuntu Software Center and search for gscan2pdf. You can scan multiple pages and save as a pdf or tif or something, and it seems to have decent OCR capabilities.

    It's the best scanning/OCR software I've seen on here yet.
    Today you are You, that is truer than true. There is no one alive who is Youer than You. - Dr. Seuss

  8. #8
    Join Date
    Jul 2005
    Location
    I think I'm here! Maybe?
    Beans
    Hidden!
    Distro
    Xubuntu 22.04 Jammy Jellyfish

    Re: OCR program?

    Quote Originally Posted by forrestcupp View Post
    There's no reason to go beyond the free software included in the repos. Open up Ubuntu Software Center and search for gscan2pdf. You can scan multiple pages and save as a pdf or tif or something, and it seems to have decent OCR capabilities.

    It's the best scanning/OCR software I've seen on here yet.
    Not much help if you want to get a text document that you can edit to change the text, surely? Or does it allow you to select text from the pdf it makes and then copy/paste that to a document?

    When I use OCR it is almost always because there is a need to make considerable edits to all the text; to make a pdf out of the sheet of text does not seem to help much, or have I totally misunderstood what gscan2pdf does?

  9. #9
    Join Date
    May 2005
    Location
    Indiana
    Beans
    1,933
    Distro
    Hardy Heron (Ubuntu Development)

    Re: OCR program?

    Quote Originally Posted by ajgreeny View Post
    Not much help if you want to get a text document that you can edit to change the text, surely? Or does it allow you to select text from the pdf it makes and then copy/paste that to a document?

    When I use OCR it is almost always because there is a need to make considerable edits to all the text; to make a pdf out of the sheet of text does not seem to help much, or have I totally misunderstood what gscan2pdf does?
    For the OCR feature, you copy the text and paste it into LibreOffice or something.

    Edit: I just tried out the OCR on a screen shot I took of a Word document. You can't get a much better source than that. The OCR output royally sucked. Even the best OCR engine out of the 3 only got 2 words right. So this software is great for scanning multi-page documents, but not so great for OCR. The document was all in italics, but a decent OCR should be able to read a perfect image with italics.
    Last edited by forrestcupp; February 18th, 2012 at 02:20 AM.
    Today you are You, that is truer than true. There is no one alive who is Youer than You. - Dr. Seuss

  10. #10
    Join Date
    Jul 2005
    Location
    I think I'm here! Maybe?
    Beans
    Hidden!
    Distro
    Xubuntu 22.04 Jammy Jellyfish

    Re: OCR program?

    A screenshot will only be at a max of 96dpi, not the 300dpi needed for good OCR. Did you also use tesseract 3, or the v2 from the repos, as v3 is a lot better than v2.

    I must try an italic page of print to see if I can do better than you managed. I'll report back asap.

Page 1 of 4 123 ... LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •