Results 1 to 6 of 6

Thread: How to OCR from an image file?

  1. #1
    Join Date
    Mar 2010
    Location
    Zipangu
    Beans
    27
    Distro
    Ubuntu 16.04 Xenial Xerus

    How to OCR from an image file?

    I have an image file which has a text captured on it:

    http://homepage1.nifty.com/algafield/microsofthumor.jpg

    How could I get a plain text file from this? Does Ubuntu have a software for that?

    Thanks in advance!

  2. #2
    Join Date
    Jul 2006
    Beans
    607
    Distro
    Ubuntu 13.04 Raring Ringtail

    Re: How to OCR from an image file?

    It's not possible. Image files and text files are completely different and are read by your computer in completely different ways.

    If you're keen, you could retype it up into a text file!

  3. #3
    Join Date
    Mar 2009
    Location
    Brazil
    Beans
    475
    Distro
    Ubuntu 12.04 Precise Pangolin

    Re: How to OCR from an image file?

    I think gocr can do it, but IIRC it's command-line based;

    Open the terminal.

    Install:
    Code:
    sudo apt-get install gocr
    save the image to some directory (saving it in Home is easier, as the terminal defaults there)
    and run:
    Code:
    gocr <file name>
    if you need to change the directory the terminal is in, use:
    Code:
    cd Desktop
    if you want to go to the desktop, for example.

    You can also check the software centre for graphical alternatives.

    BTW, most ocr softwares will convert files to a text-only format, so you'll lose that table format.
    Ubuntu User #27453 | Linux User #490358
    "Don't preach Linux, mention it"
    "Linux is not Windows"
    73% of statistics in forums are made up on the spot

  4. #4
    Join Date
    Mar 2010
    Location
    Zipangu
    Beans
    27
    Distro
    Ubuntu 16.04 Xenial Xerus

    Re: How to OCR from an image file?

    Thanks Marlonsm for your help.
    The gocr program did the trick.
    I think the inaccuracies of the result may be minimized by preprocessing the original image using GIMP.

    Thanks again.
    Ubuntu and its community are great!

  5. #5
    Join Date
    Dec 2009
    Beans
    Hidden!
    Distro
    Ubuntu 10.04 Lucid Lynx

    Re: How to OCR from an image file?

    recent copy of full circle magazine mention http://gscan2pdf.sourceforge.net/ as a good option, although I believe the part you need is really just something like you've got with gocr, but you may find tesseract to provide better results. see: http://code.google.com/p/tesseract-ocr/ or see tesseract-ocr in the repositories.

  6. #6
    Join Date
    Jul 2006
    Beans
    607
    Distro
    Ubuntu 13.04 Raring Ringtail

    Re: How to OCR from an image file?

    My apologies if I put the OP on a bum-steer with my first reply. I completely misread the original post and completely misunderstood what you were trying to achieve. So yeah, apologies mate.

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •