Results 1 to 10 of 10

Thread: PDF to text file conversion,does program exist?

  1. #1
    Join Date
    Feb 2010
    Beans
    21

    PDF to text file conversion,does program exist?

    Is there a program that converts PDF files to text files? Emacs seems to be able to do it.

  2. #2
    Join Date
    Jul 2007
    Location
    Poland
    Beans
    4,499
    Distro
    Ubuntu 14.04 Trusty Tahr

    Re: PDF to text file conversion,does program exist?

    check out pdftotext
    if your question is answered, mark the thread as [SOLVED]. Thx.
    To post code or command output, use [code] tags.
    Check your bash script here // BashFAQ // BashPitfalls

  3. #3
    Join Date
    Nov 2012
    Location
    Halloween Town
    Beans
    Hidden!
    Distro
    Xubuntu Development Release

    Re: PDF to text file conversion,does program exist?

    Quote Originally Posted by Vaphell View Post
    check out pdftotext
    +1

    Besides pdftotext, there is also a command line converter in unoconv
    Code:
    sudo apt-get install unoconv
    It will convert any document from and to any OpenOffice supported format.

  4. #4
    Join Date
    Feb 2010
    Beans
    21

    Re: PDF to text file conversion,does program exist?

    VapHell and slickymaster, thanks. Neither works yet. Pdftotext was right under my nose, but I did not know it existed. Its output was just "^L^L^L......". I installed unoconv, but the form of the command was not clear to me. I just typed "unoconv something.pdf" and got:

    /q/z$ unoconv robco-20130930.pdf
    Warning: -headless is deprecated. Use --headless instead.
    Warning: -invisible is deprecated. Use --invisible instead.
    Warning: -nodefault is deprecated. Use --nodefault instead.
    Warning: -nofirststartwizard is deprecated. Use --nofirststartwizard instead.
    Warning: -nologo is deprecated. Use --nologo instead.
    Warning: -norestore is deprecated. Use --norestore instead.
    Warning: -accept=socket,host=localhost,port=2002;urp;StarOff ice.ComponentContext is deprecated. Use --accept=socket,host=localhost,port=2002;urp;StarOff ice.ComponentContext instead.
    ^Z
    [1]+ Stopped unoconv robco-20130930.pdf
    /q/z$


    If you can think of any other angles, please let me know.

  5. #5
    Join Date
    May 2010
    Beans
    61

    Re: PDF to text file conversion,does program exist?

    If it is a true PDF (not a PDF of an image), then copy and paste should work as well.

  6. #6
    Join Date
    Feb 2010
    Beans
    21

    Re: PDF to text file conversion,does program exist?

    Bootedguy, this is a little too sophisticated for me. Now that you mention it, I think it may be an image of a copy, since it is crooked on the page. That means, then, that it is not really a PDF file with text in some form, I suppose, but just a picture of the dots. In that case, I suppose there is no translation.

  7. #7
    Join Date
    Feb 2010
    Beans
    21

    Re: PDF to text file conversion,does program exist?

    The people sending me PDF's had sent the earlier ones as PDF and the later ones as copies of printouts which they named "....pdf". And pdftotext did print the text in the real PDF's, though my downloaded unoconv did not. But I do not mark the problem as solved, because the quality of pdftotext is poor. It simply puts any text string it finds on a separate line. My PDFs were financial statements, so a column of descriptions followed by a column of numbers is not helpful. Is there any alignment capability in any programs?

  8. #8
    Join Date
    Sep 2006
    Beans
    8,627
    Distro
    Ubuntu 14.04 Trusty Tahr

    Re: PDF to text file conversion,does program exist?

    PDF is an end-stage format. It holds a representation of data on its way to either the bitbucket or the recycle bin (via the printer). By the time material hits PDF all the pieces necessary for automated data processsing are gone and you are only left with a visual layout. Again, PDF is for displaying or erasing.

    All information about the document's structure is gone. If you want that structure, you'll have to go upstream and get the original files in XML, SGML, CSV or whatever they may have been.

  9. #9
    Join Date
    Feb 2010
    Beans
    21

    Re: PDF to text file conversion,does program exist?

    Thank you, Lars. I see your point. I wish there were some structure conveyance but accept that there is not. I would like to declare this thread SOLVED to get it out of the way of all the others, but do not find a button for that.

  10. #10
    Join Date
    Jan 2006
    Location
    Not heaven... Iowa
    Beans
    Hidden!
    Distro
    Ubuntu

    Re: PDF to text file conversion,does program exist?

    Quote Originally Posted by gnostlos View Post
    I would like to declare this thread SOLVED to get it out of the way of all the others, but do not find a button for that.
    https://wiki.ubuntu.com/UnansweredPo.../SolvedThreads
    Linux User #415691 Ubuntu User #8629
    Iowa Team (LoCo): [Wiki] [Launchpad]
    IRC channel: #ubuntu-us-ia on irc.freenode.net

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •