Results 1 to 5 of 5

Thread: PDF Library

  1. #1
    Join Date
    Apr 2006
    Location
    Phoenix, AZ
    Beans
    251
    Distro
    Ubuntu 8.04 Hardy Heron

    PDF Library

    Hey,
    Does anyone know of a good PDF manipulating library out there? I've been able to find libraries that create PDF files, but there don't seem to be many that work with existing PDF files. Specifically, I need to do some processing on URL/Links within the PDF files (not just standard page merging), and re-write them to a new PDF.
    -Skeeterbug

  2. #2
    Join Date
    Jun 2006
    Location
    CT, USA
    Beans
    5,267
    Distro
    Ubuntu 6.10 Edgy

    Re: PDF Library

    Thats a good question, same experience here. Most free libraries are for writing/merging/page manipulation of PDF, not to help parse existing one.

    There are many closed-source libraries for sale, but it is hard to test how good they are, anyone has experience with it?

    Best quick and dirty draft solution for parsing (for my need) was commandline utility pdftotext (included in Ubuntu), then manually parse resulting text file. Of course this way you will lose all formatting information. IIRC pdftotext can create HTML instead of text, so that might save you some formatting.

    Another option is: Adobe website provided web service to convert PDF file to different format, but you need to write what platform you use, and why PDF reader is not available

    I cannot use that option, my file is huge: 900MB ... :-/

    I need some way to get pictures out of that PDF, any suggestions?

  3. #3
    Join Date
    Oct 2007
    Beans
    100

    Re: PDF Library

    pdfimages from poppler (or xpdf or somewhere) extracts images from PDFs.

  4. #4
    Join Date
    Apr 2006
    Location
    Phoenix, AZ
    Beans
    251
    Distro
    Ubuntu 8.04 Hardy Heron

    Re: PDF Library

    EDIT:

    Nevermind, their sample mangled my PDF links. :-/
    Last edited by skeeterbug; January 18th, 2008 at 11:59 PM.
    -Skeeterbug

  5. #5
    Join Date
    Aug 2006
    Beans
    366

    Re: PDF Library

    There is pypdf. Also, "python read pdf" at sourceforge yields 5716 hits, one of which is rsclib="Small Python library with various things such as Configuration file parsing (in Python syntax), HTML and PDF parsing. Used in others of my projects." I haven't tried any of these but do use postscript and convert to pdf, so would appreciate it if you post back with any good solution.
    Linux Counter entry # 99383 (since 1995), Feisty Xbuntu 64 bit
    Folders! We don't need no stinking folders. "I don't have anything on my machine that needs folding" -- Unknown

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •