Results 1 to 6 of 6

Thread: libpoppler/pdftops spool size issues when printing large PDF files.

  1. #1
    Join Date
    Jun 2011
    Location
    Wollongong, Australia
    Beans
    148
    Distro
    Ubuntu 11.04 Natty Narwhal

    libpoppler/pdftops spool size issues when printing large PDF files.

    This has been annoying me for years, and I'm wondering if anybody knows of a fix for the insane spool sizes generated by pdftops (I assume) when trying to print PDF files containing bitmap graphics.

    I see it all the time when printing lecture notes for uni classes, a few print fine, then if there is one with a graphic, it takes ages (easily 1/2 hour). It is not an issue with the printers, as it's been happening for years, and I've experienced it on 4 different printers at work and 2 at home (BW lasers, colour lasers and colour inkjets from Toshiba, SHARP, Brother, HP and FUJI-XEROX).

    Examining the print queue in a Windows VM whilst printing from Ubuntu yielded some interesting results. Print settings in all cases were: A4, colour, duplex, long edge binding. Source was 23/196 pages of a 5.41 MiB PDF containing selectable text (i.e. not a scan) and pictures.

    Spool Sizes:
    Windows VM, Acrobat Reader X (10.1.1) -> 2.24 MiB, Printing starts immediately.
    Linux, Acrobat Reader 9 (9.4.2) -> 28.9 MiB, Printing is slow, pages print as they are received.
    Linux, Evince/Document Viewer 2.32.0 (poppler/cairo 0.16.4) -> 325.6 MiB, Seems to wait for entire file before printing will start. Holds up the printer for 1/2 hour or more.

    Text-only documents print fine (i.e. no discernible difference to windows) as do documents with raster graphics (e.g. My thesis has .eps graphics and prints fast). .tex->PDF and .tex->DVI->PS->PDF both print fine.

    Checked around and found an old bug on Launchpad (from Karmic days) that looks like nothing is being done about.
    https://bugs.launchpad.net/ubuntu/+s...ce/+bug/516280

  2. #2
    Join Date
    May 2009
    Beans
    18

    Re: libpoppler/pdftops spool size issues when printing large PDF files.

    There are too many factors involved to know where to start from your description - not a criticism aimed at you, it happens to be a bit of a minefield area.

    For example, your PDF may contain PDF transparency constructs (alpha blending, and the like), and such transparency is not part of the Postscript imaging model. To "print" such a PDF using Postscript (or most other page description languages) the transparency needs to be "flattened" to one or more opaque imaging operations. In some cases, this may simply involve rendering the PDF content to a high-res raster image, and wrapping that in enough Postscript to render the image. Obviously, a full page, high resolution image is almost always going to be much larger than the original vector operations. Even if you take a "smarter" approach, and try to degenerate the transparency effects into a series of opaque vector operations, the file size will usually (in all but the most trivial cases) be considerably larger.

    The above is made worse by the fact that your PDF may not contain any actual transparency operations, but might still contain the PDF transparency constructs (I won't go into details unless someone wants me to!) - and that is almost always enough to trigger the "transparency flattening" mentioned above.

    PDF generated by cairo/poppler are particularly prone to this: they wrap (almost?) all marking operations in PDF transparency constructs, even when there is no transparency on the page description. Worse, PDF transparency includes a bounding box so that the PDF interpreter knows the are to which it will be required to apply blending operations, but the cairo PDF always sets such bounding boxes to the full page size, regardless of the size of the actual marks on the page (this is not just for transparency bounding boxes, but most others as well). To my mind, this is bad form - a bounding box should be: "the smallest rectangle (oriented with the axes of the glyph coordinate system) that will just enclose the entire shape". But it is not actually contrary to the spec. It does mean that for such a case where the "render to image" flattening approach is taken, you will generally end up with one or more high-res images that are the full size of the page.

    For Ghostscript, we have discussed several times pre-scanning PDFs to find, not just the PDF transparency constructs (which we already do) but also whether any actual blending is required by those constructs, and if not, disable the transparency handling. The problem is, there is a considerable extra overhead for these checks, and the general feeling is that we should not penalise "well formed" PDFs from other creators just to handle poorly formed PDFs from one library.

    There are other possibilities: some "to Postscript" converters do not preserve fonts, so characters end up as either small bitmaps or being degenerated into vector operations. It is much less efficient to repeatedly draw the outline of a character than to draw it once in a font, and reuse it within the interpreter.

    Also, most "to Postscript" converters emit Level 2 Postscript, some even only emit Level 1, this means that things like shaded fill patterns have to be (usually) rendered to and emitted as an image (shaded fills were introduced in Postscript Level 3).

    I believe for Oneiric, the print spooler uses Ghostscript (with the ps2write device) instead of the poppler based pdftops. It *may* be worth installing that in a virtual machine to see whether the situation improves.

    If you want to make an example of your problem PDF's available to me, I can take a look and see if there is an obvious reason for the large spooler size. I would promise to not share it, and delete it from my system when I was done.

    Chris

    P.S. sorry for the long post.......

  3. #3
    Join Date
    Jun 2011
    Location
    Wollongong, Australia
    Beans
    148
    Distro
    Ubuntu 11.04 Natty Narwhal

    Re: libpoppler/pdftops spool size issues when printing large PDF files.

    Thanks very much for taking the time to explain the rendering situation, it was very informative. I have Oneiric installed in a VM (IIRC it doesn't boot anymore, will have to make another) and will see how well they are handled when I can get the chance.

    I can send you a malformed pdf if you want, but I don't think it's worth wasting your time as I have a workaround (even if it involves a windows VM), and that gets what I need onto paper. Mostly I was curious as to why cairo/poppler handles the problem pdfs in that way.

    Since most of my lecturers use MS Office to make their notes now (which is a shame, as many of my older notes were beautifully done using Latex and they never had a problem printing), my guess is that the pdf printers they are using don't make a very well structured document. I've even had ones that would print a few pages then just stop with an "Illegal Exception, Stack Overflow" page as the last from the printer.

  4. #4
    Join Date
    Jun 2011
    Location
    Wollongong, Australia
    Beans
    148
    Distro
    Ubuntu 11.04 Natty Narwhal

    Re: libpoppler/pdftops spool size issues when printing large PDF files.

    Eh, 4 pages of a textbook (MIPS datapaths/State machines, all raster images, no OCR), scaled to A3:

    Document Viewer: 414 MiB spool file.

    Adobe Reader (Windows VM): 14 MiB spool file. An order of magnitude smaller file size.

    That is just stupid... But, the version printed in windows kept the very pale background (so wasted colour toner) whereas the document viewer version basically turned the background into a transparent layer, so didn't waste toner.

  5. #5
    Join Date
    Dec 2007
    Beans
    49
    Distro
    Ubuntu

    Re: libpoppler/pdftops spool size issues when printing large PDF files.

    One workaround for smaller spool sizes and hence faster printing seems to be a PCL driver for the printer.
    There might not be one available in the cups list but you can take a generic one and see what you can get out of it..

    cheers.

  6. #6
    Join Date
    Mar 2010
    Location
    Winnipeg, Canada
    Beans
    2
    Distro
    Ubuntu

    Wink Re: libpoppler/pdftops spool size issues when printing large PDF files.

    I just posted regarding my network printer. I've had a feeling the problem has something to do with the buffer size as the problem seems to happen while the document is spooling to the printer.

    PDFs have largely replaced the fax machine in the business world. Its frustrating to have to keep booting over to Windows 7 to print a large PDF document. Yet, it doesn't sound like much has been done to resolve the issue.

    Why is Microsoft Windows so much more successful at printing PDFs than Ubuntu? I've always thought what Windows can do Linux can do better.

    Now I'm starting to wonder,

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •