| The comp.text.pdf Frequently Asked Questions | ||
|---|---|---|
| <<< Previous | Next >>> | |
Q: Is there a way to automate the extracting of pages from a PDF document by using a script or batch process? [Q49]
A: I think we might have exactly what you're looking for. Our TK40 toolkit lets you define criteria (how pages break, what indexes exist, etc.) for extracting logical documents from a larger compound PDF file. http://www.maximal.com/products
Scott Robert Ladd (scott.ladd@maximal.com) [A77]
A: http://www.reportlab.com/pageCatcher/index.html
Dinu Gherman (dinu@reportlab.com) [A78]
A: Here are a few more (that aren't vaporware like TK40):
Ari's PDF Splitter Pro (ask for the extraction version) (http://www.dionis.com)
AppendPDF (http://www.appligent.com)
Glance PDF CLT tools for Win/Unix (pdsel) (http://pdf.glance.ch)
JW (18tni7m5x001@sneakemail.com) [A79]
A: GhostScript 7.0 supports almost all of the features of distiller.
Alex Cherepanov (alexcher@erols.com) [A80]
A: http://www.ctrlp.com/freepdf.asp?st=pdf
Ingemar Djurhuus (djurhuus@mail1.stofanet.dk) [A81]
A: Jaws Creator http://www.jawssystems.com/products/products_fs.html at $120 US, is an enhanced, commercial alternative to the original 5D program, which can still be downloaded for free at http://www.ctrlp.com/freepdf.asp?st=pdf According to the feature comparison chart, Jaws does almost everything Adobe does, including recognizing Word 97 and 2000 subcomponents.
Steve Auerweck (steven.auerweck@verizon.net) [A82]
A: The offer of www.ctrlp.com to downlaod a free copy of "5D PDF Creator" (a version from 1999) exists not any longer. The successor product "Jaws PDF Creator 2.0" can be downloaded as an evaluation copy from www.jawssystems.com. For Ghostscript look at www.ghostscript.com or www.cs.wisc.edu/~ghost or www.aladdin.com (the URL http://aladdin.com doesn't work)
Eckhard Henkel (eckhard.henkel@t-online.de) [A178]
A: There is also my PStill converter which is quite inexpensive for private use on Windows systems (and free for private and edu use on Linux, Solaris, IRIX, HPUX and AIX).
You can download the program here
http://www.wizards.de/~frank/pstill.html
It works also unregistered but will draw a small text line near the bottom of each generated PDF page but is otherwise uncrippled.
Frank M. Siegert (frank@this.net) [A179]
A: 5D PDF Creator has now been replaced with Jaws PDF Creator at
http://www.jawssystems.com in 2 forms: trial version and full version. I was able to get the FREE 5D version which allowed me to get the full version of Jaws. That FREE version is no longer available from the link below. The trial version at Jaws allows you to try it out but your PDF files will have a ghosted message in the background until you pay a fee to get the full version. I don't remember how much it is but it is well worth it in my opinion.
On the other hand, Jaws is a good PDF program and makes them easily from any application. BUT Adobe Acrobat gives you the editabilty that I don't think you will get with any other PDF creation program. It seems to be the industry-leader at this point.
Vi Wisdom (viwiz@worldnet.att.net) [A233]
A: ps2pdf, which is part of the ghostscript package, works well more me.
Michael Still (mikal@stillhq.com) [A263]
A: You could try xpdf's pdfimages (http://www.foolabs.com/xpdf)
Jorma Heimonen (Jorma.Heimonen@kone.com) [A83]
A: Several ways... and I have used all of these, often just because it's easier than figuring out where I stored the original source imagery (or when I actually want the presented composite image, and not the original raw image).
* Some applications, like PhotoShop and Illustrator can open an individual PDF page and edit it.
Illustrator is ideal for this, because it preserves all of the page elements as individually selectable entities, and further, preserves their vector or raster nature (and probably the color model).
PhotoShop may rasterize the whole PDF page as a single object. Rasterizing entire pages at high resolutions results in HUGE data objects. And unfortunately, unless you can calculate precisely what dpi to use to get 1:1 source:captured pixels, you want to oversample by at least 2x.Run the numbers.
Neither Adobe app, however, will open the page if it has any kind of security (open or admin security). Further, embedded fonts may be incorrectly rendered.
* Reprint to .eps
If you can configure your PostScript driver to a. print-to-file, and b. print in EPS format you can print the page of interest to page.eps, then edit it with whatever EPS-capable image editor you have.
Some apps, like Adobe FrameMaker, can import an entire PDF page as a referenced graphics object. You can then reprint it as .eps. FM won't import pages from secured PDFs.
* Acrobat and Acrobat Reader can select rectangular sections of a page - if (big if) selecting graphics is allowed in the file (see File:DocInfo:Security).
On Windows, place the mouse cursor over the [T] text selection tool, hold the left mouse button down, and let up over the "graphics select tool" icon.
Outline the image desired. Zoom to the screen resolution desired. Edit:Copy or [Ctrl[c]] Selected image is now on clipboard.
The selected area can be off-screen, and even off logical desktop, but you will be limited on some systems to a maximum graphics object size - 32MB for Windows. If you get a nastygram dialog box, zoom out until you don't.
* Screen copy
The last resort is to use whatever tools the OS provides, or are available in the freeware, shareware or aftermarkets to perform a full-screen-copy or window-only-copy. On MS Windows, the [PrntScrn] and [Alt[PrntScrn]] keys do this. Zoom to desired size - but the entire desired image must [usually] be on-screen. Generally, unless you can run the numbers and match the object raster res to 1 pixel per screen pixel, you want to zoom as large as possible and over-sample to minimize re-sampling damage.
If your graphics card supports large logical desktops, as many Matrox cards do, the image can be partially off-screen as long as it is entirely in the card's on-board RAM, i.e. is entirely on the logical desktop.
The 32MB Windows GDI limit applies.
All of the above assumes that you own or have permission to re-use the image(s) in question.
Bob Niland (rjn@fc.hp.com) [A84]
A: Yes. If a PDF file permits you to retrieve text, it also permits you to retrieve graphics.
Acrobat has three ways to do this. I'm not sure whether the free Acrobat Reader supports any of these. Someone else will have to check, as I don't even have it installed.
1. There is an object touch-up tool (a solid diagonal black arrow icon on a button). In Acrobat 5 this is accessed by pulling out the button strip attached to the text touch-up tool (outline capital T). Press and hold the T button to access the pullout strip. Then select the arrow button.
Using this tool, you can select one or more objects (graphics as well as text blocks). The tool selects every object it _touches_ as you drag it, not every object it _surrounds_. You can deselect unwanted objects with shift+click.
Once you select an object, you can move it, copy it, cut it, or delete it. If you copy it or cut it, you can paste it into another PDF but not into any other application that I'm aware of.
2. There is a graphics select tool (dotted box with a white and a black circle inside it). You can drag a box around any area of the page, which may include both graphic and text material. You get everything inside the rectangular box you draw, as a bitmap (screen capture style). If you don't draw the box exactly right the first time, you cannot adjust it. Just drag a new box. After selecting the area you want, you can increase the magnification on the view. You can zoom until the selected area is much larger than your monitor if you want to, in order to increase the resolution of the bitmap you are capturing. Of course this causes a geometric increase in the file size, and you may not need such high resolution, depending on what your ultimate purpose is.
Once you have the area selected and zoomed, you can copy it to the Windows clipboard (Ctrl-C) and then paste it in any application that accepts a raster image from the clipboard. (The dimensions in pixels of the copied area travel with the clip. If you are in PhotoShop and open a new image with the graphic on the clipboard, the default size of the new image will match the piece you clipped.)
3. You can export an EPS of a single page (File > Export) and select the graphic in a program that can parse the EPS.
Finally, you can open the PDF in Adobe Illustrator and select the graphic easily there.
Dick Margulis (margulis@fiam.net) [A215]
Q: I am astonished that this is (apparently) such a complex job. If I want to take a pdf graphic and move it to say, Word, I simply select what I want with the graphic tool (in Exchange), and paste into my Word doc. And I get the graphic - some or all of it depending on what is selected ... Am I missing something here? [Q52]
A: Possibly.
What you describe has some serious limitations:
1. If the document has inhibited "selecting text and graphics", it won't work at all. You'll be limited to screen capture using host OS tools.
2. Even when it does work, you get a RASTER image of the entire selection area. You can't easily separate elements or even eliminate overlay text.
Further, if the original graphic was vector, you still get raster - not as scaleable - and often vastly larger.
3. You get that raster at screen resolution. This means that even in the case of raster originals, you are either under- or over-sampling the original, with potential damage to the image.
Normally, you need to select the area, then zoom until the "copy" fails due to the object size (32mb in Windows), then down-size the resulting object in your image editor. This minimizes re-sampling artifacts.
Bob Niland (rjn@frii.com) [A85]
A: Yes. What you describe is equivalent to faxing yourself a copy of the image. It's not the image itself. Not only is it no longer in PDF format but it's limited to the dot-density of your screen, so while it's OK for a Web site, the low resolution will be horribly obvious if you try to print it.
Adobe went out of their way to make it _difficult_ to extract a real image from a PDF file, under pressure from publishers who don't want their expensively-generated imaged being pirated.
Peter Flynn (peter@silmaril.ie) [A86]
A: http://www.reportlab.com/pageCatcher/index.html
Aaron Watters (aaron@at.reportlab.com) [A87]
A: StampPDF Batch will allow you to do this easily. More information on StampPDF Batch; including documentation and online demos, can be found on our web site at http://www.appligent.com
Mark Gavin (mgavin@appligent.com) [A88]
A: http://www.appligent.com/
Aandi Inston (quite@dial.pipex.com) [A90]
A: I'm working on a PDF Append application (Freeware), but it has yet to be beta tested. You're welcome to try it out, but keep in mind I'll assume no legal responsibilities (normal legal stuff). Anyway, it can be downloaded at: http://www.northlandpublishing.net/append.zip
Timothy L. Jordi (jordi@northlandpublishing.net) [A91]
A: PDF Crypt What is pdfcrypt? pdfcrypt is a very flexible and powerful program. pdfcrypt allows you to set permissions to a PDF-file. For example you can publish a document without to allow to print it. The button to print the file will be disabled in Acrobat Reader application. It's simple to use it like a batch application to set permissions to every PDF in you archive. It's simple to use it like a pipe application. It's simple to use it inside your cgis. We distribute only executable versions (if you need ask us the original PERL code). Download it and test it! http://www.sanface.com/pdfcrypt.html
SANFACE Software (sanface@yahoo.com) [A93]
Q: Using Acrobat 4 or 5, or any other program, is it possible to edit an existing PDF file: I need to add 1000 pictures (logos) to 1000 different, existing PDF-files. [Q58]
A: For the logos, if you know where you want to put them you can use PageCatcher for this. http://www.reportlab.com/pageCatcher/ Regarding searching usenet: if you go http://www.deja.com you will be redirected to google's usenet searching service which will save you some time (try the advanced search option where you can control the search with a lot of options -- list, keyword, sort order, phrase, etc.)
Aaron Watters (aaron@at.reportlab.dot.com) [A94]
A: Have a look at www.enfocus.com for PitStop Pro or PitStop Server. PitStop Pro is a plug-in for Acrobat 4/5. PitStop Pro has many edit capabilities. If you want to automate a task then you can make use of Action lists and apply it to each file in Acrobat by opening each file or do it in a batch using PitStop Server. PitStop Server is a standalone product. All our products have a 30 trial period.
Filiep Maes (filiepm@enfocus.be) [A95]
A: Cheapest way (beside using an illegal copy) is to print from Acrobat Reader into a PostScript file. Interpret this file with CorelDraw or Illustrator or Freehand and add the logo. Export as PDF from the applications or print to a PostScript file again. Use Ghostscript to convert the PostScript file to a PDF. Don't ask me about how to do it step-by-step or which settings you should use. The cheap way will cost you also (your time).
Marc Wieber (mgww@exmail.de) [A96]
A: StampPDF only costs $179 and is a plug-in for Adobe Acrobat. StampPDF can easily add "Evaluation Copy" across every page of your document.
"StampPDF Batch", costing $2995, is a stand alone, server based application designed for high production, on-demand web based applications.
More information on StampPDF can be found on our web site at
http://www.appligent.com.
StampPDF can be purchased through us or a variety of resellers and online purchases can be done at PlanetPDF at
http://www.planetpdf.com.
Mark Gavin (mgavin@appligent.com) [A201]
Q: Does anybody knows if there is another program which reads pdf files besides Adobe Acrobat? I ask this because my Acrobat reader doesn't start, it says that there is an error in the file msvcrt.dll. [Q59]
A: You can read them with GSview , which requires GhostScript . Install GhostScript first, then GSView.
Matti Vuori (mvuori@koti.soon.fi) [A97]
A: Freshmeat is your friend...
http://freshmeat.net/search/?site=Freshmeat&q=pdf+viewer
laird@gunsmoke.ecn.purdue.edu (Kyler Laird) [A98]
Q: http://www.appligent.com/ - AppendPDF. http://www.dionis.com/ - PDF Splitter Pro I have tried both of these tools, as well as pdsel (pdf.glance.ch), and none of them works. AppendPDF eats up memory and dies, Ari's PDF Splitter can only split into single-page fragments, and pdsel doesn't work on files greater than several hundred pages. What other solutions are there? [Q60]
A: You may have about covered it. Did you contact the tech support, especially for AppendPDF? It is designed for very heavy duty use, so a major memory leak is a little surprising.
Aandi Inston (quite@dial.pipex.com) [A99]
A: You might try the PDF Handshake product from HELIOS (www.helios.de), that includes a pdfcat utility that can concatenate as well as extract arbitrary page ranges from an Unix command line.
Jens-Uwe Mager (jum@anubis.han.de) [A100]
Q: I have a LaTeX file with some included *.eps graphics. If I create a *.ps document of this everything works fine. Creating a *.pdf document, all the graphics are not shown - only the captions are displayed. Does anybody know a solution for this problem? Can you recommend me a package to make the *.pdf files? [Q61]
A: usepackage[pdftex]{graphicx} <-- you *really need to use the graphicx pkg)
usepackage{epstopdf} if you put these two commands before your egin{document} and run the file (barring the fact that there are no other complications) it will not only create pdf files of your EPS graphics files, but include them in your final PDF document... really neat too ;-) you can get the epstopdf.sty from CTAN in macros/latex/contrib/supported/oberdiek
Mimi Burbank (mimi@csit.fsu.edu) [A101]
A: I thought you might like to know about pdfFactory, our new PDF printer driver for Windows that creates PDFs. There is a free trial version available at http://www.fineprint.com. The registration fee is $39.95.
Among it's features:
- ability to seamlessly combine multiple print jobs (even from separate applications) into a single PDF file - pdfFactory has a Send button to email a PDF to someone without having to Save As and type a file name - easy font embedding UI - preview jobs before freezing into PDF, ability to delete pages
- all print jobs automatically saved for later retrieval if you need them again (especially useful for transient content like Web page searches, ecommerce receipts, emails that get deleted, etc.) - smaller files than PDFWriter (usually, but not guaranteed)
The current version does not support non-ansi charset fonts or security. We will be adding those features shortly. When use with FinePrint, our other product, you can create PDF booklets, multi-up renderings, watermarks and much more.
Check it out at http://www.fineprint.com
Jonathan Weiner (jonathan@singletrack.com) [A102]
A: BCL Computers just released the Beta version of their new PDF printer driver for Windows 2000. To download a copy, visit http://www.bcl-computers.com.
Rachel Burnsed (burnsed@bcl-computers.com) [A103]
A: The pdfMachine is a Windows print driver that produces great quality PDF files. It integrates with MAPI compliant mail programs such as Outlook, Outlook Express to ease the sending of PDF's via email. pdfMachine is easy to install and use. If you know how to print a document then you already know how to use pdfMachine. It works with virtually any windows application - just print to the pdfMachine. The trial version of pdfMachine creates a faint watermark of "pdfMachine by BroadGun Software" on every page. This watermark is not present if the software is purchased and registered. For more information or to download a copy goto http://broadgun.com/pdfmachine/index.htm
Craig Broadbear (craig@broadgun.com) [A165]
A: Acrobat Distiller, GhostScript (with redmon), Jaws PDF Creator, there are sure to be more.
Ken Sharp (ken@spamcop.net) [A254]
A:
APStripFiles is a command line application that removes attached or embedded files from PDF documents. It enables you to protect your systems from malicious unwanted PDF file attachments.
APStripFiles offers you the option to remove the files from the original PDF document or make a copy of the PDF without the attachment, leaving the original PDF untouched. It can be used on the desktop, a web server or directly on an e-mail server to avoid the transfer of viruses that can be carried by a PDF file attachment. APStripFiles supports the removal of attachments from multiple PDF files using file names or wildcards.
APStripFiles for Mac OS X can be downloaded free from,
http://www.appligent.com/newpages/freeSoftware_Mac.html
APStripFiles for AIX, HP-UX, Sun Solaris and Red Hat Linux can be downloaded free from,,
http://www.appligent.com/newpages/freeSoftware_Unix.html
APStripFiles for Windows 95/98/NT/2000 can be downloaded free from,
http://www.appligent.com/newpages/freeSoftware_Win.html
lvincent (lvincent@digapp.com) [A177]
A: For server side generation, you may want to check out Cocoon, part of Apache Jakarta project. http://jakarta.apache.org
Frankie Lam (franky@mindless.com) [A181]
A: you can use the free tool DBtoPDF from http://www.kgo.de/dbtopdf.html
Klaus Gotthardt (k.gotthardt@em.uni-frankfurt.de) [A182]
Q: I need a package which can extract information about the document for me -- for instance the name of the author of the document. What is available? [Q145]
A: The "pdfinfo" application from my Xpdf package does this.
http://www.foolabs.com/xpdf/
The web page has source code and various binaries, licensed under the GPL.
Derek B. Noonburg (derekn@foolabs.com) [A225]
A: Here's something I've done in my linux-box ; so I have a printer named "pdf", and if I choose it, a pdf-file is created in my home-directory with a name based on the current date. Maybe you'll have to adapt it, and you must of course have a ghostcript version with the output-device pdfwrite available (try "gs --help | grep pdfwrite"). Most applications with a GUI give a menu where you can select the name of the printer.
new section in /etc/printcap/
####################################################################
pdf:
:sd=/var/spool/lpd/pdf:
:mx#0:
:sh:
:lp=/dev/null:
:if=/var/spool/lpd/pdf/filter:
####################################################################
the input filter /var/spool/lpd/pdf/filter :
#!/bin/bash
# set > /tmp/print_env.txt #only for debugging
HOME="/home/$USER"
OUTPUT_FILE=$(date +%x-%X | tr / .).pdf
/usr/bin/gs -dNOPAUSE -q -sDEVICE=pdfwrite -sPAPERSIZE=a4
-sOutputFile="$HOME/$OUTPUT_FILE" -
|
(last 2 lines on the same line, in fact!)
(gauthier-vdm@ibelgique.com) [A232]
| <<< Previous | Home | Next >>> |
| Other | PDF/X |