The comp.text.pdf FAQ ********************* Welcome to the comp.text.pdf frequently asked questions document. If you have any suggestions on how this document could be improved, better answers to questions included, or even a new question, please contact the maintainer, Michael Still at mikal@stillhq.com Whenever the name and email address of a person who has answered a question is known, they have been credited within this document. It should be noted that some people chose to add extra text to their email addresses to confuse spam senders. These extra pieces of text have been retained in this document and will need to be removed manually before sending email. Please note that questions and answers are sometimes editted for readability. I have started marking the answers which have been editted from the version labelled 2001-04. [] denotes a comment from the editor. This document may also be found online at http://www.stillhq.com/cgi-bin/getpage?area=ctpfaq&page=index.htm which also includes past versions of the FAQ. This version of the FAQ covers information up to and including 24 Jul 2001 17:08:35 GMT and is based on 4125 postings. It also includes all comments emailed to me up to 24 Jul 2001 03:00:00 GMT. Version: 2001-04 ******************************************************************************** Printing: 1.1: How can I print a PDF that has the no-print option turned on by the author? 1.2: Can I print a PDF directly to a printer? Development using PDF: 2.1: APIs not developed by Adobe 2.2: Masks for images (PDF 1.3) 2.3: How can I determine the compression used in a PDF file? 2.4: ASCII85 encoding / decoding 2.5: Discussion areas for PDF developers 2.6: What is that stuff inserted into PostScript by Acrobat when printing encrypted PDFs? 2.7: Why are large GETs (PDF forms) truncated in Microsoft Internet Explorer? 2.8: How machine intensive is generating PDF documents dynamically? 2.9: Where can I find samples of xxx? 2.10: Problems with linking to web pages from within a PDF 2.11: "The file is damaged but is being repaired." 2.12: Compression algorithms 2.13: Bezier curve approximation 2.14: JBIG2 compression support 2.15: Rotated text 2.16: Centering text Acrobat: 3.1: Should I upgrade from 4.0 to 5.0? 3.2: Toolbar icons 3.3: Cutting text from a PDF and pasting into Word 3.4: Guiding your readers through your document 3.5: Printing from the command line 3.6: Acrobat on Linux character set problems 3.7: Control over OLE automation 3.8: Editting existing PDF files 3.9: Command line searching a PDF 3.10: Running multiple versions of Acrobat 3.11: Identifying the author of a document Distiller: 4.1: Licensing 4.2: Colour in distilled PDFs 4.3: Distiller page size problems 4.4: Output filenames Internet stuff: 5.1: Problems with IE displaying PDFs 5.2: Optimised, compressed, linearised? Arrrrgh! My brain hurts! 5.3: Encrypted PDFs and search engines 5.4: Document corruption in older web caches History: 6.1: How old is PDF? PDF tools from people other than Adobe: 7.1: Truetype capable PDF generators (not APIs) 7.2: Page extraction 7.3: Distiller equivalents 7.4: Extracting images from PDF documents 7.5: Why can't I just and paste the images 7.6: Inserting watermarks over pages in a PDF document 7.7: Linearization tools 7.8: Tools to concatenate PDFs 7.9: HTML to PDF conversion 7.10: PDF permissions tools 7.11: Inserting images onto existing PDFs 7.12: PDF Viewers 7.13: Problems with appending or extracting data from large documents 7.14: LaTeX to PDF eps graphics support 7.15: Printer drivers that produce PDF 7.16: Online PDF generators Conversion: 8.1: Converting a PDF document to raster / bitmap files 8.2: Converting TIFF documents to PDF 8.3: Converting a PDF document to an edittable format (such as RTF) 8.4: HTML to PDF Acrobat forms: 9.1: Methods of creating Acrobat forms 9.2: Images inside form elements 9.3: Using ASP to collect form values 9.4: Onscreen completion 9.5: FDF form filling with CGI 9.6: FDF, Java, and special characters 9.7: Emailling completed forms Online tutorials: 10.1: What online PDF tutorials are available? Specification things: 11.1: Is there a maximum document size for PDF? 11.2: Embedding a font into a PDF 11.3: Required font tables 11.4: Multiplatform cross book links 11.5: Converting from user space coordinates into millimeters 11.6: Large paper sizes Common questions about PDF: 12.1: Document sizes 12.2: The size of a scanned page 12.3: Reliability of Paper Capture 12.4: Is it possible to have viruses inside a PDF document? 12.5: Customizing bookmarks Other: Security: 14.1: eBook security =========== Printing =========================================================== 1.1: How can I print a PDF that has the no-print option turned on by the author? How can I print a pdf of which the security setting is changed so that it cannot be printed? Should it be converted or is there a crack to enable that function? Email the owner of the document and ask their permission to be able to print the document. One must presume that the author has chosen to not let you print for a reason. Michael Still (mikal@stillhq.com) ********** http://www.elcomsoft.com/apdfpr.html Vladimir Katalov (vkatalov@elcomsoft.com) ********** http://www.password-crackers.com/crack/guapdf.html Christian Koch (christian_koch@gmx.de) ********** http://www.ecn.purdue.edu/~laird/PDF/ Christian Koch (christian_koch@gmx.de) ********** http://www.cs.wisc.edu/~ghost/ Michael Mantz (michael.mantz@de.bosch.com) ********** [Please note that there is more discussion on this topic in the security section of this document] Michael Still (mikal@stillhq.com) -------------------------------------------------------------------------------- 1.2: Can I print a PDF directly to a printer? I'd like to print a pdf file directly to a printer (or at least without human intervention) from a program running in the background on a win32 type of machine. If you have a Postscript level 3 printer, its easy. Just send the file to the printer, since PS 3 devices can directly print pdf files. Acrobat is not required. To sent it to the printer you can use lpr on Win, or any Unix, PRINT on any Win platform, or COPY on any Win platform. On the Mac, you probably need to use a font download utility and Applescript. Dan Sideen (dansideen@home.com) =========== Development using PDF ============================================== 2.1: APIs not developed by Adobe What APIs are available, and under what terms? PDFLib, free for non-commercial use, http://www.pdflib.com ClibPDF, free for non-commercial use, http//www.fastio.com Panda, open source / free (GPL), http://www.stillhq.com ReportLab, open source / free (BSD license), http://www.reportlab.com Michael Still (mikal@stillhq.com) -------------------------------------------------------------------------------- 2.2: Masks for images (PDF 1.3) PDF 1.3 has documented a new "Mask" key that provides explicit masking ability. This is supposed to allow you to embed an image (say, a jpeg) along with a 1 bit-per-sample mask that indicates where the jpeg should and should not appear on the page. I haven't been able to get this to work -- the image just renders without respecting the mask. Does anyone have any experience with this feature? The PDF reference does not have any sample code that does this, so I would be interested in any examples you could throw my way. It would be easier if you provided an excerpt of the PDF you generated. Don't forget to put the ImageMask dictionary entry in the mask image, the mask's BitsPerComponent to 1, and no color space. Pierre Baillargeon (pb@artquest.net) ********** And another thing I noticed about legal masks is that the mask image XObject appears in the resource dictionary for the page it appears on. Ed Bomke (edb@discryptic.com) ********** Here are some fragments from one that works: 25 0 obj << /Type /XObject /Subtype /Image /Width 317 /Height 299 /BitsPerComponent 1 /ImageMask true /Length 524 /Filter /CCITTFaxDecode /DecodeParms << /K -1 /Columns 317 >> >> stream 26 0 obj << /Type /XObject /Subtype /Image /Mask 25 0 R /Width 317 /Height 299 /BitsPerComponent 8 /ColorSpace /DeviceRGB /Length 95613 /Filter /DCTDecode >> stream Aandi Inston (quite@dial.pipex.com) -------------------------------------------------------------------------------- 2.3: How can I determine the compression used in a PDF file? I got a few documents looking good and at a reasonable size for the final use. But months later I have to re-issue and I can't repeat the results. Other than trial and error, is there a utility or steps to determine the compression options used in a PDF? Open the PDFs using a txt editor which can afford binary data (like UltraEdit on Winxx) and search for the "/Filter" keywords. Helge Blischke (H.Blischke@srz-berlin.de) ********** http://www.enfocus.com/plugins.htm and look for the free Enfocus Browser plug-in. Enfocus Browser allows you to navigate the low-level object hierarchy in a PDF file, and view the PDF page description for a particular page. Filiep Maes (filiepm@enfocus.be) ********** Our Quite A Box Of Tricks product will tell you the compression used, image by image. This includes the level of JPEG used. This can be done with the free demo. Aandi Inston (quite@dial.pipex.com) -------------------------------------------------------------------------------- 2.4: ASCII85 encoding / decoding I'm looking for a piece of code to do ASCII85 encoding/decoding. Does anyone know where to get this? ftp://ftp.webcom.com/pub/haahr/src/encode85.c ftp://ftp.webcom.com/pub/haahr/src/decode85.c Tom Kacvinsky (tjk@ams.org) -------------------------------------------------------------------------------- 2.5: Discussion areas for PDF developers What mailing lists are available for PDF developers? You could try some of the following: - comp.text.pdf - PDFDev (http://www.pdfzone.com) - PlanetPDF developer's forum (http://www.planetpdf.com) Aandi Inston (quite@dial.pipex.com) -------------------------------------------------------------------------------- 2.6: What is that stuff inserted into PostScript by Acrobat when printing encrypted PDFs? Encrypted PDFs when printed have the following in them: % Removing the following eight lines is illegal, subject to the Digital Copyright Act of 1998. mark currentfile eexec 54dc5232e897cbaaa7584b7da7c23a6c59e7451851159cdbf40334cc2600 30036a856fabb196b3ddab71514d79106c969797b119ae4379c5ac9b7318 33471fc81a8e4b87bac59f7003cddaebea2a741c4e80818b4b136660994b 18a85d6b60e3c6b57cc0815fe834bc82704ac2caf0b6e228ce1b2218c8c7 67e87aef6db14cd38dda844c855b4e9c46d510cab8fdaa521d67cbb83ee1 af966cc79653b9aca2a5f91f908bbd3f06ecc0c940097ec77e210e6184dc 2f5777aacfc6907d43f1edb490a2a89c9af5b90ff126c0c3c5da9ae99f59 d47040be1c0336205bf3c6169b1b01cd78f922ec384cd0fcab955c0c20de 000000000000000000000000000000000000000000000000000000000000 Which stops Distiller from converting to PDF. What is it? % Removing the following eight lines is illegal, subject to the Digital Copyright Act of 1998. mark currentfile eexec The eexec'd code reads in cleartext: /currentdistillerparams where { pop /pdfmark where {pop (This PostScript file was created from an encrypted PDF file.\n) print (Redistilling encrypted PDF is not permitted.\n) print userdict /quit get exec }if} if currentfile closefile That means, if either your printer knows about currentdistillerparams and pdfmark or the PostScript job itself defines these operators (even as dummies, see note below), this code assumes you are going to re-distill the PS job which is forbidden. NOTE: The PostScript driver you use might insert statements like /currentdistillerparams where {pop} {userdict/currentdistillerparams{1 dict}put}ifelse /pdfmark where {pop} {userdict/pdfmark{cleartomark}put}ifelse or the like (see the recommendations in Adobe's Pdfmark Reference Manual). Helge Blischke (H.Blischke@acm.org) -------------------------------------------------------------------------------- 2.7: Why are large GETs (PDF forms) truncated in Microsoft Internet Explorer? Sometimes GETs are truncated for web pages (including PDFs), why is this? See http://www.networkice.com/Advice/Intrusions/2000608/default.htm for a secuity discussion on GET Data Overflow, which might explain why MSIE-transmitted URL-encoded strings from PDF "submit" are sometimes truncated at something a bit less than 4KB length. Bill Segraves (wsegrave@mindspring.com) -------------------------------------------------------------------------------- 2.8: How machine intensive is generating PDF documents dynamically? Is PDF like Postscript? What I'm trying to get an idea of is whether all those PDFLib function calls that are generating parts of the file are doing massive amounts of processing or are they just doing simple things like writing PS-like markup tags wrapped around data. PDF is conceptually similar to PostScript. It isn't hugely complex to generate, but there is an additional (small) overhead of not only generating the graphical information, but also generating a file structure and index. In general, the time taken to make PostScript and PDF directly should be comparable. I wouldn't describe either of them as at all like HTML, but in the sense that you aren't rendering to a bitmap or anything like that, they are similar. This is very true. There are also the aspects of compression of data within the PDF to be considered as well... These can be quite processor intensive. PDF is also pretty thingie about the format that bitmaps take, and it take take a fair bit of memory and time to convert the bitmaps to the right format. The operations aren't slow, it's more the fact that there can be millions of them. I would say that the similarity ends when you say that both formats (HTML and PDF) have structure. PDFs are pretty funky in their object layouts. I am not sure about the timing statement comparing PS and PDF though. I have generated large PS files (containing images), which have been quite slow, but the PDF has been much faster because of better compression support. I would think the best route with your ISP is to just do some trials and logging and see what happens. Also, I don't think pdflib does linearisation, which might cause problems with large documents online. Michael Still (mikal@stillhq.com) -------------------------------------------------------------------------------- 2.9: Where can I find samples of xxx? I need to obtain samples of streams that are encoded with each of the types of filters that PDF 1.3 supports. Is there a "test bank" anywhere? Ideally, the streams would be fairly small to facilitate debugging. Some of them, like FlateDecode and CCITTFaxDecode are easy to find. But for example, I haven't been able to locate an ASCIIHexDecode stream anywhere. Have you tried the spec? There is an examples appendix, as well as many examples spread throughout the document. I can't check your specific example for you, as my copy is at work, but you should be fine. Michael Still (mikal@stillhq.com) -------------------------------------------------------------------------------- 2.10: Problems with linking to web pages from within a PDF I would like to create a link to an external document with a remote go-to action. However using a URL file specification (chapter 3.10.4 of the PDF Reference) does not work. This is the created object in the pdf file: 1 0 obj << /Type /Annot /Subtype /Link /A << /S /GoToR /F << /Type /Filespec /FS /URL /F (http://www.tug.org/applications/pdftex/calculat.pdf) >> /NewWindow true >> /Rect [124.802 706.129 266.534 791.168] >> endobj It's clear from experiments that Acrobat does not support the full generality of what might theoretically be possible. If you create something Acrobat wouldn't do, then you may be stuck, especially with indirect file references, which tend to work only if Acrobat would expect them there. An Acrobat weblink would have an /A field more like /A << /S /URI /URI (http:...) >> Aandi Inston (quite@dial.pipex.com) -------------------------------------------------------------------------------- 2.11: "The file is damaged but is being repaired." I have created a PDF file and get the following error when trying to view the file in Acrobat Reader 4.0: "The file is damaged but is being repaired." No other information is given. The file then opens and I can view the document. The file has not been zipped or e-mailed. As a test, I created a PDF example from the Adobe Portable Document Format Manual Version 1.3 and I received the same error. In both cases, the result is messed up. Is there a way to either truly fix such a file, or at least "extract" the good parts? This normally means you have the offsets in the XREF table wrong, so they are regenerated. Make sure you are using an editor that will show you binary content, and remember to count each newline as two bytes if you are running on Windows. Michael Still (mikal@stillhq.com) ********** When I see this message it means "something is wrong" and it could be just about anything. The first thing to think of is: remember that PDF is indexed with exacty byte offsets -- so be sure you are using the right sort of line ending and counting all the bytes correctly. Aaron Watters (aaron@at.reportlab.dot.com) -------------------------------------------------------------------------------- 2.12: Compression algorithms Does anybody has a link to a site where i can download source code to do base85 encoding / decoding? http://dogma.net/DataCompression/SourceCode.shtml Guy Vdh (vdhguy@hotmail.com) -------------------------------------------------------------------------------- 2.13: Bezier curve approximation I'm trying to draw a pie slice, given the pie's center and the start and end angles. As far as I can tell, I need to use bezier curves to do the arc. Is there is an easier way, or if not, what do you use for the bezier's control points? You can calculate them like: $alpha = ($alpha * 3.1415 / 180); $beta = ($beta * 3.1415 / 180); my $bcp = (4.0/3 * (1 - cos(($beta - $alpha)/2)) / sin(($beta - $alpha)/2)); my $sin_alpha = sin($alpha); my $sin_beta = sin($beta); my $cos_alpha = cos($alpha); my $cos_beta = cos($beta); my $p0_x = $x + $a * $cos_alpha; my $p0_y = $y + $b * $sin_alpha; my $p1_x = $x + $a * ($cos_alpha - $bcp * $sin_alpha); my $p1_y = $y + $b * ($sin_alpha + $bcp * $cos_alpha); my $p2_x = $x + $a * ($cos_beta + $bcp * $sin_beta); my $p2_y = $y + $b * ($sin_beta - $bcp * $cos_beta); my $p3_x = $x + $a * $cos_beta; my $p3_y = $y + $b * $sin_beta; $x,$y ... center point of arc $alpha,$beta ... start/end angle of arc $a,$b ... x/y extens of fitting elipsis (for circle $a=$b) $p0_x,$p0_y ... start-point of bezier $p1_x,$p1_y ... control-point 1 of bezier $p2_x,$p2_y ... control-point 2 of bezier $p3_x,$p3_y ... end-point of bezier Mind that you cannot calculate correct arcs for abs($beta-$alpha)>180 using this bezier approximation, so if your arcs span more than 180 degrees split it into two using a middle angle. (alfredreibenschuh@yahoo.com) -------------------------------------------------------------------------------- 2.14: JBIG2 compression support The version 1.4 PDF specification allows JBIG2 compression to be used in PDF files. Has anybody been able to get Acrobat 5.0 create a PDF file that contains JBIG2-compressed images? I don't think it will. I couldn't do it programmatically. There is certainly nothing in the user interface. The support may be read-only at the moment (and I have no way to confirm even that). It may be that Adobe have done something unusually sensible with regard to changes: add support for reading the files, then wait another year before making anyone. This could make the transition a lot smoother. Aandi Inston (quite@dial.pipex.com) -------------------------------------------------------------------------------- 2.15: Rotated text I'm trying to draw rotated text, given the point where the text starts, and an angle of rotation (e.g., 90 degrees). So far, the text does not show up, so I guess my transformation matrix is incorrect, or maybe it's something else. If you know how to do this, could you show an example? To be more specific, what I'm trying to do is rotate text 90 degrees without "changing" the page's coordinate system. In essence, with x,y being a point in the page's "normal" coordinate system, the text is to start at x,y, except be rotated 90 degrees. What I tried was 0 1 -1 0 0 0 Tm The text matrix is the way to move text. The best approach in debugging the matrix is to do the mathematics yourself. Since the text starts at 0,0 in user space, calculate the transformation of 0,0 with your matrix and see where it is. (This assumes that cm has not also been used). Aandi Inston (quite@dial.pipex.com) ********** OK, what you want to do is save the graphic state ("q") then do the rotation, then restore the old state ("Q"). The rotation itself is also a bit more complex than what you have. For a single line of text it will be 0 1 -1 0 h 0, where "h" is the height of the text. Think about rotating a rectangle about the origin and you can see where the "h" comes from. Arne (arnet@hpcvplnx.cv.hp.com) -------------------------------------------------------------------------------- 2.16: Centering text I am a fairly competent PostScript programmer but now need to generate some PDF. I have the basic tree structure down, but I need to accurately position some text, to be specific I need to center some text. In PostScript it is easy to figure out the size of the string to be set then translate appropriately. This seems impossible in PDF, so how do you estimate the font metrics programmatically to set the text. To make it more difficult I am generating pDF from a Java Applet so small code size is a definite factor. You need to include a table of font metrics in the code - so you can calculate the width and height, and then use those to translate ( just like postscript ). Adobe provides the data for the builtin fonts, otherwise you are on your own. Dave Bloodgood (dabldgd@home.com) =========== Acrobat ============================================================ 3.1: Should I upgrade from 4.0 to 5.0? What are the new features/improvements? Too many to list here. Adobe's site presently has a ton of material listing all the new tools and features. Here are a few of my favorites. Workflow automation Database connectivity Online collaboration Much improved javascript editor Great new JavaScript objects and interface control Tools for adding PDF structure XML support Mr T (nert@bobco.com) ********** We just got our copy of Acrobat 5. First impressions are: 1. If you use forms, you can now use all available fonts, not just the base 13. 2. You can export pdf's to jpeg, tiff, and rtf. The rtf means that you can now create Word documents from PDF documents. We tried it, and the results were OK for text, but poor for anything with complicated formatting, like tables and columns. And non existant for graphics. 3. There are numerous improvements for the colour pre-press market, but we haven't evalutated them yet Dan Sideen (dansideen@home.com) ********** 4.05 has Paper Capture. It is an add-on in 5.x. Paper Capture is OCR. Use Acrobat to scan a doc and edit the doc in Acrobat subject to the rule that you must edit text one line at a time. Very useful when making extra copies of documents. (REMOVEraindoll@ziplip.com) ********** Adobe have now announced that the Capture plug-in is coming back for v5. It will be a separate download (for v5.00 users anyway) and is slated to be available about June 2001. Mark Anderson (mark@notmeyeardley.demon.co.uk) -------------------------------------------------------------------------------- 3.2: Toolbar icons I've created my own ToolButton in the default ToolBar but I'm not able to load the icon I've designed. I'm using the AVToolButtonNew method. What are the attributes of the icon: - type (bitmap, icon, ...) - size - colours? Aandi Inston (quite@dial.pipex.com) ********** The icon type is Icon the size is 18x18 with 256 colours. Giacomo (x-ray69@usa.net) ********** In my project it is a Bitmap, not Icon. This may be critical. Mine is also 20 x 20 with 16 colours. This may be less important. Refer to (I think) the ImageSel project; the documentation on this point is poor. Aandi Inston (quite@dial.pipex.com) ********** It displays just a question mark insted of my icon. I've also tryed to convert the .ico file into a bitmap and load it using the same procedure as shown in ImageSel project. No way. I'm banging my head on the wall... I need it to work out! Giacomo (x-ray69@usa.net) -------------------------------------------------------------------------------- 3.3: Cutting text from a PDF and pasting into Word Is there any way to cut and paste from a pdf to a word document or to another document in the adobe suite? I want to copy text from a pdf and put it somewhere else, but not as an image. First put Acrobat into continuous view mode. That way you can copy more than one page at a time, up to the entire document, but subject to clipboard size limits. Dave Braze (davebraze@yahoo.com) ********** Adobe Acrobat 5.0 also has a save as RTF (a format Word can open) option... Michael Still (mikal@stillhq.com) ********** In acroreader or acrobat just use the text select tool to select your text, copy & paste in the normal way. Steve Cook (steve.cook@spamulike.bigfoot.com) -------------------------------------------------------------------------------- 3.4: Guiding your readers through your document I received in the mail a pdf that when over a page, the hand icon has a down arrow on it and it functions to advance forward through the document when you click. I would like to add this feature to my documents but can find no reference to it in the Acrobat User's Guide. Does anyone know how to add this? You can do this in Acrobat by using the article tool. It allows you to guide your readers through the document. It is described in the Acrobat User's Guide from page 247 onwards. mschulz (mschulz@bigpond.net.au) -------------------------------------------------------------------------------- 3.5: Printing from the command line Can anybody tell me if a PDF commandline-tool is available, ie. I would like to execute a line like: PDFPRINT ACRO.PDF PRINTERXX [COPIES=4] resulting in 4 copies of the ARCO.PDF-document on the printer called PRINTERXX You can use the Acrobat Reader for printing PDF files via the command line (although xou can't set the number of copies). From the Acrobat Developer FAQ: "Using Command Lines with Acrobat and Acrobat Reader under Windows" These are unsupported command lines, but have worked for some developers. There is no documentation for these commands other than what is listed below. You can display and print a PDF file using command lines with Acrobat and Acrobat Reader. AcroRd32.exe filename - Executes the Reader and displays a file. AcroRd32.exe /p filename - Executes the Reader and prints a file. AcroRd32.exe /t path printername drivername portname - Initiates Acrobat Reader, prints a file while suppressing the Acrobat print dialog box, then terminates Reader. The four parameters of the /t option evaluate to path, printername, drivername, and portname (all strings). printername - The name of your printer. drivername - Your printer driver's name. Whatever appears in the Driver Used box when you view your printer's properties. portname - The printer's port. portname cannot contain any "/" characters; if it does, output is routed to the default port for that printer. If using Acrobat, substitute Acrobat.exe in place of AcroRd32.exe in the command lines." Gunther Schmidt (g.schmidt@bigfoot.de) ********** It has been reported that the acrord32 /t command line option does not work with Acrobat 5.0 It would appear to be a genuine bug, and Adobe apparently doesn't care, because the command-line switches are officially "undocumented". I went back to Acrobat Reader 4 because the /t switch doesn't work in Acrobat Reader 5. [This asnwer has been edited by mikal@stillhq.com] Edward Mendelson (edward_mendelson@ziffdavis.INVALID) -------------------------------------------------------------------------------- 3.6: Acrobat on Linux character set problems Sometimes, when trying to display a PDF file with Acrobat Reader (version 4.0 for Linux, from the Debian distribution), I get this message: Unable to extract the embedded font 'AHCAAI+CMSS17'. Some characters may not display or print correctly. You may set the locales for Acrobat Reader to english. You must edit the start script "acroread" which is normally located in /usr/local/Acrobat4/bin : ---------- #!/bin/sh # LC_ALL=C # <--- Add these export LC_ALL # <--- two lines ver=4.0 install_dir=/usr/local/Acrobat4/Reader ---------- Christian Koch (christian_koch@gmx.de) -------------------------------------------------------------------------------- 3.7: Control over OLE automation I have a general question in regard to OLE automation. I've built an application that controls Acrobat, and I'm trying to make it more robust. The problem I'm having is what to do when I try to open a non-existent file (a fdf pointing to a missing pdf file). Acrobat then displays a dialog box, and waits for user input. Since there's no user to click on the cancel or ok button, my program freezes waiting for Acrobat to respond. Is there any way to pass a "cancel" or something similar to Acrobat via OLE automation? You can get the name of the pdf file that the fdf points to using the FDFGetFile method from the Acrobat forms toolkit. Then use code to verify that the pdf exists (and display a suitable message if it doesn't) before trying to open it via the pdf. Dan Sideen (dansideen@home.com) -------------------------------------------------------------------------------- 3.8: Editting existing PDF files Using Acrobat 4 or 5, or any other program, is it possible to edit an existing PDF file: I need to add 1000 pictures (logos) to 1000 different, existing PDF-files. Well, yes, you can edit. One page at a time. You will need patience, or else some product that goes far beyond Acrobat. If using Acrobat, the easiest way to add a logo is to use the Object Touch-up tool, and copy/paste the logo from one PDF (which has it on a blank page, in the right place) to each new page. Aandi Inston (quite@dial.pipex.com) -------------------------------------------------------------------------------- 3.9: Command line searching a PDF Is there a parameter for having acrobat reader search for a specific >word? If anybody knows some other parameters, can you post them (or just point >me to a page where they are listed)? they can be so helpful. No. The Acrobat Developer FAQ lists some of them, but specifically notes that they are unsupported. I would strongly recommend that they are not used, and a supported method used instead. (If you do choose to use them, please don't complain when they stop working on an upgrade.) Aandi Inston (quite@dial.pipex.com) -------------------------------------------------------------------------------- 3.10: Running multiple versions of Acrobat I am working with Acrobat 4.05 and we have purchased Acrobat 5.0 (not the upgrade, the new one). Do I have to uninstall Acrobat 4 before installing 5.0? No, you can keep both. But do NOT uninstall 4 later if you don't do it first. Or, if you do, you will have to reinstall 5. Aandi Inston (quite@dial.pipex.com) -------------------------------------------------------------------------------- 3.11: Identifying the author of a document Does pdf involuntarily store information relating to the serial number/author of the acrobat install that was used to create the document? No. You can determine (often) the name of the software that produced the document and at what time though. Michael Still (mikal@stillhq.com) =========== Distiller ========================================================== 4.1: Licensing We want to use Distiller Server to create PDF versions of customer annual statements. These will then be made available to the customer over the internet. Is this a violation of the Distiller Server license agreement? I would normally have said not, but the FAQ says "Using Distiller Server, can I create an Adobe PDF file from one of my company's internal documents and publish it on the internet for someone outside my company? For example, if a customer requests a bank statement over the Internet, can I publish it in Adobe PDF?. No....[snip]" This sounds exactly like what we are doing, except that the document is not generated at the customer's request, but periodically. Also, it seems to be saying that no company documents can _ever_ be published on the internet, which can't be right!! Any advice or clarification? Tricky, it depends I believe on whether the customer will request these statements, or you will simply publish them anyway. The first is a violation, the second is (I think) not. It certainly appears that you can read it that way. This would certainly restrict the usefulness of the format. I *thought* the idea of the licence was to prevent you making a 'distillation service' available, whereby people could send you files and have them converted and sent back. Creating your own documents in PDF form would certainly seem reasonable, and if you used the standard Distiller, and paid someone to sit and convert the files, this would certainly be permitted under that licence. I think you may have to seek a legal opinion, or better, a clarification from Adobe. If you get one, I'd be interested in hearing it. Ken Sharp (ken@spamcop.net) -------------------------------------------------------------------------------- 4.2: Colour in distilled PDFs Should I be able to get the jpeg to appear in color in my pdf doc created through Distiller 3.01? Sure, use a colour driver. Aandi Inston (quite@dial.pipex.com) ********** Yes. Do not use the HP driver, but the one Acrobat installed in your system: Acrobat Distiller 3.01. Using a black and white driver makes your images black and white. Matti Vuori (mvuori@koti.soon.fi) -------------------------------------------------------------------------------- 4.3: Distiller page size problems Why is it the whenever I create a pdf file from Word I get a page size of 8x11? I have changed the paper size in the advanced properties of the Distiller printer driver to Postscript Custom Page size and specified the new size but it makes no difference. It doesn't matter whether I print to file and then use Distiller or use the Create Adobe PDF button in Word and Use Distiller that way. I had encountered the same problem and thought it was only on a German Windows2000/NT4 platform, since it didn't occurred on an English version that was at my disposal. May be the solution works for you as well. If you have a user defined format in Word and converting it to pdf via Acrobat Destiller does not output your format, try the following (only Win2000 or NT4): In the printer dialog (- start - settings - printers) in the uppermost menu-list you'll find the "file" menu. In the file menu you'll find the entry "server properties". In the "server properties" you can add a new format (like A4 or US-Letter). Name it "myformat" (or what ever you like). Fill in the your paper size (in the lower part of the dialog). Press the "save format" button. Now every printer that is listed in the printer-dialog-window has the "myformat"-format. Switch to Word. Look in the - file - "set up paper" (or whatever it's called in the English Word) dialog. The "myformat"-format should be displayed instead of "custompapersize". The new format is only local on your workstation (independent from your computer being connected to your LAN). If you want to convert doc to pdf on another machine, you must first add your self-defined paper size to the printer server properties as described above. No convert your Word document to pdf via the destiller. It should work correctly now. Andreas Wall (Andreas.Wall@shinkatech.com) -------------------------------------------------------------------------------- 4.4: Output filenames Is there a possibility to automatically generate PDF files with Acrobat Distiller without typing in the name for the output file? I heard that in Acrobat 3 patching some INI files works fine, but is there any chance to do the same in Acrobat 4? With a watched folder you can create automatically a pdf with the same name as the original postscript file. Veli Holopainen (veli.holopainen@kaleva.fi) =========== Internet stuff ===================================================== 5.1: Problems with IE displaying PDFs Does anyone have any suggestions that could help us understand why IE intermittantly will not get the PDF from the web server. We just get a blank screen occasionally, with 'Done' on the status bar. We have tried unoptimizing the DPF so as not to use byte serving. Web server is Apache 3.1.12 HTTP 1.1 on solaris UNIX. The client is NT4 + IE5. I've noticed that it depends (somewhat, not always) on how your serving up your PDF's. I have some software on my site that creates PDFs on-the-fly which wasn't working in IE (got the white screen or Adobe complained it wasn't a valid PDF) but worked fine in Netscape. From your message I take it you have the PDFs on disk already? Instead of writing the PDF stream out to the user's browser I started putting the PDFs on disk and doing a redirect. It seems to have alieviated the problem (server is NT with PDFs being created from Perl scripts). What was really strange was that when I sent the data directly to the browser, and Adobe complained in IE, I did a save to my disk to inspect the contents. Parts of the PDF were missing and even some of the internal structure, like what object the page info was stored at, was changed. I suspect that the plug-in for IE was modifying it (not sure what else could have). Perhaps the same thing is happening on your end? Even some static PDFs I have out there don't always come up properly in IE although in Netscape I don't think I've ever had a problem. Mike Bernardo (mbernardo@chartermi.net) -------------------------------------------------------------------------------- 5.2: Optimised, compressed, linearised? Arrrrgh! My brain hurts! What is the difference between an optomised PDF, a compressed PDF, a linearised PDF, and a cheese stick? These terms are confusing because many people mis use them (to a certain extent Adobe didn't pick very helpful terms either). The PDF specification uses the terms to mean: Optimised: the PDF has been laid out in the most graceful manner possible. For instance, you have saved a black and white image as a colour or grayscale image, which would take a lot more space. Compressed: some elements in the PDF are compressed. The whole document is not required to be compressed however. Linearised: the document has had it's internals rearranged so that byte serving will work. Byte serving is that thing you get on some web sites when only the page you are currently reading is downloaded... This means that you can flick through large documents without having to first download the entire thing. It is often called optimised by people who haven't read the PDF specification. Michael Still (mikal@stillhq.com) -------------------------------------------------------------------------------- 5.3: Encrypted PDFs and search engines I noticed that in Windows, PDF files have an extra 'PDF Properties' tab in file properties. It shows the 'General Info' (ctrl+D in Acrobat) of the PDF file. However, if the PDF is encrypted (eg. changing the document is not allowed), this info is not showed. Does anyone know if encrypting affetcs the indexing of PDFs (Acrobat Catalog, search engines)? Certainly, Catalog used not to be able to index encrypted PDFs. With Acrobat 5.0 it is now a plug-in, so it probably can. Many, perhaps most, search engines will not be able to index files that are encrypted. Aandi Inston (quite@dial.pipex.com) -------------------------------------------------------------------------------- 5.4: Document corruption in older web caches I'm having trouble. I've tested with Acrobat Reader 5.0.1 27.3.2001 & W2k (both SP1 &SP2) and IE 5.5SP1 & IE 6.0beta. Error message is "File does not start with '%PDF-'". I get the error message about 99% of the pdf viewing attempts, also with www.airtug.com/brochure.pdf When PDF files are fetched from the web, this is often in small pieces - a process called byteserving. Some proxy servers don't recognise that the pieces are separate, and so they give back the wrong pieces. This leaves the PDF files in quite a mess, as if they are broken up and glued together wrong. I don't know if that proxy has problems, but if a proxy didn't understand byteserving your symptoms are exactly what I'd expect. Aandi Inston (quite@dial.pipex.com) =========== History ============================================================ 6.1: How old is PDF? Does anybody know when PDF was inveted or introduced? The first PDF specification was introduced in 1993. PDF's roots go back much earlier, though, to the invention of Postscript. scott.ladd@maximal.com (Scott Robert Ladd) ********** Here is some information from the preface of Adobe's Pdf Specification. http://partners.adobe.com/asn/developer/acrosdk/docs/PDFRef.pdf Hope this helps. "THE ORIGINS OF THE Portable Document Format and the Adobe Acrobat product family date to early 1990. At that time, the PostScript page description language was rapidly becoming the worldwide standard for the production of the printed page. PDF builds on the PostScript page description language by layering a document structure and interactive navigation features on PostScript's underlying imaging model, providing a convenient, efficient mechanism enabling documents to be reliably viewed and printed anywhere. The PDF specification was first published at the same time the first Acrobat products were introduced in 1993. Since then, updated versions of the specification have been and continue to be available from Adobe via the World Wide Web. This book is the first version of the specification that is completely self-contained, including the precise documentation of the underlying imaging model from PostScript along with the PDF-specific features that are combined in version 1.3 of the PDF standard." DesQuite (desquite@hotmail.com) =========== PDF tools from people other than Adobe ============================= 7.1: Truetype capable PDF generators (not APIs) I am looking -- mainly for my Word-Docs -- for a free- or a share-ware pdf-creator (Win98) which can embed TrueType and Type1. Any idea? http://www.this.net/~frank/pstill_win.html Doug Milliken (bd427@freenet.buffalo.edu)) -------------------------------------------------------------------------------- 7.2: Page extraction Is there a way to automate the extracting of pages from a PDF document by using a script or batch process? I think we might have exactly what you're looking for. Our TK40 toolkit lets you define criteria (how pages break, what indexes exist, etc.) for extracting logical documents from a larger compound PDF file. http://www.maximal.com/products Scott Robert Ladd (scott.ladd@maximal.com) ********** http://www.reportlab.com/pageCatcher/index.html Dinu Gherman (dinu@reportlab.com) ********** Here are a few more (that aren't vaporware like TK40): Ari's PDF Splitter Pro (ask for the extraction version) (http://www.dionis.com) AppendPDF (http://www.appligent.com) Glance PDF CLT tools for Win/Unix (pdsel) (http://pdf.glance.ch) JW (18tni7m5x001@sneakemail.com) -------------------------------------------------------------------------------- 7.3: Distiller equivalents Are there any equivalents of Adobe distiller? GhostScript 7.0 supports almost all of the features of distiller. Alex Cherepanov (alexcher@erols.com) ********** http://www.ctrlp.com/freepdf.asp?st=pdf Ingemar Djurhuus (djurhuus@mail1.stofanet.dk) ********** Jaws Creator http://www.jawssystems.com/products/products_fs.html at $120 US, is an enhanced, commercial alternative to the original 5D program, which can still be downloaded for free at http://www.ctrlp.com/freepdf.asp?st=pdf According to the feature comparison chart, Jaws does almost everything Adobe does, including recognizing Word 97 and 2000 subcomponents. Steve Auerweck (steven.auerweck@verizon.net) -------------------------------------------------------------------------------- 7.4: Extracting images from PDF documents How can I extract images from a PDF file? You could try xpdf's pdfimages (http://www.foolabs.com/xpdf) Jorma Heimonen (Jorma.Heimonen@kone.com) ********** Several ways... and I have used all of these, often just because it's easier than figuring out where I stored the original source imagery (or when I actually want the presented composite image, and not the original raw image). * Some applications, like PhotoShop and Illustrator can open an individual PDF page and edit it. Illustrator is ideal for this, because it preserves all of the page elements as individually selectable entities, and further, preserves their vector or raster nature (and probably the color model). PhotoShop may rasterize the whole PDF page as a single object. Rasterizing entire pages at high resolutions results in HUGE data objects. And unfortunately, unless you can calculate precisely what dpi to use to get 1:1 source:captured pixels, you want to oversample by at least 2x.Run the numbers. Neither Adobe app, however, will open the page if it has any kind of security (open or admin security). Further, embedded fonts may be incorrectly rendered. * Reprint to .eps If you can configure your PostScript driver to a. print-to-file, and b. print in EPS format you can print the page of interest to page.eps, then edit it with whatever EPS-capable image editor you have. Some apps, like Adobe FrameMaker, can import an entire PDF page as a referenced graphics object. You can then reprint it as .eps. FM won't import pages from secured PDFs. * Acrobat and Acrobat Reader can select rectangular sections of a page - if (big if) selecting graphics is allowed in the file (see File:DocInfo:Security). On Windows, place the mouse cursor over the [T] text selection tool, hold the left mouse button down, and let up over the "graphics select tool" icon. Outline the image desired. Zoom to the screen resolution desired. Edit:Copy or [Ctrl[c]] Selected image is now on clipboard. The selected area can be off-screen, and even off logical desktop, but you will be limited on some systems to a maximum graphics object size - 32MB for Windows. If you get a nastygram dialog box, zoom out until you don't. * Screen copy The last resort is to use whatever tools the OS provides, or are available in the freeware, shareware or aftermarkets to perform a full-screen-copy or window-only-copy. On MS Windows, the [PrntScrn] and [Alt[PrntScrn]] keys do this. Zoom to desired size - but the entire desired image must [usually] be on-screen. Generally, unless you can run the numbers and match the object raster res to 1 pixel per screen pixel, you want to zoom as large as possible and over-sample to minimize re-sampling damage. If your graphics card supports large logical desktops, as many Matrox cards do, the image can be partially off-screen as long as it is entirely in the card's on-board RAM, i.e. is entirely on the logical desktop. The 32MB Windows GDI limit applies. All of the above assumes that you own or have permission to re-use the image(s) in question. Bob Niland (rjn@fc.hp.com) -------------------------------------------------------------------------------- 7.5: Why can't I just and paste the images I am astonished that this is (apparently) such a complex job. If I want to take a pdf graphic and move it to say, Word, I simply select what I want with the graphic tool (in Exchange), and paste into my Word doc. And I get the graphic - some or all of it depending on what is selected ... Am I missing something here? Possibly. What you describe has some serious limitations: 1. If the document has inhibited "selecting text and graphics", it won't work at all. You'll be limited to screen capture using host OS tools. 2. Even when it does work, you get a RASTER image of the entire selection area. You can't easily separate elements or even eliminate overlay text. Further, if the original graphic was vector, you still get raster - not as scaleable - and often vastly larger. 3. You get that raster at screen resolution. This means that even in the case of raster originals, you are either under- or over-sampling the original, with potential damage to the image. Normally, you need to select the area, then zoom until the "copy" fails due to the object size (32mb in Windows), then down-size the resulting object in your image editor. This minimizes re-sampling artifacts. Bob Niland (rjn@frii.com) ********** Yes. What you describe is equivalent to faxing yourself a copy of the image. It's not the image itself. Not only is it no longer in PDF format but it's limited to the dot-density of your screen, so while it's OK for a Web site, the low resolution will be horribly obvious if you try to print it. Adobe went out of their way to make it _difficult_ to extract a real image from a PDF file, under pressure from publishers who don't want their expensively-generated imaged being pirated. Peter Flynn (peter@silmaril.ie) -------------------------------------------------------------------------------- 7.6: Inserting watermarks over pages in a PDF document What tools can help me insert a watermark over pages in a PDF document? http://www.reportlab.com/pageCatcher/index.html Aaron Watters (aaron@at.reportlab.com) ********** StampPDF Batch will allow you to do this easily. More information on StampPDF Batch; including documentation and online demos, can be found on our web site at http://www.appligent.com Mark Gavin (mgavin@appligent.com) -------------------------------------------------------------------------------- 7.7: Linearization tools What non-Adobe linearization tools are available? [See earlier questions for a description of linearization itself] http://www.pdfzone.com/products/software/tool_pdlinearize.html Bryan Guignard (bryang@sympatico.ca) -------------------------------------------------------------------------------- 7.8: Tools to concatenate PDFs What tools are available to concatenate PDFs? http://www.appligent.com/ Aandi Inston (quite@dial.pipex.com) ********** I'm working on a PDF Append application (Freeware), but it has yet to be beta tested. You're welcome to try it out, but keep in mind I'll assume no legal responsibilities (normal legal stuff). Anyway, it can be downloaded at: http://www.northlandpublishing.net/append.zip Timothy L. Jordi (jordi@northlandpublishing.net) -------------------------------------------------------------------------------- 7.9: HTML to PDF conversion Is there somebody who knows a solution to convert HTML files to PDF files on a linux system? Try HTMLDOC from http://www.easysw.com/htmldoc/ Matthias Haeusser (matthias.haeusser@t-systems.de) -------------------------------------------------------------------------------- 7.10: PDF permissions tools What tools are available to manipulate permissions on PDF documents? PDF Crypt What is pdfcrypt? pdfcrypt is a very flexible and powerful program. pdfcrypt allows you to set permissions to a PDF-file. For example you can publish a document without to allow to print it. The button to print the file will be disabled in Acrobat Reader application. It's simple to use it like a batch application to set permissions to every PDF in you archive. It's simple to use it like a pipe application. It's simple to use it inside your cgis. We distribute only executable versions (if you need ask us the original PERL code). Download it and test it! http://www.sanface.com/pdfcrypt.html SANFACE Software (sanface@yahoo.com) -------------------------------------------------------------------------------- 7.11: Inserting images onto existing PDFs Using Acrobat 4 or 5, or any other program, is it possible to edit an existing PDF file: I need to add 1000 pictures (logos) to 1000 different, existing PDF-files. For the logos, if you know where you want to put them you can use PageCatcher for this. http://www.reportlab.com/pageCatcher/ Regarding searching usenet: if you go http://www.deja.com you will be redirected to google's usenet searching service which will save you some time (try the advanced search option where you can control the search with a lot of options -- list, keyword, sort order, phrase, etc.) Aaron Watters (aaron@at.reportlab.dot.com) ********** Have a look at www.enfocus.com for PitStop Pro or PitStop Server. PitStop Pro is a plug-in for Acrobat 4/5. PitStop Pro has many edit capabilities. If you want to automate a task then you can make use of Action lists and apply it to each file in Acrobat by opening each file or do it in a batch using PitStop Server. PitStop Server is a standalone product. All our products have a 30 trial period. Filiep Maes (filiepm@enfocus.be) -------------------------------------------------------------------------------- 7.12: PDF Viewers Does anybody knows if there is another program which reads pdf files besides Adobe Acrobat? I ask this because my Acrobat reader doesn't start, it says that there is an error in the file msvcrt.dll. You can read them with GSview , which requires GhostScript . Install GhostScript first, then GSView. Matti Vuori (mvuori@koti.soon.fi) ********** Freshmeat is your friend... http://freshmeat.net/search/?site=Freshmeat&q=pdf+viewer laird@gunsmoke.ecn.purdue.edu (Kyler Laird) -------------------------------------------------------------------------------- 7.13: Problems with appending or extracting data from large documents http://www.appligent.com/ - AppendPDF. http://www.dionis.com/ - PDF Splitter Pro I have tried both of these tools, as well as pdsel (pdf.glance.ch), and none of them works. AppendPDF eats up memory and dies, Ari's PDF Splitter can only split into single-page fragments, and pdsel doesn't work on files greater than several hundred pages. What other solutions are there? You may have about covered it. Did you contact the tech support, especially for AppendPDF? It is designed for very heavy duty use, so a major memory leak is a little surprising. Aandi Inston (quite@dial.pipex.com) ********** You might try the PDF Handshake product from HELIOS (www.helios.de), that includes a pdfcat utility that can concatenate as well as extract arbitrary page ranges from an Unix command line. Jens-Uwe Mager (jum@anubis.han.de) -------------------------------------------------------------------------------- 7.14: LaTeX to PDF eps graphics support I have a LaTeX file with some included *.eps graphics. If I create a *.ps document of this everything works fine. Creating a *.pdf document, all the graphics are not shown - only the captions are displayed. Does anybody know a solution for this problem? Can you recommend me a package to make the *.pdf files? \usepackage[pdftex]{graphicx} <-- you *really need to use the graphicx pkg) \usepackage{epstopdf} if you put these two commands before your \begin{document} and run the file (barring the fact that there are no other complications) it will not only create pdf files of your EPS graphics files, but include them in your final PDF document... really neat too ;-) you can get the epstopdf.sty from CTAN in macros/latex/contrib/supported/oberdiek Mimi Burbank (mimi@csit.fsu.edu) -------------------------------------------------------------------------------- 7.15: Printer drivers that produce PDF What non-Adobe PDF generating printer drivers are available? I thought you might like to know about pdfFactory, our new PDF printer driver for Windows that creates PDFs. There is a free trial version available at http://www.fineprint.com. The registration fee is $39.95. Among it's features: - ability to seamlessly combine multiple print jobs (even from separate applications) into a single PDF file - pdfFactory has a Send button to email a PDF to someone without having to Save As and type a file name - easy font embedding UI - preview jobs before freezing into PDF, ability to delete pages - all print jobs automatically saved for later retrieval if you need them again (especially useful for transient content like Web page searches, ecommerce receipts, emails that get deleted, etc.) - smaller files than PDFWriter (usually, but not guaranteed) The current version does not support non-ansi charset fonts or security. We will be adding those features shortly. When use with FinePrint, our other product, you can create PDF booklets, multi-up renderings, watermarks and much more. Check it out at http://www.fineprint.com Jonathan Weiner (jonathan@singletrack.com) ********** BCL Computers just released the Beta version of their new PDF printer driver for Windows 2000. To download a copy, visit http://www.bcl-computers.com. Rachel Burnsed (burnsed@bcl-computers.com) -------------------------------------------------------------------------------- 7.16: Online PDF generators What online PDF generators are available? goBCL is a free service for converting your files to PDF. PDF to HTML conversion is also available. http://www.gobcl.com Rachel Burnsed (burnsed@bcl-computers.com) =========== Conversion ========================================================= 8.1: Converting a PDF document to raster / bitmap files Is there an easy way to turn a PDF into a TIFF [or other raster / bitmap files]? I've tried exporting as an EPS then opening it in Photoshop, but Photoshop is giving me an error message. I just did it with Illustrator 9. Jim K (jkajpust@concentricRATS.net) ********** Try using ghostscript from http://www.ghostscript.com Michael Still (mikal@stillhq.com) ********** Try Konvertor_pdf2xxx (http://www.logipole.com) Jean Piquemal (j.piquemal@wanadoo.fr) ********** BCL Computers' Freebird software is available as a command line program. Freebird converts PDF to TIFF, JPEG, and BMP. For more information, visit our website at http://www.bcl-computers.com Rachel Burnsed (burnsed@bcl-computers.com) ********** Try ghostscript, e.g.: gs -sDEVICE=jpeg -sOutputFile=temp.%03d.jpg -c save pop -f your_file.pdf This will create one jpg file for every page, the "%03d" being replaced by a 3 digit page count. Helge Blischke (H.Blischke@srz-berlin.de) -------------------------------------------------------------------------------- 8.2: Converting TIFF documents to PDF I'm trying to convert TIFF files to PDF files. Preferably, I'd like to do this on a Unix (Solaris) server, and it definately has to be something that can run in the background, unattended. PDFlib does this. And yes, we will answer your support mails :-) There's also a dedicated PDFlib mailing available. Thomas Merz (tm@pdflib.com) ********** Panda is a free (an in GNU GPLed) API which runs on various Unices, Linux and Windows and can do this sort of conversion work. Have a look at http://www.stillhq.com for more details Michael Still (mikal@stillhq.com) ********** `convert' (part of of the ImageMagick tools) can translate almost any image format to almost any other. So `convert your.tiff your.pdf' will do what you want. I just tried it on a tiff file lying around and it worked fine. Convert is tremendously useful. On Linux it is part of the ImageMagick RPM package. For Solaris I guess you have to get the source and compile. If rpm runs under Solaris, you might be able to get the src rpm and do rpm --rebuild ImageMagic...src.rpm where the ... depend on what version you get. Go to and you'll be redirected to page pointing to tons of versions of the RPM (for various releases of many different linux distributions). They list http://www.imagemagick.org as the source for the software but it seems to be down right now. Sanjoy Mahajan (sanjoy@skye.ra.phy.cam.ac.uk) -------------------------------------------------------------------------------- 8.3: Converting a PDF document to an edittable format (such as RTF) How can I eliminate certain page breaks in PDF-documents like in word with Acrobat or in any other way ? You could convert the PDF file to another format. BCL Computers' Drake software will convert PDF to RTF for editing in Word; however, it will also preserve page breaks. If you use BCL Magellan for converting PDF to HTML, and select the "HTML3" option, you should get an output file which can be opened and edited in Word. Free demos of these Acrobat plug-ins are available at http://www.bcl-computers.com. Rachel Burnsed (burnsed@bcl-computers.com) -------------------------------------------------------------------------------- 8.4: HTML to PDF My question is how to convert a html file into pdf file. Are there any free-soft for doing this? HTMLDOC does this: http://www.easysw.com/htmldoc/ Michael Sweet (mike@easysw.com) =========== Acrobat forms ====================================================== 9.1: Methods of creating Acrobat forms We use Word for producing forms, which I then dutifully run thru Distiller to get pdfs. Now I'd like to go to the next stage, which is putting in form fields so people can fill them out onscreen rather than on paper. Now the only option that I can see is that either (a) I get busy with the Acrobat Exchange form tool (boring, and the forms are vveeerryy long) or (b) I get the people who create the forms to put in some marker in Word that gets converted by a Acrobat add-on to Acrobat fields or (c) install Acrobat on form producers' PCs and train them up (not more attractive than (a), really, and it's certainly expensive). Can anyone advise on the possibilities of (b)? This must be a relatively common problem. I've checked whether Exchange converts Word forms into Acrobat ones: no, it doesn't (in v3, not sure about v5). Here are your options, and there aren't many. Learn all the intricacies of the Acrobat forms tools. It has many time saving shortcuts you can use. Learning how to duplicate and rename fields effectively can really cut down your development time. Also learn to "borrow" ready made stuff from other forms. You can easily cut your development time in half or less by using these shortcuts. There's an excellent forms tutorial that explains these tricks. It's here: http://www.planetpdf.com/mainpage.asp?webpageid=1155 Since you are using Word, you should seriously consider Caere Omni form. What it does is it opens a Word form and converts it to an Omni form, which can then be saved as a PDF form or HTML form. The conversion is not perfect, but you can expect about a 90% accuracy rate. It will even convert simple form functions and calculations for you. I highly recommend it, and the price is right. Another option is to import your Word files into a PDF savvy app like FrameMaker or PageMaker, and let them perform the PDF conversion. FrameMaker has a feature called PostScript frames which allows you to manually enter the PostScript or pdfmark code required for generating PDF form fields. PageMaker has a third party plugin you can purchase. It simulates the Acrobat forms tool. I have yet to find a way to do this with Ventura. And the last way I know, is to include pdfmarks directly in your Word documents, by inserting them through Word's field codes. Thomas Merz has an excellent pdfmark primer that explains how to do most of these procedures. It's free. Get it here: http://www.pdflib.com/pdfmark/index.html Bryan Guignard (bryang@sympatico.ca) -------------------------------------------------------------------------------- 9.2: Images inside form elements Can anyone confirm that it is not possible to put an image (jpg/gif) into a form element? I would be happy to convert the image to pdf and then placing it into the form if that needs be. I want to make a pdf document service on our website that provides our customized data and allows the user to upload their logo to place into the document. At this point, I think we need a form (FDF) which will act as a template container, into which we can populate from the database and HTML form, which will include images..... Can anyone provide some pointers, even if to say, its not possible. If you make the form of type Button, and select Icon only to display, you can choose a pdf as the icon. this effectively puts a pdf inside a pdf, and, if the pdf is an image, your problem is solved. Its even easier in Acrobat 5, which allows you to select most bit mapped image formats as icons for buttons. Dan Sideen (dansideen@home.com) -------------------------------------------------------------------------------- 9.3: Using ASP to collect form values Can I use ASP to collect field values from a pdf form? Yes, you can. Put the field values into session variables, then assign the variable name to a form field using the FDFSetValue method available in the FDF toollkit from Adobe. You can code in VBScript (preferred) or Javascript. Here is some sample code that does that, then displays the data in a "master" pdf Set FdfAcX = Server.CreateObject("FdfApp.FdfApp") Dim objFdf Set objFdf = FdfAcX.FDFCreate name = Session("name") address1 = Session("street") & " " & Session("poboxorfloor") address2 = Session("city") & " " & Session("province") & " " & Session("postalcode") title1 = Session("title1") title2 = Session("title2") email = Session("email") objFdf.FDFSetValue "name", name, Off objFdf.FDFSetValue "title1", title1, Off objFdf.FDFSetValue "title2", title2, Off objFdf.FDFSetValue "address1", address1, Off objFdf.FDFSetValue "address2", address2, Off objFdf.FDFSetValue "email", email, Off objFdf.FDFSetFile "http://warehouse.informco.com/asp_projects/liberty/lib06_D.pdf" Response.ContentType = "application/vnd.fdf" Response.BinaryWrite objFdf.FDFSavetobuf objFdf.FDFClose Set objFdf = Nothing Set FdfAcX = Nothing Dan Sideen (dansideen@home.com) -------------------------------------------------------------------------------- 9.4: Onscreen completion Many forms are available in PDF format. I was wondering if there is a package that will allow the user to fill them out onscreen before printing them out? Note that I am not looking for the user to edit them, only to fill the blanks. Acrobat can make any (unprotected) PDF file into a fillable form. Try for instance http://www.quite.com/box/qboxbuy1.pdf (it's an order form, but don't worry, you'd have to print and fax it to order). Aandi Inston (quite@dial.pipex.com) ********** Acrobat Reader, if the forms already have fillable form fields. Acrobat (full distribution), if you need to add any fields, including submit/import methods. Once you've added the required fields, Acrobat Reader, operating as a plugin in your browser, as well as a live TCP/IP connection, is enough to fill, submit, and import data with your form. If you've deduced this means you can save form data (not PDF, but FDF), using Acrobat Reader, you are correct. See http://segraves.tripod.com/index3.htm for a couple of simple examples. You DO NOT have to be connected to the internet to do this. It works fine with a web server running on the same computer with the browser, i.e., "localhost". Bill Segraves (wsegrave@mindspring.com) -------------------------------------------------------------------------------- 9.5: FDF form filling with CGI Could you please tell wheter and how it is possible to use CGI to fill in a PDF-form Yes. Most easily, by the CGI generating an FDF file and delivering that. Aandi Inston (quite@dial.pipex.com) ********** No and Yes! No: CGI can't do it, as it is a standard, i.e. Common Gateway Interface. Yes: You could write a script in a CGI-compliant language, e.g., Perl, that would generate a FDF file with the field names and associated values in it. For an example, please see http://segraves.tripod.com/index3.htm. You can see the required format of the FDF by simply submitting a blank form. Bill Segraves (wsegrave@mindspring.com) ********** Get the adobe FDF toolkit from www.adobe.com. It contains Activex controls and is useable in C++, VB, VBS, ASP and Java to automate the creation of separate fdf files. The populated forms can then easily be displayed inside a browser just by pointing to the fdf. This loads the "parent" pdf, and fills the fields. Dan Sideen (dansideen@home.com) -------------------------------------------------------------------------------- 9.6: FDF, Java, and special characters When I do a setValue("fieldname", a Java String, false) if the java string has a french character inside (like an accented character), they are transformed in the document and not property I use the Times Roman font in the fields. The Java version of the FDF toolkit is horribly broken with respect to special characters. It uses the Java Native Interface (JNI) for attaching C libraries to Java, and the JNI relies on UTF-7 encoding. This doesn't seem to be implemented in the FDF toolkit, so the Java FDF toolkit is unusable from a European point of view. I haven't yet tested the new version 5 of the FDF toolkit. Thomas Merz (tm@pdflib.com) -------------------------------------------------------------------------------- 9.7: Emailling completed forms When I create the form with the full version of Acrobat 5.0 using a java script -- the email function works great. When I access the form with Acrobat Reader 5.0 the function does not work. Does anyone have any recommendations on the best way to email forms using the Reader? Use a CGI or ASP. Except in very limited circumstances (centrally controlled intranets) sending direct e-mail is completely impractical. This limitation is nothing to do with Acrobat. Aandi Inston (quite@dial.pipex.com) =========== Online tutorials =================================================== 10.1: What online PDF tutorials are available? Are there any tutorials online which deal with PDF related issues? I've a number of tutorials on my site (see www.yeardley.demon.co.uk/pdf.html and www.yeardley.demon.co.uk/morpdf.html) that might help folk & your FAQ. 4 spring to mind: 1. PDF on how to set up Reader v4.x to run from PC 2. PDF explaining the v4.x Acrobat/ABT/Reader toolbar and tool shortcuts. 3. PDF showing how to create Doc level JavaScripts. 4. PDF / Word97 DOC for troubleshooting Reader v4.05 (Win) installs (icons, double-click link, etc.) Mark Anderson (mark@yeardley.demon.co.uk) =========== Specification things =============================================== 11.1: Is there a maximum document size for PDF? Is there a size limit, (either in terms of filesize or page numbers) for pdf files? We are looking at creating a document of around 1000 pages and would like to know if it's feasible before we start. Check out the Acrobat SDK on http://partners.adobe.com/ - not because you need the contents, but because it has a file (Core API Reference) which is close to 3000 pages and heavily hyperlinked. It works well for me. Aandi Inston (quite@dial.pipex.com) ********** I have personally created 10,000 page PDF documents, admittedly on fairly large machines with a lot of RAM (4gb plus). Whilst the PDF specification does not specify a maximum filesize (that I can find), the maximum byte offset for an object reference is a 10 digit number, which means that I would imagine that the absolute maximum file size would be 9,999,999,999 bytes or about 10gb (page 74, edition 2). I think you would have to have a pretty meaty machine to be able to open the document anyway at that point. Michael Still (mikal@stillhq.com) ********** We regularly deal with PDFs containing 50,000 to 100,000+ pages, so I think your 1,000-page document will be fine... That said, be aware that Acrobat itself can sometimes have trouble with very large PDF files, especially those a gigabyte or more in size. Scott Robert Ladd (scott.ladd@maximal.com) ********** We have created Acrobat documents of up to 20,000 pages (mutual fund statements for archiving) without problems. Files can be several Gb in size, so make sure that you have lots of disk space. Dan Sideen (dansideen@home.com) -------------------------------------------------------------------------------- 11.2: Embedding a font into a PDF I have a plug-in that adds a text banner to the foot of each page in a PDF. How do I correctly add a font resource, e.g. Arial, to the PDF so that the banner prints correctly. The PDFs are straight bitmap conversions so have not font info. If you are using the Cos layer this is an interesting and serious challenge. If you are using the PDFEdit layer, this does include options to create and embed a font from a "system" (i.e. installed) font. Since Arial is a virtual clone of Helvetica, that particular example would not be worthwhile. The base fonts like Helvetica are guaranteed to print without embedding. Aandi Inston (quite@dial.pipex.com) -------------------------------------------------------------------------------- 11.3: Required font tables Can anyone verify which of these tables are required when embedding a true type font? I've got 3 sources that say 3 different things. OS/2 cmap cvt fpgm glyf head hhea hmtx loca maxp name post prep Depends on what you means by embedding. You need to include these tables in your embedded file String[] tables = { "head", "hhea", "loca", "maxp", "cvt ", "prep", "glyf", "hmtx", "fpgm", "cmap" }; but you need to read some of the others (such as OS/2) to get values you need during the embedding process. John Farrow (jfarrow@visualprogramxming.co.nz) ********** cvt, fpgm, and prep are definitely not required by Acrobat, although the PDF reference may specify otherwise. Thomas Merz (tm@pdflib.com) -------------------------------------------------------------------------------- 11.4: Multiplatform cross book links Does anyone know why cross-books links (from an index file to another file) will not work on a Solaris machine while they will work on a PC? The naming of the PDF files and the links are all lowercase to avoid upper/lowercase problems. To work cross-platform, the file names of the destination files must be stored as "generic" names in the PDF file. If they are platform-specific, they won't work on other platforms. Helge Blischke (H.Blischke@srz-berlin.de) -------------------------------------------------------------------------------- 11.5: Converting from user space coordinates into millimeters What is the best method of converting user space units to millimetres. I multiply by 25.4 and divide by 72. Aandi Inston (quite@dial.pipex.com) -------------------------------------------------------------------------------- 11.6: Large paper sizes I am trying to understand if there is any paper size limitations for the PDF files. If so, what? My objective is to create postscript files (PS) from all the corporate applications, and then from PS, create a PDF. The corporate applications will include AutoCAD, Word, Images, some Graphics apps and so on. I searched Adobe and many other web sites regarding this, but I could not get any info. Is there anyone out there that has generated PDF's for larger sizes like D or E drawings sizes? The PDF 1.2, it has the 200 inches x 200 inches, with the PDF 1.2 having the 40 inches x 40 inches, where it might be 1.3 and 1.2. So, the PDFs can be the size of the drawings. Ross (apex@calpha.com) =========== Common questions about PDF ========================================= 12.1: Document sizes I would know what is the size per page of a pdf document when using a scanner : a) text only b) text and pictures Impossible to say. It depends first and most importantly on the resolution. 400 dpi is 1600% larger than 100 dpi. Beyond that everything depends on the kind of original, other scanning settings, and how you make the PDF. DON'T scan directly to Acrobat UNLESS you run Paper Capture. Aandi Inston (quite@dial.pipex.com) -------------------------------------------------------------------------------- 12.2: The size of a scanned page I have tried to make a decent pdf but quality and size is hard to get into an acceptable range. I've tried to scan an original consisting of 5 pages clean quality paper document (A4) but the result becomes a 1.7MB PDF - readable but poor and ragged screen text. Acceptable when printing but the byte size is hardly acceptable for the web? We find an a4 page, once G4 fax compressed (CCITT Group 4) is about 50kb. Michael Still (mikal@stillhq.com) ********** Pure scans are rarely suitable for the web. They are too large. That is a key reason to use Paper Capture. However, you may be OK with a pure black and white scan. If you are not going to run Paper Capture, do NOT scan direct to Acrobat. Use Distiller. Scan to Acrobat (and import images to Acrobat 4) ONLY as a preparation for Paper Capture. Aandi Inston (quite@dial.pipex.com) ********** I've noticed a strange thing about scanning int Acrobat: Scanned (grayscale) the first of 5 A4 docs, saved it and it was 51KB. Then, not altering any settings, I scanned all 5 pages consecutively and saved the PDF was 2MB. I don't get the math here! Seems to be a waste of time to scan import directly into Acrobat. Kent Isaksson (kentisaksson@netscape.net) -------------------------------------------------------------------------------- 12.3: Reliability of Paper Capture How reliable is Paper Capture at OCRing scanned text? Paper Capture is good but it really doesn't produce a reliable result which is the case of all OCR programs I've seen. It mistakes 0 for O and 6 for 8 etc and production hours soars. Kent Isaksson (kentisaksson@netscape.net) -------------------------------------------------------------------------------- 12.4: Is it possible to have viruses inside a PDF document? It doesn't seem possible to me, as I didn't know there could be executable code in a PDF. While there is no code inside a PDF file, there cannot be any kind of virus. AFAIK. No, this is incorrect. PDF can contain embedded executable code (in a wide variety of ways), which can in turn contain malicious code like viruses, worms and trojans. However, executable code must first be extracted from the PDF file with Acrobat's built in tools or with some other tool, before it can be executed so the malicious code can activate itself. That is why when you extract file annotations with Acrobat there is a warning window that appears letting users know of the potential for malicious code. Non Acrobat may or may not offer such warnings. This is fully documented in the Acrobat online guide. MacAffee now has virus detecting software specifically for PDF files. Bryan Guignard (bryang@sympatico.ca) -------------------------------------------------------------------------------- 12.5: Customizing bookmarks I'm just tired of the bland icons that Adobe serves up for use when creating bookmarks. Is there a way to make different symbols or perhaps even colors for use in the bookmarks section? I use pagemaker and photshop to create pdf files and the default bookmarks take away from the overall look and feel. Acrobat 5 allows you to specify text formatting such as Bold, Italic, and different colors for the text part of bookmarks. I'm not aware of any way to change the icon. it would be nice if Acrobat had some different bookmark icons to choose from, the way you can change text annot icons. Bryan Guignard (bryang@sympatico.ca) ********** Versions of Acrobat prior to 5.0 do not understand color entry in the outlines (or text styles); they ignore them. MicroPress, Inc (support@micropress-inc.com) =========== Other ============================================================== =========== Security =========================================================== 14.1: eBook security How secure are ebooks? ElcomSoft Co. Ltd. has released Advanced eBook Processor, a Windows ME/98/95/NT4/2000/XP program that makes it easy to remove both password encryption and usage restrictions from Adobe Acrobat PDF files and eBooks. The latest addition to ElcomSoft's family of password recovery software allows business managers to deal with lost and destroyed passwords, as well as with employees who, intentionally or unintentionally, are unable to edit and print password-protected PDF files. Advanced eBook Processor lets users make backup copies of eBooks that are protected with passwords, security plug-ins, various DRM (Digital Rights Management) schemes like EBX and WebBuy, enabling them to be readable with any PDF viewer, without additional plug-ins. In addition, the program makes it easy to decrypt eBooks and load them onto Palm Pilot's and other small, portable devices. This gives users - especially users who read on airplanes or in hotels - a more convenient option than using larger notebooks with limited battery power to read their eBooks. PDF protection can prevent users from changing or printing information, adding or changing annotations and form fields, or even selecting and copying text or graphics. With Advanced eBook Processor, these PDF files can be decrypted, opened, and used without any of these restrictions. Once protection has been removed, PDF files created with Adobe's Acrobat program can be opened in any PDF viewer, including Adobe's Acrobat Reader. Advanced eBook Processor protects businesses from losing control of their eBooks, technical articles, documentation manuals, presentations, and all PDF documents that could be rendered unusable by improperly managed passwords and licenses. Advanced eBook Processor costs $99(US) and may be purchased securely online at http://www.elcomsoft.com/aebpr.html. You can download a free trial version of the software at the same web address. Vladimir Katalov (kitten@elcomsoft.com) ********** The Anti-Piracy Enforcement Team at Adobe Systems has notified ElcomSoft, a software company based in Russia, that its Advanced eBook Processor software program violates the copyright of materials -- primarily PDF-based eBooks -- published by Adobe. ElcomSoft has five days to meet Adobe's demand that the commercial product be removed, after which the matter will be "pursued aggressively" by the San Jose-based company. Watch for updates. http://www.planetebook.com/mainpage.asp?webpageid=157&nl Kurt Foss (kfoss@planetpdf.com) ******************************************************************************** END OF FILE ********************************************************************************