Development: General questions

1. APIs not developed by Adobe
Q: What APIs are available, and under what terms? [Q3]
2. How can I determine the compression used in a PDF file?
Q: I got a few documents looking good and at a reasonable size for the final use. But months later I have to re-issue and I can't repeat the results. Other than trial and error, is there a utility or steps to determine the compression options used in a PDF? [Q5]
3. Discussion areas for PDF developers
Q: What mailing lists are available for PDF developers? [Q7]
4. What is that stuff inserted into PostScript by Acrobat when printing encrypted PDFs?
Q: Encrypted PDFs when printed have the following in them:
5. Why are large GETs (PDF forms) truncated in Microsoft Internet Explorer?
Q: Sometimes GETs are truncated for web pages (including PDFs), why is this? [Q9]
6. How machine intensive is generating PDF documents dynamically?
Q: Is PDF like Postscript? What I'm trying to get an idea of is whether all those PDFLib function calls that are generating parts of the file are doing massive amounts of processing or are they just doing simple things like writing PS-like markup tags wrapped around data.
7. Problems with linking to web pages from within a PDF
Q: I would like to create a link to an external document with a remote go-to action. However using a URL file specification (chapter 3.10.4 of the PDF Reference) does not work. This is the created object in the pdf file:
8. Bezier curve approximation
Q: I'm trying to draw a pie slice, given the pie's center and the start and end angles. As far as I can tell, I need to use bezier curves to do the arc. Is there is an easier way, or if not, what do you use for the bezier's control points? [Q15]
9. JBIG2 compression support
Q: The version 1.4 PDF specification allows JBIG2 compression to be used in PDF files. Has anybody been able to get Acrobat 5.0 create a PDF file that contains JBIG2-compressed images? [Q16]
10. Rotated text
Q: I'm trying to draw rotated text, given the point where the text starts, and an angle of rotation (e.g., 90 degrees). So far, the text does not show up, so I guess my transformation matrix is incorrect, or maybe it's something else. If you know how to do this, could you show an example?
11. Centering text
Q: I am a fairly competent PostScript programmer but now need to generate some PDF. I have the basic tree structure down, but I need to accurately position some text, to be specific I need to center some text. In PostScript it is easy to figure out the size of the string to be set then translate appropriately. This seems impossible in PDF, so how do you estimate the font metrics programmatically to set the text. To make it more difficult I am generating pDF from a Java Applet so small code size is a definite factor. [Q18]
12. Enumerating the images in a PDF
Q: I'm new to PDF's and am wondering how I might go about programatically enumerating all embedded images in a PDF, extracting them, then inserting a modified version back into the PDF? Basically I am looking to create a Windows program that will make a small change to all images in a PDF but otherwise leave the PDF alone. Are there decent PDF SDK's out there that will let me programatically manipulate a PDF? [Q19]
13. Modifing elements in a PDF
Q: Is it possible to extract an embedded image, modify it and put it back without creating a whole new PDF? [Q20]
14. Embedding WMF in PDF
Q: Is there a possibility to easiliy embed WMF in a PDF file while creating the PDF using a PDF library? Normally the libs only support JPG, PNG and TIFF. [Q21]
15. Unicode support
Q: I have a problem concerning special Polish characters which I write down as Unicode characters but which don't show up correctly in a PDF document when using the Helvetica standard Pdf font. The problem scenario is the following: I have implemented a PDF Report Generator in Java. To describe it briefly, the generator is a piece of Java code that reads an XML layout specification and a stream of Java business objects and generates a Pdf document from it. For the PDF low- level stuff, I am relying on using Bruno Lowagie's Java library `iText', version 0.37. Generally, only Pdf standard fonts are used. [Q22]
16. Raster vs Vector images?
Q: I've been looking at the PDF file in a text editor and found a number of /Subtype /Image objects. Are these raster or vector images? [Q23]
17. Required font tables
Q: Can anyone verify which of these tables are required when embedding a true type font? I've got 3 sources that say 3 different things. OS/2 cmap cvt fpgm glyf head hhea hmtx loca maxp name post prep [Q79]
18. Custom fields in the info dictionary
Q: Has anybody out there:- added there own custom Info Dictionary fields? does Acrobat Readers "doc info" command show them? does Exchange etc interfeer with them? I noticed that that the spec recomends adding your own custom fields to the "catalogue" object. But I intend adding them to the Info Dictionary in the hope that Verity will index them. [Q134]
19. CMYK to RGB conversion
Q: What is the correct algorithm for converting between colour spaces? [Q137]
20. JavaScript popup menus on hot spots
Q: I want to create WWW links in my PDF documents. The URL will start a JAVA request servlet on my webserver, and the result may be more than one document. Is it possible to create a list menu behind the hotspots, which displays the documents in a list so that the reader can select which he/she wants to open? [Q138]
21. Internationalization of accented characters
Q: Does anybody has information about using german umlauts and other european special characters in pdf documents? [Q144]
22. Internationalization of fonts
Q: I'm writing a program that has pdf as output format. now I need to display a text in cyrillic letters. i have no idea how to do that. Please, can you help me or at least point me to some documentation. [Q154]
23. Signature sample code
Q: Can I get the wrapper code that builds a PKCS#7 object from the Raw Signature format? [Q160]
24. Illegal command m in text object
Q: Does anybody know why any .PDF files I make in Gymnast give me an "Illegal operation 'm' inside a text object" error message when I open them in Acrobat Reader? [Q176]

1. APIs not developed by Adobe

Q: What APIs are available, and under what terms? [Q3]

A: PDFLib, free for non-commercial use, http://www.pdflib.com

ClibPDF, free for non-commercial use, http://www.fastio.com

Panda, open source / free (GPL), http://www.stillhq.com

ReportLab, open source / free (BSD license), http://www.reportlab.com

Michael Still (mikal@stillhq.com) [A8]

A: We are pleased to announce the release of version 1.0 of the XMLPDF library for Java.

The preview period for this product has been completed and the first production version has been released.

This library converts XML to PDF including support for:

- complex table formatting including nested tables

- JPEG and PNG images

- automatic pagination of text and tables

- text kerning

- Type 1 and TrueType fonts including embedding

- defining a document template in XML and merging of data from a separate XML source

- server-side use in Web and Application servers

A quick overview of capabilities is at

http://www.xmlpdf.com/overview.html.

XMLPDF retails for US$ 99 per developer seat. For information on source licences contact sales@xmlpdf.com.

John Farrow (john.farrow@xmlpdf.com) [A194]

A: I've made an inexpensive (149 USD), royalty free component to create PDF files with bookmarks, images (jpg), standard base 14 fonts, vector drawing...

It has a very small footprint; the Delphi version compiles in the exe and the OCX is only 350 Kb in size.

Developers can distribute freely, royalty free applications with PDF creation features.

Very fast (1000 pages = 15 secs on a old Celeron 400), no DLL Hell, no acrobat needed.

If interested visit www.dreamscape.it

Massimo Brini (brinim@tiscalinet.it) [A214]

2. How can I determine the compression used in a PDF file?

Q: I got a few documents looking good and at a reasonable size for the final use. But months later I have to re-issue and I can't repeat the results. Other than trial and error, is there a utility or steps to determine the compression options used in a PDF? [Q5]

A: Open the PDFs using a txt editor which can afford binary data (like UltraEdit on Winxx) and search for the "/Filter" keywords.

Helge Blischke (H.Blischke@srz-berlin.de) [A12]

A: http://www.enfocus.com/plugins.htm and look for the free Enfocus Browser plug-in. Enfocus Browser allows you to navigate the low-level object hierarchy in a PDF file, and view the PDF page description for a particular page.

Filiep Maes (filiepm@enfocus.be) [A13]

A: Our Quite A Box Of Tricks product will tell you the compression used, image by image. This includes the level of JPEG used. This can be done with the free demo.

Aandi Inston (quite@dial.pipex.com) [A14]

3. Discussion areas for PDF developers

Q: What mailing lists are available for PDF developers? [Q7]

A: You could try some of the following:

- comp.text.pdf

- PDFDev (http://www.pdfzone.com)

- PlanetPDF developer's forum (http://www.planetpdf.com)

Aandi Inston (quite@dial.pipex.com) [A16]

4. What is that stuff inserted into PostScript by Acrobat when printing encrypted PDFs?

Q: Encrypted PDFs when printed have the following in them:

 
% Removing the following eight lines is illegal, subject to the Digital 
Copyright Act of 1998. 

mark currentfile eexec 
54dc5232e897cbaaa7584b7da7c23a6c59e7451851159cdbf40334cc2600 
30036a856fabb196b3ddab71514d79106c969797b119ae4379c5ac9b7318 
33471fc81a8e4b87bac59f7003cddaebea2a741c4e80818b4b136660994b 
18a85d6b60e3c6b57cc0815fe834bc82704ac2caf0b6e228ce1b2218c8c7 
67e87aef6db14cd38dda844c855b4e9c46d510cab8fdaa521d67cbb83ee1 
af966cc79653b9aca2a5f91f908bbd3f06ecc0c940097ec77e210e6184dc 
2f5777aacfc6907d43f1edb490a2a89c9af5b90ff126c0c3c5da9ae99f59 
d47040be1c0336205bf3c6169b1b01cd78f922ec384cd0fcab955c0c20de 
000000000000000000000000000000000000000000000000000000000000
 

Which stops Distiller from converting to PDF. What is it? [Q8]

A: The eexec'd code reads in cleartext:

 
/currentdistillerparams where { 
pop /pdfmark where {
pop (This PostScript file  was created from an encrypted PDF file.) 
print (Redistilling encrypted PDF is not permitted.) 
print userdict /quit get exec }if} if currentfile 
closefile 
 

That means, if either your printer knows about currentdistillerparams and pdfmark or the PostScript job itself defines these operators (even as dummies, see note below), this code assumes you are going to re-distill the PS job which is forbidden.

NOTE: The PostScript driver you use might insert statements like

 
/currentdistillerparams where {pop} {userdict/currentdistillerparams{1 
dict}put}ifelse /pdfmark where {pop} {userdict/pdfmark{cleartomark}put}ifelse 
 

or the like (see the recommendations in Adobe's Pdfmark Reference Manual).

Helge Blischke (H.Blischke@acm.org) [A17]

5. Why are large GETs (PDF forms) truncated in Microsoft Internet Explorer?

Q: Sometimes GETs are truncated for web pages (including PDFs), why is this? [Q9]

A: See http://www.networkice.com/Advice/Intrusions/2000608/default.htm for a secuity discussion on GET Data Overflow, which might explain why MSIE-transmitted URL-encoded strings from PDF "submit" are sometimes truncated at something a bit less than 4KB length.

Bill Segraves (wsegrave@mindspring.com) [A18]

6. How machine intensive is generating PDF documents dynamically?

Q: Is PDF like Postscript? What I'm trying to get an idea of is whether all those PDFLib function calls that are generating parts of the file are doing massive amounts of processing or are they just doing simple things like writing PS-like markup tags wrapped around data.

PDF is conceptually similar to PostScript. It isn't hugely complex to generate, but there is an additional (small) overhead of not only generating the graphical information, but also generating a file structure and index.

In general, the time taken to make PostScript and PDF directly should be comparable. I wouldn't describe either of them as at all like HTML, but in the sense that you aren't rendering to a bitmap or anything like that, they are similar. [Q10]

A: This is very true. There are also the aspects of compression of data within the PDF to be considered as well... These can be quite processor intensive.

PDF is also pretty thingie about the format that bitmaps take, and it take take a fair bit of memory and time to convert the bitmaps to the right format. The operations aren't slow, it's more the fact that there can be millions of them.

I would say that the similarity ends when you say that both formats (HTML and PDF) have structure. PDFs are pretty funky in their object layouts. I am not sure about the timing statement comparing PS and PDF though. I have generated large PS files (containing images), which have been quite slow, but the PDF has been much faster because of better compression support.

I would think the best route with your ISP is to just do some trials and logging and see what happens. Also, I don't think pdflib does linearisation, which might cause problems with large documents online.

Michael Still (mikal@stillhq.com) [A19]

7. Problems with linking to web pages from within a PDF

Q: I would like to create a link to an external document with a remote go-to action. However using a URL file specification (chapter 3.10.4 of the PDF Reference) does not work. This is the created object in the pdf file:

 
1 0 obj << 
  /Type /Annot /Subtype /Link /A << /S /GoToR /F << 
    /Type /Filespec /FS /URL 
    /F (http://www.tug.org/applications/pdftex/calculat.pdf) >> 
  /NewWindow true >> 
/Rect [124.802 706.129 266.534 791.168] >> endobj
 
[Q12]

A: It's clear from experiments that Acrobat does not support the full generality of what might theoretically be possible. If you create something Acrobat wouldn't do, then you may be stuck, especially with indirect file references, which tend to work only if Acrobat would expect them there.

An Acrobat weblink would have an /A field more like

 
/A << /S /URI /URI (http:...) >>
 

Aandi Inston (quite@dial.pipex.com) [A21]

8. Bezier curve approximation

Q: I'm trying to draw a pie slice, given the pie's center and the start and end angles. As far as I can tell, I need to use bezier curves to do the arc. Is there is an easier way, or if not, what do you use for the bezier's control points? [Q15]

A: You can calculate them like:

 
  $alpha = ($alpha * 3.1415 / 180); $beta = ($beta * 3.1415 / 180); 

  my $bcp = (4.0/3 * (1 - cos(($beta - $alpha)/2)) / sin(($beta - $alpha)/2)); 

  my $sin_alpha = sin($alpha); my $sin_beta = sin($beta); my $cos_alpha = 
  cos($alpha); my $cos_beta = cos($beta); 

  my $p0_x = $x + $a * $cos_alpha; 
  my $p0_y = $y + $b * $sin_alpha; 
  my $p1_x = $x + $a * ($cos_alpha - $bcp * $sin_alpha); 
  my $p1_y = $y + $b * ($sin_alpha + $bcp * $cos_alpha); 
  my $p2_x = $x + $a * ($cos_beta + $bcp * $sin_beta); 
  my $p2_y = $y + $b * ($sin_beta - $bcp * $cos_beta); 
  my $p3_x = $x + $a * $cos_beta; 
  my $p3_y = $y + $b * $sin_beta; 

  $x,$y ... 
  center point of arc $alpha,$beta ... 
  start/end angle of arc $a,$b ... 
  x/y extens of fitting elipsis (for circle $a=$b) 

  $p0_x,$p0_y ... 
  start-point of bezier $p1_x,$p1_y ... 
  control-point 1 of bezier $p2_x,$p2_y ... 
  control-point 2 of bezier $p3_x,$p3_y ... 
  end-point of bezier 
 

Mind that you cannot calculate correct arcs for abs($beta-$alpha)>180 using this bezier approximation, so if your arcs span more than 180 degrees split it into two using a middle angle.

(alfredreibenschuh@yahoo.com) [A25]

9. JBIG2 compression support

Q: The version 1.4 PDF specification allows JBIG2 compression to be used in PDF files. Has anybody been able to get Acrobat 5.0 create a PDF file that contains JBIG2-compressed images? [Q16]

A: I don't think it will. I couldn't do it programmatically. There is certainly nothing in the user interface. The support may be read-only at the moment (and I have no way to confirm even that).

It may be that Adobe have done something unusually sensible with regard to changes: add support for reading the files, then wait another year before making anyone. This could make the transition a lot smoother.

Aandi Inston (quite@dial.pipex.com) [A26]

10. Rotated text

Q: I'm trying to draw rotated text, given the point where the text starts, and an angle of rotation (e.g., 90 degrees). So far, the text does not show up, so I guess my transformation matrix is incorrect, or maybe it's something else. If you know how to do this, could you show an example?

To be more specific, what I'm trying to do is rotate text 90 degrees without "changing" the page's coordinate system. In essence, with x,y being a point in the page's "normal" coordinate system, the text is to start at x,y, except be rotated 90 degrees. What I tried was

 
0 1 -1 0 0 0 Tm
 
[Q17]

A: The text matrix is the way to move text. The best approach in debugging the matrix is to do the mathematics yourself. Since the text starts at 0,0 in user space, calculate the transformation of 0,0 with your matrix and see where it is. (This assumes that cm has not also been used).

Aandi Inston (quite@dial.pipex.com) [A27]

A: OK, what you want to do is save the graphic state ("q") then do the rotation, then restore the old state ("Q"). The rotation itself is also a bit more complex than what you have. For a single line of text it will be 0 1 -1 0 h 0, where "h" is the height of the text. Think about rotating a rectangle about the origin and you can see where the "h" comes from.

Arne (arnet@hpcvplnx.cv.hp.com) [A28]

11. Centering text

Q: I am a fairly competent PostScript programmer but now need to generate some PDF. I have the basic tree structure down, but I need to accurately position some text, to be specific I need to center some text. In PostScript it is easy to figure out the size of the string to be set then translate appropriately. This seems impossible in PDF, so how do you estimate the font metrics programmatically to set the text. To make it more difficult I am generating pDF from a Java Applet so small code size is a definite factor. [Q18]

A: You need to include a table of font metrics in the code - so you can calculate the width and height, and then use those to translate ( just like postscript ). Adobe provides the data for the builtin fonts, otherwise you are on your own.

Dave Bloodgood (dabldgd@home.com) [A29]

12. Enumerating the images in a PDF

Q: I'm new to PDF's and am wondering how I might go about programatically enumerating all embedded images in a PDF, extracting them, then inserting a modified version back into the PDF? Basically I am looking to create a Windows program that will make a small change to all images in a PDF but otherwise leave the PDF alone. Are there decent PDF SDK's out there that will let me programatically manipulate a PDF? [Q19]

A: The Acrobat SDK includes the Core API. This can be used to write plug-ins in C or C++. Provided that you don't need to run the result on a server this may be suitable. The Acrobat SDK mostly requires a full copy of Acrobat installed and licensed per machine. Acrobat SDK: http://partners.adobe.com/ You may find the PDFEdit layer does what you want, though you might be surprised at the learning curve, the amount of coding required, and how much you have to understand PDF internal formats. Indeed, you should start this task by reading the 600 page PDF Specification.

Aandi Inston (quite@dial.pipex.com) [A30]

13. Modifing elements in a PDF

Q: Is it possible to extract an embedded image, modify it and put it back without creating a whole new PDF? [Q20]

A: It depends what you mean. The SDK could edit a PDF file and save it. The changes would be added to the end of the file - so you are storing both the original and the new image. The SDK can also do a Save As, rewriting the file.

Aandi Inston (quite@dial.pipex.com) [A31]

A: Most information inside a PDF document is stored in a data structure known as an object. These objects have two major identifiers, and object number, and a generation number. It is therefore possible to create a new version of an object without having to recreate the entire PDF structure -- you simply increment the generation number. Then again, you still need to update the xref table and other areas of the document to refer to the new version of the object. Hence, it probably is faster just to recreate the entire file programatically.

Michael Still (mikal@stillhq.com) [A32]

14. Embedding WMF in PDF

Q: Is there a possibility to easiliy embed WMF in a PDF file while creating the PDF using a PDF library? Normally the libs only support JPG, PNG and TIFF. [Q21]

A: Easily? No. WMF is a Windows-specific format, containing graphical concepts with no easy or direct equivalent in PDF. If you are on Windows you can play the metafile to get the graphical constructs, and try to convert them to PDF equivalents. But on other platforms this is a huge development - probably larger than all of the rest of PDFLib.

Aandi Inston (quite@dial.pipex.com) [A33]

A: The WMF file format is discussed in the "Graphics File Format FAQ", which can be found at http://www.faqs.org/faqs/graphics/fileformats-faq/part3/ -- this includes a link to the specification and some discussion. This should give you enough information to be able to implement changes to the PDF library of your choice (assuming you have the source code).

Michael Still (mikal@stillhq.com) [A34]

A: There is a library to support the WMF file format at:

http://sourceforge.net/project/showfiles.php?group_id=10501

Michael Still (mikal@stillhq.com) [A35]

A: This is not helped by the fact that the file format is not fully documented by Microsoft. Only the header format has documentation; the contents are basically renderings of GDI calls, and the documentation for them only comes with MS-platform compilers.

To the original poster (and those saying 'me too!'):

If you are a reasonable programmer, you might well find it useful to look at wmflib2 at http://www.wvware.com/libwmf.html

As well as documenting those parts of the WMF file format that have been successfully reverse-engineered, there is software to convert WMF to other formats such as fig which you might find more tractable.

Aandi's comment that WMF contains graphical concepts with no easy or direct equivalent in PDF is true, but you may also find that _many_ of the most commonly-used primitives in WMF have pretty direct analogues in PDF, postscript and many other drawing languages.

So no, it isn't easy. But neither is it impossibly difficult. It's somewhere in-between. YMMV.

Kevin Ashley (K.Ashley@ulcc.ac.uk) [A36]

15. Unicode support

Q: I have a problem concerning special Polish characters which I write down as Unicode characters but which don't show up correctly in a PDF document when using the Helvetica standard Pdf font. The problem scenario is the following: I have implemented a PDF Report Generator in Java. To describe it briefly, the generator is a piece of Java code that reads an XML layout specification and a stream of Java business objects and generates a Pdf document from it. For the PDF low- level stuff, I am relying on using Bruno Lowagie's Java library `iText', version 0.37. Generally, only Pdf standard fonts are used. [Q22]

A: I think that's the problem right there. PDF standard fonts include only ISOLatin1Encoding. That indeed doesn't include Polish characters. Helvetica does NOT include "Z with a dot accent". Standard fonts are 1-byte, not Unicode. Generating a PDF with Unicode characters is possible, but the program to do it has to be written COMPLETELY differently and is generally an order of magnitude more complex. Check if iText offers this option. Another possibility, again if iText offers it, is to use a non-Unicode font containing Polish characters, and embed it in the PDF.

Aandi Inston (quite@dial.pipex.com) [A37]

A: Yes, you can change the font-encoding so you can use the regular codepage with the embedded font, or with the base 14 fonts, or with Type 1 fonts with definitions for the glyph names, or the WGL Truetype fonts with the glyphs implemented for the codepage.

Thus, depending on what codepage it is, then there are font-encodings for that codepage in the PDF so that the unchanged text character data is used with the fonts.

Converting it to Unicode would be the step if the PDFs were ever used to be sources of Unicode PDF data in readers that were unaware how to reorder the glyphs in the glyph map from a non-standard encoding of a font in the PDF to Unicode, or to some other codepage. That is to say, software meant to search through them would have to understand the glyph mappings to the character values outside the regular printable ASCII characters.

Where the string elements' data like the glyphs can be in the native codepage, the elements in the navigational interface for the PDF reader applications require the PDFDocEncoding or UnicodeEncoding, those are predefined encodings.

For example, C code to write a font encoding differences from the predefined WinAnsiEncoding for codepage 1250 is as this:

 
"extern long writePDFFontEncodingDifferencesCentralEurope(FILE* fd){
//CP1250

        long written=0;

        written+=fwrite("128 /Euro ", 1, 10, fd);
        written+=fwrite("140 /Sacute /Tcaron /Zcaron /Zacute ", 1, 36, fd);
        written+=fwrite("156 
/sacute /tcaron /zcaron /zacute ", 1, 37, fd);
        written+=fwrite("161 /caron /breve /Lslash ", 1, 26, fd);
        written+=fwrite("165 /Aogonek ", 1, 13, fd);
        written+=fwrite("170 /Scedilla ", 1, 14, fd);
        written+=fwrite("
175 /Zdotaccent ", 1, 17, fd);
        written+=fwrite("178 /ogonek /lslash ", 1, 20, fd);
        written+=fwrite("185 /aogonek /scedilla ", 1, 23, fd);
        written+=fwrite("188 /Lcaron /hungarumlaut 
/lcaron /zdotaccent /Racute ", 1, 55, fd);
        written+=fwrite("195 /Abreve ", 1, 12, fd);
        written+=fwrite("197 /Lacute /Cacute ", 1, 20, fd);
        written+=fwrite("200 /Ccaron ", 1, 12, fd);
        written+=fwrite("202 /Eogonek ", 1, 13, fd);
        written+=fwrite("
204 /Ecaron ", 1, 13, fd);
        written+=fwrite("207 /Dcaron /Dslash ", 1, 20, fd);
        written+=fwrite("209 /Nacute /Ncaron /Ohungarumlaut ", 1, 15, fd);

        written+=fwrite("216 /Rcaron /Uring ", 1, 19, fd);
        written+=fwrite("219 /Uhungarumlaut ", 1, 19, fd);
        written+=fwrite("
222 /Tcedilla ", 1, 15, fd);
        written+=fwrite("224 /racute ", 1, 12, fd);
        written+=fwrite("227 /abreve ", 1, 12, fd);
        written+=fwrite("229 /lacute /cacute /ccedilla /ccaron 
", 1, 39, fd);
        written+=fwrite("234 /eogonek ", 1, 13, fd);
        written+=fwrite("236 /ecaron ", 1, 12, fd);
        written+=fwrite("239 /dcaron /dstroke ", 1, 21, fd);
        written+=fwrite("241 /nacute /ncaron ", 1, 20, fd);
        written+=fwrite("245 
/ohungarumlaut ", 1, 20, fd);
        written+=fwrite("248 /rcaron /uring ", 1, 19, fd);
        written+=fwrite("251 /uhungarumlaut ", 1, 19, fd);
        written+=fwrite("254 /tcedilla /dotaccent ", 1, 25, fd);

        return(written);

}
 

Your data might be in a different codepage, on UNIX systems probably one of the ISO 8859 codepages or ASCII codepages, on Windows the Windows codepages, and on Macintosh the Mac codepages. The PDF has the predefined encodings for those types of codepages, basically, then writing the font encoding differences remaps the characters back to their glyphs in the font in the native codepage.

That way, you can use the extended Western glyphs in the base 14 fonts. The extended CJK fonts are using the composite Type 0 fonts. For example, the Big 5 and JIS encodings are for much of Chinese or Japanese, respectively, where their non-Western characters are not in the Western codepages. Basically that is the difference between a single byte codepage which has enough characters for almost all of the glyphs of a Western script, and the multiple byte codepages to support more than 256 glyphs. The single byte codepages are mostly having around 220 glyphs, in the codepages, not of their script.

Overall, it is probably best to convert it to Unicode, what is required then is the knowledge of the Unicode fonts. The Unicode fonts can be very large, for example a single Unicode font has tens of thousands of characters in it, where a regular font file has less than a thousand. So, feasibly, it might be better to write the font encoding differences into the PDF generated from the Java. In a different case, it might be more feasible to convert the data to Unicode and then embed a subset of the font into the PDF, where it would only have as many characters as are glyphs in the data, which is for Western languages probably less than a hundred, not the complete font.

So, for Western languages and single-byte codepages, the most direct way to put the text data into a PDF is using the base 14 fonts with the font encoding differences. This is very convenient, for example, to have the Euro glyph without Unicode, with the font encoding different to map it to 128, and to remap Zcaron and zcaron to 142 and 158, where those are names of the glyphs that are used in the font encoding.

The names of the glyphs are in the Adobe PostScript glyph lists, and there are representations for each of those I think in Unicode. Information about this is from Unicode and Adobe.

Then, there are the strings that have to be the predefined PDFDocEncoding or UnicodeEncoding, those as part of navigational elements.

Ross Finlayson (apex@calpha.com) [A38]

16. Raster vs Vector images?

Q: I've been looking at the PDF file in a text editor and found a number of /Subtype /Image objects. Are these raster or vector images? [Q23]

A: These are raster images, but all the vector images are held in the page stream, as are text and some bitmap images, and you won't be able easily to see what they are.

Aandi Inston (quite@dial.pipex.com) [A39]

A: You are on the right track. Vector "images" are not images, but consist of PDF (Postscript) drawing commands. In an uncompressed PDF you can recognise them easily. In a compressed PDF I do not know of a way to tell text sections from vector sections, as they both consist of PS drawing commands. Maybe someone else knows more.

Arne (arnet@hpcvplnx.cv.hp.com) [A40]

17. Required font tables

Q: Can anyone verify which of these tables are required when embedding a true type font? I've got 3 sources that say 3 different things. OS/2 cmap cvt fpgm glyf head hhea hmtx loca maxp name post prep [Q79]

A: Depends on what you means by embedding. You need to include these tables in your embedded file String[] tables = { "head", "hhea", "loca", "maxp", "cvt ", "prep", "glyf", "hmtx", "fpgm", "cmap" }; but you need to read some of the others (such as OS/2) to get values you need during the embedding process.

John Farrow (jfarrow@visualprogramxming.co.nz) [A133]

A: cvt, fpgm, and prep are definitely not required by Acrobat, although the PDF reference may specify otherwise.

Thomas Merz (tm@pdflib.com) [A134]

18. Custom fields in the info dictionary

Q: Has anybody out there:- added there own custom Info Dictionary fields? does Acrobat Readers "doc info" command show them? does Exchange etc interfeer with them? I noticed that that the spec recomends adding your own custom fields to the "catalogue" object. But I intend adding them to the Info Dictionary in the hope that Verity will index them. [Q134]

A: You may add any key value pairs to the DOCINFO dictionary, and Verity will index them (if properly configured). But Acrobat (reader) only shows the predefined ones.

H.Blischke@srz-berlin.de (Helge Blischke) [A205]

19. CMYK to RGB conversion

Q: What is the correct algorithm for converting between colour spaces? [Q137]

A: The problem is that GhostScript uses Adobe's documented method of converting CMYK to RGB. In this, for instance, 100% magenta + 100% yellow simply becomes 100% red. This is optically correct, but not a good approximation of a printer, and not what Acrobat does. Adding colour management, perhaps to match a profile, and ideally using embedded source profiles from the PDF, would not be a simple undertaking. It may be worth posting a new topic specifically asking about Color management of CMYK conversion in GhostScript to see this question reach the largest number of people. I know you're not the first to be interested in this area.

Aandi Inston (quite@dial.pipex.com) [A211]

20. JavaScript popup menus on hot spots

Q: I want to create WWW links in my PDF documents. The URL will start a JAVA request servlet on my webserver, and the result may be more than one document. Is it possible to create a list menu behind the hotspots, which displays the documents in a list so that the reader can select which he/she wants to open? [Q138]

A: I have been playing with Named Destinations and the popupmenu method in PDF javascript. Perhaps this javascript will lead you to a solution.

Here I had set up some named destinations based upon a variable name appended to a standard set of suffixes.

Grab the name of the hotspot and use it to build a destination name. Use popupmenu to have the user select the suffix they want ( in this case it's always "removal").

Call gotoNamedDest to go there

  
var itemname = event.target.name;

var xreply
var reply = app.popUpMenu( [ itemname+"removal",
    "Removal",
    "Installation",
    "Operation",
    "Part Info",
    "Loc on Dwg",
    "Loc on Schematic",
    "Detail Dwg or Picture"] );


if (reply!= null)
{
switch ( reply ) {
case "Removal":
     xreply = "removal";
     break;
case "Installation":
     xreply = "removal";
     break;
case "Operation":
     xreply = "removal";
     break;
case "Part Info":
     xreply = "removal";
     break;
case "Loc on Dwg":
     xreply = "removal";
     break;
case "Loc on Schematic":
     xreply = "removal";
     break;
case "Detail Dwg or Picture":
     xreply = "removal";
     break;
}

var args = new String();
args = itemname+xreply;

this.gotoNamedDest(args );
 

John Freund (jfreund@freundassociates.com) [A212]

21. Internationalization of accented characters

Q: Does anybody has information about using german umlauts and other european special characters in pdf documents? [Q144]

A: I think that using fonts with WinAnsiEncoding will solve your problem. It seems to correspond to the ISO Latin 1 character set (could somebody please confirm this?). So you can use accented characters directly in PDF text commands. I'm using them for Portuguese words. For example:

  
6 0 obj
<<
  /Type /Font 
  /Subtype /Type1 
  /Encoding /WinAnsiEncoding 
  /BaseFont /Helvetica
>>
endobj
8 0 obj
<<
  /Type /Font 
  /Subtype /Type1 
  /Encoding /WinAnsiEncoding 
  /BaseFont /Helvetica-Bold
>>
endobj
10 0 obj
<</Length 11 0 R>>
stream
0 g
BT
1.0204 0 0 1.0237 368.5677 641.0251 Tm
/HB 4.081 Tf     % set Helvetica-Bold font
(CLASSIFICAÇÃO)Tj
ET
BT
181.548 703.3136 TD
(Freqüência)Tj
ET
endstream
endobj
 

Jose Fernando Tepedino (jose@wiser.com.br) [A224]

22. Internationalization of fonts

Q: I'm writing a program that has pdf as output format. now I need to display a text in cyrillic letters. i have no idea how to do that. Please, can you help me or at least point me to some documentation. [Q154]

A: It's all in the pdf reference document that I assume you already have. The main problem you have is that the built in fonts in Acrobat don't cover cyrillic so, you'll have to embedd an external font or at least refer to it. If you are not yet discouraged I must tell you that you'll also have to peek at the font to get the character widths, which is rather easy with an type1 afm file and not so easy with a true type file.

Here's a couple of dictionaries created with iText for a cyrillic true type font without embedding:

 


6 0 obj
<<
/Ascent 728
/Flags 32
/StemV 80
/ItalicAngle 0
/Type /FontDescriptor
/FontName /ArialMT
/CapHeight 699
/FontBBox [-222 -324 1071 1037]
/Descent -210
>>
endobj
3 0 obj
<<
/FirstChar 32
/BaseFont /ArialMT
/FontDescriptor 6 0 R
/Encoding <<
/Differences [32 /space /exclam /quotedbl /numbersign /dollar /percent
/ampersand /quotesingle /parenleft /parenright /asterisk /plus /comma
/hyphen /period /slash /zero /one /two /three /four /five /six /seven
/eight /nine /colon /semicolon /less /equal /greater /question /at /A
/B /C /D /E /F /G /H /I /J /K /L /M /N /O /P /Q /R /S /T /U /V /W /X
/Y /Z /bracketleft /backslash /bracketright /asciicircum /underscore
/grave /a /b /c /d /e /f /g /h /i /j /k /l /m /n /o /p /q /r /s /t /u
/v /w /x /y /z /braceleft /bar /braceright /asciitilde /.notdef
/afii10051 /afii10052 /quotesinglbase /afii10100 /quotedblbase
/ellipsis /dagger /daggerdbl /Euro /perthousand /afii10058
/guilsinglleft /afii10059 /afii10061 /afii10060 /afii10145 /afii10099
/quoteleft /quoteright /quotedblleft /quotedblright /bullet /endash
/emdash /.notdef /trademark /afii10106 /guilsinglright /afii10107
/afii10109 /afii10108 /afii10193 /space /afii10062 /afii10110
/afii10057 /currency /afii10050 /brokenbar /section /afii10023
/copyright /afii10053 /guillemotleft /logicalnot /hyphen /registered
/afii10056 /degree /plusminus /afii10055 /afii10103 /afii10098 /mu
/paragraph /periodcentered /afii10071 /afii61352 /afii10101
/guillemotright /afii10105 /afii10054 /afii10102 /afii10104 /afii10017
/afii10018 /afii10019 /afii10020 /afii10021 /afii10022 /afii10024
/afii10025 /afii10026 /afii10027 /afii10028 /afii10029 /afii10030
/afii10031 /afii10032 /afii10033 /afii10034 /afii10035 /afii10036
/afii10037 /afii10038 /afii10039 /afii10040 /afii10041 /afii10042
/afii10043 /afii10044 /afii10045 /afii10046 /afii10047 /afii10048
/afii10049 /afii10065 /afii10066 /afii10067 /afii10068 /afii10069
/afii10070 /afii10072 /afii10073 /afii10074 /afii10075 /afii10076
/afii10077 /afii10078 /afii10079 /afii10080 /afii10081 /afii10082
/afii10083 /afii10084 /afii10085 /afii10086 /afii10087 /afii10088
/afii10089 /afii10090 /afii10091 /afii10092 /afii10093 /afii10094
/afii10095 /afii10096 /afii10097]
/Type /Encoding
>>
/LastChar 255
/Subtype /TrueType
/Widths [277 277 354 556 556 889 666 190 333 333 389 583 277 333 277
277 556 556 556 556 556 556 556 556 556 556 277 277 583 583 583 556
1015 666 666 722 722 666 610 777 722 277 500 666 556 833 722 777 666
777 722 666 610 722 666 943 666 666 610 277 277 277 469 556 333 556
556 500 556 556 277 556 556 222 222 500 222 833 556 556 556 556 333
500 277 556 500 722 500 500 500 333 259 333 583 0 864 541 222 364 333
1000 556 556 556 1000 1057 333 1010 582 854 718 556 222 222 333 333
350 556 1000 0 1000 906 333 812 437 556 552 277 635 500 500 556 488
259 556 667 736 718 556 583 333 736 277 399 548 277 222 411 576 537
277 556 1072 510 556 222 666 500 277 666 656 666 541 677 666 923 604
718 718 582 656 833 722 777 718 666 722 610 635 760 666 739 666 916
937 791 885 656 718 1010 722 556 572 531 364 583 556 668 458 558 558
437 583 687 552 556 541 556 500 458 500 822 500 572 520 802 822 625
718 520 510 750 541]
/Type /Font
>>
endobj
 

Paulo Soares (psoares@ip.pt) [A237]

23. Signature sample code

Q: Can I get the wrapper code that builds a PKCS#7 object from the Raw Signature format? [Q160]

A: It's buried at the Adobe site at http://support.adobe.com/devsup/devsup.nsf/docs/50752.htm

Thomas Merz (tm@pdflib.com) [A245]

24. Illegal command m in text object

Q: Does anybody know why any .PDF files I make in Gymnast give me an "Illegal operation 'm' inside a text object" error message when I open them in Acrobat Reader? [Q176]

A: Because the m operation is not allowed (but tolerated up to version 4 of the reader) within a text object.

Herbert Kleebauer (klee@unibwm.de) [A269]

A: Its caused by a bug in the program producing the file.

Text-related instructions must be separate from graphics instructions. m is a graphics instruction (short for move) and should not appear in the PDF between a start-text and end-text instruction. When it does you get this message.

John Farrow (john.farrow@xmlpdf.com) [A270]