More Information

Portable Document Format (PDF) is a file format created by Adobe Systems in 1993 for document exchange. PDF is used for representing two-dimensional documents in a manner independent of the application software, hardware, and operating system.
Each PDF file encapsulates a complete description of a fixed-layout 2-D document (and, with Acrobat 3-D, embedded 3-D documents) that includes the text, fonts, images, and 2-D vector graphics which comprise the documents.
PDF is an open standard that was officially published on July 1, 2008 by the ISO as ISO 32000-1:2008.
A PDF file consists primarily of objects, of which there are eight types:

  • Boolean values, representing true or false
  • Numbers
  • Strings
  • Names
  • Arrays, ordered collections of objects
  • Dictionaries, collections of objects indexed by Names
  • Streams, usually containing large amounts of data
  • The Null object

Objects may be either direct (embedded in another object) or indirect. Indirect objects are numbered with an object number and a generation number. An index table called the xref table gives the byte offset of each indirect object from the start of the file. This design allows for efficient random access to the objects in the file, and also allows for small changes to be made without rewriting the entire file (incremental update). Beginning with PDF version 1.5, indirect objects may also be located in special streams known as object streams. This technique reduces the size of files that have large numbers of small indirect objects and is especially useful for Tagged PDF.
There are two layouts to the PDF files—non-linear (not “optimized”) and linear (“optimized”). Non-linear PDF files consume less disk space than their linear counterparts, though they are slower to access because portions of the data required to assemble pages of the document are scattered throughout the PDF file. Linear PDF files (also called “optimized” or “web optimized” PDF files) are constructed in a manner that enables them to be read in a Web browser plug-in, since they are written to disk in a linear (as in page order) fashion. PDF files may be optimized using pdfopt, which is part of GPL Ghostscript.

A PDF file is often a combination of vector graphics, text, and raster graphics. The basic types of content in a PDF are:

  • text stored as such
  • vector graphics for illustrations and designs that consist of shapes and lines
  • raster graphics for photographs and other types of image

In later PDF revisions, a PDF document can also support links (inside document or web page), forms, JavaScript (initially available as plugin for Acrobat 3.0), or any other types of embedded contents that can be handled using plug-ins.
PDF 1.6 supports interactive 3D documents embedded in the PDF.
Two PDF files that look similar on a computer screen may be of very different sizes. For example, a high resolution raster image takes more space than a low resolution one. Typically higher resolution is needed for printing documents than for displaying them on screen. Other things that may increase the size of a file is embedding full fonts, especially for Asiatic scripts, and storing text as graphics.

There are fourteen typefaces that have a special significance to PDF documents:

  • Times (v3) or Times Roman PS MT (v4.x) (in regular, italic or oblique, bold, and bold italic)
  • Courier (in regular, italic or oblique, bold and bold italic)
  • Helvetica (v3) or Arial MT (v4.x) (in regular, italic or oblique, bold and bold italic)
  • Symbol
  • Zapf Dingbats

These should always be present (actually present or a close substitute) and so need not be embedded in a PDF. PDF viewers must know about the metrics of these fonts. Other fonts may be substituted if they are not embedded in a PDF.


Download Now