Exploring the PDF Format Specification

Updates on the PDF format spec, the latest errata collection for PDF 2.0, and what it means for developers and document professionals.

← Back to Blog

Exploring the PDF format specification

The Portable Document Format Specification continues to evolve, offering new capabilities and enhanced clarity for software developers and end users alike. Recently, the PDF Association has published the second edition of the ISO 32000-2:2020 specification, commonly referred to as PDF 2.0, including a comprehensive collection of errata and amendments. See our PDF 1.7 vs 2.0 comparison for a detailed look at what changed between versions.

You can also check your PDF version and specification online for free using Mapsoft's PDF Hub — no installation required.

What Is ISO 32000-2?

ISO 32000-2:2020 defines the global standard for representing electronic documents, ensuring compatibility across systems and platforms. This version builds upon its predecessor with corrections, updates, and expanded features tailored to the needs of modern digital document workflows.

The specification provides the technical framework for creating, processing, and rendering PDF files in a consistent and predictable manner across different software implementations and platforms.

Key Updates in PDF Format Spec 2.0

These updates align with evolving industry needs, offering a robust framework for developers to build more secure, interoperable, and efficient applications. Key areas of improvement include:

  • Enhanced Security: Support for 256-bit AES encryption and advanced digital signature mechanisms including PAdES (PDF Advanced Electronic Signatures).
  • Improved Accessibility: Enhanced tagged PDF structures that better support assistive technologies and compliance with accessibility standards.
  • Removal of Proprietary Features: Deprecated features such as XFA forms and Flash-based rich media annotations have been removed in favour of open standards.
  • Associated Files: Support for embedding related data and files within the PDF structure.
  • Clarifications: Numerous refinements based on years of implementation experience, reducing ambiguity for developers.

Errata and Continuous Improvement

The PDF Association has addressed numerous errata to enhance clarity and precision within the specification. Developers can view the latest resolved issues and track updates through the official errata repository. The amendments ensure that the specification remains a reliable reference for producing and processing conforming PDF format spec documents.

The second collection of errata for PDF 2.0 features 265 corrections, addressing issues ranging from typographical errors to substantive technical clarifications. These corrections are essential for ensuring consistent implementation across different PDF tools and libraries.

Why ISO 32000-2 Matters

Adherence to the Portable Document Format Specification ensures consistent and predictable behaviour across PDF tools. From creating interactive forms to archiving critical documents, PDF 2.0 provides the technical foundation for reliable document exchange and long-term digital preservation.

Explore the full specification and stay updated with the latest changes by visiting the PDF Association website. Whether you're a developer, designer, or document manager, understanding the PDF Format Spec is key to leveraging the full potential of PDF technology.

Summary of the ISO 32000-2:2020 Specification

The ISO 32000-2:2020 specification, also known as PDF 2.0, is a comprehensive document detailing the technical framework and standards for creating and processing Portable Document Format (PDF) files. Below is an overview of its contents:

  • Document Structure: Defines the fundamental building blocks of a PDF file, including the file header, body, cross-reference table, and trailer.
  • Page Description: Specifies how page content is described, including text, graphics, images, and colour spaces.
  • Fonts: Covers font embedding, subsetting, and the various font formats supported by PDF 2.0.
  • Interactive Features: Describes annotations, form fields, actions, and JavaScript support for interactive documents.
  • Security: Details encryption, digital signatures, and permissions management.
  • Accessibility: Specifies tagged PDF structures, alternative text, and other features for document accessibility.
  • Transparency: Defines the transparency imaging model for compositing graphical elements.
  • Metadata: Covers XMP metadata, document information dictionaries, and associated files.

The PDF Format Specification document has a lot of extra information in the appendices, like operator summaries, best practices for portability, and compatibility advice. This makes it an important resource for both PDF developers and users.

From Adobe Proprietary to ISO Standard

PDF's journey from a proprietary Adobe format to an open international standard is worth understanding, because it explains a great deal about how the specification is structured and why some sections are clearer than others. Adobe first published the PDF specification in 1993 alongside Acrobat 1.0. For the next 15 years, Adobe controlled the format entirely — each version of Acrobat introduced a new PDF version with new features, and Adobe published the corresponding specification as a free PDF document. The version numbers ran from 1.0 through 1.7.

In 2008, Adobe submitted PDF 1.7 to ISO, which published it as ISO 32000-1:2008. This was a significant moment — PDF became a genuine open standard rather than an Adobe-controlled format. The PDF Association (formerly AIIM's PDF subcommittee, now an independent organisation) took on a coordination role, running working groups that contributed to subsequent development. PDF 2.0 (ISO 32000-2:2017, revised in 2020) was the first version developed under the full ISO process, with contributions from multiple vendors rather than just Adobe. The result is a more carefully specified document, though the sheer number of errata (265 corrections in the second errata set alone) tells you something about the complexity of the task.

How to Navigate the Specification

The full ISO 32000-2 specification runs to over 900 pages. Reading it cover to cover is neither practical nor particularly useful for most development work. The sections you actually need depend on what you are building.

For developers working with document structure — parsing or creating PDF files from scratch — the essential sections are the file structure (Section 7.5, covering header, body, cross-reference table, and trailer), the object types (Section 7.3, covering Boolean, integer, real, string, name, array, dictionary, stream, and null), and the PDF name tree and number tree structures. Understanding how indirect objects and object references work is fundamental to everything else.

For page content and rendering work, Section 8 (Graphics) is the core reference. It defines the content stream operators for path painting, text rendering, image placement, and colour management. The operator tables in Appendix A provide a quick reference once you know what you are looking for. Section 9 covers text in detail — font types, glyph encodings, text matrix, and the subtleties of character spacing — and it is genuinely complex. Type 3 fonts, CIDFonts, and the ToUnicode mapping necessary for text extraction all warrant careful reading.

For interactive features, Section 12 covers annotations, actions, form fields, and JavaScript. Section 7.6 covers encryption. Section 14 covers document information, metadata, tagged PDF, and accessibility structures. For PDF/A and PDF/UA compliance work, you will also need the relevant ISO standards — ISO 19005 for PDF/A, ISO 14289 for PDF/UA — as these are separate specifications that reference and constrain ISO 32000 rather than forming part of it.

Working with Errata

Errata matter more than many developers realise. The PDF specification, like any document of its complexity, has errors, ambiguities, and genuine technical mistakes that were caught after publication. The PDF Association publishes errata publicly, and the corrections range from typographical fixes to substantive clarifications that change how implementations should behave.

The second errata set for PDF 2.0 includes corrections to table entries that were missing mandatory entries, fixes to incorrect cross-references between sections, clarifications to how certain action types should behave, and corrections to the Tagged PDF structure requirements. If you implement a feature against the original 2017 text and then encounter interoperability problems, checking the errata is the first place to look. The PDF Association maintains these at pdfa.org, and they are incorporated into the updated 2020 edition of the specification.

For practical implementation, the Matterhorn Protocol — maintained by the PDF Association — is also valuable for PDF/UA compliance work. It translates the abstract accessibility requirements of ISO 14289 into a set of 136 concrete, testable checkpoints, which makes it far more useful for implementation and testing than reading the normative standard alone.

Practical Tips for Developers

A few things that experience with the specification teaches that the document itself does not spell out clearly. First, "shall" versus "should" in ISO standards is not casual — "shall" means the requirement is mandatory for conformance, "should" means it is recommended but not required. Reading these distinctions carefully prevents wasted effort implementing optional features as if they were required, and vice versa.

Second, the specification defines what a conforming file should contain, but also what a conforming reader should accept. These are different things, and in practice PDF files in the wild frequently violate the specification in minor ways. A robust PDF library cannot simply reject any file that deviates from the spec — it needs error recovery heuristics. This is why PDF parsing is harder than it looks from the specification alone.

Third, the specification does not cover everything. It defines the file format, but not how applications should render it beyond broad conformance requirements. Font rendering quality, colour management precision, and rendering of overlapping transparent objects with complex blend modes are implementation concerns. The spec defines the transparency imaging model in detail, but "correctly implementing" it requires considerable additional engineering effort beyond reading Section 11.

Using Mapsoft's free online PDF analyser to inspect the internal structure of PDF files is a useful complement to spec reading — seeing how real-world PDFs from various generators implement features often illuminates what the specification text means in practice.

About Mapsoft and Our PDF Solutions

Mapsoft specialises in providing advanced PDF solutions, including a range of Adobe Acrobat plug-ins and custom software development services. We design our tools and services to streamline workflows, enhance productivity, and cater to the diverse needs of businesses dealing with PDF documents. From PDF library integrations to custom plug-ins, we deliver high-quality, cost-effective results tailored to your requirements. You can also try our free online PDF analyser tool to inspect the structure and properties of any PDF file.

Related Articles

What is PDF? The Portable Document Format Explained

A clear explanation of what PDF is, why it was created, how it works, and why it remains the world's most widely used document format.

Evolution of the PDF Format: A Historical Perspective

Explore the birth and impact of the PDF format. Learn how Adobe Acrobat shaped its growth, discover features and advantages of PDF, including document integrity and security.

Summary of the Structure of PDF Files

Explore the structure of PDF files: discover how text, art, images, and more are combined in a single container format.

Working with the PDF Specification?

Mapsoft has over 30 years of experience building PDF tools and solutions. Get in touch for expert guidance.