PDF/A: The Archival PDF Standard

How ISO 19005 defines a self-contained, reproducible PDF format suitable for long-term preservation — and what that means in practice.

← Back to Blog

What Is PDF/A?

PDF/A is an ISO standard (ISO 19005) that defines a constrained subset of PDF intended for long-term archiving of electronic documents. The "A" stands for archiving. The core principle of PDF/A is self-containment: a conforming document must carry everything needed to render it identically regardless of the environment or era in which it is opened. No external fonts, no references to external colour profiles, no dynamic content, no encryption that could prevent future access.

The standard was first published as ISO 19005-1 in 2005, based on PDF 1.4. It has since been extended through multiple parts to accommodate newer PDF capabilities. PDF/A is now one of the most widely adopted document standards in the world, required by archives, courts, government agencies, and regulated industries across dozens of countries.

Why PDF/A Matters

Standard PDF files contain many features that create dependencies on external resources or specific software behaviour. A PDF might reference fonts installed on the author's system but not embedded in the file, use proprietary colour spaces, or rely on JavaScript to render correctly. A decade later, those fonts may not exist, the proprietary colour space may be undefined, and the JavaScript API may have changed. The document becomes unreadable.

PDF/A addresses this by prohibiting or mandating specific features so that any conforming reader — including one built decades from now — can accurately reproduce the document's appearance and text content. This matters enormously for:

  • Legal and court records: Many jurisdictions require electronic filings in PDF/A format. In the EU, eIDAS and associated technical standards reference PDF/A for electronic signatures on archived documents.
  • Government archives: National archives in Germany, the US, the UK, and many others mandate or recommend PDF/A for permanent electronic records.
  • Healthcare and regulated industries: Long retention periods for patient records, clinical trial data, and financial records benefit from a format that does not become unreadable as software evolves.
  • Libraries and cultural heritage: Digital preservation of books, journals, and manuscripts requires stable, reproducible formats.

Conformance Levels

PDF/A has accumulated multiple conformance levels across its three published parts. Understanding which level applies to a given use case is important:

PDF/A-1 (ISO 19005-1, based on PDF 1.4)

  • PDF/A-1b (Basic): Ensures reliable visual reproduction of the document. All fonts must be embedded, colour must be device-independent, and prohibitions on encryption, transparency, and JavaScript apply. This is the minimum level and the most commonly implemented.
  • PDF/A-1a (Accessible): All requirements of 1b plus a tagged logical structure (tagged PDF), Unicode character mappings for all text, and natural language specification. This enables text extraction, reflow, and accessibility. Significantly more demanding to produce correctly.

PDF/A-2 (ISO 19005-2, based on PDF 1.7)

  • PDF/A-2b: Extends -1b to PDF 1.7 features. Permits JPEG 2000 compression, transparency, layers (Optional Content Groups), and embedding of PDF/A-conforming file attachments.
  • PDF/A-2u: All requirements of 2b plus Unicode mappings for all text, but without the full logical structure required by 2a.
  • PDF/A-2a: Full accessible conformance equivalent to 1a but based on PDF 1.7.

PDF/A-3 (ISO 19005-3, based on PDF 1.7)

  • PDF/A-3b, 3u, 3a: Identical to the PDF/A-2 levels but with one significant relaxation: file attachments are no longer required to be PDF/A conforming. Any file type may be embedded as an attachment. This makes PDF/A-3 useful for hybrid formats like ZUGFeRD (invoice PDFs with embedded XML) and factur-x, where a machine-readable structured data file is attached to a human-readable archival PDF.

PDF/A-4 (ISO 19005-4, based on PDF 2.0)

  • PDF/A-4: Based on PDF 2.0. Removes the a/b/u level distinction — all PDF/A-4 documents require Unicode mappings. Transparency is fully supported. Encryption is still prohibited.
  • PDF/A-4e (Engineering): Allows 3D artwork (U3D and PRC) and rich media. Intended for engineering and technical documentation.
  • PDF/A-4f (with Files): Allows arbitrary embedded file attachments, similar to PDF/A-3.

What PDF/A Prohibits

To achieve self-containment, PDF/A imposes strict prohibitions on features that create external dependencies or unpredictable behaviour:

  • Encryption: Any form of password or certificate encryption is forbidden in all PDF/A levels. An encrypted document cannot guarantee future accessibility.
  • JavaScript: JavaScript actions are prohibited. A document's appearance must not depend on script execution.
  • Audio and video content: Multimedia annotations embedding sound or video are not permitted, as codecs and players cannot be assumed to exist in the future.
  • External content dependencies: OPI (Open Prepress Interface) comments and references to external resources that are not embedded are prohibited.
  • LZW compression (PDF/A-1 only): PDF/A-1 prohibits LZW compression due to historical patent concerns. PDF/A-2 and later permit it as the patents have expired.
  • Transparency (PDF/A-1 only): Transparency (PDF 1.4 transparency model) is prohibited in PDF/A-1 because PDF 1.4 transparency rendering is complex. PDF/A-2 and later permit transparency.
  • Non-embedded fonts: All fonts used in the document must be fully embedded, including subsets. Standard 14 fonts may not be referenced without embedding.

What PDF/A Requires

Beyond prohibitions, PDF/A mandates several positive requirements:

  • Embedded fonts: Every font used in the document — including subsets — must be embedded. Composite (CIDFont) fonts used for Unicode text must include a ToUnicode mapping.
  • Device-independent colour: All colour specifications must use calibrated or ICC-profiled colour spaces, or named colour spaces with embedded profiles. Device colour spaces (DeviceRGB, DeviceCMYK, DeviceGray) are only permitted if an output intent (DestOutputProfile) specifying the intended rendering condition is present in the document.
  • XMP metadata: The document must contain XMP metadata (an embedded XML stream) including the PDF/A conformance identifier — the part number and conformance level — declared in the pdfaid:part and pdfaid:conformance XMP properties.
  • Document title in metadata: A document title must be present in the document information dictionary and XMP metadata.
  • Logical structure (for 'a' and 'u' levels): The document must be tagged with a complete, correct logical structure tree, and all text must have Unicode mappings.

Converting to PDF/A in Acrobat

Adobe Acrobat Pro provides a Preflight panel (Tools > Print Production > Preflight) with built-in PDF/A conversion profiles. To convert a document, open Preflight, select the appropriate PDF/A standard profile (for example, "Convert to PDF/A-2b"), and run the fixup. Acrobat will embed fonts, convert colour spaces, remove prohibited features, and add the required XMP metadata. It will report any issues it cannot automatically fix — such as transparency that requires manual flattening, or missing Unicode mappings for custom-encoded fonts.

The Preflight panel can also be used to validate (not just convert) an existing document against a PDF/A profile, producing a detailed report of any non-conformance issues.

PDF/A-3 vs PDF/A-2: Embedded Attachments

The most practically significant difference between PDF/A-2 and PDF/A-3 is the treatment of file attachments. PDF/A-2 only allows attaching files that are themselves PDF/A-conforming. PDF/A-3 removes this restriction, allowing any file type — XML, CSV, spreadsheets, images — to be embedded as an attachment. This makes PDF/A-3 the basis for hybrid invoice formats: the PDF/A-3 file is the archival human-readable invoice, while the embedded XML attachment is the machine-readable structured data used for automated processing. The combination is both archival and interoperable with automated accounts-payable systems.

Validation Tools

Beyond Acrobat Preflight, several dedicated validation tools are available. veraPDF is the industry-standard open-source PDF/A validator, maintained by the PDF Association and the Open Preservation Foundation, and used by national archives worldwide. It validates against all PDF/A levels and produces detailed, standards-referenced reports. The PDF Association also maintains a test suite of conforming and non-conforming documents for calibrating validators.

For programmatic validation and conversion at scale, the Adobe PDF Library provides the underlying engine that powers Acrobat's own PDF/A capabilities, giving developers direct access to validation and conversion within custom workflows.

PDF/A Conversion and Validation at Scale

Mapsoft's Adobe PDF Library integration enables developers to build high-throughput PDF/A conversion, validation, and archiving workflows into their own applications.