Understanding Tagged PDFs: Definition, Usage and Examples
A comprehensive guide to tagged PDF documents, their role in accessibility, and how to create them effectively.
What Is a Tagged PDF?
A tagged PDF is a PDF document that contains an underlying logical structure tree, sometimes called a "tag tree," which describes the organisation and semantic meaning of the document's content. Tags identify elements such as headings, paragraphs, lists, tables, images, and links, much like HTML tags describe the structure of a web page.
While a standard (untagged) PDF describes only the visual appearance of content on each page — where to place text, which fonts to use, how to draw graphics — a tagged PDF adds a layer of semantic information that describes what the content is, not just how it looks. This distinction is critical for accessibility, content reflow, and reliable text extraction.
Why Tagged PDFs Matter
Accessibility
The primary purpose of tagged PDF is to make documents accessible to people with disabilities. Screen readers and other assistive technologies rely on the tag structure to read content in the correct order, convey the document's logical hierarchy, and provide meaningful descriptions of non-text elements such as images.
Without tags, a screen reader can only extract raw text from the page content streams, which may not be in the correct reading order and provides no information about the document's structure. A visually impaired user would have no way of distinguishing a heading from body text, understanding table relationships, or knowing that an image is present.
Legal and Regulatory Requirements
Many jurisdictions and organisations require that electronic documents be accessible. Key regulations and standards include:
- Section 508 of the US Rehabilitation Act, which requires federal agencies to make electronic information accessible to people with disabilities.
- WCAG 2.1 (Web Content Accessibility Guidelines), which, while primarily focused on web content, is increasingly applied to PDF documents as well.
- PDF/UA (ISO 14289), the international standard for universally accessible PDF documents, which mandates the use of tags and specifies detailed requirements for how they must be structured.
- European Accessibility Act and the EN 301 549 standard, which require accessible ICT products and services across the European Union.
Content Reflow
Tagged PDFs enable content reflow, allowing the document to be displayed on screens of different sizes (such as mobile devices) by rearranging the content to fit the available space. Without tags, reflow is unreliable because the reading order and element relationships cannot be determined from the visual layout alone. For more on the broader topic of PDF accessibility, see our PDF accessibility guide.
Reliable Text Extraction and Repurposing
Tags make it possible to extract text from a PDF in the correct logical order, which is important for search indexing, copy-paste operations, and converting PDFs to other formats such as HTML, EPUB, or Word documents.
The Tag Structure
The tag tree in a tagged PDF is a hierarchy of structure elements, each identified by a structure type. The most common standard structure types defined in the PDF specification include:
Document-Level Tags
- Document: The root element of the tag tree, representing the entire document.
- Part: A large division of the document.
- Sect: A section within the document.
- Art: A self-contained article or body of content.
Block-Level Tags
- H, H1–H6: Headings at various levels, establishing the document's outline.
- P: A paragraph of text.
- L, LI, Lbl, LBody: Lists, list items, labels, and list item bodies.
- Table, TR, TH, TD: Tables, table rows, header cells, and data cells.
- BlockQuote: A block quotation.
- TOC, TOCI: Table of contents and table of contents items.
Inline-Level Tags
- Span: A generic inline element.
- Link: A hyperlink.
- Note: A footnote or endnote reference.
- Reference: A citation or cross-reference.
- Code: Computer code.
- Em: Emphasis (typically rendered as italic).
- Strong: Strong emphasis (typically rendered as bold).
Illustration Tags
- Figure: An image, diagram, or other graphical content. Figures should have alternative text describing their content.
- Formula: A mathematical formula.
- Form: A widget annotation (form field).
Alternative Text and Attributes
Structure elements in a tagged PDF can carry several important attributes:
- Alt (Alternative Text): A textual description of a non-text element, such as an image. This is read by screen readers in place of the visual content.
- ActualText: The actual text represented by content that might be visually rendered in a non-standard way (for example, a ligature or decorative glyph).
- E (Expansion Text): The expansion of an abbreviation or acronym.
- Lang: The natural language of the content (for example, "en-GB" for British English), which helps screen readers use the correct pronunciation.
Examples of Tagged PDF in Practice
Example 1: A Simple Document
Consider a simple document with a title, two paragraphs, and an image. The tag tree would look like:
- Document
- H1: "Report Title"
- P: "This is the first paragraph..."
- Figure (Alt: "A bar chart showing quarterly sales")
- P: "This is the second paragraph..."
Example 2: A Table
For a table with headers and data, the tag structure would be:
- Table
- TR
- TH: "Product"
- TH: "Price"
- TH: "Quantity"
- TR
- TD: "Widget A"
- TD: "$10.00"
- TD: "500"
- TR
The TH tags identify header cells, which allows assistive technologies to associate each data cell with its corresponding header when reading the table.
Example 3: A List
A numbered list would be tagged as:
- L (ListNumbering: Decimal)
- LI
- Lbl: "1."
- LBody: "First item in the list"
- LI
- Lbl: "2."
- LBody: "Second item in the list"
- LI
How to Create Tagged PDFs
There are several approaches to creating tagged PDF documents:
From Authoring Applications
The most reliable way to create tagged PDFs is to generate them from an authoring application that supports tagging:
- Microsoft Word: When exporting to PDF, use the "Best for electronic distribution and accessibility" option, or check "Document structure tags for accessibility" in the PDF options. Ensure that proper heading styles (Heading 1, Heading 2, etc.) are used in the source document.
- Adobe InDesign: InDesign can export tagged PDFs when the "Create Tagged PDF" option is selected during PDF export. The Articles panel can be used to control the reading order.
- LibreOffice: LibreOffice Writer can export tagged PDFs when the "Tagged PDF" option is checked in the PDF export dialogue.
Using Adobe Acrobat
Adobe Acrobat Pro provides tools for adding, editing, and verifying tags in existing PDF documents:
- The Accessibility tools can automatically add tags to an untagged PDF, though the results typically require manual review and correction.
- The Tags panel (View > Show/Hide > Navigation Panes > Tags) allows you to view and edit the tag tree directly.
- The Reading Order tool provides a visual interface for assigning tags to content on the page.
- The Accessibility Checker can validate a tagged PDF against accessibility requirements and identify issues.
Programmatically
Developers can create tagged PDFs programmatically using PDF libraries that support the tagged PDF specification. This approach is ideal for automated document generation workflows where accessibility must be built in from the start.
Common Issues and Best Practices
- Reading order: Ensure the tag tree reflects the correct logical reading order, which may differ from the visual layout. Multi-column layouts and documents with sidebars require particular attention.
- Alternative text: Every meaningful image must have appropriate alternative text. Decorative images should be tagged as artifacts so they are ignored by assistive technologies.
- Table structure: Complex tables with merged cells or nested headers need careful tagging with appropriate scope attributes and header-cell associations.
- Language specification: Set the document's primary language and mark any passages in a different language with the appropriate Lang attribute.
- Artifacts: Content that is not part of the logical document structure — such as page numbers, headers, footers, and decorative elements — should be marked as artifacts rather than tagged as content.
- Validation: Always validate tagged PDFs using accessibility checking tools such as Adobe Acrobat’s built-in checker or the PAC (PDF Accessibility Checker) tool.
Accessibility Compliance Checklist
Use this checklist before publishing any document that needs to meet PDF/UA, WCAG 2.1, Section 508, or the European Accessibility Act. The list covers what an accessibility checker will and won’t verify automatically; the items marked with an asterisk require human judgement and cannot be fully automated.
- Document language is set in the PDF’s metadata (the
/Langentry in the document catalog). - Document title is set in the metadata, and the title is meaningful (not the source filename).
- Tag tree exists and is structured (Document, Sect, P, H1–H6, L, LI, Table, TR, TD, TH, Figure).
- Reading order is correct* — the tag tree order matches the intended reading order, not the visual layout. Critical for multi-column and sidebar layouts.
- Heading hierarchy is logical* — H1 for the document title (one only), H2 for major sections, H3 nested under H2s, no skipped levels.
- All images have appropriate alt text* — meaningful alt text for content-bearing images, marked as artifacts for decorative images.
- Tables are properly tagged with TR, TH, and TD elements, with TH cells associated with the data cells they describe via
scopeorheadersattributes. - Lists use proper L/LI structure, with Lbl and LBody children where appropriate.
- Form fields have tooltips set in their properties — screen readers announce these as field labels.
- Links have meaningful link text* — not "click here" or bare URLs but descriptive text that makes sense out of context.
- Colour contrast meets WCAG AA* — 4.5:1 for normal text, 3:1 for large text. Acrobat’s checker doesn’t verify this; use a separate contrast tool.
- Page numbers, headers, and footers are artifacts, not tagged as content.
- Bookmarks exist for any document longer than ~10 pages, mirroring the heading hierarchy.
- The PDF passes Acrobat’s Accessibility Check with no errors (warnings are sometimes acceptable; document the rationale).
- The PDF passes PAC 2024 (the third-party PDF Accessibility Checker) with no PDF/UA failures.
For organisations subject to the German PDF/UA mandate covered in our Germany PDF/UA standardisation piece, this checklist is the working baseline for any document destined for federal or state agency use.
Automated Tagging Tools Compared
Manually tagging PDFs is slow — a competent accessibility specialist works through one moderately complex 30-page document per day. Automated tagging tools have improved enormously in the last few years, and most production accessibility workflows now run automated tagging first and remediate by hand only what the tool got wrong.
| Tool | Best for | Strength | Limitation |
|---|---|---|---|
| Adobe Acrobat Auto-Tag | Single-document remediation by an accessibility specialist | Built into Acrobat Pro; handles most heading-list-paragraph structure correctly; integrated with the rest of the Acrobat tagging UI | Slower than batch tools; quality drops on multi-column or heavily formatted layouts |
| Adobe PDF Accessibility Auto-Tag API | Bulk remediation of legacy archives | Cloud REST API; powered by Sensei AI; handles thousands of documents per hour; covered in our Adobe PDF Services API post | Per-document pricing; output still needs human review for complex tables and reading order |
| CommonLook PDF GlobalAccess | Enterprise accessibility teams remediating high volumes with audit trails | Industry-leading accuracy; comprehensive remediation features; section-508 / WCAG / PDF/UA compliance reporting | Expensive enterprise licensing; learning curve; overkill for small teams |
| axesPDF | Authoring-stage tagging from Word and InDesign | Word add-in produces well-tagged PDFs without post-processing; PDF/UA-aware throughout | Best when used at authoring time; less useful for legacy PDFs |
| PAC 2024 (free) | Validation rather than tagging | The de facto third-party checker for PDF/UA conformance; free; produces detailed reports | Validates only; doesn’t fix issues |
The pattern most production accessibility teams converge on: use Acrobat Auto-Tag (or the PDF Services API for volume) to produce a first-pass tag tree, validate with Acrobat’s checker plus PAC, and remediate the flagged issues by hand. The 80/20 split is real — the tools handle most of the work, and the remaining 20% takes a competent specialist a fraction of the time it would take to tag the document from scratch.
Conclusion
Tagged PDFs are essential for creating accessible, well-structured documents that can be used by everyone, including people who rely on assistive technologies. As accessibility requirements become increasingly important across industries and jurisdictions, understanding and implementing tagged PDF properly is a critical skill for document professionals. By following the standards and best practices outlined above, you can ensure that your PDF documents are accessible, compliant, and ready for a wide range of uses beyond simple visual display. For compliance requirements in regulated industries, our article on FDA PDF compliance covers the requirements for pharmaceutical submissions.
Related Articles
PDF Bookmarks: A Complete Guide
A complete guide to PDF bookmarks: what they are, how they work in the PDF specification, how Acrobat generates them, and how to automate their creation.
PDF Annotations: Types and Usage
A comprehensive guide to PDF annotations: the different types, how they are stored in the PDF format, review workflows, flattening, and scripting with Acrobat JavaScript.
PDF Layers (Optional Content Groups)
Everything about PDF layers (Optional Content Groups): how OCGs work in ISO 32000, use cases in CAD and maps, creating layers in InDesign and Acrobat, scripting visibility, and printing.
Need Help Working with PDFs?
Mapsoft offers professional PDF tools and expert consultancy services to help you get the most from your documents.