How to Extract Embedded Files from a PDF
PDFs can carry arbitrary files inside them. Here's how to find those attachments and save them out — in Acrobat, online, or at the command line.
Quick summary
Quick answer: A PDF can contain embedded files — arbitrary attachments such as spreadsheets, images, source documents, or XML data — stored alongside the visible pages. You can extract them using Adobe Acrobat's Attachments panel, free online tools like Mapsoft's PDF Hub, or command-line utilities like pdfdetach and qpdf.
You can also extract embedded files from a PDF online for free using Mapsoft's PDF Hub — no installation required.
What are embedded files in a PDF?
An embedded file — also called a PDF attachment — is an arbitrary file stored inside a PDF container, separate from the page content. The PDF specification (ISO 32000) reserves a dictionary entry called EmbeddedFiles in the document catalog for exactly this purpose. Any file type can live there: Excel workbooks, Word documents, images, CSVs, JSON, XML, even other PDFs.
Embedding is distinct from two things that sound similar. Inline images and fonts are included in the page's content stream; they render as part of the visual document and aren't separately downloadable. An embedded file, by contrast, is a complete discrete file that a reader can save back out intact, exactly as it was supplied. PDF Portfolios (sometimes called PDF Packages) take this to an extreme — they're essentially PDF containers whose main job is to bundle a set of embedded files together.
Why would a PDF have embedded files?
Embedded files exist to keep related materials bundled together so they travel with the document rather than as separate attachments that can get lost. Common real-world uses:
- Tax returns and government filings. The authoritative PDF includes the supporting spreadsheets, CSVs, and schedules as embedded files so auditors have the original data, not just a rendered snapshot.
- Academic and research papers. Datasets, supplementary materials, code, and figures travel with the paper so readers can reproduce the analysis.
- Engineering and architectural PDFs. DWG drawings, BIM files, and 3D models ship inside the PDF so a single file represents the complete deliverable.
- XFA and hybrid forms. The form definition, reference images, and scripts are embedded so the form works end-to-end on any reader.
- TDMRep policy files. Our TDM rights post covers how a PDF can embed a machine-readable policy that declares text-and-data-mining permissions.
- Emails archived to PDF. Outlook and Gmail both support "print to PDF with attachments" options that preserve original attachments as embedded files.
How to tell if a PDF has attachments
There's no visible indicator on the page itself. A PDF can carry hundreds of megabytes of embedded files and render identically to one with nothing attached. To check, you need to look at the document catalog or use a reader that surfaces attachments.
- In Adobe Acrobat / Reader: open the document, then click the paperclip icon on the left-hand navigation panel. If the icon isn't showing, right-click the panel and enable the Attachments pane. Any embedded files are listed there with their original filenames and sizes.
- In a browser: most built-in PDF viewers (Chrome, Edge, Firefox) show attachments in a sidebar or under a paperclip button. The exact UI varies by browser.
- At the command line:
pdfdetach -list filename.pdfprints the attachment table without extracting anything. - In a hex editor: search for
/EmbeddedFiles. If the string appears inside a dictionary, the PDF contains attachments.
Extracting embedded files: three methods
Method 1 — Adobe Acrobat's Attachments panel
With the PDF open in Acrobat or Acrobat Reader, click the paperclip icon on the left navigation pane. For each attachment you want to save, right-click and choose Save Attachment (or Save All Attachments to export the whole set at once). Acrobat preserves each file's original name and content exactly as it was embedded.
This is the most reliable method for sensitive documents because the file never leaves your machine. The downside is that you need Acrobat (or Acrobat Reader) installed and the ability to open the file; if the PDF is locked or damaged, Acrobat may not give you access to the attachments panel.
Method 2 — Online, free, no installation
Mapsoft's free Extract Embedded Files tool runs in the browser. Upload a PDF and the tool lists every attachment it finds, with original filename, size, and MIME type, and lets you download them individually or as a zip. It's the fastest option when you're on a locked-down machine or working with a PDF that arrived on a colleague's laptop.
Like any online tool, check the service's privacy policy before uploading truly confidential material. For routine extraction — the bulk of real-world cases — it's the least-friction option available.
Method 3 — Command line (pdfdetach, qpdf)
Both GNU's pdfdetach (part of poppler-utils) and qpdf handle extraction. pdfdetach is the most direct:
pdfdetach -list file.pdf— show the table of attachments.pdfdetach -saveall -o ./out file.pdf— extract every attachment to the./outdirectory.pdfdetach -save 3 -o data.xlsx file.pdf— extract the third attachment only.
qpdf's --json mode can enumerate attachments in a machine-readable format that's easy to script against, which matters when you're processing hundreds of PDFs overnight rather than one on your desk.
Security considerations
Embedded files can be any file type, including executables, macro-enabled Office documents, and scripts. A malicious PDF can carry a .exe, .js, or macro-laden .docm payload and wait for a user to click "Open attachment" from the Acrobat panel. Treat attachments from untrusted PDFs with the same care as email attachments from unknown senders.
A few sensible habits:
- Before opening any extracted attachment, inspect the filename and extension. Acrobat shows both in the Attachments panel.
- Scan extracted files with an up-to-date antivirus tool before opening them — especially Office documents, which can carry macros.
- For extractions you run as part of an automated pipeline, validate file types against an allowlist (e.g. only accept .xlsx, .csv, .png) and quarantine anything that doesn't match.
- If you're doing discovery or forensics work, use a sandboxed workstation so an opened attachment can't pivot to your main network.
Frequently Asked Questions
How do I know if a PDF has attachments?
Open the PDF in Adobe Acrobat or Reader and click the paperclip icon on the left navigation pane. You can also run pdfdetach -list on the file, or use an online tool that lists attachments.
Are embedded files the same as inline images?
No. Inline images are rendered as part of a page; embedded files are complete, standalone files bundled inside the PDF container. Embedded files are extracted back out as intact originals; inline images are not.
Can I extract embedded files without installing software?
Yes. Free online tools like Mapsoft's PDF Hub Extract Embedded Files run in the browser. The file is processed server-side; most viewers also surface attachments in a sidebar.
Are embedded files safe to open?
Not automatically. An attachment can be any file type, including executables and macro-enabled documents. Treat attachments from untrusted PDFs the same way you'd treat email attachments from unknown senders.
What file types can be embedded in a PDF?
Any. The PDF specification doesn't restrict the MIME type of an embedded file. Spreadsheets, CSVs, images, CAD files, other PDFs, XML, JSON, archives, and executables can all be embedded.
Related Articles
PDF Portfolios: Combining Files in a Single PDF
The natural next step: PDF Portfolios are documents whose primary purpose is to bundle embedded files.
PDF File Structure Explained
Where the EmbeddedFiles dictionary lives inside a PDF, and how extractors read it.
TDM Rights in PDF Documents and the TDMRep Protocol
One practical use of embedded files: carrying a machine-readable TDM policy inside the document.