Document sanitization removes hidden metadata and embedded content that could reveal sensitive information about the document's origin, authorship, editing history, or internal structure. This is a critical step when releasing redacted documents to the public or to third parties.
| Option | Description | Default |
|---|---|---|
| Document Properties | Remove Author, Title, Subject, Keywords, Creator, and Producer fields from the document information dictionary. | On |
| XMP Metadata | Remove the XMP metadata stream, which can contain extensive editing history, software version info, and timestamps. | On |
| Thumbnails | Remove embedded page thumbnail images. These can sometimes retain content from earlier versions of a page. | On |
| JavaScript | Remove all document-level and page-level JavaScript actions. JavaScript can execute code when the document is opened. | On |
| Bookmarks | Remove all bookmarks (outlines). Bookmarks may contain text that reveals the document's structure or removed content. | Off |
| Form Fields | Flatten interactive form fields to their visual appearance, removing the ability to fill in or extract form data. | Off |
| Hidden Layers | Remove hidden optional content layers. Hidden layers may contain content that is not visible but is still present in the file. | On |
| Search Index | Remove the embedded search index. The index can contain text from the document that could be extracted separately. | On |
| Attachments | Remove all embedded file attachments from the document. | On |
| Annotations | Remove all annotations (comments, highlights, stamps, etc.). | Off |
The dialog provides two convenience buttons:
Warning
Sanitization permanently modifies the document. Removed metadata and content cannot be recovered. Save a backup copy before sanitizing.
Sanitization should typically be performed as the final step in the redaction workflow, after all redacting annotations have been applied and executed. A recommended workflow is:
Note
Even after redacting visible content, a PDF can still contain sensitive metadata such as the original author's name, editing timestamps, or hidden layers with pre-redaction content. Sanitization addresses these hidden information channels.