PDF Compression: Reducing File Size Without Losing Quality

A technical look at what makes PDFs large, the compression algorithms available for different content types, and how to optimise effectively.

← Back to Blog

What Contributes to PDF File Size?

A PDF file is a collection of objects — streams, dictionaries, arrays, and scalars — assembled into a document. The largest contributors to file size in typical PDFs are images, fonts, and content streams. Understanding where the bytes are going is the first step to reducing file size effectively.

  • Images: Raster images are almost always the dominant contributor to file size in PDFs containing photographs, scanned documents, or high-resolution artwork. A single uncompressed CMYK image at print resolution can easily run to tens of megabytes.
  • Fonts: Embedded font programs — particularly TrueType and OpenType fonts with large character sets — can add hundreds of kilobytes per font. A document using many typefaces or a large Unicode CJK font can accumulate several megabytes in font data alone.
  • Content streams: Page content streams describe all the text, paths, and graphics on each page. For pages with complex vector artwork or many small objects, content streams can be significant in size.
  • Other streams: Colour profiles, thumbnail images, embedded files (in PDF/A-3), XMP metadata, and other stream-based objects contribute smaller but non-trivial amounts to total file size.

For a detailed breakdown of where size is spent in a specific document, Acrobat Pro's Audit Space Usage feature (Document Properties > Description, or via Save As Other > Optimized PDF) provides a percentage breakdown by category. See also our related post on managing PDF file size.

Image Compression Algorithms

PDF supports multiple image compression filters, each suited to different types of image content:

JPEG (DCTDecode) — Lossy

JPEG (implemented in PDF as the DCTDecode filter) is the most widely used compression for continuous-tone photographic images. It achieves high compression ratios by discarding high-frequency image detail that the human eye is less sensitive to. JPEG is a lossy algorithm: each time a JPEG-compressed image is re-compressed, quality degrades further. In PDF, JPEG quality is controlled by a quality parameter from 0 (lowest) to 100 (highest). For screen-resolution or web-optimised PDFs, quality settings of 60–75 are common; print-quality PDFs typically use 80–90 or retain the original JPEG without re-encoding.

ZIP/Deflate (FlateDecode) — Lossless

FlateDecode applies the DEFLATE algorithm (the same used by ZIP files and PNG images) to compress any stream losslessly. It works well for images with large areas of flat colour, line art, screenshots, and indexed colour images. For photographic content FlateDecode produces significantly larger files than JPEG. It is the standard filter for content streams, form XObjects, and other non-image streams where lossless compression is required.

JPEG 2000 (JPXDecode) — Lossy or Lossless

JPEG 2000 (JPXDecode in PDF) is a wavelet-based image compression standard that supports both lossy and lossless modes. It typically achieves better compression than JPEG at equivalent visual quality, and supports a wider gamut of features including progressive display, multi-component images, and arbitrary bit depths. JPEG 2000 is the mandatory image compression format for PDF/A-2 and later when lossy compression is applied to photographs. Support in PDF viewers is near-universal for Acrobat-compatible readers.

JBIG2 (JBIG2Decode) — Monochrome, Lossy or Lossless

JBIG2 is designed specifically for compressing bi-level (black-and-white) images such as scanned text pages, fax images, and monochrome line art. It achieves dramatic compression ratios — typically 3–10x better than CCITT Group 4 — by identifying repeated symbols (characters, logos) and encoding them as references to a shared dictionary of patterns. In lossy mode, JBIG2 can substitute visually similar glyphs, which is extremely effective for scanned text but can occasionally cause subtle glyph substitution artefacts. Adobe Acrobat uses JBIG2 in lossy mode by default for monochrome images when using the Reduce File Size or PDF Optimizer features.

CCITT Group 4 (CCITTFaxDecode) — Monochrome, Lossless

CCITT Group 4 is a lossless bi-level compression standard derived from fax technology (ITU-T T.6). It is highly efficient for scanned monochrome documents and was the dominant format for monochrome images in PDFs before JBIG2 was introduced. It remains in widespread use for archival scans (particularly in TIFF-based workflows that are then wrapped into PDF) and in PDF/A-1, where JBIG2 was not yet formally supported.

Font Subsetting, Embedding, and Exclusion

Fonts contribute to file size in three ways depending on how they are handled:

  • Full embedding: The entire font program is included in the PDF. Guarantees accurate rendering on any device but maximises font data size. Common in print-ready PDFs where any character might appear in variable data.
  • Subsetting: Only the glyphs (individual characters) actually used in the document are embedded. Acrobat's default behaviour is to subset fonts when fewer than a configurable percentage of glyphs are used (typically 35% threshold). A subset of a large CJK font used for a handful of Chinese characters might be tens of kilobytes rather than several megabytes. Subsetting is identified by a six-character random prefix on the font name (e.g., "ABCDEF+Helvetica").
  • Not embedding: The font is referenced by name but not included in the file. This produces the smallest file but requires the font to be installed on the recipient's device; if absent, a substitution font will be used, potentially altering layout and appearance significantly. Not appropriate for document distribution; only viable for internal workflows with controlled environments.

Content Stream Compression (FlateDecode)

Page content streams — the sequences of PDF operators that draw text, paths, and images on a page — are themselves compressible. Applying FlateDecode to content streams typically achieves 50–80% compression depending on content complexity. Acrobat always compresses content streams by default in recent versions. Older PDFs created by early PostScript-to-PDF workflows or certain applications may have uncompressed content streams; reprocessing such files through Acrobat's Optimizer captures this size reduction.

Lossless vs Lossy Compression Trade-offs

The choice between lossless and lossy compression depends on the use case:

  • Lossless compression (FlateDecode, CCITT Group 4, lossless JBIG2, lossless JPEG 2000) preserves every bit of the original data and is appropriate for line art, text-as-image, technical drawings, archival scans, and any content where quality degradation is not acceptable.
  • Lossy compression (JPEG, lossy JBIG2, lossy JPEG 2000) achieves significantly better size reduction by discarding data deemed visually unimportant. For photographs and scanned pages destined for screen reading, well-tuned lossy compression is imperceptible and provides 5–10x size reductions over lossless alternatives.

The key risk of lossy re-compression is quality accumulation: if a PDF has already been JPEG-compressed, re-compressing it with JPEG again compounds the artefacts. Where possible, re-encode from the original uncompressed source rather than re-compressing an already-lossy file.

Downsampling Images

In addition to compression, image resolution can be reduced (downsampled) to decrease the number of pixels and thus the compressed data size. PDF Optimizer offers three downsampling methods:

  • Average downsampling: Averages the colour values of a block of source pixels to produce each output pixel. Fast but can introduce slight blurring.
  • Subsampling: Uses the value of the centre pixel of a block to represent the block. Fastest but lowest quality — can produce aliasing artefacts.
  • Bicubic downsampling: Uses a weighted average of surrounding pixels based on a bicubic interpolation kernel. Slowest but highest quality — the recommended choice for photographs.

Typical downsampling targets are 72 ppi for screen-only PDFs, 150 ppi for standard print, and 300 ppi (or original resolution) for high-quality print. Monochrome images are typically targeted at 600–1200 ppi to preserve the sharpness of text edges.

PDF Optimizer and Reduce File Size in Acrobat

Acrobat Pro provides two tools for size reduction. File > Reduce File Size applies a predefined set of optimisations targeting compatibility with a specific Acrobat version, re-compressing images and removing some unnecessary data. This is fast but not configurable.

Save As Other > Optimized PDF opens the full PDF Optimizer dialog, which gives granular control over image compression settings per image type (colour, greyscale, monochrome), font handling, transparency flattening, discard objects (form data, comments, bookmarks, article threads, metadata), and clean up (object compression, encoding, invalid bookmarks). The Audit Space Usage button within this dialog shows exactly where size is being spent before you apply optimisation.

Object Streams and Cross-Reference Streams (PDF 1.5+)

PDF 1.5 introduced object streams — a mechanism for packing multiple PDF objects into a single compressed stream. This reduces the overhead of individual object headers and allows indirect objects (which are numerous in a typical PDF) to benefit from cross-object compression rather than being stored individually. Combined with cross-reference streams (which replace the traditional uncompressed cross-reference table with a compressed binary stream), PDF 1.5-format files are structurally more compact than equivalent PDF 1.4 files even with identical content. Modern Acrobat versions use these features by default when saving in PDF 1.5 or later compatibility mode.

Re-distilling PDFs for Maximum Compression

For the maximum possible size reduction on a complex document, a technique called re-distilling involves printing the PDF to a PostScript printer driver and then re-creating the PDF using Adobe Distiller with a compression-optimised job options file. This approach re-rasterises and re-encodes all content, applies the full Distiller compression pipeline, and removes all PDF structures from the original document. The resulting PDF is a clean, consistently compressed version. The trade-off is that re-distilling is destructive: it can remove accessibility tags, bookmarks, hyperlinks, form fields, and other interactive or structural features, and it discards any lossless content by converting it through the PostScript raster pipeline.

Automate PDF Optimisation in Your Workflow

Mapsoft's PDF solutions give developers and businesses the tools to compress, optimise, and process PDFs programmatically at scale — without manual intervention in Acrobat.