Extract by Text Pattern

Overview

Extract by Text Pattern scans page text for keywords or regular expressions and collects every matching page into separate output files — one file per criterion. Pages collected for a single criterion can be non-contiguous within the source document.

This is useful for pulling all pages that mention a particular client name, project code, or legal term from a large PDF.

How to use

  1. Open the PDF you want to search.
  2. Go to Plug-Ins > Split > Extract by Text Pattern.
  3. Type a pattern in the text box and click Add to build the criteria list. Repeat for additional patterns.
  4. Optionally click Regex Library to insert a saved regular expression into the pattern field.
  5. Choose the output folder and file-name pattern.
  6. Click OK to run the extraction.

Options

OptionDescription
Criteria list One or more regular expressions (or plain text strings). Each criterion produces a separate output file containing all pages that match it. Use Add to append and Remove to delete the selected entry.
Regex Library Opens the Regex Library dialog so you can insert a saved pattern into the pattern field.
Search annotation text When checked, annotation content (comments, sticky notes, free-text annotations) on each page is also searched in addition to the page body text.
Output folder The destination folder for extracted files. Click Browse to select a folder.
File pattern Controls the output filename. Use {pattern} to embed the criterion string and {n} for a sequential index. The default pattern is {pattern}_{n}.
Open outputs after creation Opens each generated file in Acrobat after the operation completes.

Tip

Patterns follow standard C++ ECMAScript regex syntax. Prefix a pattern with (?i) for case-insensitive matching.

Note

A single page may match multiple criteria and will appear in each corresponding output file. Criteria that match no pages produce no output file.

See also