PDF Glossary: Extract, Page Range, Subset, Page Selection, and More
Definitions of key PDF terms related to page extraction: extract, page range, subset, page selection, content stream, shared resources, permissions password, and WebAssembly. A reference for understanding PDF extraction operations.
- pdf extraction glossary
- pdf page range definition
- pdf subset meaning
- pdf page selection terms
- extract pdf terminology
- pdf me
Extract
To extract PDF pages means to select one or more pages from an existing PDF and copy them into a new, separate document. The original file is not modified — the result is a new PDF containing only the pages you selected. Extraction differs from splitting (which distributes all pages across multiple output files) and from deletion (which removes pages from the original). See the complete extraction guide for a full walkthrough.
Page Range
A page range specifies a contiguous block of pages using a start page and an end page — for example, 'pages 5 to 12' or '5–12'. Most extraction tools accept range notation in a text input field, allowing you to type the range rather than click individual thumbnails. Page ranges are the fastest way to extract a chapter, a section, or any group of consecutive pages from a long document.
Page Selection
Page selection refers to the act of choosing which specific pages to include in an extraction operation. You can select pages individually by clicking their thumbnails in a visual preview, by entering page numbers manually, or by combining range notation with individual page numbers (for example: 1–5, 12, 18–20). The result is a set of pages that will be copied into the new extracted PDF.
Subset
In the context of PDF extraction, a subset is the group of pages you select from the full document. If a PDF has 100 pages and you extract pages 10–20 and page 35, your extracted document is a subset of the original — it contains 12 of the 100 pages. The term is also used in PDF typography, where a font subset refers to only the characters of a font that are actually used in the document, rather than the entire typeface.
Content Stream
The content stream is the data structure within a PDF page that describes all the visual elements on that page: text positioning, font references, image placements, vector drawing commands, and color settings. When you extract pages, the tool copies each selected page's content stream into the new document. The content stream is what makes page extraction lossless — the visual appearance is defined by this data, and it is copied unchanged.
Page Tree
The page tree is the internal index of a PDF file that records the order of pages and maps each page number to its content stream and associated resources. When you extract pages, the tool builds a new page tree for the extracted document containing only the selected pages. The page tree is also what gets rewritten when you reorder pages in an organization operation.
Shared Resources
Shared resources are fonts, images, and color profiles that are embedded once in a PDF and referenced by multiple pages, rather than being duplicated for each page that uses them. When you extract a subset of pages, the extracted document may include some shared resources that were referenced by pages not in your selection — because the tool must include every resource the extracted pages need to render correctly. This is why an extracted document can sometimes be larger than expected relative to the number of pages extracted.
Permissions Password
A permissions password (also called an owner password or restriction password) is a type of PDF protection that does not prevent a user from opening or reading the document, but restricts operations like editing, printing, copying, and page extraction. If a PDF has a permissions password set to prevent content extraction, an extraction tool will fail unless the restriction is removed first. This is distinct from the open password, which prevents the document from being viewed at all without the correct password.
WebAssembly (Wasm)
WebAssembly is a binary instruction format that allows code — including complex PDF processing libraries — to run in the browser at near-native speed. PDF ME uses WebAssembly to process page extraction entirely on your device, without uploading your file to any server. This approach provides both performance (large PDFs process quickly) and privacy (your document stays local). WebAssembly is available in all modern browsers.
PDF Split vs PDF Extract
These two terms are sometimes used interchangeably, but they describe different operations. Splitting divides all pages of a PDF into multiple output files — every page goes somewhere. Extracting creates a new document from a selected subset of pages, leaving the original intact. Use extraction when you need a specific subset; use splitting when you need to distribute all pages. Both operations preserve full page quality because they copy page content streams rather than re-rendering pages.
DPI (Dots Per Inch)
DPI measures the resolution of raster images embedded in a PDF — higher DPI means more detail and larger file size. PDF extraction does not change image DPI because the image data is copied directly without re-encoding. If an extracted page appears lower quality than expected, the source images likely have a low DPI to begin with. This is a property of the source document, not a result of the extraction operation.