# PDF Adapter `source.pdf` is a read-only Markitect source adapter for local, digitally-readable PDF files. ## Dependency Policy The first implementation is stdlib-only. The `pdf` optional dependency extra is present so a richer pure-Python backend can be added later without changing the adapter boundary or making PDF support mandatory for EPUB3 users. The adapter does not use network access, external processes, OCR engines, native system services, or renderer-specific tooling. ## Supported Inputs - Local files with media type `application/pdf` or extension `.pdf`. - PDFs with extractable text in page content streams. - Plain and FlateDecode content streams for the first deterministic slice. ## Deferred Inputs - Scanned or image-only PDFs that require OCR. - Encrypted or permission-restricted PDFs. - Pixel-perfect layout reconstruction. - Table, figure, annotation, form, signature, and attachment extraction. - PDF writing/export. ## Options - `page_range`: optional 1-based page range such as `1-3,5`. - `include_page_breaks`: when true, prefixes each page segment with a Markdown page marker comment. - `normalize_whitespace`: when true, collapses repeated horizontal whitespace while preserving extracted line breaks. ## Provenance And Quality The adapter emits one segment per extracted page. Each segment carries page-level `SourceProvenance` with the source path, source digest, page number, and originating PDF page object id. Quality metadata records the extraction backend, document page count, selected pages, extracted page count, page coverage, skipped pages, warning count, lossiness, and confidence.