# PDF Adapter

`source.pdf` is a read-only Markitect source adapter for local,
digitally-readable PDF files.

## Dependency Policy

The first implementation is stdlib-only. The `pdf` optional dependency extra is
present so a richer pure-Python backend can be added later without changing the
adapter boundary or making PDF support mandatory for EPUB3 users.

The adapter does not use network access, external processes, OCR engines,
native system services, or renderer-specific tooling.

## Supported Inputs

- Local files with media type `application/pdf` or extension `.pdf`.
- PDFs with extractable text in page content streams.
- Plain and FlateDecode content streams for the first deterministic slice.

## Deferred Inputs

- Scanned or image-only PDFs that require OCR.
- Encrypted or permission-restricted PDFs.
- Pixel-perfect layout reconstruction.
- Table, figure, annotation, form, signature, and attachment extraction.
- PDF writing/export.

## Options

- `page_range`: optional 1-based page range such as `1-3,5`.
- `include_page_breaks`: when true, prefixes each page segment with a Markdown
  page marker comment.
- `normalize_whitespace`: when true, collapses repeated horizontal whitespace
  while preserving extracted line breaks.

## Provenance And Quality

The adapter emits one segment per extracted page. Each segment carries
page-level `SourceProvenance` with the source path, source digest, page number,
and originating PDF page object id.

Quality metadata records the extraction backend, document page count, selected
pages, extracted page count, page coverage, skipped pages, warning count,
lossiness, and confidence.