Add source attachment metadata compatibility

This commit is contained in:
2026-05-15 14:36:24 +02:00
parent afa51f8764
commit ad137b214f
13 changed files with 724 additions and 28 deletions

View File

@@ -23,7 +23,7 @@ native system services, or renderer-specific tooling.
- Scanned or image-only PDFs that require OCR.
- Encrypted or permission-restricted PDFs.
- Pixel-perfect layout reconstruction.
- Table, figure, annotation, form, signature, and attachment extraction.
- Table, figure, annotation, form, signature, and rich attachment extraction.
- PDF writing/export.
## Options
@@ -43,3 +43,9 @@ and originating PDF page object id.
Quality metadata records the extraction backend, document page count, selected
pages, extracted page count, page coverage, skipped pages, warning count,
lossiness, and confidence.
`NormalizedMarkdownDocument.attachments` may include read-side metadata for
embedded file streams and image-resource signals when the stdlib parser can
detect them. Embedded files include byte size and digest. Image resources are
signal-only descriptors with page/object provenance; the adapter does not
extract image bytes or perform OCR.