2.9 KiB
Source Attachment Metadata
markitect-filter exposes read-side attachment metadata through
NormalizedMarkdownDocument.attachments. These entries are
markitect_tool.source.SourceAsset objects, so markitect-tool can consume
them when building passive render asset manifests.
The metadata schema marker is:
markitect-filter.source-attachment.v1
Common Fields
Attachment entries should preserve:
uri: stable source package or document member URIpath: package member path or signal pathname: member filename or signal labelmedia_typeandextensionwhen knownsizeanddigestwhen bytes are availablemetadata.source_adapter: adapter id such assource.epub3orsource.pdfmetadata.source_role: logical read-side rolemetadata.package_path,metadata.page,metadata.pdf_object, or related provenance coordinates when knownmetadata.render_manifest_compatible: truewhen the entry can feedRenderAsset.from_source_asset
These entries describe source-side resources only. They do not imply output paths, copy execution, final artifact locations, or publication state.
EPUB3
The EPUB3 adapter records manifest resources for images, stylesheets, fonts, audio, and video when the package entry exists and can be read cheaply from the ZIP archive. It stores byte size and sha256 digest for each collected resource.
Unsupported non-XHTML package resources produce
source.epub3.skipped_resource warnings. Declared but missing resources produce
source.epub3.missing_resource warnings.
The PDF adapter records embedded file streams when a stdlib scan can identify
Filespec and EmbeddedFile objects. It stores member bytes, media type by
filename, size, digest, object id, and source role embedded-file.
For image resources, the stdlib slice records signal-only entries with source
role image-signal. These entries preserve page/object provenance and a stable
digest of the detected page/resource signal, but they do not extract image
bytes. Image signals emit source.pdf.image_resource_signal warnings so callers
know the adapter detected media that it did not extract.
Render Manifest Handoff
markitect-tool can convert attachment entries to passive render assets:
from markitect_tool.render import RenderAsset
render_assets = [
RenderAsset.from_source_asset(asset, role=asset.metadata["source_role"])
for asset in document.attachments
]
The resulting render assets remain passive descriptors. Asset copying,
renderer output references, link rewriting, and final artifact validation stay
outside markitect-filter.
Example normalized attachment envelopes live in:
examples/source-attachments/epub3-attachments.normalized.yamlexamples/source-attachments/pdf-attachments.normalized.yaml
Cross-repo validation can be run from this checkout with:
PYTHONPATH=src:/home/worsch/markitect-tool/src python3 -m pytest