Files
markitect-filter/workplans/MKTF-WP-0003-source-attachment-manifest-compatibility.md

5.0 KiB

id, type, title, domain, status, owner, topic_slug, planning_priority, planning_order, depends_on_workplans, related_workplans, created, updated, state_hub_workstream_id
id type title domain status owner topic_slug planning_priority planning_order depends_on_workplans related_workplans created updated state_hub_workstream_id
MKTF-WP-0003 workplan Source Attachment Manifest Compatibility markitect done markitect-filter markitect complete 30
MKTF-WP-0001
MKTF-WP-0002
MKTT-WP-0018
MKTT-WP-0021
MKTT-WP-0020
2026-05-15 2026-05-15 16e5c830-31e3-4070-9e27-65d28ed06595

MKTF-WP-0003: Source Attachment Manifest Compatibility

Purpose

Provide the read-side source attachment and asset metadata needed by the Markitect render reference and asset manifest contract without making markitect-filter a renderer or export pipeline.

markitect-filter owns concrete source-format normalization. It should expose attachments, embedded media, package resources, and related provenance as normalized source metadata that markitect-tool can consume when building a render asset manifest.

Boundary

markitect-filter owns:

  • source-format-specific attachment discovery
  • read-side source asset metadata
  • media type, extension, size, and digest capture where available
  • provenance back to package paths, pages, anchors, or source members
  • diagnostics for skipped or unsupported embedded resources
  • fixtures proving EPUB3 and PDF adapters preserve read-side asset metadata

markitect-filter does not own:

  • write/export adapters
  • renderer execution
  • output asset copying
  • final render artifact paths
  • publication lifecycle or durable artifact storage
  • Quarkdown invocation

Those responsibilities belong to markitect-tool contracts, markitect-quarkdown render integration, or later runtime/publication systems.

Implementation Summary

Completed as a read-side attachment metadata compatibility slice:

  • Added shared source attachment metadata helpers and exported markitect-filter.source-attachment.v1.
  • EPUB3 read results now populate NormalizedMarkdownDocument.attachments for package images, stylesheets, fonts, audio, and video with byte size, digest, package path, manifest id, href, and render-manifest compatibility metadata.
  • PDF read results now populate attachments for embedded file streams and signal-only image resources where the stdlib parser can detect them.
  • Unsupported EPUB resources, missing EPUB resources, PDF image signals, and unreadable embedded files produce structured diagnostics.
  • Docs, handoff fixtures, adapter descriptors, README notes, and tests were added without introducing renderer/export behavior.

P3.1 - Align attachment metadata with Markitect source contracts

id: MKTF-WP-0003-T001
status: done
priority: high
state_hub_task_id: "d119daca-8141-4662-8ad7-ce43ccd79044"

Confirm how existing markitect_tool.source.SourceAsset and NormalizedMarkdownDocument.attachments should be populated by concrete read adapters.

Output: compatibility note, adapter metadata conventions, and tests that can be run with markitect-tool on PYTHONPATH.

P3.2 - Add EPUB3 embedded resource metadata

id: MKTF-WP-0003-T002
status: done
priority: medium
state_hub_task_id: "ebcbf480-210d-46e7-a4e4-fbe7e9baa39a"

Extend the EPUB3 adapter to report package resources that are relevant to future render manifests, such as images, stylesheets, fonts, and media where safe and cheap to inspect.

Output: EPUB3 attachment metadata, provenance, diagnostics, and fixtures.

P3.3 - Add PDF attachment and image-signal metadata

id: MKTF-WP-0003-T003
status: done
priority: medium
state_hub_task_id: "d8b7b820-387f-4d45-bf22-296b227f917a"

Extend the PDF adapter with read-side metadata for attachments or embedded resource signals where the current dependency profile can expose them reliably.

Do not add OCR, layout reconstruction, or renderer behavior.

Output: PDF metadata conventions, diagnostics, and tests.

P3.4 - Preserve checksums and provenance

id: MKTF-WP-0003-T004
status: done
priority: high
state_hub_task_id: "ca539c01-c272-4635-8f60-86f870bbef0c"

For each attachment or source asset, preserve stable identity fields:

  • source URI or package member path
  • media type and extension
  • byte size where available
  • digest/checksum where feasible
  • page, anchor, section, or package path provenance
  • extraction diagnostics and quality notes

Output: deterministic digest/provenance tests.

P3.5 - Provide handoff fixtures for render manifests

id: MKTF-WP-0003-T005
status: done
priority: medium
state_hub_task_id: "f2213a20-ce6f-4e16-9b9b-557b99f8b4d1"

Add fixtures that MKTT-WP-0021 can use to prove source attachments flow into render asset manifests without renderer execution.

Output: fixture files, README/docs update, and cross-repo validation command.

Exit Criteria

  • EPUB3 and PDF read adapters can expose read-side asset metadata when present.
  • Unsupported or skipped resources produce structured diagnostics.
  • markitect-filter remains read-only and does not implement export/render behavior.
  • markitect-tool can consume the metadata for passive render asset manifests.