--- id: MRKD-WP-0004 type: workplan domain: markitect repo: marki-docx status: active state_hub_workstream_id: 91d06c92-caa8-42fc-b6d4-82340f1bed4f created: 2026-03-16 updated: 2026-03-16 --- # MRKD-WP-0004 — Stable Documentation Corpus & Architecture Records Fulfil FR-1101 by establishing the markidocx product documentation itself as a real, managed markidocx project. The specs (PRD, FRS, UCC) become a live round-trip corpus that the `release-regression` workflow runs against on every release. This workstream also writes the two deferred architecture decision records and generates the first SBOM. **Scope:** FR-1101–1110 (stable corpus & self-test), ADR-002, ADR-003, SBOM **Out of scope:** diagram rendering, packaging, CI/CD — addressed in WP-0005/0006 **Depends on:** MRKD-WP-0001, MRKD-WP-0002, MRKD-WP-0003 — all complete --- ## T01 — Set up specs as real markidocx project manifest ```task id: MRKD-WP-0004-T01 status: done priority: high state_hub_task_id: f1a36613-ceaa-4786-ac39-cd3a7fd1c142 ``` Create a manifest file that treats the markidocx product documentation as a live markidocx project. This makes the specs the stable corpus for regression testing as required by FR-1101. - Create `corpus/markidocx-docs/manifest.yaml`: - `project.name: markidocx-docs` - `project.feature_level: level1` - `project.family: article` - `sources`: PRD, FRS v0.2, UCC (relative paths into `specs/`) - `output.dir: corpus/markidocx-docs/dist` - Run `markidocx validate corpus/markidocx-docs/manifest.yaml` — must exit 0 - Run `markidocx build corpus/markidocx-docs/manifest.yaml` — must produce valid DOCX - Run `markidocx import` + `markidocx compare` — must report clean or expected drift only - Document any structural drift in `corpus/markidocx-docs/known-drift.md` Deliverable: `markidocx build corpus/markidocx-docs/manifest.yaml` succeeds; a DOCX of the product documentation exists in `corpus/markidocx-docs/dist/`. --- ## T02 — Wire release-regression workflow against specs corpus ```task id: MRKD-WP-0004-T02 status: done priority: high state_hub_task_id: f17e959f-28da-4386-9004-b5e036054b06 ``` Connect the `release-regression` composite workflow to the real documentation corpus so that `markidocx workflow release-regression corpus/markidocx-docs/manifest.yaml` runs a full build → import → compare cycle and records evidence (FR-1102, FR-1103, FR-1106, FR-1107). - Update `workflows.py` `release-regression` handler to accept a manifest path argument; default to the corpus manifest when none supplied - Run the workflow; assert the evidence set contains build, import, and drift reports - Add `tests/regression/test_corpus_regression.py`: - Invokes `release-regression` on the corpus manifest - Asserts workflow result is `full` or `with-fallback` (not `failed`) - Asserts evidence artefacts are present and have correct traceability fields (FR-1110) - Disclose corpus identity in regression output (FR-1109): include corpus manifest path and its git HEAD SHA as `corpus_id` in the workflow result Deliverable: `pytest tests/regression/test_corpus_regression.py` passes; evidence written to `.markidocx/evidence/` and retrievable via CLI. --- ## T03 — ADR-002: python-docx as conversion engine ```task id: MRKD-WP-0004-T03 status: done priority: medium state_hub_task_id: bfe2a9fa-25b2-4b4b-b21b-eae457716ce0 ``` Write the architecture decision record explaining the choice of python-docx as the DOCX conversion engine. This was identified as a deferred deliverable during WP-0001. File: `architecture/ADR-002-python-docx-as-conversion-engine.md` Cover: - **Context:** need to produce and consume .docx files from Python; alternatives evaluated (pandoc subprocess, docx2python, mammoth, python-docx) - **Decision:** python-docx for both build (write) and import (read) - **Consequences:** direct paragraph/run model maps cleanly to Markdown structure; no subprocess dependency; limited to Open XML subset exposed by python-docx API; complex Word features (track changes, SmartArt) are out of scope by design - **Alternatives rejected:** pandoc — heavier dependency, harder to control structure; mammoth — read-only; docx2python — limited write support Deliverable: `architecture/ADR-002-*.md` present and follows ADR-001 conventions. --- ## T04 — ADR-003: manifest YAML schema ```task id: MRKD-WP-0004-T04 status: done priority: medium state_hub_task_id: b6de6733-b332-4efc-9e23-82fce205b856 ``` Write the architecture decision record documenting the manifest YAML schema design. File: `architecture/ADR-003-manifest-yaml-schema.md` Cover: - **Context:** need a project definition format that is human-writable, version-controlled, and parseable without a schema registry - **Decision:** YAML with a fixed top-level structure (`project`, `sources`, `output`, `metadata`); validated on load via dataclass coercion - **Schema snapshot:** include the current field definitions as a reference - **Consequences:** simple for users; no JSON Schema or Pydantic dependency; evolving the schema requires coordination with manifest.py - **Alternatives rejected:** TOML (less familiar in doc tooling), JSON (less writable), a database manifest (over-engineered for single-project use) Deliverable: `architecture/ADR-003-*.md` present. --- ## T05 — SBOM generation and state-hub registration ```task id: MRKD-WP-0004-T05 status: blocked blocking_reason: ops-bridge ingest_sbom_tool cannot access /home/tegwick/ paths (runs as worsch). Configure host_paths mapping for marki-docx, then re-run ingest. priority: medium state_hub_task_id: 36aecd50-8176-4122-9706-a8697d8f5936 ``` Generate and register the first SBOM for marki-docx so the state hub has an accurate dependency picture. ```bash cd ~/the-custodian/state-hub make ingest-sbom REPO=marki-docx SCAN=1 REPO_PATH=/home/tegwick/marki-docx ``` - Verify the SBOM ingestion completes without errors - Confirm `last_sbom_at` is set for `marki-docx` in the state hub - Document any licence issues or unexpected transitive dependencies - Add a note to CLAUDE.md reminding to re-run SBOM after dependency changes Deliverable: State hub shows `last_sbom_at` set for `marki-docx`; no unresolved licence issues. --- ## How to Work - Work through tasks in priority order: T01 → T02 (high), then T03 → T04 → T05 (medium) - T01 must complete before T02 (T02 depends on the corpus manifest) - T03 and T04 are independent writing tasks — can be done in any order or in parallel ## Updating Task Status ``` status: todo → status: in_progress (when you start it) status: in_progress → status: done (when verified complete) ``` When every task is `done`, set the frontmatter `status: done`. ## Success Criteria Before marking the workplan done: 1. Every task block has `status: done` 2. Workplan frontmatter `status: done` 3. `corpus/markidocx-docs/manifest.yaml` present and builds cleanly 4. `pytest tests/regression/test_corpus_regression.py` passes 5. `architecture/ADR-002-*.md` and `architecture/ADR-003-*.md` present 6. State hub shows `last_sbom_at` set for `marki-docx`