Files
marki-docx/workplans/MRKD-WP-0004-stable-corpus-architecture.md
Bernd Worsch ebc5eaee77 feat: WP-0004 T01-T04 — stable corpus, ADRs, regression test
- corpus/markidocx-docs/manifest.yaml: specs as live markidocx project (FR-1101)
- corpus/markidocx-docs/known-drift.md: documented structural drift
- workflows.py: release-regression accepts manifest path; emits corpus_id (FR-1109)
- tests/regression/test_corpus_regression.py: corpus regression suite (FR-1102–1110)
- architecture/ADR-002: python-docx as conversion engine
- architecture/ADR-003: manifest YAML schema
- workplans/MRKD-WP-0004: T01–T04 done; T05 blocked (SBOM path mapping needed)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 17:48:33 +00:00

192 lines
7.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: MRKD-WP-0004
type: workplan
domain: markitect
repo: marki-docx
status: active
state_hub_workstream_id: 91d06c92-caa8-42fc-b6d4-82340f1bed4f
created: 2026-03-16
updated: 2026-03-16
---
# MRKD-WP-0004 — Stable Documentation Corpus & Architecture Records
Fulfil FR-1101 by establishing the markidocx product documentation itself as a real,
managed markidocx project. The specs (PRD, FRS, UCC) become a live round-trip corpus
that the `release-regression` workflow runs against on every release. This workstream
also writes the two deferred architecture decision records and generates the first SBOM.
**Scope:** FR-11011110 (stable corpus & self-test), ADR-002, ADR-003, SBOM
**Out of scope:** diagram rendering, packaging, CI/CD — addressed in WP-0005/0006
**Depends on:** MRKD-WP-0001, MRKD-WP-0002, MRKD-WP-0003 — all complete
---
## T01 — Set up specs as real markidocx project manifest
```task
id: MRKD-WP-0004-T01
status: done
priority: high
state_hub_task_id: f1a36613-ceaa-4786-ac39-cd3a7fd1c142
```
Create a manifest file that treats the markidocx product documentation as a live
markidocx project. This makes the specs the stable corpus for regression testing
as required by FR-1101.
- Create `corpus/markidocx-docs/manifest.yaml`:
- `project.name: markidocx-docs`
- `project.feature_level: level1`
- `project.family: article`
- `sources`: PRD, FRS v0.2, UCC (relative paths into `specs/`)
- `output.dir: corpus/markidocx-docs/dist`
- Run `markidocx validate corpus/markidocx-docs/manifest.yaml` — must exit 0
- Run `markidocx build corpus/markidocx-docs/manifest.yaml` — must produce valid DOCX
- Run `markidocx import` + `markidocx compare` — must report clean or expected drift only
- Document any structural drift in `corpus/markidocx-docs/known-drift.md`
Deliverable: `markidocx build corpus/markidocx-docs/manifest.yaml` succeeds; a DOCX
of the product documentation exists in `corpus/markidocx-docs/dist/`.
---
## T02 — Wire release-regression workflow against specs corpus
```task
id: MRKD-WP-0004-T02
status: done
priority: high
state_hub_task_id: f17e959f-28da-4386-9004-b5e036054b06
```
Connect the `release-regression` composite workflow to the real documentation corpus
so that `markidocx workflow release-regression corpus/markidocx-docs/manifest.yaml`
runs a full build → import → compare cycle and records evidence (FR-1102, FR-1103,
FR-1106, FR-1107).
- Update `workflows.py` `release-regression` handler to accept a manifest path argument;
default to the corpus manifest when none supplied
- Run the workflow; assert the evidence set contains build, import, and drift reports
- Add `tests/regression/test_corpus_regression.py`:
- Invokes `release-regression` on the corpus manifest
- Asserts workflow result is `full` or `with-fallback` (not `failed`)
- Asserts evidence artefacts are present and have correct traceability fields (FR-1110)
- Disclose corpus identity in regression output (FR-1109): include corpus manifest path
and its git HEAD SHA as `corpus_id` in the workflow result
Deliverable: `pytest tests/regression/test_corpus_regression.py` passes; evidence
written to `.markidocx/evidence/` and retrievable via CLI.
---
## T03 — ADR-002: python-docx as conversion engine
```task
id: MRKD-WP-0004-T03
status: done
priority: medium
state_hub_task_id: bfe2a9fa-25b2-4b4b-b21b-eae457716ce0
```
Write the architecture decision record explaining the choice of python-docx as the
DOCX conversion engine. This was identified as a deferred deliverable during WP-0001.
File: `architecture/ADR-002-python-docx-as-conversion-engine.md`
Cover:
- **Context:** need to produce and consume .docx files from Python; alternatives evaluated
(pandoc subprocess, docx2python, mammoth, python-docx)
- **Decision:** python-docx for both build (write) and import (read)
- **Consequences:** direct paragraph/run model maps cleanly to Markdown structure;
no subprocess dependency; limited to Open XML subset exposed by python-docx API;
complex Word features (track changes, SmartArt) are out of scope by design
- **Alternatives rejected:** pandoc — heavier dependency, harder to control structure;
mammoth — read-only; docx2python — limited write support
Deliverable: `architecture/ADR-002-*.md` present and follows ADR-001 conventions.
---
## T04 — ADR-003: manifest YAML schema
```task
id: MRKD-WP-0004-T04
status: done
priority: medium
state_hub_task_id: b6de6733-b332-4efc-9e23-82fce205b856
```
Write the architecture decision record documenting the manifest YAML schema design.
File: `architecture/ADR-003-manifest-yaml-schema.md`
Cover:
- **Context:** need a project definition format that is human-writable, version-controlled,
and parseable without a schema registry
- **Decision:** YAML with a fixed top-level structure (`project`, `sources`, `output`,
`metadata`); validated on load via dataclass coercion
- **Schema snapshot:** include the current field definitions as a reference
- **Consequences:** simple for users; no JSON Schema or Pydantic dependency; evolving
the schema requires coordination with manifest.py
- **Alternatives rejected:** TOML (less familiar in doc tooling), JSON (less writable),
a database manifest (over-engineered for single-project use)
Deliverable: `architecture/ADR-003-*.md` present.
---
## T05 — SBOM generation and state-hub registration
```task
id: MRKD-WP-0004-T05
status: blocked
blocking_reason: ops-bridge ingest_sbom_tool cannot access /home/tegwick/ paths (runs as worsch). Configure host_paths mapping for marki-docx, then re-run ingest.
priority: medium
state_hub_task_id: 36aecd50-8176-4122-9706-a8697d8f5936
```
Generate and register the first SBOM for marki-docx so the state hub has an accurate
dependency picture.
```bash
cd ~/the-custodian/state-hub
make ingest-sbom REPO=marki-docx SCAN=1 REPO_PATH=/home/tegwick/marki-docx
```
- Verify the SBOM ingestion completes without errors
- Confirm `last_sbom_at` is set for `marki-docx` in the state hub
- Document any licence issues or unexpected transitive dependencies
- Add a note to CLAUDE.md reminding to re-run SBOM after dependency changes
Deliverable: State hub shows `last_sbom_at` set for `marki-docx`; no unresolved
licence issues.
---
## How to Work
- Work through tasks in priority order: T01 → T02 (high), then T03 → T04 → T05 (medium)
- T01 must complete before T02 (T02 depends on the corpus manifest)
- T03 and T04 are independent writing tasks — can be done in any order or in parallel
## Updating Task Status
```
status: todo → status: in_progress (when you start it)
status: in_progress → status: done (when verified complete)
```
When every task is `done`, set the frontmatter `status: done`.
## Success Criteria
Before marking the workplan done:
1. Every task block has `status: done`
2. Workplan frontmatter `status: done`
3. `corpus/markidocx-docs/manifest.yaml` present and builds cleanly
4. `pytest tests/regression/test_corpus_regression.py` passes
5. `architecture/ADR-002-*.md` and `architecture/ADR-003-*.md` present
6. State hub shows `last_sbom_at` set for `marki-docx`