Files
marki-docx/workplans/MRKD-WP-0004-stable-corpus-architecture.md
Bernd Worsch ebc5eaee77 feat: WP-0004 T01-T04 — stable corpus, ADRs, regression test
- corpus/markidocx-docs/manifest.yaml: specs as live markidocx project (FR-1101)
- corpus/markidocx-docs/known-drift.md: documented structural drift
- workflows.py: release-regression accepts manifest path; emits corpus_id (FR-1109)
- tests/regression/test_corpus_regression.py: corpus regression suite (FR-1102–1110)
- architecture/ADR-002: python-docx as conversion engine
- architecture/ADR-003: manifest YAML schema
- workplans/MRKD-WP-0004: T01–T04 done; T05 blocked (SBOM path mapping needed)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 17:48:33 +00:00

7.0 KiB
Raw Blame History

id, type, domain, repo, status, state_hub_workstream_id, created, updated
id type domain repo status state_hub_workstream_id created updated
MRKD-WP-0004 workplan markitect marki-docx active 91d06c92-caa8-42fc-b6d4-82340f1bed4f 2026-03-16 2026-03-16

MRKD-WP-0004 — Stable Documentation Corpus & Architecture Records

Fulfil FR-1101 by establishing the markidocx product documentation itself as a real, managed markidocx project. The specs (PRD, FRS, UCC) become a live round-trip corpus that the release-regression workflow runs against on every release. This workstream also writes the two deferred architecture decision records and generates the first SBOM.

Scope: FR-11011110 (stable corpus & self-test), ADR-002, ADR-003, SBOM Out of scope: diagram rendering, packaging, CI/CD — addressed in WP-0005/0006 Depends on: MRKD-WP-0001, MRKD-WP-0002, MRKD-WP-0003 — all complete


T01 — Set up specs as real markidocx project manifest

id: MRKD-WP-0004-T01
status: done
priority: high
state_hub_task_id: f1a36613-ceaa-4786-ac39-cd3a7fd1c142

Create a manifest file that treats the markidocx product documentation as a live markidocx project. This makes the specs the stable corpus for regression testing as required by FR-1101.

  • Create corpus/markidocx-docs/manifest.yaml:
    • project.name: markidocx-docs
    • project.feature_level: level1
    • project.family: article
    • sources: PRD, FRS v0.2, UCC (relative paths into specs/)
    • output.dir: corpus/markidocx-docs/dist
  • Run markidocx validate corpus/markidocx-docs/manifest.yaml — must exit 0
  • Run markidocx build corpus/markidocx-docs/manifest.yaml — must produce valid DOCX
  • Run markidocx import + markidocx compare — must report clean or expected drift only
  • Document any structural drift in corpus/markidocx-docs/known-drift.md

Deliverable: markidocx build corpus/markidocx-docs/manifest.yaml succeeds; a DOCX of the product documentation exists in corpus/markidocx-docs/dist/.


T02 — Wire release-regression workflow against specs corpus

id: MRKD-WP-0004-T02
status: done
priority: high
state_hub_task_id: f17e959f-28da-4386-9004-b5e036054b06

Connect the release-regression composite workflow to the real documentation corpus so that markidocx workflow release-regression corpus/markidocx-docs/manifest.yaml runs a full build → import → compare cycle and records evidence (FR-1102, FR-1103, FR-1106, FR-1107).

  • Update workflows.py release-regression handler to accept a manifest path argument; default to the corpus manifest when none supplied
  • Run the workflow; assert the evidence set contains build, import, and drift reports
  • Add tests/regression/test_corpus_regression.py:
    • Invokes release-regression on the corpus manifest
    • Asserts workflow result is full or with-fallback (not failed)
    • Asserts evidence artefacts are present and have correct traceability fields (FR-1110)
  • Disclose corpus identity in regression output (FR-1109): include corpus manifest path and its git HEAD SHA as corpus_id in the workflow result

Deliverable: pytest tests/regression/test_corpus_regression.py passes; evidence written to .markidocx/evidence/ and retrievable via CLI.


T03 — ADR-002: python-docx as conversion engine

id: MRKD-WP-0004-T03
status: done
priority: medium
state_hub_task_id: bfe2a9fa-25b2-4b4b-b21b-eae457716ce0

Write the architecture decision record explaining the choice of python-docx as the DOCX conversion engine. This was identified as a deferred deliverable during WP-0001.

File: architecture/ADR-002-python-docx-as-conversion-engine.md

Cover:

  • Context: need to produce and consume .docx files from Python; alternatives evaluated (pandoc subprocess, docx2python, mammoth, python-docx)
  • Decision: python-docx for both build (write) and import (read)
  • Consequences: direct paragraph/run model maps cleanly to Markdown structure; no subprocess dependency; limited to Open XML subset exposed by python-docx API; complex Word features (track changes, SmartArt) are out of scope by design
  • Alternatives rejected: pandoc — heavier dependency, harder to control structure; mammoth — read-only; docx2python — limited write support

Deliverable: architecture/ADR-002-*.md present and follows ADR-001 conventions.


T04 — ADR-003: manifest YAML schema

id: MRKD-WP-0004-T04
status: done
priority: medium
state_hub_task_id: b6de6733-b332-4efc-9e23-82fce205b856

Write the architecture decision record documenting the manifest YAML schema design.

File: architecture/ADR-003-manifest-yaml-schema.md

Cover:

  • Context: need a project definition format that is human-writable, version-controlled, and parseable without a schema registry
  • Decision: YAML with a fixed top-level structure (project, sources, output, metadata); validated on load via dataclass coercion
  • Schema snapshot: include the current field definitions as a reference
  • Consequences: simple for users; no JSON Schema or Pydantic dependency; evolving the schema requires coordination with manifest.py
  • Alternatives rejected: TOML (less familiar in doc tooling), JSON (less writable), a database manifest (over-engineered for single-project use)

Deliverable: architecture/ADR-003-*.md present.


T05 — SBOM generation and state-hub registration

id: MRKD-WP-0004-T05
status: blocked
blocking_reason: ops-bridge ingest_sbom_tool cannot access /home/tegwick/ paths (runs as worsch). Configure host_paths mapping for marki-docx, then re-run ingest.
priority: medium
state_hub_task_id: 36aecd50-8176-4122-9706-a8697d8f5936

Generate and register the first SBOM for marki-docx so the state hub has an accurate dependency picture.

cd ~/the-custodian/state-hub
make ingest-sbom REPO=marki-docx SCAN=1 REPO_PATH=/home/tegwick/marki-docx
  • Verify the SBOM ingestion completes without errors
  • Confirm last_sbom_at is set for marki-docx in the state hub
  • Document any licence issues or unexpected transitive dependencies
  • Add a note to CLAUDE.md reminding to re-run SBOM after dependency changes

Deliverable: State hub shows last_sbom_at set for marki-docx; no unresolved licence issues.


How to Work

  • Work through tasks in priority order: T01 → T02 (high), then T03 → T04 → T05 (medium)
  • T01 must complete before T02 (T02 depends on the corpus manifest)
  • T03 and T04 are independent writing tasks — can be done in any order or in parallel

Updating Task Status

status: todo        →  status: in_progress   (when you start it)
status: in_progress →  status: done          (when verified complete)

When every task is done, set the frontmatter status: done.

Success Criteria

Before marking the workplan done:

  1. Every task block has status: done
  2. Workplan frontmatter status: done
  3. corpus/markidocx-docs/manifest.yaml present and builds cleanly
  4. pytest tests/regression/test_corpus_regression.py passes
  5. architecture/ADR-002-*.md and architecture/ADR-003-*.md present
  6. State hub shows last_sbom_at set for marki-docx