Files
marki-docx/workplans/MRKD-WP-0007-interface-completeness-evidence.md
Bernd Worsch 893b9fa57b
Some checks failed
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
CI / coverage (push) Has been cancelled
chore: add WP-0007 — Interface Completeness & Evidence
Workplan covering the remaining FRS v0.2 gaps: CLI parity (inspect, test,
evidence commands), style listing stub replacement, evidence assembly
strengthening, LEVEL3 edge-case coverage, and a new Word-first round-trip
capability (template extraction + rebuild verification).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 16:23:59 +00:00

15 KiB
Raw Blame History

id, type, domain, repo, status, state_hub_workstream_id, created, updated
id type domain repo status state_hub_workstream_id created updated
MRKD-WP-0007 workplan markitect marki-docx active 61701224-0813-4258-9308-025bcec41780 2026-03-17 2026-03-17

MRKD-WP-0007 — Interface Completeness & Evidence

Close the remaining FRS v0.2 gaps identified after WP-0001 through WP-0006. The system is ~92% complete; this workplan brings it to full FRS coverage.

Three clusters of functional gaps plus one new capability:

  1. CLI parityinspect, test, and evidence commands exist in REST and MCP but are absent from the CLI (FR-806, FR-810, FR-1409)
  2. Style listing stubGET /styles and MCP list_styles return []; real style metadata enumeration is needed (FR-907)
  3. Evidence assembly — individual reports exist but the release evidence set has no unified aggregation or completeness disclosure (FR-14061408, FR-1413)
  4. LEVEL3 edge-case coverage — core paths are tested; targeted tests needed for diagram source mutation, bibliography ambiguity, and processor-dependency matrix (FR-534, FR-538, FR-542)
  5. Word-first round-trip — new end-to-end capability: extract content and a content-free template from an existing DOCX, then verify that MD + template → DOCX reproduces the original document

Scope: FR-806, FR-810, FR-907, FR-1409, FR-14061408, FR-1413, FR-534, FR-538, FR-542, new template-extraction capability Out of scope: new document families, non-DOCX output formats Depends on: WP-0001 through WP-0006 — all complete


T01 — Add markidocx inspect and markidocx test CLI commands

id: MRKD-WP-0007-T01
status: todo
priority: high
state_hub_task_id: f77db529-b17b-4462-a704-2b9a3dbdc892

The underlying logic for both commands already exists and is exposed via MCP (inspect_project, run_tests) and REST. This task wires them into the CLI.

markidocx inspect <manifest> (FR-806)

  • Calls the same project-inspection logic as inspect_project in MCP
  • Outputs: source files, feature level, template family, detected LEVEL3 constructs, capability disclosure (which renderers/processors are available)
  • --json flag: machine-readable output
  • Mirrors the REST GET /inspect response structure

markidocx test <manifest> (FR-810)

  • Runs the regression test suite for the project (same as MCP run_tests)
  • Outputs: pass/fail counts, skipped tests, any failures with locations
  • --json flag: machine-readable output
  • Exit code 0 on pass, 1 on any failure

Implementation notes:

  • Add @app.command() entries in cli.py; delegate to existing logic in builder.py / level3.py / workflows
  • Update test_interface_parity.py to assert CLI/REST/MCP parity for both commands
  • Add unit tests in tests/test_cli_inspect_test.py

Deliverable: markidocx inspect <manifest> and markidocx test <manifest> work; interface parity tests pass.


T02 — Add markidocx evidence CLI command

id: MRKD-WP-0007-T02
status: todo
priority: high
state_hub_task_id: 0af8c5bb-c01b-48cf-9895-f6c8033b0606

Evidence retrieval is exposed via REST (GET /evidence/{run_id}) and MCP (get_evidence) but has no CLI surface (FR-1409, FR-814).

markidocx evidence <run_id>

  • Accepts a run_id (returned by build, import, compare, workflow)
  • Retrieves the full evidence record from the evidence store
  • Outputs: human-readable summary (validation result, warnings, drift counts, overall pass/warn/fail status)
  • --json flag: full machine-readable evidence record
  • --output <path> flag: write evidence JSON to file

markidocx evidence list (subcommand)

  • Lists run IDs available in the evidence store, newest first
  • --limit N (default 10)
  • --json flag

Implementation notes:

  • Extend cli.py with an evidence group using typer.Typer()
  • Delegates to evidence.py store
  • Add --run-id output to existing build, import, compare commands so the user knows what ID to retrieve (currently run_id is only in JSON output)
  • Update test_interface_parity.py to assert parity

Deliverable: markidocx evidence <run_id> and markidocx evidence list work and are parity-tested against REST and MCP.


T03 — Implement style listing (replace stub in REST and MCP)

id: MRKD-WP-0007-T03
status: todo
priority: medium
state_hub_task_id: e26c824c-868f-470e-bdfc-e1ae18aa7ebe

GET /styles (FR-907) and MCP list_styles both return []. The template files (.docx) already contain named paragraph and character styles; they just need to be enumerated.

Style metadata model:

@dataclass
class StyleEntry:
    name: str          # e.g. "Heading 1", "Body Text"
    style_id: str      # Word's internal ID, e.g. "Heading1"
    type: str          # "paragraph" | "character" | "table" | "numbering"
    family: str        # template family this style belongs to, e.g. "article"
    built_in: bool     # True if a Word built-in style

list_styles(family: str | None) -> list[StyleEntry] in templates.py:

  • Opens the template DOCX for the given family (or default)
  • Enumerates all styles via python-docx's document.styles
  • Returns StyleEntry list sorted by type then name

Wire into interfaces:

  • REST GET /styles?family=articlelist[StyleEntry] as JSON
  • MCP list_styles(family=...) → same
  • CLI markidocx template styles [--family article] → tabular output (already has template_app Typer sub-app)

Tests:

  • test_templates.py: assert at least the standard heading/body styles are present for each built-in family
  • Interface parity test: REST, MCP, CLI all return the same set for the same family

Deliverable: markidocx template styles, GET /styles, list_styles() return real style data for all three built-in families.


T04 — Strengthen evidence assembly — unified status summary and composition disclosure

id: MRKD-WP-0007-T04
status: todo
priority: medium
state_hub_task_id: d9ef5925-f70f-4e97-a2d4-6932c4c531d6

Individual evidence records (validation, build, import, drift) exist but there is no formal aggregation into a release evidence set (FR-14061408, FR-1413).

Release evidence set structure (EvidenceSet in evidence.py):

@dataclass
class EvidenceSet:
    run_id: str
    created_at: str
    manifest_path: str
    components: list[str]          # which reports are present (FR-1407)
    overall_status: str            # "pass" | "pass-with-warnings" | "fail" (FR-1408)
    validation_result: ... | None
    build_result: ... | None
    import_result: ... | None
    drift_result: ... | None
    warnings: list[WarningRecord]  # aggregated across all components
    completeness_note: str | None  # which expected components are absent (FR-1413)

assemble_evidence_set(run_id: str) -> EvidenceSet:

  • Reads all component records for the run from the evidence store
  • Derives overall_status: fail if any component failed; pass-with-warnings if any warnings exist; pass otherwise
  • Sets completeness_note if expected components are absent for the workflow type (e.g. a roundtrip workflow should have build + import + drift; if drift is absent, note it)
  • Enumerates components list (FR-1407)

Wire into interfaces:

  • REST GET /evidence/{run_id} → return EvidenceSet instead of raw record
  • MCP get_evidence(run_id) → same
  • CLI markidocx evidence <run_id> → display EvidenceSet summary
  • Workflow commands: assemble and persist the evidence set at workflow completion

Tests:

  • test_evidence.py: assert assemble_evidence_set returns correct overall_status for pass / pass-with-warnings / fail scenarios
  • Assert components enumeration is accurate
  • Assert completeness_note fires when a component is absent

Deliverable: All three interfaces return a coherent EvidenceSet with overall_status, components, and completeness_note. Existing evidence tests still pass.


T05 — LEVEL3 edge-case coverage

id: MRKD-WP-0007-T05
status: todo
priority: low
state_hub_task_id: 20789d1c-4495-468f-bbb7-912e63e804e4

Core LEVEL3 paths are tested; this task adds targeted tests for three undertested edge-case areas.

FR-534 — Diagram source mutation on round-trip

  • Test: build a DOCX with a mermaid block; manually alter the alt-text source marker in the DOCX (simulate editorial mutation of the embedded diagram); import → assert that differ.py classifies the change as structural (not silently dropped)
  • Test: assert that a diagram block with empty source produces a WarningRecord

FR-538 — Processor dependency version matrix

  • Test: mock shutil.which("mmdc") to return a path; mock the subprocess call to return mmdc --version"10.x.x" (supported) vs. "8.x.x" (too old)
  • Assert that an outdated renderer produces WarningRecord(reason="renderer-version-unsupported") rather than silently falling back (requires adding version-check logic to renderer backends in diagrams.py if not already present)
  • If version-checking is not yet in diagrams.py, add it as part of this task

FR-542 — Bibliography ambiguity edge cases

  • Test: document with two citations sharing the same key → assert WarningRecord
  • Test: document with a citation key that has no corresponding reference entry → assert WarningRecord(reason="citation-key-missing")
  • Test: round-trip of a references section with special characters in author names

Tests location: extend tests/test_level3_diagrams.py, tests/test_level3_bibliography.py

Deliverable: All three edge-case areas have at least two targeted tests each. Existing LEVEL3 tests still pass.


T06 — End-to-end Word-first round-trip: template extraction and rebuild verification

id: MRKD-WP-0007-T06
status: todo
priority: high
state_hub_task_id: 0c16c598-bd49-4721-89a3-e989e1d36879

This task delivers a new capability: given an existing Word document as the starting point, marki-docx can decompose it into a Markdown content file and a content-free DOCX template, and then verify that recombining the two recreates the original document.

This closes the loop on the round-trip: the existing flow is MD → DOCX → MD; this adds DOCX → (MD + template) → DOCX, making Word-authored documents first-class inputs.

New command: markidocx template extract <source.docx>

Extracts the structural and stylistic shell of source.docx — keeping all styles, page setup, headers/footers, section properties, and theme data — while removing all body content (paragraphs, tables, figures, etc.).

markidocx template extract <source.docx> \
    [--template-out <template.docx>]   # default: <source>-template.docx
    [--content-out <content.md>]       # default: <source>.md  (runs import)
    [--family <name>]                  # register extracted template under this family name
    [--json]

Outputs:

  1. <template.docx> — the content-free shell (styles preserved, body empty)
  2. <content.md> — the Markdown content extracted via the existing import path

Implementation in templates.py:

def extract_template(source_path: Path, template_out: Path) -> TemplateExtractionResult:
    """
    Open source_path with python-docx. Copy all styles, page setup,
    headers/footers, and theme. Clear the document body (remove all
    paragraphs and tables). Save to template_out.
    """

TemplateExtractionResult:

@dataclass
class TemplateExtractionResult:
    template_path: Path
    styles_preserved: int      # count of styles copied
    warnings: list[WarningRecord]

Wire into CLI:

  • template_app already exists in cli.py; add extract subcommand
  • After extraction, optionally run import on the source to produce the .md file
  • Print a summary: styles preserved, content extracted, paths written

Wire into REST and MCP:

  • REST: POST /template/extract — multipart upload of source.docx; returns TemplateExtractionResult + download URLs for template and MD
  • MCP: extract_template(source_path: str, template_out: str, content_out: str)

End-to-end regression test

Add tests/regression/test_word_first_roundtrip.py:

Fixture: tests/regression/fixtures/word_first/source.docx
  — A representative Word document with headings, body text, a table,
    an image, and a footer. Committed to the repo as a binary fixture.

Test: test_word_first_roundtrip
  1. extract_template(source.docx) → template.docx + content.md
  2. Assert template.docx has zero body paragraphs
  3. Assert template.docx preserves at least the styles present in source.docx
  4. Assert content.md is non-empty and contains the expected headings
  5. build(manifest pointing at content.md + template.docx) → rebuilt.docx
  6. import(rebuilt.docx) → reimported.md
  7. Assert reimported.md is structurally equivalent to content.md
     (use differ.py; assert zero structural drift)

Test: test_template_extraction_idempotent
  1. extract_template(source.docx) → template_a.docx
  2. extract_template(template_a.docx) → template_b.docx
  3. Assert template_b has same style set as template_a (extraction of an
     already-empty template is a no-op)

Fixture creation:

  • Create tests/regression/fixtures/word_first/ directory
  • Programmatically generate source.docx using python-docx in a fixture-generator script (tests/regression/fixtures/word_first/generate.py) — this keeps the binary reproducible from source
  • Commit the generated source.docx as a stable binary fixture (tracked in git)

Success criteria for T06

  1. markidocx template extract source.docx produces a valid content-free template and a Markdown content file
  2. The extracted template + content can be built back into a DOCX via markidocx build
  3. The rebuilt DOCX imports cleanly with zero structural drift against the extracted content
  4. test_word_first_roundtrip passes in CI
  5. REST and MCP surfaces expose the new capability

Execution order

  • T01, T02, T03 are independent — can be worked in any order or in parallel
  • T04 depends on T02 (the evidence CLI command exposes the assembled set)
  • T05 is independent — can be worked at any time
  • T06 is independent of T01T05 but benefits from T04 (evidence for the rebuild step)

Updating task status

status: todo        →  status: in_progress   (when you start it)
status: in_progress →  status: done          (when verified complete)

When every task is done, set the frontmatter status: done.

Success criteria

Before marking the workplan done:

  1. Every task block has status: done
  2. Workplan frontmatter status: done
  3. Full test suite passes (pytest --tb=short -q)
  4. ruff check and mypy src/ clean
  5. markidocx inspect, markidocx test, markidocx evidence, markidocx template extract all present and functional
  6. GET /styles returns real style data (not [])
  7. markidocx evidence <run_id> returns an EvidenceSet with overall_status
  8. test_word_first_roundtrip passes
  9. LEVEL3 edge-case tests added and passing