marki-docx/workplans/MRKD-WP-0007-interface-completeness-evidence.md

---
id: MRKD-WP-0007
type: workplan
domain: communication
repo: marki-docx
status: done
state_hub_workstream_id: 61701224-0813-4258-9308-025bcec41780
created: 2026-03-17
updated: 2026-03-17
completed: 2026-03-17
---

# MRKD-WP-0007 — Interface Completeness & Evidence

Close the remaining FRS v0.2 gaps identified after WP-0001 through WP-0006.
The system is ~92% complete; this workplan brings it to full FRS coverage.

Three clusters of functional gaps plus one new capability:

1. **CLI parity** — `inspect`, `test`, and `evidence` commands exist in REST and MCP
   but are absent from the CLI (FR-806, FR-810, FR-1409)
2. **Style listing stub** — `GET /styles` and MCP `list_styles` return `[]`; real
   style metadata enumeration is needed (FR-907)
3. **Evidence assembly** — individual reports exist but the release evidence set
   has no unified aggregation or completeness disclosure (FR-1406–1408, FR-1413)
4. **LEVEL3 edge-case coverage** — core paths are tested; targeted tests needed for
   diagram source mutation, bibliography ambiguity, and processor-dependency matrix
   (FR-534, FR-538, FR-542)
5. **Word-first round-trip** — new end-to-end capability: extract content and a
   content-free template from an existing DOCX, then verify that MD + template → DOCX
   reproduces the original document

**Scope:** FR-806, FR-810, FR-907, FR-1409, FR-1406–1408, FR-1413,
FR-534, FR-538, FR-542, new template-extraction capability
**Out of scope:** new document families, non-DOCX output formats
**Depends on:** WP-0001 through WP-0006 — all complete

---

## T01 — Add `markidocx inspect` and `markidocx test` CLI commands

```task
id: MRKD-WP-0007-T01
status: done
priority: high
state_hub_task_id: f77db529-b17b-4462-a704-2b9a3dbdc892
```

The underlying logic for both commands already exists and is exposed via MCP
(`inspect_project`, `run_tests`) and REST. This task wires them into the CLI.

**`markidocx inspect <manifest>`** (FR-806)
- Calls the same project-inspection logic as `inspect_project` in MCP
- Outputs: source files, feature level, template family, detected LEVEL3 constructs,
  capability disclosure (which renderers/processors are available)
- `--json` flag: machine-readable output
- Mirrors the REST `GET /inspect` response structure

**`markidocx test <manifest>`** (FR-810)
- Runs the regression test suite for the project (same as MCP `run_tests`)
- Outputs: pass/fail counts, skipped tests, any failures with locations
- `--json` flag: machine-readable output
- Exit code 0 on pass, 1 on any failure

Implementation notes:
- Add `@app.command()` entries in `cli.py`; delegate to existing logic in
  `builder.py` / `level3.py` / workflows
- Update `test_interface_parity.py` to assert CLI/REST/MCP parity for both commands
- Add unit tests in `tests/test_cli_inspect_test.py`

Deliverable: `markidocx inspect <manifest>` and `markidocx test <manifest>` work;
interface parity tests pass.

---

## T02 — Add `markidocx evidence` CLI command

```task
id: MRKD-WP-0007-T02
status: done
priority: high
state_hub_task_id: 0af8c5bb-c01b-48cf-9895-f6c8033b0606
```

Evidence retrieval is exposed via REST (`GET /evidence/{run_id}`) and MCP
(`get_evidence`) but has no CLI surface (FR-1409, FR-814).

**`markidocx evidence <run_id>`**
- Accepts a `run_id` (returned by `build`, `import`, `compare`, `workflow`)
- Retrieves the full evidence record from the evidence store
- Outputs: human-readable summary (validation result, warnings, drift counts,
  overall pass/warn/fail status)
- `--json` flag: full machine-readable evidence record
- `--output <path>` flag: write evidence JSON to file

**`markidocx evidence list`** (subcommand)
- Lists run IDs available in the evidence store, newest first
- `--limit N` (default 10)
- `--json` flag

Implementation notes:
- Extend `cli.py` with an `evidence` group using `typer.Typer()`
- Delegates to `evidence.py` store
- Add `--run-id` output to existing `build`, `import`, `compare` commands so the
  user knows what ID to retrieve (currently run_id is only in JSON output)
- Update `test_interface_parity.py` to assert parity

Deliverable: `markidocx evidence <run_id>` and `markidocx evidence list` work and
are parity-tested against REST and MCP.

---

## T03 — Implement style listing (replace stub in REST and MCP)

```task
id: MRKD-WP-0007-T03
status: done
priority: medium
state_hub_task_id: e26c824c-868f-470e-bdfc-e1ae18aa7ebe
```

`GET /styles` (FR-907) and MCP `list_styles` both return `[]`. The template
files (`.docx`) already contain named paragraph and character styles; they just
need to be enumerated.

**Style metadata model:**
```python
@dataclass
class StyleEntry:
    name: str          # e.g. "Heading 1", "Body Text"
    style_id: str      # Word's internal ID, e.g. "Heading1"
    type: str          # "paragraph" | "character" | "table" | "numbering"
    family: str        # template family this style belongs to, e.g. "article"
    built_in: bool     # True if a Word built-in style
```

**`list_styles(family: str | None) -> list[StyleEntry]`** in `templates.py`:
- Opens the template DOCX for the given family (or default)
- Enumerates all styles via `python-docx`'s `document.styles`
- Returns `StyleEntry` list sorted by type then name

**Wire into interfaces:**
- REST `GET /styles?family=article` → `list[StyleEntry]` as JSON
- MCP `list_styles(family=...)` → same
- CLI `markidocx template styles [--family article]` → tabular output (already has
  `template_app` Typer sub-app)

**Tests:**
- `test_templates.py`: assert at least the standard heading/body styles are present
  for each built-in family
- Interface parity test: REST, MCP, CLI all return the same set for the same family

Deliverable: `markidocx template styles`, `GET /styles`, `list_styles()` return real
style data for all three built-in families.

---

## T04 — Strengthen evidence assembly — unified status summary and composition disclosure

```task
id: MRKD-WP-0007-T04
status: done
priority: medium
state_hub_task_id: d9ef5925-f70f-4e97-a2d4-6932c4c531d6
```

Individual evidence records (validation, build, import, drift) exist but there is no
formal aggregation into a release evidence set (FR-1406–1408, FR-1413).

**Release evidence set structure** (`EvidenceSet` in `evidence.py`):
```python
@dataclass
class EvidenceSet:
    run_id: str
    created_at: str
    manifest_path: str
    components: list[str]          # which reports are present (FR-1407)
    overall_status: str            # "pass" | "pass-with-warnings" | "fail" (FR-1408)
    validation_result: ... | None
    build_result: ... | None
    import_result: ... | None
    drift_result: ... | None
    warnings: list[WarningRecord]  # aggregated across all components
    completeness_note: str | None  # which expected components are absent (FR-1413)
```

**`assemble_evidence_set(run_id: str) -> EvidenceSet`**:
- Reads all component records for the run from the evidence store
- Derives `overall_status`: `fail` if any component failed; `pass-with-warnings` if
  any warnings exist; `pass` otherwise
- Sets `completeness_note` if expected components are absent for the workflow type
  (e.g. a roundtrip workflow should have build + import + drift; if drift is absent,
  note it)
- Enumerates `components` list (FR-1407)

**Wire into interfaces:**
- REST `GET /evidence/{run_id}` → return `EvidenceSet` instead of raw record
- MCP `get_evidence(run_id)` → same
- CLI `markidocx evidence <run_id>` → display `EvidenceSet` summary
- Workflow commands: assemble and persist the evidence set at workflow completion

**Tests:**
- `test_evidence.py`: assert `assemble_evidence_set` returns correct `overall_status`
  for pass / pass-with-warnings / fail scenarios
- Assert `components` enumeration is accurate
- Assert `completeness_note` fires when a component is absent

Deliverable: All three interfaces return a coherent `EvidenceSet` with `overall_status`,
`components`, and `completeness_note`. Existing evidence tests still pass.

---

## T05 — LEVEL3 edge-case coverage

```task
id: MRKD-WP-0007-T05
status: done
priority: low
state_hub_task_id: 20789d1c-4495-468f-bbb7-912e63e804e4
```

Core LEVEL3 paths are tested; this task adds targeted tests for three
undertested edge-case areas.

**FR-534 — Diagram source mutation on round-trip**
- Test: build a DOCX with a mermaid block; manually alter the alt-text source marker
  in the DOCX (simulate editorial mutation of the embedded diagram); import → assert
  that `differ.py` classifies the change as `structural` (not silently dropped)
- Test: assert that a diagram block with empty source produces a `WarningRecord`

**FR-538 — Processor dependency version matrix**
- Test: mock `shutil.which("mmdc")` to return a path; mock the subprocess call to
  return `mmdc --version` → `"10.x.x"` (supported) vs. `"8.x.x"` (too old)
- Assert that an outdated renderer produces `WarningRecord(reason="renderer-version-unsupported")`
  rather than silently falling back (requires adding version-check logic to renderer
  backends in `diagrams.py` if not already present)
- If version-checking is not yet in `diagrams.py`, add it as part of this task

**FR-542 — Bibliography ambiguity edge cases**
- Test: document with two citations sharing the same key → assert `WarningRecord`
- Test: document with a citation key that has no corresponding reference entry →
  assert `WarningRecord(reason="citation-key-missing")`
- Test: round-trip of a references section with special characters in author names

**Tests location:** extend `tests/test_level3_diagrams.py`, `tests/test_level3_bibliography.py`

Deliverable: All three edge-case areas have at least two targeted tests each.
Existing LEVEL3 tests still pass.

---

## T06 — End-to-end Word-first round-trip: template extraction and rebuild verification

```task
id: MRKD-WP-0007-T06
status: done
priority: high
state_hub_task_id: 0c16c598-bd49-4721-89a3-e989e1d36879
```

This task delivers a new capability: given an existing Word document as the starting
point, marki-docx can decompose it into a Markdown content file and a content-free
DOCX template, and then verify that recombining the two recreates the original document.

This closes the loop on the round-trip: the existing flow is MD → DOCX → MD; this
adds DOCX → (MD + template) → DOCX, making Word-authored documents first-class inputs.

### New command: `markidocx template extract <source.docx>`

Extracts the structural and stylistic shell of `source.docx` — keeping all styles,
page setup, headers/footers, section properties, and theme data — while removing all
body content (paragraphs, tables, figures, etc.).

```
markidocx template extract <source.docx> \
    [--template-out <template.docx>]   # default: <source>-template.docx
    [--content-out <content.md>]       # default: <source>.md  (runs import)
    [--family <name>]                  # register extracted template under this family name
    [--json]
```

**Outputs:**
1. `<template.docx>` — the content-free shell (styles preserved, body empty)
2. `<content.md>` — the Markdown content extracted via the existing `import` path

**Implementation in `templates.py`:**
```python
def extract_template(source_path: Path, template_out: Path) -> TemplateExtractionResult:
    """
    Open source_path with python-docx. Copy all styles, page setup,
    headers/footers, and theme. Clear the document body (remove all
    paragraphs and tables). Save to template_out.
    """
```

`TemplateExtractionResult`:
```python
@dataclass
class TemplateExtractionResult:
    template_path: Path
    styles_preserved: int      # count of styles copied
    warnings: list[WarningRecord]
```

**Wire into CLI:**
- `template_app` already exists in `cli.py`; add `extract` subcommand
- After extraction, optionally run `import` on the source to produce the `.md` file
- Print a summary: styles preserved, content extracted, paths written

**Wire into REST and MCP:**
- REST: `POST /template/extract` — multipart upload of `source.docx`; returns
  `TemplateExtractionResult` + download URLs for template and MD
- MCP: `extract_template(source_path: str, template_out: str, content_out: str)`

### End-to-end regression test

Add `tests/regression/test_word_first_roundtrip.py`:

```
Fixture: tests/regression/fixtures/word_first/source.docx
  — A representative Word document with headings, body text, a table,
    an image, and a footer. Committed to the repo as a binary fixture.

Test: test_word_first_roundtrip
  1. extract_template(source.docx) → template.docx + content.md
  2. Assert template.docx has zero body paragraphs
  3. Assert template.docx preserves at least the styles present in source.docx
  4. Assert content.md is non-empty and contains the expected headings
  5. build(manifest pointing at content.md + template.docx) → rebuilt.docx
  6. import(rebuilt.docx) → reimported.md
  7. Assert reimported.md is structurally equivalent to content.md
     (use differ.py; assert zero structural drift)

Test: test_template_extraction_idempotent
  1. extract_template(source.docx) → template_a.docx
  2. extract_template(template_a.docx) → template_b.docx
  3. Assert template_b has same style set as template_a (extraction of an
     already-empty template is a no-op)
```

**Fixture creation:**
- Create `tests/regression/fixtures/word_first/` directory
- Programmatically generate `source.docx` using `python-docx` in a fixture-generator
  script (`tests/regression/fixtures/word_first/generate.py`) — this keeps the binary
  reproducible from source
- Commit the generated `source.docx` as a stable binary fixture (tracked in git)

### Success criteria for T06

1. `markidocx template extract source.docx` produces a valid content-free template
   and a Markdown content file
2. The extracted template + content can be built back into a DOCX via `markidocx build`
3. The rebuilt DOCX imports cleanly with zero structural drift against the extracted
   content
4. `test_word_first_roundtrip` passes in CI
5. REST and MCP surfaces expose the new capability

---

## Execution order

- T01, T02, T03 are independent — can be worked in any order or in parallel
- T04 depends on T02 (the evidence CLI command exposes the assembled set)
- T05 is independent — can be worked at any time
- T06 is independent of T01–T05 but benefits from T04 (evidence for the rebuild step)

## Updating task status

```
status: todo        →  status: in_progress   (when you start it)
status: in_progress →  status: done          (when verified complete)
```

When every task is `done`, set the frontmatter `status: done`.

## Success criteria

Before marking the workplan done:

1. Every task block has `status: done`
2. Workplan frontmatter `status: done`
3. Full test suite passes (`pytest --tb=short -q`)
4. `ruff check` and `mypy src/` clean
5. `markidocx inspect`, `markidocx test`, `markidocx evidence`, `markidocx template extract`
   all present and functional
6. `GET /styles` returns real style data (not `[]`)
7. `markidocx evidence <run_id>` returns an `EvidenceSet` with `overall_status`
8. `test_word_first_roundtrip` passes
9. LEVEL3 edge-case tests added and passing