Files
marki-docx/docs/tutorial.md
Bernd Worsch 69d1789469
Some checks failed
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
CI / coverage (push) Has been cancelled
docs: add use-case tutorial covering all 25 UCC entries
Covers UC-001 through UC-025: project definition, inspect, validate,
build (LEVEL1 + LEVEL3), import, drift detection, full round-trip
workflows, test harness, evidence/audit trail, template management,
Word-first round-trip, REST service, MCP tools, version/health.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 19:48:08 +00:00

12 KiB

markidocx Tutorial

Overview

markidocx is a Markdown ↔ DOCX round-trip editing system. Markdown is the canonical source of truth; Word documents are editorial projections used for review. Every operation preserves this asymmetry — edits made in Word flow back into Markdown, not the other way around.

All capabilities are available through three equivalent interfaces:

  • CLI — local document workflows
  • REST — pipeline and automation integration (markidocx serve)
  • MCP — agent-accessible tools (markidocx mcp)

1. Define a Project (UC-001)

Everything in markidocx starts with a manifest file — a YAML declaration of your sources, feature level, and document family.

# manifest.yaml
project:
  name: "Technical Specification"
  feature_level: level1   # or level3 for advanced features
  family: article         # article | book | website

sources:
  - path: intro.md
  - path: chapters/design.md
  - path: chapters/api.md

output:
  dir: ./dist

Feature levels:

  • level1 — headings, lists, tables, footnotes, images, links
  • level3 — everything in LEVEL1, plus cross-references, numbered figures, auto-diagrams (Mermaid/Graphviz/PlantUML), bibliography

Built-in families:

Family Description
article Single-document article layout
book Multi-chapter book layout
website Web-optimised document layout

2. Validate and Inspect (UC-002, UC-003)

Before building, confirm the project is well-formed.

Validate — checks manifest structure, source file existence, family/level compatibility:

markidocx validate manifest.yaml
# ✓ Manifest valid: Technical Specification

markidocx validate manifest.yaml --json
# {"status": "ok", "project": "Technical Specification"}

Inspect — shows the full resolved structure including LEVEL3 capability availability:

markidocx inspect manifest.yaml
# Project:  Technical Specification
#   family:        article
#   feature_level: level1
#   sources:       intro.md, chapters/design.md, chapters/api.md
#   level3 xref:   False
#   level3 diag:   False

markidocx inspect manifest.yaml --json
# {
#   "status": "ok",
#   "project": "Technical Specification",
#   "family": "article",
#   "feature_level": "level1",
#   "sources": ["intro.md", "chapters/design.md", "chapters/api.md"],
#   "level3": {"xref_available": false, "diagrams_available": false, ...}
# }

The level3 block tells you which optional processors (mmdc, dot, plantuml) are available on your PATH.


3. Build a DOCX (UC-004, UC-005, UC-014, UC-015)

Compile Markdown sources into a Word document:

markidocx build manifest.yaml
# ✓ Built: dist/technical-specification.docx

markidocx build manifest.yaml --json
# {"status": "ok", "output_path": "dist/...", "family": "article", "warnings": []}

Switching families (UC-005) — change family: in the manifest to re-build with different styling. All three built-in families are always available without any setup.

LEVEL3 document (UC-015) — set feature_level: level3 and include advanced constructs in your Markdown:

<!-- Cross-reference -->
See [Section 2][sec-design].

<!-- Numbered figure -->
![Architecture diagram](arch.png)
*Figure 1: System architecture*
<!-- figure-label: fig-arch -->

<!-- Auto-diagram (requires mmdc on PATH) -->
```mermaid
graph TD
  A[Client] --> B[API]
  B --> C[Database]

As noted in [@smith2023], the approach is sound.

References

  • [@smith2023]: Smith, J. Technical Approaches. 2023.

If a diagram renderer is unavailable, markidocx falls back to embedding the source as a verbatim code block and emits a warning — **source is never silently discarded**.

---

## 4. Import an Edited DOCX (UC-006)

After a reviewer edits the Word document, import their changes back to Markdown:

```bash
markidocx import manifest.yaml dist/technical-specification-reviewed.docx
# ✓ Imported (mapped)
#   → intro.md
#   → chapters/design.md
#   → chapters/api.md

For single-file projects the import produces one .md file. For multi-file projects, markidocx redistributes content back to the original source files using heading boundaries as guides. If redistribution is ambiguous, it falls back to a single merged file and reports mapping_status: fallback.


5. Detect Round-Trip Drift (UC-011)

After importing, check whether any structure was lost or degraded:

markidocx compare manifest.yaml dist/technical-specification-reviewed.docx
# ✓ No drift detected
#   preserved: 12 elements

# Or with drift:
# ⚠ Drift detected
#   degraded: heading:## Background (1/2)
#   broken:   footnote:[^1]

The drift report classifies every structural element as:

  • preserved — identical in original and re-import
  • degraded — present but modified
  • broken — present in original, missing from re-import
  • unsupported — construct not supported at the declared feature level

6. Full Round-Trip Workflow (UC-007)

The workflow command runs the full cycle in one step:

markidocx workflow single-file-roundtrip manifest.yaml
# ✓ Workflow single-file-roundtrip: pass
#   ✓ validate: executed
#   ✓ build: executed
#   ✓ import: executed
#   ✓ compare: executed
#   run_id: a3f91c2e-...

# Multi-file variant
markidocx workflow multi-file-roundtrip manifest.yaml

Available workflows:

Workflow Steps
single-file-roundtrip validate → build → import → compare
multi-file-roundtrip validate → build → import → redistribute → compare
release-regression full regression against the stable corpus
family-switch-build build under each of the three built-in families

7. Run the Test Suite (UC-021)

Run the end-to-end regression harness for a project:

markidocx test manifest.yaml
# ✓ Tests: 4 passed, 0 failed, 0 skipped
#   ✓ validate: executed
#   ✓ build: executed
#   ✓ import: executed
#   ✓ compare: executed
#   run_id: b7d04a1e-...

# Exit code 0 on pass, 1 on any failure — CI-friendly
markidocx test manifest.yaml --json

8. Evidence and Audit Trail (UC-025, UC-022)

Every build, import, compare, and workflow run produces a persistent evidence record keyed by run_id.

List recent runs:

markidocx evidence list
markidocx evidence list --limit 5 --json

Retrieve a run's evidence:

markidocx evidence get a3f91c2e-...
# ✓ Run: a3f91c2e-...  [pass]
#   Reports:  4
#   Warnings: 0
#   Errors:   0
#   • build (a3f91c2e-…)
#   • import (a3f91c2e-…)
#   • compare (a3f91c2e-…)
#   • validation (a3f91c2e-…)

markidocx evidence get a3f91c2e-... --json
markidocx evidence get a3f91c2e-... --output evidence.json

The assembled EvidenceSet reports:

  • classificationpass | pass-with-warnings | failed
  • components — which report types are present
  • completeness_note — if expected reports are absent for the workflow type

9. Template Management (UC-012, UC-013)

List families:

markidocx template list
markidocx template list --json

List styles in a family — inspect the actual Word styles available:

markidocx template styles
markidocx template styles --family book
markidocx template styles --family article --json
# [
#   {"name": "Heading 1", "style_id": "Heading1", "type": "paragraph", ...},
#   {"name": "Normal",    "style_id": "Normal",   "type": "paragraph", ...},
#   ...
# ]

Register a custom template:

markidocx template register my-brand.docx --name brand --description "Corporate brand"

Extract a template from an existing Word document:

markidocx template extract existing-report.docx
# ✓ Template extracted: existing-report-template.docx
#   Styles preserved: 42

markidocx template extract existing-report.docx \
    --template-out corporate-template.docx \
    --family corporate

This strips all body content while preserving every style, page setup, header, footer, and theme from the source document. The result is a content-free template ready for use with markidocx build.


10. Word-First Round-Trip (UC-006 variant)

If you have an existing Word document and want to bring it into the markidocx workflow:

# Step 1: extract the template shell
markidocx template extract report.docx --template-out report-template.docx

# Step 2: import the content to Markdown
markidocx import manifest.yaml report.docx
# → content.md

# Step 3: edit content.md in your editor, then rebuild
markidocx build manifest.yaml
# ✓ Built: dist/report.docx

# Step 4: verify zero structural drift from the original
markidocx compare manifest.yaml dist/report.docx
# ✓ No drift detected

11. REST Service (UC-019)

Start the service:

markidocx serve                      # production
markidocx serve --dev --port 8080    # dev mode with auto-reload

All CLI operations have REST equivalents:

CLI REST
validate POST /validate
build POST /build
import POST /import
compare POST /compare
workflow POST /workflows/{name}
evidence get GET /evidence/{run_id}
template list GET /templates
template styles GET /styles?family=article
template extract POST /template/extract

Example — build via REST:

curl -X POST http://localhost:8000/build \
  -H "Content-Type: application/json" \
  -d '{
    "manifest_yaml": "project:\n  name: Test\n  feature_level: level1\n  family: article\nsources:\n  - path: doc.md\noutput:\n  dir: ./dist\n",
    "sources": [{"name": "doc.md", "content": "# Hello\n\nWorld.\n"}]
  }'
# {"status": "ok", "outputs": {"docx_base64": "..."}, "warnings": []}

Capability and health discovery:

curl http://localhost:8000/capabilities
curl http://localhost:8000/health
curl http://localhost:8000/version

12. MCP Tools (UC-020)

Start the MCP server:

markidocx mcp

Available tools (callable by any MCP-compatible agent):

Tool Description
validate_project(manifest_yaml) Validate a manifest
inspect_project(manifest_yaml) Inspect project structure + capabilities
build(manifest_yaml, sources) Build DOCX, returns docx_base64
import_docx(manifest_yaml, docx_base64) Import DOCX to Markdown
compare(manifest_yaml, docx_base64, sources) Drift detection
run_tests(manifest_yaml, sources) End-to-end regression
invoke_workflow(name, manifest_yaml, sources) Named workflow
get_evidence(run_id) Retrieve evidence set
list_templates() Available families
list_styles(family) Styles in a family
extract_template(source_path, template_out) Extract template shell
get_version() Version info

13. Version and Health (UC-024)

markidocx --version
# markidocx 0.1.0

# Via REST
curl http://localhost:8000/health
# {"status": "ok", "version": "0.1.0"}

Summary: The Core Workflow

1. Author writes Markdown  →  manifest.yaml + *.md files
2. markidocx inspect       →  confirm structure and capabilities
3. markidocx build         →  dist/document.docx  (send to reviewer)
4. Reviewer edits DOCX     →  document-reviewed.docx  (returned)
5. markidocx import        →  Markdown updated with reviewer edits
6. markidocx compare       →  drift report confirms what changed
7. markidocx evidence list →  audit trail for every run

All three interfaces (CLI, REST, MCP) expose the same functional model. No capability is interface-specific — every operation accessible via the CLI is equally accessible to a pipeline or an agent.