Files
marki-docx/docs/tutorial.md
Bernd Worsch 69d1789469
Some checks failed
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
CI / coverage (push) Has been cancelled
docs: add use-case tutorial covering all 25 UCC entries
Covers UC-001 through UC-025: project definition, inspect, validate,
build (LEVEL1 + LEVEL3), import, drift detection, full round-trip
workflows, test harness, evidence/audit trail, template management,
Word-first round-trip, REST service, MCP tools, version/health.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 19:48:08 +00:00

418 lines
12 KiB
Markdown

# markidocx Tutorial
## Overview
markidocx is a **Markdown ↔ DOCX round-trip editing system**. Markdown is the canonical source of truth; Word documents are editorial projections used for review. Every operation preserves this asymmetry — edits made in Word flow back into Markdown, not the other way around.
All capabilities are available through three equivalent interfaces:
- **CLI** — local document workflows
- **REST** — pipeline and automation integration (`markidocx serve`)
- **MCP** — agent-accessible tools (`markidocx mcp`)
---
## 1. Define a Project (UC-001)
Everything in markidocx starts with a **manifest file** — a YAML declaration of your sources, feature level, and document family.
```yaml
# manifest.yaml
project:
name: "Technical Specification"
feature_level: level1 # or level3 for advanced features
family: article # article | book | website
sources:
- path: intro.md
- path: chapters/design.md
- path: chapters/api.md
output:
dir: ./dist
```
**Feature levels:**
- `level1` — headings, lists, tables, footnotes, images, links
- `level3` — everything in LEVEL1, plus cross-references, numbered figures, auto-diagrams (Mermaid/Graphviz/PlantUML), bibliography
**Built-in families:**
| Family | Description |
|--------|-------------|
| `article` | Single-document article layout |
| `book` | Multi-chapter book layout |
| `website` | Web-optimised document layout |
---
## 2. Validate and Inspect (UC-002, UC-003)
Before building, confirm the project is well-formed.
**Validate** — checks manifest structure, source file existence, family/level compatibility:
```bash
markidocx validate manifest.yaml
# ✓ Manifest valid: Technical Specification
markidocx validate manifest.yaml --json
# {"status": "ok", "project": "Technical Specification"}
```
**Inspect** — shows the full resolved structure including LEVEL3 capability availability:
```bash
markidocx inspect manifest.yaml
# Project: Technical Specification
# family: article
# feature_level: level1
# sources: intro.md, chapters/design.md, chapters/api.md
# level3 xref: False
# level3 diag: False
markidocx inspect manifest.yaml --json
# {
# "status": "ok",
# "project": "Technical Specification",
# "family": "article",
# "feature_level": "level1",
# "sources": ["intro.md", "chapters/design.md", "chapters/api.md"],
# "level3": {"xref_available": false, "diagrams_available": false, ...}
# }
```
The `level3` block tells you which optional processors (`mmdc`, `dot`, `plantuml`) are available on your PATH.
---
## 3. Build a DOCX (UC-004, UC-005, UC-014, UC-015)
Compile Markdown sources into a Word document:
```bash
markidocx build manifest.yaml
# ✓ Built: dist/technical-specification.docx
markidocx build manifest.yaml --json
# {"status": "ok", "output_path": "dist/...", "family": "article", "warnings": []}
```
**Switching families** (UC-005) — change `family:` in the manifest to re-build with different styling. All three built-in families are always available without any setup.
**LEVEL3 document** (UC-015) — set `feature_level: level3` and include advanced constructs in your Markdown:
```markdown
<!-- Cross-reference -->
See [Section 2][sec-design].
<!-- Numbered figure -->
![Architecture diagram](arch.png)
*Figure 1: System architecture*
<!-- figure-label: fig-arch -->
<!-- Auto-diagram (requires mmdc on PATH) -->
```mermaid
graph TD
A[Client] --> B[API]
B --> C[Database]
```
<!-- Citation -->
As noted in [@smith2023], the approach is sound.
## References
- [@smith2023]: Smith, J. *Technical Approaches*. 2023.
```
If a diagram renderer is unavailable, markidocx falls back to embedding the source as a verbatim code block and emits a warning — **source is never silently discarded**.
---
## 4. Import an Edited DOCX (UC-006)
After a reviewer edits the Word document, import their changes back to Markdown:
```bash
markidocx import manifest.yaml dist/technical-specification-reviewed.docx
# ✓ Imported (mapped)
# → intro.md
# → chapters/design.md
# → chapters/api.md
```
For **single-file projects** the import produces one `.md` file. For **multi-file projects**, markidocx redistributes content back to the original source files using heading boundaries as guides. If redistribution is ambiguous, it falls back to a single merged file and reports `mapping_status: fallback`.
---
## 5. Detect Round-Trip Drift (UC-011)
After importing, check whether any structure was lost or degraded:
```bash
markidocx compare manifest.yaml dist/technical-specification-reviewed.docx
# ✓ No drift detected
# preserved: 12 elements
# Or with drift:
# ⚠ Drift detected
# degraded: heading:## Background (1/2)
# broken: footnote:[^1]
```
The drift report classifies every structural element as:
- **preserved** — identical in original and re-import
- **degraded** — present but modified
- **broken** — present in original, missing from re-import
- **unsupported** — construct not supported at the declared feature level
---
## 6. Full Round-Trip Workflow (UC-007)
The `workflow` command runs the full cycle in one step:
```bash
markidocx workflow single-file-roundtrip manifest.yaml
# ✓ Workflow single-file-roundtrip: pass
# ✓ validate: executed
# ✓ build: executed
# ✓ import: executed
# ✓ compare: executed
# run_id: a3f91c2e-...
# Multi-file variant
markidocx workflow multi-file-roundtrip manifest.yaml
```
Available workflows:
| Workflow | Steps |
|----------|-------|
| `single-file-roundtrip` | validate → build → import → compare |
| `multi-file-roundtrip` | validate → build → import → redistribute → compare |
| `release-regression` | full regression against the stable corpus |
| `family-switch-build` | build under each of the three built-in families |
---
## 7. Run the Test Suite (UC-021)
Run the end-to-end regression harness for a project:
```bash
markidocx test manifest.yaml
# ✓ Tests: 4 passed, 0 failed, 0 skipped
# ✓ validate: executed
# ✓ build: executed
# ✓ import: executed
# ✓ compare: executed
# run_id: b7d04a1e-...
# Exit code 0 on pass, 1 on any failure — CI-friendly
markidocx test manifest.yaml --json
```
---
## 8. Evidence and Audit Trail (UC-025, UC-022)
Every `build`, `import`, `compare`, and `workflow` run produces a persistent evidence record keyed by `run_id`.
**List recent runs:**
```bash
markidocx evidence list
markidocx evidence list --limit 5 --json
```
**Retrieve a run's evidence:**
```bash
markidocx evidence get a3f91c2e-...
# ✓ Run: a3f91c2e-... [pass]
# Reports: 4
# Warnings: 0
# Errors: 0
# • build (a3f91c2e-…)
# • import (a3f91c2e-…)
# • compare (a3f91c2e-…)
# • validation (a3f91c2e-…)
markidocx evidence get a3f91c2e-... --json
markidocx evidence get a3f91c2e-... --output evidence.json
```
The assembled **EvidenceSet** reports:
- `classification``pass` | `pass-with-warnings` | `failed`
- `components` — which report types are present
- `completeness_note` — if expected reports are absent for the workflow type
---
## 9. Template Management (UC-012, UC-013)
**List families:**
```bash
markidocx template list
markidocx template list --json
```
**List styles in a family** — inspect the actual Word styles available:
```bash
markidocx template styles
markidocx template styles --family book
markidocx template styles --family article --json
# [
# {"name": "Heading 1", "style_id": "Heading1", "type": "paragraph", ...},
# {"name": "Normal", "style_id": "Normal", "type": "paragraph", ...},
# ...
# ]
```
**Register a custom template:**
```bash
markidocx template register my-brand.docx --name brand --description "Corporate brand"
```
**Extract a template from an existing Word document:**
```bash
markidocx template extract existing-report.docx
# ✓ Template extracted: existing-report-template.docx
# Styles preserved: 42
markidocx template extract existing-report.docx \
--template-out corporate-template.docx \
--family corporate
```
This strips all body content while preserving every style, page setup, header, footer, and theme from the source document. The result is a content-free template ready for use with `markidocx build`.
---
## 10. Word-First Round-Trip (UC-006 variant)
If you have an existing Word document and want to bring it into the markidocx workflow:
```bash
# Step 1: extract the template shell
markidocx template extract report.docx --template-out report-template.docx
# Step 2: import the content to Markdown
markidocx import manifest.yaml report.docx
# → content.md
# Step 3: edit content.md in your editor, then rebuild
markidocx build manifest.yaml
# ✓ Built: dist/report.docx
# Step 4: verify zero structural drift from the original
markidocx compare manifest.yaml dist/report.docx
# ✓ No drift detected
```
---
## 11. REST Service (UC-019)
Start the service:
```bash
markidocx serve # production
markidocx serve --dev --port 8080 # dev mode with auto-reload
```
All CLI operations have REST equivalents:
| CLI | REST |
|-----|------|
| `validate` | `POST /validate` |
| `build` | `POST /build` |
| `import` | `POST /import` |
| `compare` | `POST /compare` |
| `workflow` | `POST /workflows/{name}` |
| `evidence get` | `GET /evidence/{run_id}` |
| `template list` | `GET /templates` |
| `template styles` | `GET /styles?family=article` |
| `template extract` | `POST /template/extract` |
**Example — build via REST:**
```bash
curl -X POST http://localhost:8000/build \
-H "Content-Type: application/json" \
-d '{
"manifest_yaml": "project:\n name: Test\n feature_level: level1\n family: article\nsources:\n - path: doc.md\noutput:\n dir: ./dist\n",
"sources": [{"name": "doc.md", "content": "# Hello\n\nWorld.\n"}]
}'
# {"status": "ok", "outputs": {"docx_base64": "..."}, "warnings": []}
```
**Capability and health discovery:**
```bash
curl http://localhost:8000/capabilities
curl http://localhost:8000/health
curl http://localhost:8000/version
```
---
## 12. MCP Tools (UC-020)
Start the MCP server:
```bash
markidocx mcp
```
Available tools (callable by any MCP-compatible agent):
| Tool | Description |
|------|-------------|
| `validate_project(manifest_yaml)` | Validate a manifest |
| `inspect_project(manifest_yaml)` | Inspect project structure + capabilities |
| `build(manifest_yaml, sources)` | Build DOCX, returns `docx_base64` |
| `import_docx(manifest_yaml, docx_base64)` | Import DOCX to Markdown |
| `compare(manifest_yaml, docx_base64, sources)` | Drift detection |
| `run_tests(manifest_yaml, sources)` | End-to-end regression |
| `invoke_workflow(name, manifest_yaml, sources)` | Named workflow |
| `get_evidence(run_id)` | Retrieve evidence set |
| `list_templates()` | Available families |
| `list_styles(family)` | Styles in a family |
| `extract_template(source_path, template_out)` | Extract template shell |
| `get_version()` | Version info |
---
## 13. Version and Health (UC-024)
```bash
markidocx --version
# markidocx 0.1.0
# Via REST
curl http://localhost:8000/health
# {"status": "ok", "version": "0.1.0"}
```
---
## Summary: The Core Workflow
```
1. Author writes Markdown → manifest.yaml + *.md files
2. markidocx inspect → confirm structure and capabilities
3. markidocx build → dist/document.docx (send to reviewer)
4. Reviewer edits DOCX → document-reviewed.docx (returned)
5. markidocx import → Markdown updated with reviewer edits
6. markidocx compare → drift report confirms what changed
7. markidocx evidence list → audit trail for every run
```
All three interfaces (CLI, REST, MCP) expose the same functional model. No capability is interface-specific — every operation accessible via the CLI is equally accessible to a pipeline or an agent.