# markidocx Tutorial ## Overview markidocx is a **Markdown ↔ DOCX round-trip editing system**. Markdown is the canonical source of truth; Word documents are editorial projections used for review. Every operation preserves this asymmetry — edits made in Word flow back into Markdown, not the other way around. All capabilities are available through three equivalent interfaces: - **CLI** — local document workflows - **REST** — pipeline and automation integration (`markidocx serve`) - **MCP** — agent-accessible tools (`markidocx mcp`) --- ## 1. Define a Project (UC-001) Everything in markidocx starts with a **manifest file** — a YAML declaration of your sources, feature level, and document family. ```yaml # manifest.yaml project: name: "Technical Specification" feature_level: level1 # or level3 for advanced features family: article # article | book | website sources: - path: intro.md - path: chapters/design.md - path: chapters/api.md output: dir: ./dist ``` **Feature levels:** - `level1` — headings, lists, tables, footnotes, images, links - `level3` — everything in LEVEL1, plus cross-references, numbered figures, auto-diagrams (Mermaid/Graphviz/PlantUML), bibliography **Built-in families:** | Family | Description | |--------|-------------| | `article` | Single-document article layout | | `book` | Multi-chapter book layout | | `website` | Web-optimised document layout | --- ## 2. Validate and Inspect (UC-002, UC-003) Before building, confirm the project is well-formed. **Validate** — checks manifest structure, source file existence, family/level compatibility: ```bash markidocx validate manifest.yaml # ✓ Manifest valid: Technical Specification markidocx validate manifest.yaml --json # {"status": "ok", "project": "Technical Specification"} ``` **Inspect** — shows the full resolved structure including LEVEL3 capability availability: ```bash markidocx inspect manifest.yaml # Project: Technical Specification # family: article # feature_level: level1 # sources: intro.md, chapters/design.md, chapters/api.md # level3 xref: False # level3 diag: False markidocx inspect manifest.yaml --json # { # "status": "ok", # "project": "Technical Specification", # "family": "article", # "feature_level": "level1", # "sources": ["intro.md", "chapters/design.md", "chapters/api.md"], # "level3": {"xref_available": false, "diagrams_available": false, ...} # } ``` The `level3` block tells you which optional processors (`mmdc`, `dot`, `plantuml`) are available on your PATH. --- ## 3. Build a DOCX (UC-004, UC-005, UC-014, UC-015) Compile Markdown sources into a Word document: ```bash markidocx build manifest.yaml # ✓ Built: dist/technical-specification.docx markidocx build manifest.yaml --json # {"status": "ok", "output_path": "dist/...", "family": "article", "warnings": []} ``` **Switching families** (UC-005) — change `family:` in the manifest to re-build with different styling. All three built-in families are always available without any setup. **LEVEL3 document** (UC-015) — set `feature_level: level3` and include advanced constructs in your Markdown: ```markdown See [Section 2][sec-design]. ![Architecture diagram](arch.png) *Figure 1: System architecture* ```mermaid graph TD A[Client] --> B[API] B --> C[Database] ``` As noted in [@smith2023], the approach is sound. ## References - [@smith2023]: Smith, J. *Technical Approaches*. 2023. ``` If a diagram renderer is unavailable, markidocx falls back to embedding the source as a verbatim code block and emits a warning — **source is never silently discarded**. --- ## 4. Import an Edited DOCX (UC-006) After a reviewer edits the Word document, import their changes back to Markdown: ```bash markidocx import manifest.yaml dist/technical-specification-reviewed.docx # ✓ Imported (mapped) # → intro.md # → chapters/design.md # → chapters/api.md ``` For **single-file projects** the import produces one `.md` file. For **multi-file projects**, markidocx redistributes content back to the original source files using heading boundaries as guides. If redistribution is ambiguous, it falls back to a single merged file and reports `mapping_status: fallback`. --- ## 5. Detect Round-Trip Drift (UC-011) After importing, check whether any structure was lost or degraded: ```bash markidocx compare manifest.yaml dist/technical-specification-reviewed.docx # ✓ No drift detected # preserved: 12 elements # Or with drift: # ⚠ Drift detected # degraded: heading:## Background (1/2) # broken: footnote:[^1] ``` The drift report classifies every structural element as: - **preserved** — identical in original and re-import - **degraded** — present but modified - **broken** — present in original, missing from re-import - **unsupported** — construct not supported at the declared feature level --- ## 6. Full Round-Trip Workflow (UC-007) The `workflow` command runs the full cycle in one step: ```bash markidocx workflow single-file-roundtrip manifest.yaml # ✓ Workflow single-file-roundtrip: pass # ✓ validate: executed # ✓ build: executed # ✓ import: executed # ✓ compare: executed # run_id: a3f91c2e-... # Multi-file variant markidocx workflow multi-file-roundtrip manifest.yaml ``` Available workflows: | Workflow | Steps | |----------|-------| | `single-file-roundtrip` | validate → build → import → compare | | `multi-file-roundtrip` | validate → build → import → redistribute → compare | | `release-regression` | full regression against the stable corpus | | `family-switch-build` | build under each of the three built-in families | --- ## 7. Run the Test Suite (UC-021) Run the end-to-end regression harness for a project: ```bash markidocx test manifest.yaml # ✓ Tests: 4 passed, 0 failed, 0 skipped # ✓ validate: executed # ✓ build: executed # ✓ import: executed # ✓ compare: executed # run_id: b7d04a1e-... # Exit code 0 on pass, 1 on any failure — CI-friendly markidocx test manifest.yaml --json ``` --- ## 8. Evidence and Audit Trail (UC-025, UC-022) Every `build`, `import`, `compare`, and `workflow` run produces a persistent evidence record keyed by `run_id`. **List recent runs:** ```bash markidocx evidence list markidocx evidence list --limit 5 --json ``` **Retrieve a run's evidence:** ```bash markidocx evidence get a3f91c2e-... # ✓ Run: a3f91c2e-... [pass] # Reports: 4 # Warnings: 0 # Errors: 0 # • build (a3f91c2e-…) # • import (a3f91c2e-…) # • compare (a3f91c2e-…) # • validation (a3f91c2e-…) markidocx evidence get a3f91c2e-... --json markidocx evidence get a3f91c2e-... --output evidence.json ``` The assembled **EvidenceSet** reports: - `classification` — `pass` | `pass-with-warnings` | `failed` - `components` — which report types are present - `completeness_note` — if expected reports are absent for the workflow type --- ## 9. Template Management (UC-012, UC-013) **List families:** ```bash markidocx template list markidocx template list --json ``` **List styles in a family** — inspect the actual Word styles available: ```bash markidocx template styles markidocx template styles --family book markidocx template styles --family article --json # [ # {"name": "Heading 1", "style_id": "Heading1", "type": "paragraph", ...}, # {"name": "Normal", "style_id": "Normal", "type": "paragraph", ...}, # ... # ] ``` **Register a custom template:** ```bash markidocx template register my-brand.docx --name brand --description "Corporate brand" ``` **Extract a template from an existing Word document:** ```bash markidocx template extract existing-report.docx # ✓ Template extracted: existing-report-template.docx # Styles preserved: 42 markidocx template extract existing-report.docx \ --template-out corporate-template.docx \ --family corporate ``` This strips all body content while preserving every style, page setup, header, footer, and theme from the source document. The result is a content-free template ready for use with `markidocx build`. --- ## 10. Word-First Round-Trip (UC-006 variant) If you have an existing Word document and want to bring it into the markidocx workflow: ```bash # Step 1: extract the template shell markidocx template extract report.docx --template-out report-template.docx # Step 2: import the content to Markdown markidocx import manifest.yaml report.docx # → content.md # Step 3: edit content.md in your editor, then rebuild markidocx build manifest.yaml # ✓ Built: dist/report.docx # Step 4: verify zero structural drift from the original markidocx compare manifest.yaml dist/report.docx # ✓ No drift detected ``` --- ## 11. REST Service (UC-019) Start the service: ```bash markidocx serve # production markidocx serve --dev --port 8080 # dev mode with auto-reload ``` All CLI operations have REST equivalents: | CLI | REST | |-----|------| | `validate` | `POST /validate` | | `build` | `POST /build` | | `import` | `POST /import` | | `compare` | `POST /compare` | | `workflow` | `POST /workflows/{name}` | | `evidence get` | `GET /evidence/{run_id}` | | `template list` | `GET /templates` | | `template styles` | `GET /styles?family=article` | | `template extract` | `POST /template/extract` | **Example — build via REST:** ```bash curl -X POST http://localhost:8000/build \ -H "Content-Type: application/json" \ -d '{ "manifest_yaml": "project:\n name: Test\n feature_level: level1\n family: article\nsources:\n - path: doc.md\noutput:\n dir: ./dist\n", "sources": [{"name": "doc.md", "content": "# Hello\n\nWorld.\n"}] }' # {"status": "ok", "outputs": {"docx_base64": "..."}, "warnings": []} ``` **Capability and health discovery:** ```bash curl http://localhost:8000/capabilities curl http://localhost:8000/health curl http://localhost:8000/version ``` --- ## 12. MCP Tools (UC-020) Start the MCP server: ```bash markidocx mcp ``` Available tools (callable by any MCP-compatible agent): | Tool | Description | |------|-------------| | `validate_project(manifest_yaml)` | Validate a manifest | | `inspect_project(manifest_yaml)` | Inspect project structure + capabilities | | `build(manifest_yaml, sources)` | Build DOCX, returns `docx_base64` | | `import_docx(manifest_yaml, docx_base64)` | Import DOCX to Markdown | | `compare(manifest_yaml, docx_base64, sources)` | Drift detection | | `run_tests(manifest_yaml, sources)` | End-to-end regression | | `invoke_workflow(name, manifest_yaml, sources)` | Named workflow | | `get_evidence(run_id)` | Retrieve evidence set | | `list_templates()` | Available families | | `list_styles(family)` | Styles in a family | | `extract_template(source_path, template_out)` | Extract template shell | | `get_version()` | Version info | --- ## 13. Version and Health (UC-024) ```bash markidocx --version # markidocx 0.1.0 # Via REST curl http://localhost:8000/health # {"status": "ok", "version": "0.1.0"} ``` --- ## Summary: The Core Workflow ``` 1. Author writes Markdown → manifest.yaml + *.md files 2. markidocx inspect → confirm structure and capabilities 3. markidocx build → dist/document.docx (send to reviewer) 4. Reviewer edits DOCX → document-reviewed.docx (returned) 5. markidocx import → Markdown updated with reviewer edits 6. markidocx compare → drift report confirms what changed 7. markidocx evidence list → audit trail for every run ``` All three interfaces (CLI, REST, MCP) expose the same functional model. No capability is interface-specific — every operation accessible via the CLI is equally accessible to a pipeline or an agent.