Files
marki-docx/architecture/ADR-003-manifest-yaml-schema.md
Bernd Worsch ebc5eaee77 feat: WP-0004 T01-T04 — stable corpus, ADRs, regression test
- corpus/markidocx-docs/manifest.yaml: specs as live markidocx project (FR-1101)
- corpus/markidocx-docs/known-drift.md: documented structural drift
- workflows.py: release-regression accepts manifest path; emits corpus_id (FR-1109)
- tests/regression/test_corpus_regression.py: corpus regression suite (FR-1102–1110)
- architecture/ADR-002: python-docx as conversion engine
- architecture/ADR-003: manifest YAML schema
- workplans/MRKD-WP-0004: T01–T04 done; T05 blocked (SBOM path mapping needed)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 17:48:33 +00:00

3.7 KiB

id, type, status, created, deciders
id type status created deciders
ADR-003 adr accepted 2026-03-16
Bernd
Custodian

ADR-003: Manifest YAML Schema

Status

Accepted

Context

markidocx needs a project definition format that:

  1. Describes which Markdown source files form a document project
  2. Declares the feature level (level1 / level3) and document family (article, book, website)
  3. Specifies output location and document metadata
  4. Is human-writable and version-controllable alongside source files
  5. Is parseable by the system without a schema registry or external validator

The format must support single-file and multi-file projects, and be extensible enough for future additions (e.g. bibliography sources, asset directories) without breaking existing manifests.

Decision

Use YAML with a fixed four-section top-level structure:

project:
  name: <string>
  feature_level: level1 | level3
  family: article | book | website

sources:
  - path: <relative path to .md file>
  - path: <relative path to .md file>

output:
  dir: <relative path to output directory>

metadata:
  title: <string>
  author: <string>
  date: <string>

All paths are resolved relative to the manifest file's location. The metadata section and individual source path keys may be extended in future versions.

Validation is performed on load by manifest.py using dataclass coercion: load_manifest(path) raises ManifestError on any schema violation (missing required fields, unknown feature levels, unresolvable source paths).

Current Field Definitions

Field Type Required Default Notes
project.name string yes Project identifier; used in output filenames
project.feature_level enum yes level1 or level3
project.family enum yes article, book, or website
sources[].path string yes Relative path; resolved against manifest dir
output.dir string no ./dist Relative path for generated artefacts
metadata.title string no Propagated to DOCX document properties
metadata.author string no Propagated to DOCX document properties
metadata.date string no Propagated to DOCX document properties

Consequences

Positive:

  • Human-readable and diff-friendly; natural fit for version-controlled documentation repositories
  • No external schema validation library needed — manifest.py owns validation
  • Simple enough for a first-time user to write by hand
  • Relative paths keep manifests portable across machines

Negative / accepted limitations:

  • Evolving the schema requires coordination between the manifest file format and manifest.py — there is no formal schema version field
  • No auto-completion support in editors without a JSON Schema / YAML Language Server configuration (out of scope for v0.1)
  • YAML's implicit type coercion can surprise users (e.g. bare no parsed as False); load_manifest validates all fields explicitly to catch these cases

Alternatives Rejected

TOML — good alternative, but YAML is more common in documentation tooling (MkDocs, GitHub Actions, Kubernetes) and more familiar to the target audience.

JSON — less writable for humans; comments not supported; trailing commas disallowed; less pleasant for multi-line string values.

Database / registry — over-engineered for the single-project use case; would require a running service just to define a document project.

Pydantic / JSON Schema — considered for validation, but adds a dependency for functionality that a handful of explicit checks in load_manifest() already covers cleanly.