generated from coulomb/repo-seed
- corpus/markidocx-docs/manifest.yaml: specs as live markidocx project (FR-1101) - corpus/markidocx-docs/known-drift.md: documented structural drift - workflows.py: release-regression accepts manifest path; emits corpus_id (FR-1109) - tests/regression/test_corpus_regression.py: corpus regression suite (FR-1102–1110) - architecture/ADR-002: python-docx as conversion engine - architecture/ADR-003: manifest YAML schema - workplans/MRKD-WP-0004: T01–T04 done; T05 blocked (SBOM path mapping needed) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
104 lines
3.7 KiB
Markdown
104 lines
3.7 KiB
Markdown
---
|
|
id: ADR-003
|
|
type: adr
|
|
status: accepted
|
|
created: 2026-03-16
|
|
deciders: [Bernd, Custodian]
|
|
---
|
|
|
|
# ADR-003: Manifest YAML Schema
|
|
|
|
## Status
|
|
|
|
Accepted
|
|
|
|
## Context
|
|
|
|
markidocx needs a project definition format that:
|
|
|
|
1. Describes which Markdown source files form a document project
|
|
2. Declares the feature level (`level1` / `level3`) and document family (`article`,
|
|
`book`, `website`)
|
|
3. Specifies output location and document metadata
|
|
4. Is human-writable and version-controllable alongside source files
|
|
5. Is parseable by the system without a schema registry or external validator
|
|
|
|
The format must support single-file and multi-file projects, and be extensible
|
|
enough for future additions (e.g. bibliography sources, asset directories) without
|
|
breaking existing manifests.
|
|
|
|
## Decision
|
|
|
|
Use **YAML** with a fixed four-section top-level structure:
|
|
|
|
```yaml
|
|
project:
|
|
name: <string>
|
|
feature_level: level1 | level3
|
|
family: article | book | website
|
|
|
|
sources:
|
|
- path: <relative path to .md file>
|
|
- path: <relative path to .md file>
|
|
|
|
output:
|
|
dir: <relative path to output directory>
|
|
|
|
metadata:
|
|
title: <string>
|
|
author: <string>
|
|
date: <string>
|
|
```
|
|
|
|
All paths are resolved relative to the manifest file's location. The `metadata`
|
|
section and individual source `path` keys may be extended in future versions.
|
|
|
|
Validation is performed on load by `manifest.py` using dataclass coercion:
|
|
`load_manifest(path)` raises `ManifestError` on any schema violation (missing
|
|
required fields, unknown feature levels, unresolvable source paths).
|
|
|
|
## Current Field Definitions
|
|
|
|
| Field | Type | Required | Default | Notes |
|
|
|-------|------|----------|---------|-------|
|
|
| `project.name` | string | yes | — | Project identifier; used in output filenames |
|
|
| `project.feature_level` | enum | yes | — | `level1` or `level3` |
|
|
| `project.family` | enum | yes | — | `article`, `book`, or `website` |
|
|
| `sources[].path` | string | yes | — | Relative path; resolved against manifest dir |
|
|
| `output.dir` | string | no | `./dist` | Relative path for generated artefacts |
|
|
| `metadata.title` | string | no | — | Propagated to DOCX document properties |
|
|
| `metadata.author` | string | no | — | Propagated to DOCX document properties |
|
|
| `metadata.date` | string | no | — | Propagated to DOCX document properties |
|
|
|
|
## Consequences
|
|
|
|
**Positive:**
|
|
- Human-readable and diff-friendly; natural fit for version-controlled documentation
|
|
repositories
|
|
- No external schema validation library needed — `manifest.py` owns validation
|
|
- Simple enough for a first-time user to write by hand
|
|
- Relative paths keep manifests portable across machines
|
|
|
|
**Negative / accepted limitations:**
|
|
- Evolving the schema requires coordination between the manifest file format and
|
|
`manifest.py` — there is no formal schema version field
|
|
- No auto-completion support in editors without a JSON Schema / YAML Language Server
|
|
configuration (out of scope for v0.1)
|
|
- YAML's implicit type coercion can surprise users (e.g. bare `no` parsed as `False`);
|
|
`load_manifest` validates all fields explicitly to catch these cases
|
|
|
|
## Alternatives Rejected
|
|
|
|
**TOML** — good alternative, but YAML is more common in documentation tooling
|
|
(MkDocs, GitHub Actions, Kubernetes) and more familiar to the target audience.
|
|
|
|
**JSON** — less writable for humans; comments not supported; trailing commas
|
|
disallowed; less pleasant for multi-line string values.
|
|
|
|
**Database / registry** — over-engineered for the single-project use case; would
|
|
require a running service just to define a document project.
|
|
|
|
**Pydantic / JSON Schema** — considered for validation, but adds a dependency
|
|
for functionality that a handful of explicit checks in `load_manifest()` already
|
|
covers cleanly.
|