generated from coulomb/repo-seed
feat: WP-0004 T01-T04 — stable corpus, ADRs, regression test
- corpus/markidocx-docs/manifest.yaml: specs as live markidocx project (FR-1101) - corpus/markidocx-docs/known-drift.md: documented structural drift - workflows.py: release-regression accepts manifest path; emits corpus_id (FR-1109) - tests/regression/test_corpus_regression.py: corpus regression suite (FR-1102–1110) - architecture/ADR-002: python-docx as conversion engine - architecture/ADR-003: manifest YAML schema - workplans/MRKD-WP-0004: T01–T04 done; T05 blocked (SBOM path mapping needed) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
45
corpus/markidocx-docs/known-drift.md
Normal file
45
corpus/markidocx-docs/known-drift.md
Normal file
@@ -0,0 +1,45 @@
|
||||
# Known Drift — markidocx-docs Corpus
|
||||
|
||||
Last updated: 2026-03-16
|
||||
|
||||
## Summary
|
||||
|
||||
The markidocx-docs corpus (PRD + FRS v0.2 + UCC) produces known structural drift
|
||||
on round-trip at LEVEL1. This drift is expected and does not indicate a regression.
|
||||
|
||||
## Import mode: fallback (merged)
|
||||
|
||||
The three source files are composed into a single DOCX. On import the system attempts
|
||||
to redistribute content back to the three origin files using source-boundary markers.
|
||||
The current build pipeline embeds section markers but the 27 H1-level sections in the
|
||||
combined document make boundary matching ambiguous, so the importer falls back to a
|
||||
single merged output (`dist/imported_merged.md`).
|
||||
|
||||
**Classification:** expected / by-design. The merged output is complete and usable.
|
||||
|
||||
## Structural drift items
|
||||
|
||||
### Bold inline text in list items (broken: ~70 items)
|
||||
|
||||
List items containing `**bold**` inline spans lose the bold markers on round-trip.
|
||||
python-docx represents inline bold as a `Run` with `bold=True`, but the importer's
|
||||
list-item text extractor concatenates run text without restoring markdown bold syntax.
|
||||
|
||||
**Classification:** known limitation of LEVEL1 inline formatting in list items.
|
||||
**FR reference:** FR-508 (unsupported construct visibility) — these are surfaced
|
||||
explicitly as `broken` rather than silently accepted.
|
||||
**Impact:** content is preserved, presentation marker is lost.
|
||||
|
||||
### Table (broken: 1 of 1)
|
||||
|
||||
One table in the UCC is detected as missing after round-trip. Likely cause: the table
|
||||
contains merged cells or a header row structure that the importer does not reconstruct.
|
||||
|
||||
**Classification:** known LEVEL1 table limitation.
|
||||
**Impact:** table content is present in the DOCX but not re-imported to Markdown.
|
||||
|
||||
## Verdict
|
||||
|
||||
902 elements preserved; ~71 broken items (all inline formatting in lists or 1 table).
|
||||
This corpus is suitable as a regression baseline: a clean round-trip regression test
|
||||
can assert `preserved >= 900` and `broken <= 80` rather than exact zero-drift.
|
||||
17
corpus/markidocx-docs/manifest.yaml
Normal file
17
corpus/markidocx-docs/manifest.yaml
Normal file
@@ -0,0 +1,17 @@
|
||||
project:
|
||||
name: markidocx-docs
|
||||
feature_level: level1
|
||||
family: article
|
||||
|
||||
sources:
|
||||
- path: ../../specs/MarkiDocxProductRequirementsDocument_v0.1.md
|
||||
- path: ../../specs/MarkiDocxFunctionalRequirementsSpecification_v0.2.md
|
||||
- path: ../../specs/MarkiDocxUseCaseCatalog_v0.1.md
|
||||
|
||||
output:
|
||||
dir: ./dist
|
||||
|
||||
metadata:
|
||||
title: markidocx — Product Documentation
|
||||
author: Markitect Project
|
||||
date: "2026-03-16"
|
||||
Reference in New Issue
Block a user