# Known Drift — markidocx-docs Corpus Last updated: 2026-03-16 ## Summary The markidocx-docs corpus (PRD + FRS v0.2 + UCC) produces known structural drift on round-trip at LEVEL1. This drift is expected and does not indicate a regression. ## Import mode: fallback (merged) The three source files are composed into a single DOCX. On import the system attempts to redistribute content back to the three origin files using source-boundary markers. The current build pipeline embeds section markers but the 27 H1-level sections in the combined document make boundary matching ambiguous, so the importer falls back to a single merged output (`dist/imported_merged.md`). **Classification:** expected / by-design. The merged output is complete and usable. ## Structural drift items ### Bold inline text in list items (broken: ~70 items) List items containing `**bold**` inline spans lose the bold markers on round-trip. python-docx represents inline bold as a `Run` with `bold=True`, but the importer's list-item text extractor concatenates run text without restoring markdown bold syntax. **Classification:** known limitation of LEVEL1 inline formatting in list items. **FR reference:** FR-508 (unsupported construct visibility) — these are surfaced explicitly as `broken` rather than silently accepted. **Impact:** content is preserved, presentation marker is lost. ### Table (broken: 1 of 1) One table in the UCC is detected as missing after round-trip. Likely cause: the table contains merged cells or a header row structure that the importer does not reconstruct. **Classification:** known LEVEL1 table limitation. **Impact:** table content is present in the DOCX but not re-imported to Markdown. ## Verdict 902 elements preserved; ~71 broken items (all inline formatting in lists or 1 table). This corpus is suitable as a regression baseline: a clean round-trip regression test can assert `preserved >= 900` and `broken <= 80` rather than exact zero-drift.