feat: WP-0003 complete — LEVEL3 advanced features + error framework

Implements full LEVEL3 feature set: cross-references (xref.py), numbered figures (figures.py), auto-diagrams (diagrams.py), bibliography/citations (bibliography.py), LEVEL3 capability detection (level3.py), and structured error/warning records (errors.py). Builder, importer, and differ updated for LEVEL3 round-trip support. REST and MCP interfaces updated with structured warning records. 259 tests passing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-16 10:51:38 +00:00
parent 760047b82b
commit ac442ea41f
26 changed files with 3713 additions and 74 deletions
--- a/tests/regression/level3/init.py
+++ b/tests/regression/level3/init.py
--- a/tests/regression/level3/bibliography_document.md
+++ b/tests/regression/level3/bibliography_document.md
@@ -0,0 +1,35 @@
+# Research Document with Citations
+
+## Introduction
+
+Prior work by [@smith2020] established the foundation. The approach was later
+refined by [@jones2021], building on the original insights of [@smith2020].
+
+## Related Work
+
+Several key contributions inform this work. The landmark paper [@brown2019]
+introduced the core technique. Further development appeared in [@davis2022]
+and [@wilson2023].
+
+## Methodology
+
+Based on [@smith2020] and the refinements of [@jones2021], our methodology
+proceeds as follows.
+
+## Results
+
+Our results confirm the predictions of [@brown2019] and extend the findings
+of [@davis2022].
+
+## Conclusion
+
+This work synthesises [@smith2020], [@jones2021], [@brown2019], [@davis2022],
+and [@wilson2023].
+
+## References
+
+- [@smith2020]: Smith, J. *Foundational Work*. Journal of Research, 2020.
+- [@jones2021]: Jones, B. *Refinements and Extensions*. Proceedings, 2021.
+- [@brown2019]: Brown, C. *The Core Technique*. Nature, 2019.
+- [@davis2022]: Davis, A. *Further Development*. Science, 2022.
+- [@wilson2023]: Wilson, E. *Recent Advances*. Review, 2023.
--- a/tests/regression/level3/combined_document.md
+++ b/tests/regression/level3/combined_document.md
@@ -0,0 +1,63 @@
+# Combined LEVEL3 Feature Document {#combined}
+
+This document exercises all LEVEL3 constructs in a single file.
+
+## Introduction {#intro}
+
+This document demonstrates the full LEVEL3 feature set as described by [@smith2020].
+See [Background][bg] for context.
+
+## Background {#bg}
+
+Context and prerequisites are discussed here. Refer to [Introduction][intro]
+for the problem statement.
+
+## Architecture {#arch-section}
+
+The system architecture is shown below.
+
+![System Architecture](arch.png){#fig:arch}
+
+The architecture overview in [Architecture][arch-section] establishes the
+baseline from which the data flow is derived.
+
+## Data Flow
+
+The data flow diagram illustrates message routing.
+
+```mermaid
+graph LR
+    A[Input] --> B[Processor]
+    B --> C[Output]
+```
+
+## Algorithm {#algo}
+
+The algorithm formalises the approach described in [@jones2021].
+
+```graphviz
+digraph algorithm {
+    start -> step1 -> step2 -> end;
+}
+```
+
+## Results {#results}
+
+Experimental results confirm the algorithm in [Algorithm][algo].
+
+![Experimental Results](results.png){#fig:results}
+
+The results align with predictions from [@brown2019] and the architectural
+choices described in [Architecture][arch-section].
+
+## Conclusion {#conclusion}
+
+All LEVEL3 constructs — cross-references, figures, diagrams, and citations —
+have been demonstrated. See [Introduction][intro] through [Results][results]
+for the complete narrative.
+
+## References
+
+- [@smith2020]: Smith, J. *LEVEL3 Design Principles*. 2020.
+- [@jones2021]: Jones, B. *Algorithm Formalisation*. 2021.
+- [@brown2019]: Brown, C. *Experimental Validation*. 2019.
--- a/tests/regression/level3/diagrams_document.md
+++ b/tests/regression/level3/diagrams_document.md
@@ -0,0 +1,44 @@
+# Document with Diagram Sources
+
+## State Machine
+
+The following Mermaid diagram describes the state machine:
+
+```mermaid
+stateDiagram-v2
+    [*] --> Idle
+    Idle --> Processing: start
+    Processing --> Done: complete
+    Processing --> Error: fail
+    Done --> [*]
+    Error --> Idle: reset
+```
+
+## Dependency Graph
+
+The Graphviz diagram shows dependencies:
+
+```graphviz
+digraph G {
+    A -> B;
+    A -> C;
+    B -> D;
+    C -> D;
+}
+```
+
+## Sequence
+
+The PlantUML sequence diagram:
+
+```plantuml
+@startuml
+Alice -> Bob: Request
+Bob --> Alice: Response
+Alice -> Carol: Forward
+@enduml
+```
+
+## Summary
+
+All three diagram types are supported in LEVEL3 source-only mode.
--- a/tests/regression/level3/figures_document.md
+++ b/tests/regression/level3/figures_document.md
@@ -0,0 +1,29 @@
+# Technical Report with Figures
+
+## Overview
+
+This document contains multiple numbered figures for LEVEL3 round-trip testing.
+
+## System Architecture
+
+The overall architecture is illustrated below.
+
+![System Architecture Overview](figures/architecture.png){#fig:arch}
+
+The architecture shows the main components and their interactions.
+
+## Data Flow
+
+The data flow is shown in the following figure.
+
+![Data Flow Diagram](figures/dataflow.png){#fig:dataflow}
+
+Compare the architecture in [fig:arch] with the data flow above.
+
+## Results
+
+Final results are captured in this chart.
+
+![Results Summary Chart](figures/results.png){#fig:results}
+
+The chart confirms the findings from the data flow in Figure 2.
--- a/tests/regression/level3/xref_document.md
+++ b/tests/regression/level3/xref_document.md
@@ -0,0 +1,21 @@
+# Introduction {#intro}
+
+This document demonstrates cross-reference support for LEVEL3 processing.
+
+## Background {#bg}
+
+The background section provides context. See [Introduction][intro] for the overview.
+
+## Methodology {#method}
+
+This section describes the approach. Refer to [Background][bg] for prerequisites,
+and see [Introduction][intro] for the original problem statement.
+
+## Results {#results}
+
+Results are discussed here. The methodology in [Methodology][method] led to these findings.
+
+## Conclusion
+
+This concludes the document. All sections from [Introduction][intro] through
+[Results][results] have been covered.
--- a/tests/regression/test_level3_roundtrip.py
+++ b/tests/regression/test_level3_roundtrip.py
@@ -0,0 +1,261 @@
+"""LEVEL3 end-to-end round-trip regression tests (FR-1100, MRKD-WP-0003 T07).
+
+Tests the full build → import → compare cycle for each corpus file in
+tests/regression/level3/, using feature_level: level3.
+
+All LEVEL1 regression tests must remain green (non-regression gate).
+"""
+
+from __future__ import annotations
+
+from pathlib import Path
+
+import pytest
+import yaml
+
+from markidocx.builder import build_document
+from markidocx.differ import compare
+from markidocx.importer import import_document
+from markidocx.manifest import load_manifest
+
+# Corpus files in tests/regression/level3/
+CORPUS_DIR = Path(__file__).parent / "level3"
+CORPUS_FILES = [
+    "xref_document.md",
+    "figures_document.md",
+    "diagrams_document.md",
+    "bibliography_document.md",
+    "combined_document.md",
+]
+
+
+def _make_level3_project(tmp_path: Path, markdown: str, name: str = "test") -> Path:
+    (tmp_path / "doc.md").write_text(markdown, encoding="utf-8")
+    manifest_path = tmp_path / "manifest.yaml"
+    manifest_path.write_text(
+        yaml.dump(
+            {
+                "project": {"name": name, "feature_level": "level3", "family": "article"},
+                "sources": [{"path": "doc.md"}],
+                "output": {"dir": "./dist"},
+            }
+        )
+    )
+    (tmp_path / "dist").mkdir()
+    return manifest_path
+
+
+# ---------------------------------------------------------------------------
+# Corpus round-trip tests
+# ---------------------------------------------------------------------------
+
+
+@pytest.mark.parametrize("corpus_file", CORPUS_FILES)
+def test_level3_corpus_builds(tmp_path: Path, corpus_file: str) -> None:
+    """Each corpus file builds successfully under LEVEL3."""
+    md = (CORPUS_DIR / corpus_file).read_text(encoding="utf-8")
+    manifest_path = _make_level3_project(tmp_path, md, name=corpus_file.replace(".md", ""))
+    manifest = load_manifest(manifest_path)
+
+    result = build_document(manifest)
+    assert result.success, f"Build failed for {corpus_file}: {result.errors}"
+    assert result.output_path.exists()
+    assert result.feature_level == "level3"
+
+
+@pytest.mark.parametrize("corpus_file", CORPUS_FILES)
+def test_level3_corpus_imports(tmp_path: Path, corpus_file: str) -> None:
+    """Each corpus file imports successfully after build."""
+    md = (CORPUS_DIR / corpus_file).read_text(encoding="utf-8")
+    manifest_path = _make_level3_project(tmp_path, md, name=corpus_file.replace(".md", ""))
+    manifest = load_manifest(manifest_path)
+
+    build_result = build_document(manifest)
+    assert build_result.success, f"Build failed for {corpus_file}"
+
+    import_result = import_document(manifest, build_result.output_path)
+    assert import_result.success, f"Import failed for {corpus_file}: {import_result.warnings}"
+
+
+@pytest.mark.parametrize("corpus_file", CORPUS_FILES)
+def test_level3_corpus_no_unexpected_breakage(tmp_path: Path, corpus_file: str) -> None:
+    """Round-trip diff for each corpus file has no broken headings."""
+    md = (CORPUS_DIR / corpus_file).read_text(encoding="utf-8")
+    manifest_path = _make_level3_project(tmp_path, md, name=corpus_file.replace(".md", ""))
+    manifest = load_manifest(manifest_path)
+
+    build_result = build_document(manifest)
+    assert build_result.success
+
+    import_result = import_document(manifest, build_result.output_path)
+    assert import_result.success
+
+    reimported = import_result.output_files[0].read_text(encoding="utf-8")
+    report = compare(md, reimported)
+
+    # Headings must not be broken
+    broken_headings = [b for b in report.broken if b.startswith("heading:")]
+    assert not broken_headings, (
+        f"Broken headings in {corpus_file}: {broken_headings}"
+    )
+
+
+# ---------------------------------------------------------------------------
+# Specific corpus: xref_document — cross-ref anchors preserved
+# ---------------------------------------------------------------------------
+
+
+def test_xref_document_anchors_preserved(tmp_path: Path) -> None:
+    md = (CORPUS_DIR / "xref_document.md").read_text(encoding="utf-8")
+    manifest_path = _make_level3_project(tmp_path, md, name="xref")
+    manifest = load_manifest(manifest_path)
+
+    build_result = build_document(manifest)
+    assert build_result.success
+
+    import_result = import_document(manifest, build_result.output_path)
+    assert import_result.success
+
+    reimported = import_result.output_files[0].read_text(encoding="utf-8")
+    # Core anchors must survive
+    assert "{#intro}" in reimported
+    assert "{#bg}" in reimported
+    assert "{#method}" in reimported
+    assert "{#results}" in reimported
+
+
+# ---------------------------------------------------------------------------
+# Specific corpus: figures_document — figure labels preserved
+# ---------------------------------------------------------------------------
+
+
+def test_figures_document_labels_preserved(tmp_path: Path) -> None:
+    md = (CORPUS_DIR / "figures_document.md").read_text(encoding="utf-8")
+    manifest_path = _make_level3_project(tmp_path, md, name="figures")
+    manifest = load_manifest(manifest_path)
+
+    build_result = build_document(manifest)
+    assert build_result.success
+
+    import_result = import_document(manifest, build_result.output_path)
+    assert import_result.success
+
+    reimported = import_result.output_files[0].read_text(encoding="utf-8")
+    assert "fig:arch" in reimported
+    assert "fig:dataflow" in reimported
+    assert "fig:results" in reimported
+
+
+# ---------------------------------------------------------------------------
+# Specific corpus: diagrams_document — diagram sources preserved
+# ---------------------------------------------------------------------------
+
+
+def test_diagrams_document_sources_preserved(tmp_path: Path, monkeypatch) -> None:
+    """Diagram sources survive round-trip in source-only path."""
+    import shutil
+
+    monkeypatch.setattr(shutil, "which", lambda _cmd: None)
+    md = (CORPUS_DIR / "diagrams_document.md").read_text(encoding="utf-8")
+    manifest_path = _make_level3_project(tmp_path, md, name="diagrams")
+    manifest = load_manifest(manifest_path)
+
+    build_result = build_document(manifest)
+    assert build_result.success
+
+    import_result = import_document(manifest, build_result.output_path)
+    assert import_result.success
+
+    reimported = import_result.output_files[0].read_text(encoding="utf-8")
+    # At least one diagram type must appear in reimported
+    assert "mermaid" in reimported or "graphviz" in reimported or "plantuml" in reimported
+
+
+# ---------------------------------------------------------------------------
+# Specific corpus: bibliography_document — citation keys preserved
+# ---------------------------------------------------------------------------
+
+
+def test_bibliography_document_citations_preserved(tmp_path: Path) -> None:
+    md = (CORPUS_DIR / "bibliography_document.md").read_text(encoding="utf-8")
+    manifest_path = _make_level3_project(tmp_path, md, name="bibliography")
+    manifest = load_manifest(manifest_path)
+
+    build_result = build_document(manifest)
+    assert build_result.success
+
+    import_result = import_document(manifest, build_result.output_path)
+    assert import_result.success
+
+    reimported = import_result.output_files[0].read_text(encoding="utf-8")
+    assert "smith2020" in reimported
+    assert "jones2021" in reimported
+    assert "brown2019" in reimported
+
+
+# ---------------------------------------------------------------------------
+# Specific corpus: combined_document — all LEVEL3 constructs
+# ---------------------------------------------------------------------------
+
+
+def test_combined_document_roundtrip(tmp_path: Path, monkeypatch) -> None:
+    """Combined document with all LEVEL3 constructs survives build+import."""
+    import shutil
+
+    monkeypatch.setattr(shutil, "which", lambda _cmd: None)
+    md = (CORPUS_DIR / "combined_document.md").read_text(encoding="utf-8")
+    manifest_path = _make_level3_project(tmp_path, md, name="combined")
+    manifest = load_manifest(manifest_path)
+
+    build_result = build_document(manifest)
+    assert build_result.success
+
+    import_result = import_document(manifest, build_result.output_path)
+    assert import_result.success
+
+    reimported = import_result.output_files[0].read_text(encoding="utf-8")
+
+    # Anchors preserved
+    assert "{#intro}" in reimported
+
+    # Figures preserved (at least the label)
+    assert "fig:arch" in reimported
+
+    # Citations preserved
+    assert "smith2020" in reimported
+
+
+# ---------------------------------------------------------------------------
+# CLI: markidocx test executes LEVEL1 + LEVEL3 corpus (non-regression gate)
+# ---------------------------------------------------------------------------
+
+
+def test_level1_regression_still_passes(tmp_path: Path) -> None:
+    """LEVEL1 round-trip must remain green after LEVEL3 changes (non-regression)."""
+    from tests.regression.test_roundtrip import LEVEL1_MARKDOWN
+
+    (tmp_path / "doc.md").write_text(LEVEL1_MARKDOWN, encoding="utf-8")
+    manifest_path = tmp_path / "manifest.yaml"
+    manifest_path.write_text(
+        yaml.dump(
+            {
+                "project": {"name": "l1-nonreg", "feature_level": "level1", "family": "article"},
+                "sources": [{"path": "doc.md"}],
+                "output": {"dir": "./dist"},
+            }
+        )
+    )
+    (tmp_path / "dist").mkdir()
+    manifest = load_manifest(manifest_path)
+
+    build_result = build_document(manifest)
+    assert build_result.success
+    assert not build_result.errors
+
+    import_result = import_document(manifest, build_result.output_path)
+    assert import_result.success
+
+    reimported = import_result.output_files[0].read_text(encoding="utf-8")
+    report = compare(LEVEL1_MARKDOWN, reimported)
+    broken_headings = [b for b in report.broken if b.startswith("heading:")]
+    assert not broken_headings