feat: WP-0003 complete — LEVEL3 advanced features + error framework

Implements full LEVEL3 feature set: cross-references (xref.py), numbered
figures (figures.py), auto-diagrams (diagrams.py), bibliography/citations
(bibliography.py), LEVEL3 capability detection (level3.py), and structured
error/warning records (errors.py). Builder, importer, and differ updated for
LEVEL3 round-trip support. REST and MCP interfaces updated with structured
warning records. 259 tests passing.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-16 10:51:38 +00:00
parent 760047b82b
commit ac442ea41f
26 changed files with 3713 additions and 74 deletions

View File

View File

@@ -0,0 +1,35 @@
# Research Document with Citations
## Introduction
Prior work by [@smith2020] established the foundation. The approach was later
refined by [@jones2021], building on the original insights of [@smith2020].
## Related Work
Several key contributions inform this work. The landmark paper [@brown2019]
introduced the core technique. Further development appeared in [@davis2022]
and [@wilson2023].
## Methodology
Based on [@smith2020] and the refinements of [@jones2021], our methodology
proceeds as follows.
## Results
Our results confirm the predictions of [@brown2019] and extend the findings
of [@davis2022].
## Conclusion
This work synthesises [@smith2020], [@jones2021], [@brown2019], [@davis2022],
and [@wilson2023].
## References
- [@smith2020]: Smith, J. *Foundational Work*. Journal of Research, 2020.
- [@jones2021]: Jones, B. *Refinements and Extensions*. Proceedings, 2021.
- [@brown2019]: Brown, C. *The Core Technique*. Nature, 2019.
- [@davis2022]: Davis, A. *Further Development*. Science, 2022.
- [@wilson2023]: Wilson, E. *Recent Advances*. Review, 2023.

View File

@@ -0,0 +1,63 @@
# Combined LEVEL3 Feature Document {#combined}
This document exercises all LEVEL3 constructs in a single file.
## Introduction {#intro}
This document demonstrates the full LEVEL3 feature set as described by [@smith2020].
See [Background][bg] for context.
## Background {#bg}
Context and prerequisites are discussed here. Refer to [Introduction][intro]
for the problem statement.
## Architecture {#arch-section}
The system architecture is shown below.
![System Architecture](arch.png){#fig:arch}
The architecture overview in [Architecture][arch-section] establishes the
baseline from which the data flow is derived.
## Data Flow
The data flow diagram illustrates message routing.
```mermaid
graph LR
A[Input] --> B[Processor]
B --> C[Output]
```
## Algorithm {#algo}
The algorithm formalises the approach described in [@jones2021].
```graphviz
digraph algorithm {
start -> step1 -> step2 -> end;
}
```
## Results {#results}
Experimental results confirm the algorithm in [Algorithm][algo].
![Experimental Results](results.png){#fig:results}
The results align with predictions from [@brown2019] and the architectural
choices described in [Architecture][arch-section].
## Conclusion {#conclusion}
All LEVEL3 constructs — cross-references, figures, diagrams, and citations —
have been demonstrated. See [Introduction][intro] through [Results][results]
for the complete narrative.
## References
- [@smith2020]: Smith, J. *LEVEL3 Design Principles*. 2020.
- [@jones2021]: Jones, B. *Algorithm Formalisation*. 2021.
- [@brown2019]: Brown, C. *Experimental Validation*. 2019.

View File

@@ -0,0 +1,44 @@
# Document with Diagram Sources
## State Machine
The following Mermaid diagram describes the state machine:
```mermaid
stateDiagram-v2
[*] --> Idle
Idle --> Processing: start
Processing --> Done: complete
Processing --> Error: fail
Done --> [*]
Error --> Idle: reset
```
## Dependency Graph
The Graphviz diagram shows dependencies:
```graphviz
digraph G {
A -> B;
A -> C;
B -> D;
C -> D;
}
```
## Sequence
The PlantUML sequence diagram:
```plantuml
@startuml
Alice -> Bob: Request
Bob --> Alice: Response
Alice -> Carol: Forward
@enduml
```
## Summary
All three diagram types are supported in LEVEL3 source-only mode.

View File

@@ -0,0 +1,29 @@
# Technical Report with Figures
## Overview
This document contains multiple numbered figures for LEVEL3 round-trip testing.
## System Architecture
The overall architecture is illustrated below.
![System Architecture Overview](figures/architecture.png){#fig:arch}
The architecture shows the main components and their interactions.
## Data Flow
The data flow is shown in the following figure.
![Data Flow Diagram](figures/dataflow.png){#fig:dataflow}
Compare the architecture in [fig:arch] with the data flow above.
## Results
Final results are captured in this chart.
![Results Summary Chart](figures/results.png){#fig:results}
The chart confirms the findings from the data flow in Figure 2.

View File

@@ -0,0 +1,21 @@
# Introduction {#intro}
This document demonstrates cross-reference support for LEVEL3 processing.
## Background {#bg}
The background section provides context. See [Introduction][intro] for the overview.
## Methodology {#method}
This section describes the approach. Refer to [Background][bg] for prerequisites,
and see [Introduction][intro] for the original problem statement.
## Results {#results}
Results are discussed here. The methodology in [Methodology][method] led to these findings.
## Conclusion
This concludes the document. All sections from [Introduction][intro] through
[Results][results] have been covered.

View File

@@ -0,0 +1,261 @@
"""LEVEL3 end-to-end round-trip regression tests (FR-1100, MRKD-WP-0003 T07).
Tests the full build → import → compare cycle for each corpus file in
tests/regression/level3/, using feature_level: level3.
All LEVEL1 regression tests must remain green (non-regression gate).
"""
from __future__ import annotations
from pathlib import Path
import pytest
import yaml
from markidocx.builder import build_document
from markidocx.differ import compare
from markidocx.importer import import_document
from markidocx.manifest import load_manifest
# Corpus files in tests/regression/level3/
CORPUS_DIR = Path(__file__).parent / "level3"
CORPUS_FILES = [
"xref_document.md",
"figures_document.md",
"diagrams_document.md",
"bibliography_document.md",
"combined_document.md",
]
def _make_level3_project(tmp_path: Path, markdown: str, name: str = "test") -> Path:
(tmp_path / "doc.md").write_text(markdown, encoding="utf-8")
manifest_path = tmp_path / "manifest.yaml"
manifest_path.write_text(
yaml.dump(
{
"project": {"name": name, "feature_level": "level3", "family": "article"},
"sources": [{"path": "doc.md"}],
"output": {"dir": "./dist"},
}
)
)
(tmp_path / "dist").mkdir()
return manifest_path
# ---------------------------------------------------------------------------
# Corpus round-trip tests
# ---------------------------------------------------------------------------
@pytest.mark.parametrize("corpus_file", CORPUS_FILES)
def test_level3_corpus_builds(tmp_path: Path, corpus_file: str) -> None:
"""Each corpus file builds successfully under LEVEL3."""
md = (CORPUS_DIR / corpus_file).read_text(encoding="utf-8")
manifest_path = _make_level3_project(tmp_path, md, name=corpus_file.replace(".md", ""))
manifest = load_manifest(manifest_path)
result = build_document(manifest)
assert result.success, f"Build failed for {corpus_file}: {result.errors}"
assert result.output_path.exists()
assert result.feature_level == "level3"
@pytest.mark.parametrize("corpus_file", CORPUS_FILES)
def test_level3_corpus_imports(tmp_path: Path, corpus_file: str) -> None:
"""Each corpus file imports successfully after build."""
md = (CORPUS_DIR / corpus_file).read_text(encoding="utf-8")
manifest_path = _make_level3_project(tmp_path, md, name=corpus_file.replace(".md", ""))
manifest = load_manifest(manifest_path)
build_result = build_document(manifest)
assert build_result.success, f"Build failed for {corpus_file}"
import_result = import_document(manifest, build_result.output_path)
assert import_result.success, f"Import failed for {corpus_file}: {import_result.warnings}"
@pytest.mark.parametrize("corpus_file", CORPUS_FILES)
def test_level3_corpus_no_unexpected_breakage(tmp_path: Path, corpus_file: str) -> None:
"""Round-trip diff for each corpus file has no broken headings."""
md = (CORPUS_DIR / corpus_file).read_text(encoding="utf-8")
manifest_path = _make_level3_project(tmp_path, md, name=corpus_file.replace(".md", ""))
manifest = load_manifest(manifest_path)
build_result = build_document(manifest)
assert build_result.success
import_result = import_document(manifest, build_result.output_path)
assert import_result.success
reimported = import_result.output_files[0].read_text(encoding="utf-8")
report = compare(md, reimported)
# Headings must not be broken
broken_headings = [b for b in report.broken if b.startswith("heading:")]
assert not broken_headings, (
f"Broken headings in {corpus_file}: {broken_headings}"
)
# ---------------------------------------------------------------------------
# Specific corpus: xref_document — cross-ref anchors preserved
# ---------------------------------------------------------------------------
def test_xref_document_anchors_preserved(tmp_path: Path) -> None:
md = (CORPUS_DIR / "xref_document.md").read_text(encoding="utf-8")
manifest_path = _make_level3_project(tmp_path, md, name="xref")
manifest = load_manifest(manifest_path)
build_result = build_document(manifest)
assert build_result.success
import_result = import_document(manifest, build_result.output_path)
assert import_result.success
reimported = import_result.output_files[0].read_text(encoding="utf-8")
# Core anchors must survive
assert "{#intro}" in reimported
assert "{#bg}" in reimported
assert "{#method}" in reimported
assert "{#results}" in reimported
# ---------------------------------------------------------------------------
# Specific corpus: figures_document — figure labels preserved
# ---------------------------------------------------------------------------
def test_figures_document_labels_preserved(tmp_path: Path) -> None:
md = (CORPUS_DIR / "figures_document.md").read_text(encoding="utf-8")
manifest_path = _make_level3_project(tmp_path, md, name="figures")
manifest = load_manifest(manifest_path)
build_result = build_document(manifest)
assert build_result.success
import_result = import_document(manifest, build_result.output_path)
assert import_result.success
reimported = import_result.output_files[0].read_text(encoding="utf-8")
assert "fig:arch" in reimported
assert "fig:dataflow" in reimported
assert "fig:results" in reimported
# ---------------------------------------------------------------------------
# Specific corpus: diagrams_document — diagram sources preserved
# ---------------------------------------------------------------------------
def test_diagrams_document_sources_preserved(tmp_path: Path, monkeypatch) -> None:
"""Diagram sources survive round-trip in source-only path."""
import shutil
monkeypatch.setattr(shutil, "which", lambda _cmd: None)
md = (CORPUS_DIR / "diagrams_document.md").read_text(encoding="utf-8")
manifest_path = _make_level3_project(tmp_path, md, name="diagrams")
manifest = load_manifest(manifest_path)
build_result = build_document(manifest)
assert build_result.success
import_result = import_document(manifest, build_result.output_path)
assert import_result.success
reimported = import_result.output_files[0].read_text(encoding="utf-8")
# At least one diagram type must appear in reimported
assert "mermaid" in reimported or "graphviz" in reimported or "plantuml" in reimported
# ---------------------------------------------------------------------------
# Specific corpus: bibliography_document — citation keys preserved
# ---------------------------------------------------------------------------
def test_bibliography_document_citations_preserved(tmp_path: Path) -> None:
md = (CORPUS_DIR / "bibliography_document.md").read_text(encoding="utf-8")
manifest_path = _make_level3_project(tmp_path, md, name="bibliography")
manifest = load_manifest(manifest_path)
build_result = build_document(manifest)
assert build_result.success
import_result = import_document(manifest, build_result.output_path)
assert import_result.success
reimported = import_result.output_files[0].read_text(encoding="utf-8")
assert "smith2020" in reimported
assert "jones2021" in reimported
assert "brown2019" in reimported
# ---------------------------------------------------------------------------
# Specific corpus: combined_document — all LEVEL3 constructs
# ---------------------------------------------------------------------------
def test_combined_document_roundtrip(tmp_path: Path, monkeypatch) -> None:
"""Combined document with all LEVEL3 constructs survives build+import."""
import shutil
monkeypatch.setattr(shutil, "which", lambda _cmd: None)
md = (CORPUS_DIR / "combined_document.md").read_text(encoding="utf-8")
manifest_path = _make_level3_project(tmp_path, md, name="combined")
manifest = load_manifest(manifest_path)
build_result = build_document(manifest)
assert build_result.success
import_result = import_document(manifest, build_result.output_path)
assert import_result.success
reimported = import_result.output_files[0].read_text(encoding="utf-8")
# Anchors preserved
assert "{#intro}" in reimported
# Figures preserved (at least the label)
assert "fig:arch" in reimported
# Citations preserved
assert "smith2020" in reimported
# ---------------------------------------------------------------------------
# CLI: markidocx test executes LEVEL1 + LEVEL3 corpus (non-regression gate)
# ---------------------------------------------------------------------------
def test_level1_regression_still_passes(tmp_path: Path) -> None:
"""LEVEL1 round-trip must remain green after LEVEL3 changes (non-regression)."""
from tests.regression.test_roundtrip import LEVEL1_MARKDOWN
(tmp_path / "doc.md").write_text(LEVEL1_MARKDOWN, encoding="utf-8")
manifest_path = tmp_path / "manifest.yaml"
manifest_path.write_text(
yaml.dump(
{
"project": {"name": "l1-nonreg", "feature_level": "level1", "family": "article"},
"sources": [{"path": "doc.md"}],
"output": {"dir": "./dist"},
}
)
)
(tmp_path / "dist").mkdir()
manifest = load_manifest(manifest_path)
build_result = build_document(manifest)
assert build_result.success
assert not build_result.errors
import_result = import_document(manifest, build_result.output_path)
assert import_result.success
reimported = import_result.output_files[0].read_text(encoding="utf-8")
report = compare(LEVEL1_MARKDOWN, reimported)
broken_headings = [b for b in report.broken if b.startswith("heading:")]
assert not broken_headings

View File

@@ -0,0 +1,380 @@
"""Tests for structured error & warning framework (FR-12011210)."""
from __future__ import annotations
import textwrap
from pathlib import Path
# ---------------------------------------------------------------------------
# WarningRecord / FailureRecord / OutputState types (FR-12081210)
# ---------------------------------------------------------------------------
class TestWarningRecord:
def test_to_dict(self) -> None:
from markidocx.errors import Severity, WarningRecord
w = WarningRecord(severity=Severity.WARNING, reason="unsupported-construct", construct="html_block")
d = w.to_dict()
assert d["severity"] == "warning"
assert d["reason"] == "unsupported-construct"
assert d["construct"] == "html_block"
def test_str_with_construct(self) -> None:
from markidocx.errors import WarningRecord
w = WarningRecord(severity="warning", reason="test-reason", construct="my-token")
assert "warning" in str(w)
assert "test-reason" in str(w)
assert "my-token" in str(w)
def test_str_without_construct(self) -> None:
from markidocx.errors import WarningRecord
w = WarningRecord(severity="info", reason="test-reason")
s = str(w)
assert "info" in s
assert "test-reason" in s
class TestFailureRecord:
def test_to_dict(self) -> None:
from markidocx.errors import FailureRecord, Severity
f = FailureRecord(severity=Severity.ERROR, reason="docx-not-found", construct="some.docx")
d = f.to_dict()
assert d["severity"] == "error"
assert d["reason"] == "docx-not-found"
class TestOutputState:
def test_all_states_defined(self) -> None:
from markidocx.errors import OutputState
assert OutputState.FINAL == "final"
assert OutputState.PARTIAL == "partial"
assert OutputState.FALLBACK == "fallback"
assert OutputState.DEGRADED == "degraded"
assert OutputState.UNRESOLVED == "unresolved"
# ---------------------------------------------------------------------------
# Builder emits WarningRecord for unsupported constructs (FR-1203, FR-1205)
# ---------------------------------------------------------------------------
class TestBuilderWarningRecords:
def test_unsupported_html_emits_warning_record(self, tmp_path: Path) -> None:
from markidocx.builder import build_document
from markidocx.errors import Severity
from markidocx.manifest import load_manifest
(tmp_path / "doc.md").write_text(
"# Hello\n\n<div>raw html</div>\n\nNormal paragraph.",
encoding="utf-8",
)
(tmp_path / "manifest.yaml").write_text(
textwrap.dedent("""\
project:
name: test
feature_level: level1
family: article
sources:
- path: doc.md
output:
dir: ./dist
"""),
encoding="utf-8",
)
m = load_manifest(tmp_path / "manifest.yaml")
result = build_document(m)
assert result.success
assert len(result.warning_records) > 0
html_warnings = [w for w in result.warning_records if "html" in w.construct]
assert html_warnings, "Expected warning for html construct"
assert all(w.severity == Severity.WARNING for w in html_warnings)
def test_warning_records_have_reason(self, tmp_path: Path) -> None:
from markidocx.builder import build_document
from markidocx.manifest import load_manifest
(tmp_path / "doc.md").write_text(
"# Hello\n\n<div>raw html</div>",
encoding="utf-8",
)
(tmp_path / "manifest.yaml").write_text(
textwrap.dedent("""\
project:
name: test
feature_level: level1
family: article
sources:
- path: doc.md
output:
dir: ./dist
"""),
encoding="utf-8",
)
m = load_manifest(tmp_path / "manifest.yaml")
result = build_document(m)
for w in result.warning_records:
assert w.reason, "WarningRecord must have a non-empty reason"
def test_warnings_property_returns_strings(self, tmp_path: Path) -> None:
from markidocx.builder import build_document
from markidocx.manifest import load_manifest
(tmp_path / "doc.md").write_text("# Hello\n\n<div>html</div>", encoding="utf-8")
(tmp_path / "manifest.yaml").write_text(
textwrap.dedent("""\
project:
name: test
feature_level: level1
family: article
sources:
- path: doc.md
output:
dir: ./dist
"""),
encoding="utf-8",
)
m = load_manifest(tmp_path / "manifest.yaml")
result = build_document(m)
assert all(isinstance(w, str) for w in result.warnings)
def test_output_state_on_clean_build(self, tmp_path: Path) -> None:
from markidocx.builder import build_document
from markidocx.errors import OutputState
from markidocx.manifest import load_manifest
(tmp_path / "doc.md").write_text("# Hello\n\nContent.", encoding="utf-8")
(tmp_path / "manifest.yaml").write_text(
textwrap.dedent("""\
project:
name: clean
feature_level: level1
family: article
sources:
- path: doc.md
output:
dir: ./dist
"""),
encoding="utf-8",
)
m = load_manifest(tmp_path / "manifest.yaml")
result = build_document(m)
assert result.output_state == OutputState.FINAL
# ---------------------------------------------------------------------------
# Importer emits WarningRecord for errors and fallback paths (FR-1206, FR-1207)
# ---------------------------------------------------------------------------
class TestImporterWarningRecords:
def test_not_found_emits_error_warning_record(self, tmp_path: Path) -> None:
from markidocx.errors import OutputState, Severity
from markidocx.importer import import_document
from markidocx.manifest import load_manifest
(tmp_path / "doc.md").write_text("# Hello", encoding="utf-8")
(tmp_path / "manifest.yaml").write_text(
textwrap.dedent("""\
project:
name: test
feature_level: level1
family: article
sources:
- path: doc.md
output:
dir: ./dist
"""),
encoding="utf-8",
)
m = load_manifest(tmp_path / "manifest.yaml")
result = import_document(m, tmp_path / "missing.docx")
assert not result.success
assert result.output_state == OutputState.UNRESOLVED
assert len(result.warning_records) > 0
assert result.warning_records[0].severity == Severity.ERROR
assert result.warning_records[0].reason == "docx-not-found"
def test_warnings_property_returns_strings(self, tmp_path: Path) -> None:
from markidocx.importer import import_document
from markidocx.manifest import load_manifest
(tmp_path / "doc.md").write_text("# Hello", encoding="utf-8")
(tmp_path / "manifest.yaml").write_text(
textwrap.dedent("""\
project:
name: test
feature_level: level1
family: article
sources:
- path: doc.md
output:
dir: ./dist
"""),
encoding="utf-8",
)
m = load_manifest(tmp_path / "manifest.yaml")
result = import_document(m, tmp_path / "missing.docx")
assert all(isinstance(w, str) for w in result.warnings)
def test_fallback_emits_fallback_warning(self, tmp_path: Path) -> None:
"""Multi-source import that can't redistribute produces fallback WarningRecord."""
from markidocx.builder import build_document
from markidocx.errors import OutputState
from markidocx.importer import import_document
from markidocx.manifest import load_manifest
# Create two source files — the DOCX will have a single H1 so redistribution fails
(tmp_path / "a.md").write_text("# Alpha\n\nContent.", encoding="utf-8")
(tmp_path / "b.md").write_text("# Beta\n\nContent.", encoding="utf-8")
(tmp_path / "manifest.yaml").write_text(
textwrap.dedent("""\
project:
name: multi
feature_level: level1
family: article
sources:
- path: a.md
- path: b.md
output:
dir: ./dist
"""),
encoding="utf-8",
)
m = load_manifest(tmp_path / "manifest.yaml")
# Build first to get a DOCX
build_result = build_document(m)
assert build_result.success
# Now import with a manifest that has 3 sources (mismatch)
(tmp_path / "c.md").write_text("# Gamma\n\nContent.", encoding="utf-8")
(tmp_path / "manifest3.yaml").write_text(
textwrap.dedent("""\
project:
name: multi
feature_level: level1
family: article
sources:
- path: a.md
- path: b.md
- path: c.md
output:
dir: ./dist
"""),
encoding="utf-8",
)
m3 = load_manifest(tmp_path / "manifest3.yaml")
result = import_document(m3, build_result.output_path)
assert result.success
assert result.mapping_status == "merged"
assert result.output_state == OutputState.FALLBACK
fallback_warnings = [w for w in result.warning_records if w.reason == "fallback"]
assert fallback_warnings, "Expected fallback WarningRecord"
# ---------------------------------------------------------------------------
# Differ output_state (FR-1204)
# ---------------------------------------------------------------------------
class TestDifferOutputState:
def test_final_state_on_clean_diff(self) -> None:
from markidocx.differ import compare
from markidocx.errors import OutputState
text = "# Hello\n\nSome paragraph.\n\n- item one\n- item two"
report = compare(text, text)
assert not report.has_drift
assert report.output_state == OutputState.FINAL
def test_degraded_state_on_degraded_diff(self) -> None:
from markidocx.differ import compare
from markidocx.errors import OutputState
original = "# Hello\n\n- item one\n- item two\n- item three"
reimported = "# Hello\n\n- item one"
report = compare(original, reimported)
assert report.has_drift
assert report.output_state in (OutputState.DEGRADED, OutputState.PARTIAL)
def test_partial_state_on_broken_diff(self) -> None:
from markidocx.differ import compare
from markidocx.errors import OutputState
original = "# Section A\n\n## Sub\n\nParagraph."
reimported = ""
report = compare(original, reimported)
assert report.has_drift
assert report.output_state == OutputState.PARTIAL
# ---------------------------------------------------------------------------
# REST response envelope warnings are WarningRecord dicts (FR-1208)
# ---------------------------------------------------------------------------
class TestRestWarningRecords:
def test_build_warnings_are_dicts(self, tmp_path: Path) -> None:
"""When build produces warnings, REST response warnings are dicts, not bare strings."""
from fastapi.testclient import TestClient
from markidocx.rest import create_app
manifest_yaml = textwrap.dedent("""\
project:
name: test
feature_level: level1
family: article
sources:
- path: doc.md
output:
dir: ./dist
""")
# HTML in source will produce warnings
sources = [{"name": "doc.md", "content": "# Hello\n\n<div>html</div>"}]
client = TestClient(create_app())
resp = client.post("/build", json={"manifest_yaml": manifest_yaml, "sources": sources})
assert resp.status_code == 200
body = resp.json()
warnings = body.get("warnings", [])
# Each warning should be a dict with severity/reason/construct keys
for w in warnings:
assert isinstance(w, dict), f"Expected dict warning, got {type(w)}: {w}"
assert "severity" in w
assert "reason" in w
def test_import_warnings_are_dicts_on_failure(self) -> None:
"""Import failure warns with WarningRecord dict, not bare string."""
import base64
from fastapi.testclient import TestClient
from markidocx.rest import create_app
manifest_yaml = textwrap.dedent("""\
project:
name: test
feature_level: level1
family: article
sources:
- path: doc.md
output:
dir: ./dist
""")
# Send an invalid (empty) DOCX
empty_docx = base64.b64encode(b"not-a-docx").decode()
client = TestClient(create_app())
resp = client.post(
"/import",
json={"manifest_yaml": manifest_yaml, "docx_base64": empty_docx},
)
body = resp.json()
warnings = body.get("warnings", [])
for w in warnings:
assert isinstance(w, dict), f"Expected dict warning, got {type(w)}: {w}"

View File

@@ -0,0 +1,349 @@
"""Tests for LEVEL3 bibliography & citation support (FR-535, FR-536, FR-542)."""
from __future__ import annotations
import textwrap
from pathlib import Path
LEVEL3_MANIFEST = textwrap.dedent("""\
project:
name: bib-test
feature_level: level3
family: article
sources:
- path: doc.md
output:
dir: ./dist
""")
def _make_project(tmp_path: Path, markdown: str) -> Path:
(tmp_path / "doc.md").write_text(markdown, encoding="utf-8")
(tmp_path / "manifest.yaml").write_text(LEVEL3_MANIFEST, encoding="utf-8")
return tmp_path
# ---------------------------------------------------------------------------
# bibliography module helpers
# ---------------------------------------------------------------------------
class TestBibliographyHelpers:
def test_has_citations_true(self) -> None:
from markidocx.bibliography import has_citations
assert has_citations("See [@smith2020] for details.")
def test_has_citations_false(self) -> None:
from markidocx.bibliography import has_citations
assert not has_citations("Normal paragraph without citations.")
def test_extract_citation_keys(self) -> None:
from markidocx.bibliography import extract_citation_keys
text = "See [@smith2020] and [@jones2021:chap] for more."
keys = extract_citation_keys(text)
assert "smith2020" in keys
assert "jones2021:chap" in keys
def test_is_references_heading(self) -> None:
from markidocx.bibliography import is_references_heading
assert is_references_heading("## References")
assert is_references_heading("# References")
assert is_references_heading("### References")
assert not is_references_heading("## Introduction")
def test_parse_reference_entry(self) -> None:
from markidocx.bibliography import parse_reference_entry
result = parse_reference_entry("- [@smith2020]: Smith, J. *Title*. 2020.")
assert result is not None
key, entry = result
assert key == "smith2020"
assert "Smith, J." in entry
def test_extract_references_section(self) -> None:
from markidocx.bibliography import extract_references_section
md = textwrap.dedent("""\
# Document
See [@smith2020].
## References
- [@smith2020]: Smith, J. *A Book*. 2020.
- [@jones2021]: Jones, B. *Another*. 2021.
""")
entries, text_without = extract_references_section(md)
assert len(entries) == 2
assert entries[0][0] == "smith2020"
assert entries[1][0] == "jones2021"
assert "## References" not in text_without
def test_render_citation_text_unchanged(self) -> None:
from markidocx.bibliography import render_citation_text
text = "See [@smith2020] for details."
assert render_citation_text(text) == text
# ---------------------------------------------------------------------------
# Builder: citations and references section (FR-535)
# ---------------------------------------------------------------------------
class TestBuilderBibliography:
def test_build_with_citation_succeeds(self, tmp_path: Path) -> None:
from markidocx.builder import build_document
from markidocx.manifest import load_manifest
md = textwrap.dedent("""\
# Document
As shown by [@smith2020], the approach works.
## References
- [@smith2020]: Smith, J. *A Work*. 2020.
""")
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
result = build_document(m)
assert result.success
assert result.output_path.exists()
def test_build_docx_contains_citation_marker(self, tmp_path: Path) -> None:
"""The built DOCX should contain the citation text."""
from docx import Document as DocxReader
from markidocx.builder import build_document
from markidocx.manifest import load_manifest
md = "# Doc\n\nSee [@smith2020].\n\n## References\n\n- [@smith2020]: Smith. *T*. 2020."
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
result = build_document(m)
assert result.success
doc = DocxReader(str(result.output_path))
texts = [p.text for p in doc.paragraphs]
citation_paras = [t for t in texts if "smith2020" in t]
assert citation_paras, f"No citation found in DOCX. Paragraphs: {texts}"
def test_build_docx_contains_references_heading(self, tmp_path: Path) -> None:
"""The built DOCX should have a References heading."""
from docx import Document as DocxReader
from markidocx.builder import build_document
from markidocx.manifest import load_manifest
md = "# Doc\n\nText.\n\n## References\n\n- [@k1]: Author. *T*. 2020."
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
result = build_document(m)
assert result.success
doc = DocxReader(str(result.output_path))
texts = [p.text for p in doc.paragraphs]
assert "References" in texts, f"No References heading. Paragraphs: {texts}"
def test_build_multi_citation_document(self, tmp_path: Path) -> None:
"""Multiple citations and references entries all appear in DOCX."""
from docx import Document as DocxReader
from markidocx.builder import build_document
from markidocx.manifest import load_manifest
md = textwrap.dedent("""\
# Introduction
According to [@smith2020] and [@jones2021], this is true.
## References
- [@smith2020]: Smith, J. *Work A*. 2020.
- [@jones2021]: Jones, B. *Work B*. 2021.
""")
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
result = build_document(m)
assert result.success
doc = DocxReader(str(result.output_path))
all_text = " ".join(p.text for p in doc.paragraphs)
assert "smith2020" in all_text
assert "jones2021" in all_text
# ---------------------------------------------------------------------------
# Importer: citations and references restoration (FR-536)
# ---------------------------------------------------------------------------
class TestImporterBibliography:
def test_roundtrip_preserves_citation(self, tmp_path: Path) -> None:
from markidocx.builder import build_document
from markidocx.importer import import_document
from markidocx.manifest import load_manifest
md = "# Doc\n\nSee [@smith2020].\n\n## References\n\n- [@smith2020]: Smith. *T*. 2020."
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
build_result = build_document(m)
assert build_result.success
import_result = import_document(m, build_result.output_path)
assert import_result.success
reimported = import_result.output_files[0].read_text(encoding="utf-8")
assert "smith2020" in reimported
def test_roundtrip_preserves_reference_entry(self, tmp_path: Path) -> None:
from markidocx.builder import build_document
from markidocx.importer import import_document
from markidocx.manifest import load_manifest
md = textwrap.dedent("""\
# Doc
See [@k1].
## References
- [@k1]: Author. *Title*. 2020.
""")
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
build_result = build_document(m)
assert build_result.success
import_result = import_document(m, build_result.output_path)
assert import_result.success
reimported = import_result.output_files[0].read_text(encoding="utf-8")
assert "k1" in reimported
# ---------------------------------------------------------------------------
# Differ: citation and bibliography comparison (FR-542)
# ---------------------------------------------------------------------------
class TestDifferBibliography:
def test_preserved_citation(self) -> None:
from markidocx.differ import compare
text = "# Doc\n\nSee [@smith2020].\n\n## References\n\n- [@smith2020]: Smith. *T*. 2020."
report = compare(text, text)
assert any("citation:[@smith2020]" in p for p in report.preserved)
def test_missing_citation_broken(self) -> None:
from markidocx.differ import compare
original = "See [@smith2020]."
reimported = "See something."
report = compare(original, reimported)
assert any("citation:missing '[@smith2020]'" in b for b in report.broken)
assert report.has_drift
def test_missing_reference_entry_degraded(self) -> None:
from markidocx.differ import compare
original = textwrap.dedent("""\
See [@k1].
## References
- [@k1]: Author. *T*. 2020.
""")
reimported = "See [@k1]."
report = compare(original, reimported)
assert any("reference-entry" in d for d in report.degraded)
def test_unresolvable_citation_emits_warning(self) -> None:
"""Missing citation in reimported emits citation-ambiguity warning."""
from markidocx.bibliography import compare_citations
from markidocx.errors import WarningRecord
original = "See [@missing]."
reimported = "See something."
preserved: list[str] = []
degraded: list[str] = []
broken: list[str] = []
warning_records: list[WarningRecord] = []
compare_citations(original, reimported, preserved, degraded, broken, warning_records)
ambiguity = [w for w in warning_records if w.reason == "citation-ambiguity"]
assert ambiguity, "Expected citation-ambiguity warning"
assert ambiguity[0].construct == "@missing"
# ---------------------------------------------------------------------------
# Single citation round-trip
# ---------------------------------------------------------------------------
class TestCitationRoundTrip:
def test_single_citation_roundtrip(self, tmp_path: Path) -> None:
from markidocx.builder import build_document
from markidocx.differ import compare
from markidocx.importer import import_document
from markidocx.manifest import load_manifest
md = textwrap.dedent("""\
# Introduction
According to [@smith2020], things are good.
## References
- [@smith2020]: Smith, J. *Good Stuff*. 2020.
""")
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
build_result = build_document(m)
assert build_result.success
import_result = import_document(m, build_result.output_path)
assert import_result.success
reimported = import_result.output_files[0].read_text(encoding="utf-8")
report = compare(md, reimported)
broken_citations = [b for b in report.broken if "citation" in b]
assert not broken_citations, f"Broken citations: {broken_citations}"
def test_multi_citation_document(self, tmp_path: Path) -> None:
from markidocx.builder import build_document
from markidocx.importer import import_document
from markidocx.manifest import load_manifest
md = textwrap.dedent("""\
# Paper
First point from [@a2020]. Second from [@b2021].
## References
- [@a2020]: A. *Work A*. 2020.
- [@b2021]: B. *Work B*. 2021.
""")
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
build_result = build_document(m)
assert build_result.success
import_result = import_document(m, build_result.output_path)
assert import_result.success
reimported = import_result.output_files[0].read_text(encoding="utf-8")
assert "a2020" in reimported
assert "b2021" in reimported

View File

@@ -0,0 +1,231 @@
"""Tests for LEVEL3 auto-diagram support (FR-533, FR-534)."""
from __future__ import annotations
import textwrap
from pathlib import Path
LEVEL3_MANIFEST = textwrap.dedent("""\
project:
name: diag-test
feature_level: level3
family: article
sources:
- path: doc.md
output:
dir: ./dist
""")
def _make_project(tmp_path: Path, markdown: str) -> Path:
(tmp_path / "doc.md").write_text(markdown, encoding="utf-8")
(tmp_path / "manifest.yaml").write_text(LEVEL3_MANIFEST, encoding="utf-8")
return tmp_path
# ---------------------------------------------------------------------------
# diagrams module helpers
# ---------------------------------------------------------------------------
class TestDiagramHelpers:
def test_is_diagram_info_mermaid(self) -> None:
from markidocx.diagrams import is_diagram_info
assert is_diagram_info("mermaid")
def test_is_diagram_info_graphviz(self) -> None:
from markidocx.diagrams import is_diagram_info
assert is_diagram_info("graphviz")
def test_is_diagram_info_plantuml(self) -> None:
from markidocx.diagrams import is_diagram_info
assert is_diagram_info("plantuml")
def test_is_diagram_info_python_false(self) -> None:
from markidocx.diagrams import is_diagram_info
assert not is_diagram_info("python")
assert not is_diagram_info("")
assert not is_diagram_info(None)
def test_is_diagram_source_marker(self) -> None:
from markidocx.diagrams import is_diagram_source_marker
assert is_diagram_source_marker("diagram-source:mermaid\ngraph TD\nA-->B")
assert not is_diagram_source_marker("normal text")
def test_parse_diagram_source_marker(self) -> None:
from markidocx.diagrams import parse_diagram_source_marker
source = "graph TD\nA-->B"
result = parse_diagram_source_marker(f"diagram-source:mermaid\n{source}")
assert result is not None
diagram_type, parsed_source = result
assert diagram_type == "mermaid"
assert parsed_source == source
def test_reconstruct_diagram_md(self) -> None:
from markidocx.diagrams import reconstruct_diagram_md
result = reconstruct_diagram_md("mermaid", "graph TD\nA-->B")
assert result.startswith("```mermaid")
assert "graph TD" in result
assert result.endswith("```")
# ---------------------------------------------------------------------------
# Builder: diagram blocks → source-only path (no renderer in test env) (FR-533)
# ---------------------------------------------------------------------------
class TestBuilderDiagrams:
def test_build_with_mermaid_block_succeeds(self, tmp_path: Path) -> None:
"""Mermaid block builds without error (source-only path)."""
from markidocx.builder import build_document
from markidocx.manifest import load_manifest
md = textwrap.dedent("""\
# Document
```mermaid
graph TD
A --> B --> C
```
Some text.
""")
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
result = build_document(m)
assert result.success
def test_build_emits_warning_for_unavailable_renderer(
self, tmp_path: Path, monkeypatch
) -> None:
"""Warns about missing diagram renderer (FR-538)."""
import shutil
from markidocx.builder import build_document
from markidocx.manifest import load_manifest
monkeypatch.setattr(shutil, "which", lambda _cmd: None)
md = "```mermaid\ngraph TD\nA-->B\n```"
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
result = build_document(m)
assert result.success
dep_warnings = [
w for w in result.warning_records
if w.reason == "processor-dependency-unavailable"
]
assert dep_warnings
def test_build_docx_contains_source_marker(
self, tmp_path: Path, monkeypatch
) -> None:
"""DOCX contains diagram-source marker for round-trip."""
import shutil
from docx import Document as DocxReader
from markidocx.builder import build_document
from markidocx.manifest import load_manifest
monkeypatch.setattr(shutil, "which", lambda _cmd: None)
md = "```mermaid\ngraph TD\nA-->B\n```"
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
result = build_document(m)
assert result.success
doc = DocxReader(str(result.output_path))
texts = [p.text for p in doc.paragraphs]
marker_texts = [t for t in texts if t.startswith("diagram-source:")]
assert marker_texts, f"No diagram-source marker found. Paragraphs: {texts}"
def test_build_graphviz_block_succeeds(self, tmp_path: Path) -> None:
from markidocx.builder import build_document
from markidocx.manifest import load_manifest
md = "```graphviz\ndigraph G { A -> B }\n```"
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
result = build_document(m)
assert result.success
def test_non_diagram_code_block_not_warned(
self, tmp_path: Path
) -> None:
"""Python code blocks don't trigger diagram warnings."""
from markidocx.builder import build_document
from markidocx.manifest import load_manifest
md = "```python\nprint('hello')\n```"
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
result = build_document(m)
dep_warnings = [
w for w in result.warning_records
if w.reason == "processor-dependency-unavailable"
]
# Only level3 diagram types trigger this warning, not python
# (may still warn for mmdc/dot if level3 partial check fires, but not for python block)
python_warnings = [w for w in dep_warnings if "python" in w.construct]
assert not python_warnings
# ---------------------------------------------------------------------------
# Importer: diagram source-intent marker → fenced block (FR-534)
# ---------------------------------------------------------------------------
class TestImporterDiagrams:
def test_roundtrip_source_only_path(self, tmp_path: Path, monkeypatch) -> None:
"""Source-only round-trip: diagram source is preserved in reimported MD."""
import shutil
from markidocx.builder import build_document
from markidocx.importer import import_document
from markidocx.manifest import load_manifest
monkeypatch.setattr(shutil, "which", lambda _cmd: None)
diagram_source = "graph TD\nA --> B --> C"
md = f"# Document\n\n```mermaid\n{diagram_source}\n```\n\nText."
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
build_result = build_document(m)
assert build_result.success
import_result = import_document(m, build_result.output_path)
assert import_result.success
reimported = import_result.output_files[0].read_text(encoding="utf-8")
assert "mermaid" in reimported
assert "graph TD" in reimported
def test_no_source_discarded(self, tmp_path: Path, monkeypatch) -> None:
"""Diagram source is never silently dropped (FR-1205)."""
import shutil
from markidocx.builder import build_document
from markidocx.importer import import_document
from markidocx.manifest import load_manifest
monkeypatch.setattr(shutil, "which", lambda _cmd: None)
md = "```plantuml\n@startuml\nAlice -> Bob: Hi\n@enduml\n```"
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
build_result = build_document(m)
assert build_result.success
import_result = import_document(m, build_result.output_path)
assert import_result.success
reimported = import_result.output_files[0].read_text(encoding="utf-8")
# Source content must be present somewhere in the reimported text
assert "plantuml" in reimported or "@startuml" in reimported

View File

@@ -0,0 +1,342 @@
"""Tests for LEVEL3 numbered figure support (FR-532, FR-541)."""
from __future__ import annotations
import textwrap
from pathlib import Path
LEVEL3_MANIFEST = textwrap.dedent("""\
project:
name: fig-test
feature_level: level3
family: article
sources:
- path: doc.md
output:
dir: ./dist
""")
def _make_project(tmp_path: Path, markdown: str) -> Path:
(tmp_path / "doc.md").write_text(markdown, encoding="utf-8")
(tmp_path / "manifest.yaml").write_text(LEVEL3_MANIFEST, encoding="utf-8")
return tmp_path
# ---------------------------------------------------------------------------
# figures module helpers
# ---------------------------------------------------------------------------
class TestFigureHelpers:
def test_is_figure_paragraph_true(self) -> None:
from markidocx.figures import is_figure_paragraph
assert is_figure_paragraph("![My Caption](img/photo.png){#fig:photo}")
def test_is_figure_paragraph_false(self) -> None:
from markidocx.figures import is_figure_paragraph
assert not is_figure_paragraph("Normal paragraph text.")
assert not is_figure_paragraph("![alt](img.png)") # no {#fig:} label
def test_parse_figure(self) -> None:
from markidocx.figures import parse_figure
result = parse_figure("![Architecture Diagram](arch.png){#fig:arch}")
assert result is not None
caption, path, label = result
assert caption == "Architecture Diagram"
assert path == "arch.png"
assert label == "fig:arch"
def test_extract_figures_from_md(self) -> None:
from markidocx.figures import extract_figures_from_md
md = textwrap.dedent("""\
# Title
Some text.
![Figure One](fig1.png){#fig:f1}
More text.
![Figure Two](fig2.png){#fig:f2}
""")
figs = extract_figures_from_md(md)
assert len(figs) == 2
assert figs[0] == ("Figure One", "fig1.png", "fig:f1")
assert figs[1] == ("Figure Two", "fig2.png", "fig:f2")
def test_extract_figure_labels(self) -> None:
from markidocx.figures import extract_figure_labels
md = "![Cap1](a.png){#fig:f1}\n\n![Cap2](b.png){#fig:f2}"
labels = extract_figure_labels(md)
assert labels == {"fig:f1", "fig:f2"}
def test_is_caption_paragraph(self) -> None:
from markidocx.figures import is_caption_paragraph
assert is_caption_paragraph("Figure 1 — My Caption")
assert is_caption_paragraph("Figure 3 - Another Caption")
assert not is_caption_paragraph("Some normal text")
def test_reconstruct_figure_md(self) -> None:
from markidocx.figures import reconstruct_figure_md
result = reconstruct_figure_md("My Caption", "img/photo.png", "fig:photo")
assert result == "![My Caption](img/photo.png){#fig:photo}"
# ---------------------------------------------------------------------------
# Builder: figure declaration → DOCX caption paragraph (FR-532)
# ---------------------------------------------------------------------------
class TestBuilderFigures:
def test_build_with_figure_succeeds(self, tmp_path: Path) -> None:
from markidocx.builder import build_document
from markidocx.manifest import load_manifest
md = textwrap.dedent("""\
# Document {#doc}
Introduction.
![Architecture Diagram](arch.png){#fig:arch}
More text.
""")
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
result = build_document(m)
assert result.success
assert result.output_path.exists()
def test_build_docx_contains_figure_caption(self, tmp_path: Path) -> None:
"""The built DOCX should contain a caption paragraph with 'Figure 1'."""
from docx import Document as DocxReader
from markidocx.builder import build_document
from markidocx.manifest import load_manifest
md = "![My Diagram](diag.png){#fig:diag}\n\nSome text."
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
result = build_document(m)
assert result.success
doc = DocxReader(str(result.output_path))
texts = [p.text for p in doc.paragraphs]
caption_paras = [t for t in texts if t.startswith("Figure 1")]
assert caption_paras, f"No 'Figure 1' caption found. Paragraphs: {texts}"
def test_multiple_figures_numbered_sequentially(self, tmp_path: Path) -> None:
"""Multiple figures get Figure 1, Figure 2, Figure 3."""
from docx import Document as DocxReader
from markidocx.builder import build_document
from markidocx.manifest import load_manifest
md = textwrap.dedent("""\
# Doc
![First](a.png){#fig:a}
Some text.
![Second](b.png){#fig:b}
More text.
![Third](c.png){#fig:c}
""")
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
result = build_document(m)
assert result.success
doc = DocxReader(str(result.output_path))
texts = [p.text for p in doc.paragraphs]
assert any("Figure 1" in t for t in texts)
assert any("Figure 2" in t for t in texts)
assert any("Figure 3" in t for t in texts)
def test_figure_not_activated_for_level1(self, tmp_path: Path) -> None:
"""LEVEL1: figure syntax is not stripped (no caption paragraphs added)."""
from docx import Document as DocxReader
from markidocx.builder import build_document
from markidocx.manifest import load_manifest
manifest_yaml = textwrap.dedent("""\
project:
name: l1-fig
feature_level: level1
family: article
sources:
- path: doc.md
output:
dir: ./dist
""")
(tmp_path / "doc.md").write_text(
"# Title\n\n![My Diagram](diag.png){#fig:diag}", encoding="utf-8"
)
(tmp_path / "manifest.yaml").write_text(manifest_yaml, encoding="utf-8")
m = load_manifest(tmp_path / "manifest.yaml")
result = build_document(m)
assert result.success
doc = DocxReader(str(result.output_path))
texts = [p.text for p in doc.paragraphs]
# No "Figure N" captions in LEVEL1 output
assert not any(t.startswith("Figure ") for t in texts)
# ---------------------------------------------------------------------------
# Importer: DOCX caption paragraphs → figure markdown (FR-532)
# ---------------------------------------------------------------------------
class TestImporterFigures:
def test_roundtrip_preserves_figure_caption(self, tmp_path: Path) -> None:
from markidocx.builder import build_document
from markidocx.importer import import_document
from markidocx.manifest import load_manifest
md = "# Title\n\n![Architecture](arch.png){#fig:arch}\n\nText."
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
build_result = build_document(m)
assert build_result.success
import_result = import_document(m, build_result.output_path)
assert import_result.success
reimported = import_result.output_files[0].read_text(encoding="utf-8")
assert "Architecture" in reimported
assert "fig:arch" in reimported
def test_roundtrip_preserves_figure_label(self, tmp_path: Path) -> None:
from markidocx.builder import build_document
from markidocx.importer import import_document
from markidocx.manifest import load_manifest
md = "![Cap](img.png){#fig:myimg}\n\nText."
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
build_result = build_document(m)
assert build_result.success
import_result = import_document(m, build_result.output_path)
assert import_result.success
reimported = import_result.output_files[0].read_text(encoding="utf-8")
assert "{#fig:myimg}" in reimported
# ---------------------------------------------------------------------------
# Differ: figure identity coherence (FR-541)
# ---------------------------------------------------------------------------
class TestDifferFigures:
def test_preserved_figure_label(self) -> None:
from markidocx.differ import compare
text = "# Title\n\n![Cap](img.png){#fig:img}\n\nText."
report = compare(text, text)
assert any("figure-label:fig:img" in p for p in report.preserved)
def test_missing_figure_label_broken(self) -> None:
from markidocx.differ import compare
original = "![Cap](img.png){#fig:img}\n\nText."
reimported = "Text."
report = compare(original, reimported)
assert any("figure-label:missing 'fig:img'" in b for b in report.broken)
assert report.has_drift
def test_missing_caption_degraded(self) -> None:
from markidocx.differ import compare
original = "![My Caption](img.png){#fig:img}"
reimported = "![Different Caption](img.png){#fig:img}"
report = compare(original, reimported)
assert any("figure-caption" in d for d in report.degraded)
def test_preserved_caption(self) -> None:
from markidocx.differ import compare
text = "![Same Caption](img.png){#fig:img}"
report = compare(text, text)
assert any("figure-caption" in p for p in report.preserved)
# ---------------------------------------------------------------------------
# Full figure round-trip
# ---------------------------------------------------------------------------
class TestFigureRoundTrip:
def test_single_figure_roundtrip(self, tmp_path: Path) -> None:
from markidocx.builder import build_document
from markidocx.differ import compare
from markidocx.importer import import_document
from markidocx.manifest import load_manifest
md = textwrap.dedent("""\
# Document
Introduction.
![System Architecture](arch.png){#fig:arch}
Conclusion.
""")
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
build_result = build_document(m)
assert build_result.success
import_result = import_document(m, build_result.output_path)
assert import_result.success
reimported = import_result.output_files[0].read_text(encoding="utf-8")
report = compare(md, reimported)
# No broken figures
broken_figs = [b for b in report.broken if "figure" in b]
assert not broken_figs, f"Broken figures found: {broken_figs}"
def test_multiple_figures_identity_coherent(self, tmp_path: Path) -> None:
"""Multiple figures survive round-trip with correct labels."""
from markidocx.builder import build_document
from markidocx.importer import import_document
from markidocx.manifest import load_manifest
md = textwrap.dedent("""\
# Doc
![Figure One Caption](fig1.png){#fig:f1}
Text between figures.
![Figure Two Caption](fig2.png){#fig:f2}
""")
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
build_result = build_document(m)
assert build_result.success
import_result = import_document(m, build_result.output_path)
assert import_result.success
reimported = import_result.output_files[0].read_text(encoding="utf-8")
assert "{#fig:f1}" in reimported
assert "{#fig:f2}" in reimported

View File

@@ -0,0 +1,271 @@
"""Tests for LEVEL3 plumbing — feature-level gating & disclosure (FR-537539)."""
from __future__ import annotations
import textwrap
from pathlib import Path
from markidocx.level3 import (
Level3Support,
ProcessorDependency,
capabilities_entry,
check_level3_support,
)
from markidocx.manifest import FeatureLevel, load_manifest
# ---------------------------------------------------------------------------
# Level3 support detection (FR-537, FR-538)
# ---------------------------------------------------------------------------
class TestCheckLevel3Support:
def test_returns_level3_support(self) -> None:
support = check_level3_support()
assert isinstance(support, Level3Support)
def test_always_available(self) -> None:
support = check_level3_support()
assert support.available is True
def test_dependencies_are_processor_dependency_instances(self) -> None:
support = check_level3_support()
for dep in support.dependencies:
assert isinstance(dep, ProcessorDependency)
assert dep.name in ("mmdc", "dot", "plantuml")
assert isinstance(dep.available, bool)
assert dep.description
def test_partial_when_no_diagram_tools(self, monkeypatch) -> None:
"""When no diagram tool is found, partial=True and missing_coverage is populated."""
import shutil
monkeypatch.setattr(shutil, "which", lambda _cmd: None)
support = check_level3_support()
assert support.partial is True
assert len(support.missing_coverage) > 0
assert any("diagram" in m for m in support.missing_coverage)
def test_not_partial_when_diagram_tool_present(self, monkeypatch) -> None:
"""When at least one diagram tool is found, partial=False."""
import shutil
def fake_which(cmd: str) -> str | None:
return "/usr/bin/mmdc" if cmd == "mmdc" else None
monkeypatch.setattr(shutil, "which", fake_which)
support = check_level3_support()
assert support.partial is False
assert support.missing_coverage == []
# ---------------------------------------------------------------------------
# capabilities_entry (FR-537)
# ---------------------------------------------------------------------------
class TestCapabilitiesEntry:
def test_returns_dict_with_level(self) -> None:
entry = capabilities_entry()
assert entry["level"] == "level3"
def test_available_is_true(self) -> None:
entry = capabilities_entry()
assert entry["available"] is True
def test_has_dependencies_list(self) -> None:
entry = capabilities_entry()
assert isinstance(entry["dependencies"], list)
for dep in entry["dependencies"]:
assert "name" in dep
assert "available" in dep
assert "description" in dep
def test_has_partial_and_missing_coverage(self) -> None:
entry = capabilities_entry()
assert "partial" in entry
assert "missing_coverage" in entry
# ---------------------------------------------------------------------------
# Manifest accepts feature_level: level3 (FR-537)
# ---------------------------------------------------------------------------
class TestManifestLevel3:
def test_level3_accepted(self, tmp_path: Path) -> None:
(tmp_path / "doc.md").write_text("# Hello", encoding="utf-8")
(tmp_path / "manifest.yaml").write_text(
textwrap.dedent("""\
project:
name: test
feature_level: level3
family: article
sources:
- path: doc.md
output:
dir: ./dist
"""),
encoding="utf-8",
)
m = load_manifest(tmp_path / "manifest.yaml")
assert m.project.feature_level == FeatureLevel.LEVEL3
def test_level3_routes_to_level3_processing(self, tmp_path: Path) -> None:
"""Building with feature_level: level3 succeeds (processing path reached)."""
from markidocx.builder import build_document
(tmp_path / "doc.md").write_text("# Hello\n\nContent.", encoding="utf-8")
(tmp_path / "manifest.yaml").write_text(
textwrap.dedent("""\
project:
name: test-l3
feature_level: level3
family: article
sources:
- path: doc.md
output:
dir: ./dist
"""),
encoding="utf-8",
)
m = load_manifest(tmp_path / "manifest.yaml")
result = build_document(m)
assert result.success
assert result.feature_level == "level3"
# ---------------------------------------------------------------------------
# partial_level3 flag and processor-dependency disclosure (FR-538, FR-539)
# ---------------------------------------------------------------------------
class TestPartialLevel3Flag:
def test_partial_level3_set_when_no_diagram_tools(
self, tmp_path: Path, monkeypatch
) -> None:
import shutil
from markidocx.builder import build_document
monkeypatch.setattr(shutil, "which", lambda _cmd: None)
(tmp_path / "doc.md").write_text("# Hello\n\nContent.", encoding="utf-8")
(tmp_path / "manifest.yaml").write_text(
textwrap.dedent("""\
project:
name: test-partial
feature_level: level3
family: article
sources:
- path: doc.md
output:
dir: ./dist
"""),
encoding="utf-8",
)
m = load_manifest(tmp_path / "manifest.yaml")
result = build_document(m)
assert result.success
assert result.partial_level3 is True
assert len(result.missing_coverage) > 0
def test_partial_level3_false_for_level1(self, tmp_path: Path) -> None:
from markidocx.builder import build_document
(tmp_path / "doc.md").write_text("# Hello\n\nContent.", encoding="utf-8")
(tmp_path / "manifest.yaml").write_text(
textwrap.dedent("""\
project:
name: test-l1
feature_level: level1
family: article
sources:
- path: doc.md
output:
dir: ./dist
"""),
encoding="utf-8",
)
m = load_manifest(tmp_path / "manifest.yaml")
result = build_document(m)
assert result.partial_level3 is False
assert result.missing_coverage == []
def test_dependency_warning_emitted_for_unavailable_tool(
self, tmp_path: Path, monkeypatch
) -> None:
import shutil
from markidocx.builder import build_document
from markidocx.errors import Severity
monkeypatch.setattr(shutil, "which", lambda _cmd: None)
(tmp_path / "doc.md").write_text("# Hello", encoding="utf-8")
(tmp_path / "manifest.yaml").write_text(
textwrap.dedent("""\
project:
name: t
feature_level: level3
family: article
sources:
- path: doc.md
output:
dir: ./dist
"""),
encoding="utf-8",
)
m = load_manifest(tmp_path / "manifest.yaml")
result = build_document(m)
dep_warnings = [
w for w in result.warning_records
if w.reason == "processor-dependency-unavailable"
]
assert dep_warnings, "Expected processor-dependency-unavailable warning"
assert all(w.severity == Severity.WARNING for w in dep_warnings)
# ---------------------------------------------------------------------------
# REST capabilities includes level3 (FR-537)
# ---------------------------------------------------------------------------
class TestRestCapabilitiesLevel3:
def test_capabilities_includes_level3(self) -> None:
from fastapi.testclient import TestClient
from markidocx.rest import create_app
client = TestClient(create_app())
resp = client.get("/capabilities")
assert resp.status_code == 200
body = resp.json()
outputs = body["outputs"]
assert "level3" in outputs
assert outputs["level3"]["level"] == "level3"
assert outputs["level3"]["available"] is True
assert "dependencies" in outputs["level3"]
# ---------------------------------------------------------------------------
# MCP validate_project includes level3 in context (FR-537)
# ---------------------------------------------------------------------------
class TestMcpLevel3:
def test_validate_project_includes_level3(self) -> None:
from markidocx.mcp_server import validate_project
manifest_yaml = textwrap.dedent("""\
project:
name: test
feature_level: level3
family: article
sources:
- path: doc.md
output:
dir: ./dist
""")
result = validate_project(manifest_yaml)
assert result["status"] == "ok"
assert result["feature_level"] == "level3"
assert "level3" in result["context"]
assert result["context"]["level3"]["available"] is True

326
tests/test_level3_xref.py Normal file
View File

@@ -0,0 +1,326 @@
"""Tests for LEVEL3 cross-reference support (FR-531, FR-540)."""
from __future__ import annotations
import textwrap
from pathlib import Path
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
LEVEL3_MANIFEST = textwrap.dedent("""\
project:
name: xref-test
feature_level: level3
family: article
sources:
- path: doc.md
output:
dir: ./dist
""")
def _make_project(tmp_path: Path, markdown: str, manifest_yaml: str = LEVEL3_MANIFEST) -> Path:
(tmp_path / "doc.md").write_text(markdown, encoding="utf-8")
(tmp_path / "manifest.yaml").write_text(manifest_yaml, encoding="utf-8")
return tmp_path
# ---------------------------------------------------------------------------
# xref module helpers
# ---------------------------------------------------------------------------
class TestXrefHelpers:
def test_extract_anchor_from_heading_plain(self) -> None:
from markidocx.xref import extract_anchor_from_heading
clean, anchor = extract_anchor_from_heading("Introduction {#intro}")
assert clean == "Introduction"
assert anchor == "intro"
def test_extract_anchor_from_heading_no_anchor(self) -> None:
from markidocx.xref import extract_anchor_from_heading
clean, anchor = extract_anchor_from_heading("Introduction")
assert clean == "Introduction"
assert anchor is None
def test_extract_anchors_from_text(self) -> None:
from markidocx.xref import extract_anchors
text = "# Section {#sec1}\n\n## Subsection {#sec2}\n\nNormal."
anchors = extract_anchors(text)
assert anchors == {"sec1", "sec2"}
def test_extract_xref_links(self) -> None:
from markidocx.xref import extract_xref_links
text = "See [Section One][sec1] and [Section Two][sec2]."
links = extract_xref_links(text)
assert ("Section One", "sec1") in links
assert ("Section Two", "sec2") in links
def test_has_xref_links_true(self) -> None:
from markidocx.xref import has_xref_links
assert has_xref_links("See [Intro][intro] for details.")
def test_has_xref_links_false(self) -> None:
from markidocx.xref import has_xref_links
assert not has_xref_links("Normal paragraph without refs.")
# ---------------------------------------------------------------------------
# Builder: headings with anchors → DOCX bookmarks (FR-531)
# ---------------------------------------------------------------------------
class TestBuilderXref:
def test_build_with_anchor_succeeds(self, tmp_path: Path) -> None:
from markidocx.builder import build_document
from markidocx.manifest import load_manifest
md = "# Introduction {#intro}\n\nSome text.\n\n## Section One {#sec1}\n\nContent."
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
result = build_document(m)
assert result.success
assert result.output_path.exists()
def test_build_docx_contains_bookmark(self, tmp_path: Path) -> None:
"""The built DOCX XML should contain a bookmarkStart for {#intro}."""
from docx import Document as DocxReader
from markidocx.builder import build_document
from markidocx.manifest import load_manifest
md = "# Introduction {#intro}\n\nContent."
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
result = build_document(m)
assert result.success
doc = DocxReader(str(result.output_path))
_W = "http://schemas.openxmlformats.org/wordprocessingml/2006/main"
bookmarks = [
elem.get(f"{{{_W}}}name")
for elem in doc.element.body.iter(f"{{{_W}}}bookmarkStart")
if elem.get(f"{{{_W}}}name") and not elem.get(f"{{{_W}}}name", "").startswith("_")
]
assert "intro" in bookmarks
def test_build_with_cross_ref_link(self, tmp_path: Path) -> None:
"""Cross-ref links [text][anchor] render without errors."""
from markidocx.builder import build_document
from markidocx.manifest import load_manifest
md = textwrap.dedent("""\
# Introduction {#intro}
Some text.
# Methodology {#method}
See [Introduction][intro] for background.
""")
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
result = build_document(m)
assert result.success
assert result.output_path.exists()
def test_build_xref_not_activated_for_level1(self, tmp_path: Path) -> None:
"""Level1 build: {#anchor} syntax is treated as literal heading text."""
from markidocx.builder import build_document
from markidocx.manifest import load_manifest
manifest_yaml = textwrap.dedent("""\
project:
name: l1-test
feature_level: level1
family: article
sources:
- path: doc.md
output:
dir: ./dist
""")
# In LEVEL1, {#anchor} is not stripped and no bookmark is added
md = "# Introduction {#intro}\n\nContent."
_make_project(tmp_path, md, manifest_yaml)
m = load_manifest(tmp_path / "manifest.yaml")
result = build_document(m)
assert result.success
# No cross-ref warnings
xref_warnings = [w for w in result.warning_records if "xref" in w.reason.lower()]
assert not xref_warnings
# ---------------------------------------------------------------------------
# Importer: DOCX bookmarks → {#anchor} labels (FR-531)
# ---------------------------------------------------------------------------
class TestImporterXref:
def test_roundtrip_preserves_anchor(self, tmp_path: Path) -> None:
"""Build LEVEL3 doc with {#anchor}, import back → heading has {#anchor}."""
from markidocx.builder import build_document
from markidocx.importer import import_document
from markidocx.manifest import load_manifest
md = "# Introduction {#intro}\n\nSome text."
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
build_result = build_document(m)
assert build_result.success
import_result = import_document(m, build_result.output_path)
assert import_result.success
reimported = import_result.output_files[0].read_text(encoding="utf-8")
assert "{#intro}" in reimported
def test_roundtrip_preserves_cross_ref_link(self, tmp_path: Path) -> None:
"""Cross-ref link [text][anchor] survives a round trip."""
from markidocx.builder import build_document
from markidocx.importer import import_document
from markidocx.manifest import load_manifest
md = textwrap.dedent("""\
# Introduction {#intro}
Some intro text.
# Methodology {#method}
See [Introduction][intro] for background.
""")
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
build_result = build_document(m)
assert build_result.success
import_result = import_document(m, build_result.output_path)
assert import_result.success
reimported = import_result.output_files[0].read_text(encoding="utf-8")
assert "{#intro}" in reimported
assert "[Introduction][intro]" in reimported
# ---------------------------------------------------------------------------
# Differ: cross-ref detection (FR-540)
# ---------------------------------------------------------------------------
class TestDifferXref:
def test_preserved_anchor_reported(self) -> None:
from markidocx.differ import compare
original = "# Introduction {#intro}\n\nText."
reimported = "# Introduction {#intro}\n\nText."
report = compare(original, reimported)
assert any("xref-anchor:intro" in p for p in report.preserved)
assert not any("xref-anchor" in b for b in report.broken)
def test_missing_anchor_reported_as_broken(self) -> None:
from markidocx.differ import compare
original = "# Introduction {#intro}\n\nText."
reimported = "# Introduction\n\nText."
report = compare(original, reimported)
assert any("xref-anchor:missing 'intro'" in b for b in report.broken)
assert report.has_drift
def test_preserved_xref_link(self) -> None:
from markidocx.differ import compare
text = "# Intro {#intro}\n\nSee [Intro][intro]."
report = compare(text, text)
assert any("xref-link" in p for p in report.preserved)
def test_broken_xref_link_target_missing(self) -> None:
from markidocx.differ import compare
original = "# Intro {#intro}\n\nSee [Intro][intro]."
reimported = "# Intro\n\nSee something."
report = compare(original, reimported)
# anchor missing → broken xref link
broken_xref = [b for b in report.broken if "xref" in b]
assert broken_xref
# ---------------------------------------------------------------------------
# Full single-file xref round-trip
# ---------------------------------------------------------------------------
class TestXrefRoundTrip:
def test_single_file_xref_roundtrip(self, tmp_path: Path) -> None:
from markidocx.builder import build_document
from markidocx.differ import compare
from markidocx.importer import import_document
from markidocx.manifest import load_manifest
md = textwrap.dedent("""\
# Introduction {#intro}
Welcome.
# Background {#bg}
See [Introduction][intro] and [Background][bg].
""")
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
build_result = build_document(m)
assert build_result.success
import_result = import_document(m, build_result.output_path)
assert import_result.success
reimported = import_result.output_files[0].read_text(encoding="utf-8")
report = compare(md, reimported)
# No broken cross-refs
broken_xrefs = [b for b in report.broken if "xref" in b]
assert not broken_xrefs, f"Broken xrefs found: {broken_xrefs}"
def test_multi_ref_document(self, tmp_path: Path) -> None:
"""Document with multiple anchors and refs doesn't produce broken xrefs."""
from markidocx.builder import build_document
from markidocx.importer import import_document
from markidocx.manifest import load_manifest
md = textwrap.dedent("""\
# Chapter One {#ch1}
Opening.
# Chapter Two {#ch2}
See [Chapter One][ch1].
# Chapter Three {#ch3}
Refers to [Chapter One][ch1] and [Chapter Two][ch2].
""")
_make_project(tmp_path, md)
m = load_manifest(tmp_path / "manifest.yaml")
build_result = build_document(m)
assert build_result.success
import_result = import_document(m, build_result.output_path)
assert import_result.success
reimported = import_result.output_files[0].read_text(encoding="utf-8")
# All three anchors should be in reimported
assert "{#ch1}" in reimported
assert "{#ch2}" in reimported
assert "{#ch3}" in reimported