feat(schema): add semantic schema generation as default mode
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled

schema-generate now builds content-aware schemas from the document's
section hierarchy instead of counting markdown syntax elements. Detects
key-value tables, data tables, link lists, and mixed content patterns
to produce schemas that reflect the actual document outline.

Old behavior preserved via --mode syntactic. Validator and visualization
tools pinned to syntactic mode for compatibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-16 18:49:50 +01:00
parent 120ed89780
commit 60f33443ae
8 changed files with 408 additions and 55 deletions

View File

@@ -50,8 +50,8 @@ Some text here.
temp_file = Path(f.name)
try:
# Act - Generate schema with unlimited depth
result = self.schema_generator.generate_schema_from_file(temp_file)
# Act - Generate schema in syntactic mode (element counting)
result = self.schema_generator.generate_schema_from_file(temp_file, mode='syntactic')
# Assert - Schema should be valid JSON and contain expected structure
assert isinstance(result, dict)
@@ -105,8 +105,8 @@ Very deep content.
temp_file = Path(f.name)
try:
# Act - Generate schema with depth limit of 2
result = self.schema_generator.generate_schema_from_file(temp_file, max_depth=2)
# Act - Generate schema in syntactic mode with depth limit of 2
result = self.schema_generator.generate_schema_from_file(temp_file, max_depth=2, mode='syntactic')
# Assert - Only levels 1 and 2 should be included
properties = result.get("properties", {})
@@ -173,8 +173,8 @@ Some implementation notes here.
temp_file = Path(f.name)
try:
# Act - Generate schema
result = self.schema_generator.generate_schema_from_file(temp_file)
# Act - Generate schema in syntactic mode
result = self.schema_generator.generate_schema_from_file(temp_file, mode='syntactic')
# Assert - Schema should capture complex structures
properties = result.get("properties", {})