generated from coulomb/repo-seed
Initial schemas and validation with extension workplan
This commit is contained in:
@@ -47,6 +47,17 @@ consumer needs them through the new library contract:
|
|||||||
4. Treat old code as reference material; do not preserve backward compatibility unless the new contract explicitly needs it.
|
4. Treat old code as reference material; do not preserve backward compatibility unless the new contract explicitly needs it.
|
||||||
5. Keep database, platform, and domain lifecycle concerns out of this repo.
|
5. Keep database, platform, and domain lifecycle concerns out of this repo.
|
||||||
|
|
||||||
|
## Practicality Reassessment
|
||||||
|
|
||||||
|
The first implementation slices intentionally rebuilt the clean parser and JSON
|
||||||
|
Schema spine. That is necessary but not sufficient. The legacy project already
|
||||||
|
showed that heading counts and raw structural schemas have limited practical
|
||||||
|
utility.
|
||||||
|
|
||||||
|
The successor should prioritize a document contract framework before going much
|
||||||
|
deeper into generic tooling. See `docs/practical-schema-framework-research.md`
|
||||||
|
and `workplans/MKTT-WP-0004-practical-contract-framework.md`.
|
||||||
|
|
||||||
## Initial Architecture Target
|
## Initial Architecture Target
|
||||||
|
|
||||||
```text
|
```text
|
||||||
|
|||||||
323
docs/practical-schema-framework-research.md
Normal file
323
docs/practical-schema-framework-research.md
Normal file
@@ -0,0 +1,323 @@
|
|||||||
|
# Practical Schema Framework Research
|
||||||
|
|
||||||
|
Date: 2026-05-03
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
This document reassesses `markitect-tool` schema utility before further
|
||||||
|
implementation. The concern is that pure structural validation, such as heading
|
||||||
|
counts and min/max depth constraints, is rarely enough to make markdown document
|
||||||
|
pipelines useful.
|
||||||
|
|
||||||
|
The practical opportunity is to define a stronger framework for markdown-native
|
||||||
|
document contracts: section specifications, content assertions, form fields,
|
||||||
|
context-aware rules, LLM-assisted assessments, and high-quality diagnostics.
|
||||||
|
|
||||||
|
## Research Signals
|
||||||
|
|
||||||
|
### Structured Authoring
|
||||||
|
|
||||||
|
DITA is the strongest analogue for typed, reusable textual units. It emphasizes
|
||||||
|
information typing, semantic markup, modularity, reuse, interchange, and
|
||||||
|
multiple deliverables from one source. A DITA topic is the unit of authoring and
|
||||||
|
reuse; topics may be generic or specialized into roles such as concept, task, or
|
||||||
|
reference.
|
||||||
|
|
||||||
|
Relevance for `markitect-tool`:
|
||||||
|
|
||||||
|
- A markdown document or section should have an explicit information type.
|
||||||
|
- Information type should imply expected structure and reader purpose.
|
||||||
|
- Reuse and composition need stable addressing of sections, not only files.
|
||||||
|
- Specialization is a better mental model than ad hoc schema forks.
|
||||||
|
|
||||||
|
Sources:
|
||||||
|
|
||||||
|
- https://dita-lang.org/dita/archspec/base/basic-concepts
|
||||||
|
- https://dita-lang.org/dita/archspec/base/introduction-to-dita
|
||||||
|
|
||||||
|
### Document Schemas With Assertions
|
||||||
|
|
||||||
|
DocBook remains relevant because it combines formal document schemas with
|
||||||
|
Schematron-style assertions. That is the missing layer in many simplistic JSON
|
||||||
|
Schema approaches: grammar says what may exist; assertions say what must be true
|
||||||
|
in context.
|
||||||
|
|
||||||
|
Relevance for `markitect-tool`:
|
||||||
|
|
||||||
|
- JSON Schema over `Document.to_dict()` is useful but insufficient.
|
||||||
|
- We need a second assertion layer for document-specific semantics.
|
||||||
|
- Diagnostics must point to the document location and rule intention.
|
||||||
|
|
||||||
|
Source:
|
||||||
|
|
||||||
|
- https://docbook.org/schemas/docbook/
|
||||||
|
|
||||||
|
### Dynamic Form Rules
|
||||||
|
|
||||||
|
JSON Schema supports conditional validation through `dependentRequired`,
|
||||||
|
`dependentSchemas`, and `if`/`then`/`else`. JSON Forms separates data schema
|
||||||
|
from UI schema and uses rules to show, hide, enable, or disable UI elements
|
||||||
|
based on JSON Schema conditions. Form.io’s architecture treats the form schema
|
||||||
|
as a single source of truth for validation and conditional logic across client
|
||||||
|
and server.
|
||||||
|
|
||||||
|
Relevance for `markitect-tool`:
|
||||||
|
|
||||||
|
- Forms should be first-class, not bolted onto document generation.
|
||||||
|
- Field definitions need static validation and dynamic rules.
|
||||||
|
- Prefill, visibility, requiredness, and calculated values should come from the
|
||||||
|
same contract used for generation and validation.
|
||||||
|
- Context data must be explicit and typed.
|
||||||
|
|
||||||
|
Sources:
|
||||||
|
|
||||||
|
- https://json-schema.org/understanding-json-schema/reference/conditionals
|
||||||
|
- https://jsonforms.io/docs/uischema/rules/
|
||||||
|
- https://form.io/features/form-conditional-logic-form-validation/
|
||||||
|
|
||||||
|
### LLM-Assisted Assessment
|
||||||
|
|
||||||
|
Modern evaluation frameworks treat LLM assessment as explicit graders or
|
||||||
|
rubrics. OpenAI graders return scores in a 0–1 range and can combine grader
|
||||||
|
types. Promptfoo’s `llm-rubric` uses explicit criteria and expects structured
|
||||||
|
judge output with reason, score, and pass/fail.
|
||||||
|
|
||||||
|
Relevance for `markitect-tool`:
|
||||||
|
|
||||||
|
- LLM checks should be declared as assessment rules, not hidden in prompts.
|
||||||
|
- Deterministic validation and LLM assessment should produce one diagnostic
|
||||||
|
model.
|
||||||
|
- Section-level rubrics are more useful than whole-document vague grading.
|
||||||
|
- The LLM provider must remain external; `markitect-tool` defines contracts and
|
||||||
|
reports.
|
||||||
|
|
||||||
|
Sources:
|
||||||
|
|
||||||
|
- https://developers.openai.com/api/docs/guides/graders
|
||||||
|
- https://www.promptfoo.dev/docs/configuration/expected-outputs/model-graded/llm-rubric/
|
||||||
|
|
||||||
|
### Markdown Structure
|
||||||
|
|
||||||
|
CommonMark gives markdown a well-defined block/inline model. mdast gives a
|
||||||
|
language-neutral tree vocabulary for Markdown nodes. Both point toward keeping
|
||||||
|
the parse layer separate from domain/schema layers.
|
||||||
|
|
||||||
|
Relevance for `markitect-tool`:
|
||||||
|
|
||||||
|
- The core document model should stay close to CommonMark/mdast concepts.
|
||||||
|
- Practical document contracts should sit above the parse model.
|
||||||
|
- Section addressing, source spans, and block identity are foundational for good
|
||||||
|
diagnostics.
|
||||||
|
|
||||||
|
Sources:
|
||||||
|
|
||||||
|
- https://spec.commonmark.org/0.31.2/
|
||||||
|
- https://github.com/syntax-tree/mdast
|
||||||
|
|
||||||
|
## Terminology Proposal
|
||||||
|
|
||||||
|
| Term | Meaning |
|
||||||
|
| --- | --- |
|
||||||
|
| Document | A markdown artifact parsed into frontmatter, blocks, headings, sections, and source spans. |
|
||||||
|
| Section | A heading-led document region with content, children, source location, and stable identity. |
|
||||||
|
| Document Type | A named contract for a whole document, e.g. ADR, PRD, invoice letter, support reply, concept note. |
|
||||||
|
| Section Type | A reusable role for a section, e.g. Context, Decision, Risks, Procedure, Evidence, Conclusion. |
|
||||||
|
| Field | A typed value expected in frontmatter, inline matter, a section, or an external data record. |
|
||||||
|
| Form | A field collection with UI hints, validation rules, defaults, dynamic visibility, and calculations. |
|
||||||
|
| Context | External data available during validation/generation, such as user data, project data, dates, or related entities. |
|
||||||
|
| Rule | A deterministic condition evaluated against document, fields, context, or pipeline state. |
|
||||||
|
| Assertion | A claim that must hold for content, usually richer than shape validation. |
|
||||||
|
| Metric Band | A soft or hard target for size/complexity, such as word count, sentence count, section count, or reading level. |
|
||||||
|
| Assessment | A deterministic or LLM-assisted evaluation that returns pass/fail, score, reason, and diagnostics. |
|
||||||
|
| Rubric | A human-readable criterion for LLM-assisted assessment, scoped to a document or section type. |
|
||||||
|
| Diagnostic | A structured finding with severity, code, message, source location, rule id, and suggested repair. |
|
||||||
|
| Contract | The full specification for a document type: structure, sections, fields, rules, forms, assertions, rubrics, and outputs. |
|
||||||
|
| Pipeline | A repeatable sequence of parse, prefill, generate, validate, assess, transform, and compose operations. |
|
||||||
|
|
||||||
|
## Most Relevant Use Cases
|
||||||
|
|
||||||
|
### UC-001: Typed Document Contract
|
||||||
|
|
||||||
|
Define a document type such as ADR, PRD, FRS, workplan, customer letter, or
|
||||||
|
meeting brief. Specify required sections by semantic role, allowed alternatives,
|
||||||
|
field requirements, and diagnostics.
|
||||||
|
|
||||||
|
Practical value:
|
||||||
|
|
||||||
|
- Prevents missing critical content.
|
||||||
|
- Makes generated documents predictable.
|
||||||
|
- Creates an explicit contract for humans and agents.
|
||||||
|
|
||||||
|
Needed tooling:
|
||||||
|
|
||||||
|
- `mkt contract check <doc> --contract <contract.md>`
|
||||||
|
- Section matching by heading text, aliases, ids, or section type markers.
|
||||||
|
- Diagnostics that say which section/field/assertion failed and why.
|
||||||
|
|
||||||
|
### UC-002: Section-Level Content Expectations
|
||||||
|
|
||||||
|
Specify what a section is expected to contain: assertions, required evidence,
|
||||||
|
forbidden omissions, content patterns, examples, and reviewer prompts.
|
||||||
|
|
||||||
|
Practical value:
|
||||||
|
|
||||||
|
- Moves beyond “has a heading” toward “does the section do its job?”
|
||||||
|
- Enables review of generated or human-authored text.
|
||||||
|
|
||||||
|
Needed tooling:
|
||||||
|
|
||||||
|
- Deterministic assertions for regex, presence, references, counts, and field
|
||||||
|
values.
|
||||||
|
- Optional LLM rubrics for semantic content checks.
|
||||||
|
- Per-section diagnostic reports.
|
||||||
|
|
||||||
|
### UC-003: Size and Complexity Bands
|
||||||
|
|
||||||
|
Define soft/hard bands for document and section size: words, characters,
|
||||||
|
sentences, paragraphs, sections, list items, code blocks, and nesting depth.
|
||||||
|
|
||||||
|
Practical value:
|
||||||
|
|
||||||
|
- Controls generation output size.
|
||||||
|
- Keeps templates from becoming bloated or underdeveloped.
|
||||||
|
- Helps compare intended vs actual document complexity.
|
||||||
|
|
||||||
|
Needed tooling:
|
||||||
|
|
||||||
|
- Metrics extractor.
|
||||||
|
- Rule severities: info, warning, error.
|
||||||
|
- “Too small/too large” diagnostics with actual and target values.
|
||||||
|
|
||||||
|
### UC-004: Form-Backed Markdown Generation
|
||||||
|
|
||||||
|
Define forms that collect or prefill structured fields, then render markdown
|
||||||
|
documents. Fields may be static, calculated, conditional, or context-derived.
|
||||||
|
|
||||||
|
Practical value:
|
||||||
|
|
||||||
|
- Bridges structured data capture and prose generation.
|
||||||
|
- Supports repeatable business documents.
|
||||||
|
- Makes prefill from user/project/entity data explicit.
|
||||||
|
|
||||||
|
Needed tooling:
|
||||||
|
|
||||||
|
- Field schema.
|
||||||
|
- UI schema or form hints.
|
||||||
|
- Dynamic rules for requiredness, visibility, defaults, and calculations.
|
||||||
|
- Template rendering with validation before and after render.
|
||||||
|
|
||||||
|
### UC-005: Context-Aware Validation
|
||||||
|
|
||||||
|
Validate a document against external context: user data, project metadata,
|
||||||
|
related entities, dates, policy constraints, or canonical terminology.
|
||||||
|
|
||||||
|
Practical value:
|
||||||
|
|
||||||
|
- Checks whether a document is correct for this case, not only generally
|
||||||
|
well-formed.
|
||||||
|
- Enables pipelines like personalized letters, compliance reports, and
|
||||||
|
project-specific workplans.
|
||||||
|
|
||||||
|
Needed tooling:
|
||||||
|
|
||||||
|
- Context object schema.
|
||||||
|
- Resolvers for local files, JSON/YAML data, and later higher-layer systems.
|
||||||
|
- Rule expressions that can reference document and context paths.
|
||||||
|
|
||||||
|
### UC-006: LLM-Assisted Section Assessment
|
||||||
|
|
||||||
|
Attach rubrics to section types. Use an external LLM adapter to assess whether a
|
||||||
|
section satisfies the rubric, returning score, reason, and pass/fail.
|
||||||
|
|
||||||
|
Practical value:
|
||||||
|
|
||||||
|
- Handles semantic checks that deterministic rules cannot.
|
||||||
|
- Supports review loops for generated text.
|
||||||
|
- Makes subjective requirements explicit and auditable.
|
||||||
|
|
||||||
|
Needed tooling:
|
||||||
|
|
||||||
|
- Rubric declaration format.
|
||||||
|
- Provider-neutral assessment request/response models.
|
||||||
|
- Caching and reproducibility metadata.
|
||||||
|
- Clear distinction between deterministic errors and model-judged findings.
|
||||||
|
|
||||||
|
### UC-007: Pipeline Diagnostics and Repair Guidance
|
||||||
|
|
||||||
|
Run a document pipeline and get one coherent diagnostic report from parsing,
|
||||||
|
schema checks, field validation, assertions, generation, composition, and
|
||||||
|
LLM-assisted assessments.
|
||||||
|
|
||||||
|
Practical value:
|
||||||
|
|
||||||
|
- Makes failures debuggable.
|
||||||
|
- Helps humans and agents repair documents.
|
||||||
|
- Avoids scattered errors from unrelated subsystems.
|
||||||
|
|
||||||
|
Needed tooling:
|
||||||
|
|
||||||
|
- Common diagnostic model.
|
||||||
|
- Error codes and severities.
|
||||||
|
- Source spans and rule ids.
|
||||||
|
- Suggested repair text or structured patches when safe.
|
||||||
|
|
||||||
|
## Comparison With markitect-main
|
||||||
|
|
||||||
|
`markitect-main` had several useful seeds:
|
||||||
|
|
||||||
|
- `x-markitect-sections` for required/recommended/optional/discouraged/improper sections.
|
||||||
|
- `x-markitect-content-control` for required, discouraged, and forbidden patterns plus word-count metrics.
|
||||||
|
- Section and content validators with warnings/errors.
|
||||||
|
- Schema generation and validation experiments.
|
||||||
|
- Draft generation with `x-markitect-field-mapping`.
|
||||||
|
- Prompt quality gates with schema and pattern validators.
|
||||||
|
- Infospace entity parsing and LLM classification/evaluation.
|
||||||
|
|
||||||
|
The problem was not lack of ideas. The problem was that the ideas lived in
|
||||||
|
separate subsystems with different models:
|
||||||
|
|
||||||
|
- Schema validation compared generated schemas rather than validating a stable
|
||||||
|
document contract.
|
||||||
|
- Semantic validation used `x-markitect-*` extensions but was not integrated
|
||||||
|
into a unified contract framework.
|
||||||
|
- Field mapping existed in draft generation, not in a general form/context
|
||||||
|
model.
|
||||||
|
- LLM quality gates existed inside prompt execution, not as provider-neutral
|
||||||
|
document assessments.
|
||||||
|
- Infospace checks were domain/application layer behavior, not syntax-layer
|
||||||
|
primitives.
|
||||||
|
|
||||||
|
## Strategic Direction
|
||||||
|
|
||||||
|
The successor should introduce a framework layer above parsing:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Markdown parse model
|
||||||
|
-> document contract
|
||||||
|
-> section specifications
|
||||||
|
-> field/form specifications
|
||||||
|
-> deterministic rules/assertions
|
||||||
|
-> metric bands
|
||||||
|
-> optional LLM rubrics
|
||||||
|
-> unified diagnostics
|
||||||
|
```
|
||||||
|
|
||||||
|
This should not replace JSON Schema. JSON Schema remains useful for typed data
|
||||||
|
and machine validation. The new layer should make document-specific semantics
|
||||||
|
natural.
|
||||||
|
|
||||||
|
## Recommendation
|
||||||
|
|
||||||
|
Do not continue straight into generic query/transform work until this framework
|
||||||
|
direction is captured. The next implementation slice should be a small,
|
||||||
|
deterministic version of document contracts:
|
||||||
|
|
||||||
|
1. Define the contract schema and terminology.
|
||||||
|
2. Implement section specifications.
|
||||||
|
3. Implement metric bands.
|
||||||
|
4. Implement the unified diagnostic model.
|
||||||
|
5. Leave LLM rubrics and form dynamics as designed extension points for the next
|
||||||
|
slice.
|
||||||
|
|
||||||
|
This is the utility inflection point. It will make `markitect-tool` practically
|
||||||
|
useful instead of merely structurally correct.
|
||||||
@@ -34,7 +34,7 @@ workplans/
|
|||||||
|
|
||||||
SBOM source: `sbom-tools.yaml`.
|
SBOM source: `sbom-tools.yaml`.
|
||||||
|
|
||||||
Initial SBOM ingest succeeded on 2026-05-03 with seven declared entries for the
|
Initial SBOM ingest succeeded on 2026-05-03 with eight declared entries for the
|
||||||
core and optional dependencies.
|
core and optional dependencies.
|
||||||
|
|
||||||
## Registered Extension Points
|
## Registered Extension Points
|
||||||
|
|||||||
@@ -11,6 +11,7 @@ requires-python = ">=3.12"
|
|||||||
license = { text = "MIT" }
|
license = { text = "MIT" }
|
||||||
dependencies = [
|
dependencies = [
|
||||||
"click>=8.0",
|
"click>=8.0",
|
||||||
|
"jsonschema>=4.0",
|
||||||
"markdown-it-py",
|
"markdown-it-py",
|
||||||
"PyYAML",
|
"PyYAML",
|
||||||
]
|
]
|
||||||
|
|||||||
@@ -7,6 +7,10 @@ tools:
|
|||||||
ecosystem: python
|
ecosystem: python
|
||||||
is_direct: true
|
is_direct: true
|
||||||
is_dev: false
|
is_dev: false
|
||||||
|
- name: jsonschema
|
||||||
|
ecosystem: python
|
||||||
|
is_direct: true
|
||||||
|
is_dev: false
|
||||||
- name: PyYAML
|
- name: PyYAML
|
||||||
ecosystem: python
|
ecosystem: python
|
||||||
is_direct: true
|
is_direct: true
|
||||||
|
|||||||
@@ -9,6 +9,14 @@ from markitect_tool.core import (
|
|||||||
parse_markdown,
|
parse_markdown,
|
||||||
parse_markdown_file,
|
parse_markdown_file,
|
||||||
)
|
)
|
||||||
|
from markitect_tool.schema import (
|
||||||
|
MarkdownSchema,
|
||||||
|
SchemaValidationResult,
|
||||||
|
ValidationViolation,
|
||||||
|
load_schema_file,
|
||||||
|
validate_document,
|
||||||
|
validate_markdown_file,
|
||||||
|
)
|
||||||
|
|
||||||
__all__ = [
|
__all__ = [
|
||||||
"ContentBlock",
|
"ContentBlock",
|
||||||
@@ -18,4 +26,10 @@ __all__ = [
|
|||||||
"Section",
|
"Section",
|
||||||
"parse_markdown",
|
"parse_markdown",
|
||||||
"parse_markdown_file",
|
"parse_markdown_file",
|
||||||
|
"MarkdownSchema",
|
||||||
|
"SchemaValidationResult",
|
||||||
|
"ValidationViolation",
|
||||||
|
"load_schema_file",
|
||||||
|
"validate_document",
|
||||||
|
"validate_markdown_file",
|
||||||
]
|
]
|
||||||
|
|||||||
@@ -9,6 +9,7 @@ import click
|
|||||||
import yaml
|
import yaml
|
||||||
|
|
||||||
from markitect_tool.core import parse_markdown_file
|
from markitect_tool.core import parse_markdown_file
|
||||||
|
from markitect_tool.schema import load_schema_file, validate_markdown_file, validate_schema
|
||||||
|
|
||||||
|
|
||||||
@click.group()
|
@click.group()
|
||||||
@@ -40,5 +41,66 @@ def parse(file: Path, output_format: str) -> None:
|
|||||||
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
||||||
|
|
||||||
|
|
||||||
|
@main.command()
|
||||||
|
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||||
|
@click.option(
|
||||||
|
"--schema",
|
||||||
|
"schema_file",
|
||||||
|
required=True,
|
||||||
|
type=click.Path(exists=True, dir_okay=False, path_type=Path),
|
||||||
|
)
|
||||||
|
@click.option(
|
||||||
|
"--format",
|
||||||
|
"output_format",
|
||||||
|
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
|
||||||
|
default="text",
|
||||||
|
show_default=True,
|
||||||
|
)
|
||||||
|
def validate(file: Path, schema_file: Path, output_format: str) -> None:
|
||||||
|
"""Validate a Markdown file against a Markdown schema file."""
|
||||||
|
|
||||||
|
result = validate_markdown_file(file, schema_file)
|
||||||
|
_emit_result(result.to_dict(), output_format)
|
||||||
|
raise click.exceptions.Exit(0 if result.valid else 1)
|
||||||
|
|
||||||
|
|
||||||
|
@main.group()
|
||||||
|
def schema() -> None:
|
||||||
|
"""Work with Markdown schema files."""
|
||||||
|
|
||||||
|
|
||||||
|
@schema.command("validate")
|
||||||
|
@click.argument("schema_file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||||
|
@click.option(
|
||||||
|
"--format",
|
||||||
|
"output_format",
|
||||||
|
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
|
||||||
|
default="text",
|
||||||
|
show_default=True,
|
||||||
|
)
|
||||||
|
def schema_validate(schema_file: Path, output_format: str) -> None:
|
||||||
|
"""Validate that a Markdown schema contains a well-formed JSON Schema."""
|
||||||
|
|
||||||
|
loaded = load_schema_file(schema_file)
|
||||||
|
result = validate_schema(loaded.schema)
|
||||||
|
data = result.to_dict() | {"schema_path": str(schema_file)}
|
||||||
|
_emit_result(data, output_format)
|
||||||
|
raise click.exceptions.Exit(0 if result.valid else 1)
|
||||||
|
|
||||||
|
|
||||||
|
def _emit_result(data: dict, output_format: str) -> None:
|
||||||
|
if output_format == "json":
|
||||||
|
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
||||||
|
elif output_format == "yaml":
|
||||||
|
click.echo(yaml.safe_dump(data, sort_keys=False))
|
||||||
|
else:
|
||||||
|
if data.get("valid"):
|
||||||
|
click.echo("valid")
|
||||||
|
else:
|
||||||
|
click.echo("invalid")
|
||||||
|
for violation in data.get("violations", []):
|
||||||
|
click.echo(f"- {violation['path']}: {violation['message']}")
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
main()
|
main()
|
||||||
|
|||||||
31
src/markitect_tool/schema/__init__.py
Normal file
31
src/markitect_tool/schema/__init__.py
Normal file
@@ -0,0 +1,31 @@
|
|||||||
|
"""Schema loading and validation for structured Markdown documents."""
|
||||||
|
|
||||||
|
from markitect_tool.schema.loader import (
|
||||||
|
InvalidSchemaFormatError,
|
||||||
|
MarkdownSchema,
|
||||||
|
SchemaLoaderError,
|
||||||
|
SchemaNotFoundError,
|
||||||
|
load_schema_file,
|
||||||
|
load_schema_text,
|
||||||
|
)
|
||||||
|
from markitect_tool.schema.validator import (
|
||||||
|
SchemaValidationResult,
|
||||||
|
ValidationViolation,
|
||||||
|
validate_document,
|
||||||
|
validate_markdown_file,
|
||||||
|
validate_schema,
|
||||||
|
)
|
||||||
|
|
||||||
|
__all__ = [
|
||||||
|
"InvalidSchemaFormatError",
|
||||||
|
"MarkdownSchema",
|
||||||
|
"SchemaLoaderError",
|
||||||
|
"SchemaNotFoundError",
|
||||||
|
"SchemaValidationResult",
|
||||||
|
"ValidationViolation",
|
||||||
|
"load_schema_file",
|
||||||
|
"load_schema_text",
|
||||||
|
"validate_document",
|
||||||
|
"validate_markdown_file",
|
||||||
|
"validate_schema",
|
||||||
|
]
|
||||||
124
src/markitect_tool/schema/loader.py
Normal file
124
src/markitect_tool/schema/loader.py
Normal file
@@ -0,0 +1,124 @@
|
|||||||
|
"""Load JSON Schema definitions embedded in Markdown schema files."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import re
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
import yaml
|
||||||
|
|
||||||
|
|
||||||
|
class SchemaLoaderError(ValueError):
|
||||||
|
"""Base error raised for schema loading failures."""
|
||||||
|
|
||||||
|
|
||||||
|
class SchemaNotFoundError(SchemaLoaderError):
|
||||||
|
"""Raised when no JSON schema block can be found."""
|
||||||
|
|
||||||
|
|
||||||
|
class InvalidSchemaFormatError(SchemaLoaderError):
|
||||||
|
"""Raised when a schema block exists but is not valid JSON object data."""
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class MarkdownSchema:
|
||||||
|
"""A JSON Schema loaded from a Markdown schema document."""
|
||||||
|
|
||||||
|
schema: dict[str, Any]
|
||||||
|
metadata: dict[str, Any]
|
||||||
|
documentation: str
|
||||||
|
source_path: str | None = None
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
data = {
|
||||||
|
"schema": self.schema,
|
||||||
|
"metadata": self.metadata,
|
||||||
|
"documentation": self.documentation,
|
||||||
|
"source_path": self.source_path,
|
||||||
|
}
|
||||||
|
return {key: value for key, value in data.items() if value is not None}
|
||||||
|
|
||||||
|
|
||||||
|
_JSON_BLOCK_RE = re.compile(r"```json\s*(.*?)```", re.DOTALL | re.IGNORECASE)
|
||||||
|
|
||||||
|
|
||||||
|
def load_schema_file(path: str | Path) -> MarkdownSchema:
|
||||||
|
"""Load a Markdown schema file."""
|
||||||
|
|
||||||
|
schema_path = Path(path)
|
||||||
|
if not schema_path.exists():
|
||||||
|
raise FileNotFoundError(f"Schema file not found: {schema_path}")
|
||||||
|
return load_schema_text(schema_path.read_text(encoding="utf-8"), source_path=str(schema_path))
|
||||||
|
|
||||||
|
|
||||||
|
def load_schema_text(text: str, source_path: str | None = None) -> MarkdownSchema:
|
||||||
|
"""Load a Markdown schema document from text."""
|
||||||
|
|
||||||
|
metadata, documentation = _split_frontmatter(text)
|
||||||
|
schema = _extract_json_schema(documentation)
|
||||||
|
schema = dict(schema)
|
||||||
|
schema.setdefault(
|
||||||
|
"x-markitect-source",
|
||||||
|
{
|
||||||
|
"format": "markdown",
|
||||||
|
"file": source_path,
|
||||||
|
"frontmatter": metadata,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
return MarkdownSchema(
|
||||||
|
schema=schema,
|
||||||
|
metadata=metadata,
|
||||||
|
documentation=documentation,
|
||||||
|
source_path=source_path,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _split_frontmatter(text: str) -> tuple[dict[str, Any], str]:
|
||||||
|
if not text.startswith("---\n"):
|
||||||
|
return {}, text
|
||||||
|
|
||||||
|
end = text.find("\n---", 4)
|
||||||
|
if end == -1:
|
||||||
|
return {}, text
|
||||||
|
|
||||||
|
closing_end = text.find("\n", end + 4)
|
||||||
|
if closing_end == -1:
|
||||||
|
closing_end = len(text)
|
||||||
|
else:
|
||||||
|
closing_end += 1
|
||||||
|
|
||||||
|
raw = text[4:end]
|
||||||
|
try:
|
||||||
|
metadata = yaml.safe_load(raw) if raw.strip() else {}
|
||||||
|
except yaml.YAMLError as exc:
|
||||||
|
raise InvalidSchemaFormatError(f"Invalid schema frontmatter: {exc}") from exc
|
||||||
|
if metadata is None:
|
||||||
|
metadata = {}
|
||||||
|
if not isinstance(metadata, dict):
|
||||||
|
raise InvalidSchemaFormatError("Schema frontmatter must be a mapping")
|
||||||
|
return metadata, text[closing_end:]
|
||||||
|
|
||||||
|
|
||||||
|
def _extract_json_schema(text: str) -> dict[str, Any]:
|
||||||
|
candidates = list(_JSON_BLOCK_RE.finditer(text))
|
||||||
|
if not candidates:
|
||||||
|
raise SchemaNotFoundError("No JSON schema found in markdown schema")
|
||||||
|
|
||||||
|
parsed_blocks: list[dict[str, Any]] = []
|
||||||
|
for match in candidates:
|
||||||
|
raw_json = match.group(1).strip()
|
||||||
|
try:
|
||||||
|
data = json.loads(raw_json)
|
||||||
|
except json.JSONDecodeError as exc:
|
||||||
|
raise InvalidSchemaFormatError(f"Invalid JSON schema block: {exc}") from exc
|
||||||
|
if not isinstance(data, dict):
|
||||||
|
raise InvalidSchemaFormatError("JSON schema block must contain an object")
|
||||||
|
parsed_blocks.append(data)
|
||||||
|
|
||||||
|
for data in parsed_blocks:
|
||||||
|
if "$schema" in data or "type" in data:
|
||||||
|
return data
|
||||||
|
return parsed_blocks[0]
|
||||||
110
src/markitect_tool/schema/validator.py
Normal file
110
src/markitect_tool/schema/validator.py
Normal file
@@ -0,0 +1,110 @@
|
|||||||
|
"""Validate parsed Markdown documents against JSON Schema."""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import asdict, dataclass
|
||||||
|
from pathlib import Path
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from jsonschema import Draft202012Validator, SchemaError, ValidationError
|
||||||
|
|
||||||
|
from markitect_tool.core import Document, parse_markdown_file
|
||||||
|
from markitect_tool.schema.loader import MarkdownSchema, load_schema_file
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class ValidationViolation:
|
||||||
|
"""A single schema validation violation."""
|
||||||
|
|
||||||
|
path: str
|
||||||
|
message: str
|
||||||
|
schema_path: str
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, str]:
|
||||||
|
return asdict(self)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class SchemaValidationResult:
|
||||||
|
"""Validation result for one document and one schema."""
|
||||||
|
|
||||||
|
valid: bool
|
||||||
|
violations: list[ValidationViolation]
|
||||||
|
document_path: str | None = None
|
||||||
|
schema_path: str | None = None
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
data = {
|
||||||
|
"valid": self.valid,
|
||||||
|
"violations": [violation.to_dict() for violation in self.violations],
|
||||||
|
"document_path": self.document_path,
|
||||||
|
"schema_path": self.schema_path,
|
||||||
|
}
|
||||||
|
return {key: value for key, value in data.items() if value is not None}
|
||||||
|
|
||||||
|
|
||||||
|
def validate_schema(schema: dict[str, Any]) -> SchemaValidationResult:
|
||||||
|
"""Validate that a JSON Schema itself is well formed."""
|
||||||
|
|
||||||
|
try:
|
||||||
|
Draft202012Validator.check_schema(schema)
|
||||||
|
except SchemaError as exc:
|
||||||
|
return SchemaValidationResult(
|
||||||
|
valid=False,
|
||||||
|
violations=[
|
||||||
|
ValidationViolation(
|
||||||
|
path=_format_path(exc.path),
|
||||||
|
message=exc.message,
|
||||||
|
schema_path=_format_path(exc.schema_path),
|
||||||
|
)
|
||||||
|
],
|
||||||
|
)
|
||||||
|
return SchemaValidationResult(valid=True, violations=[])
|
||||||
|
|
||||||
|
|
||||||
|
def validate_markdown_file(
|
||||||
|
markdown_path: str | Path, schema_path: str | Path
|
||||||
|
) -> SchemaValidationResult:
|
||||||
|
"""Parse and validate a Markdown file against a Markdown schema file."""
|
||||||
|
|
||||||
|
document = parse_markdown_file(markdown_path)
|
||||||
|
loaded_schema = load_schema_file(schema_path)
|
||||||
|
return validate_document(document, loaded_schema)
|
||||||
|
|
||||||
|
|
||||||
|
def validate_document(
|
||||||
|
document: Document, schema: MarkdownSchema | dict[str, Any]
|
||||||
|
) -> SchemaValidationResult:
|
||||||
|
"""Validate a parsed document against a loaded or raw JSON Schema."""
|
||||||
|
|
||||||
|
raw_schema = schema.schema if isinstance(schema, MarkdownSchema) else schema
|
||||||
|
schema_path = schema.source_path if isinstance(schema, MarkdownSchema) else None
|
||||||
|
schema_check = validate_schema(raw_schema)
|
||||||
|
if not schema_check.valid:
|
||||||
|
return SchemaValidationResult(
|
||||||
|
valid=False,
|
||||||
|
violations=schema_check.violations,
|
||||||
|
document_path=document.source_path,
|
||||||
|
schema_path=schema_path,
|
||||||
|
)
|
||||||
|
|
||||||
|
validator = Draft202012Validator(raw_schema)
|
||||||
|
violations = [
|
||||||
|
ValidationViolation(
|
||||||
|
path=_format_path(error.path),
|
||||||
|
message=error.message,
|
||||||
|
schema_path=_format_path(error.schema_path),
|
||||||
|
)
|
||||||
|
for error in sorted(validator.iter_errors(document.to_dict()), key=str)
|
||||||
|
]
|
||||||
|
return SchemaValidationResult(
|
||||||
|
valid=not violations,
|
||||||
|
violations=violations,
|
||||||
|
document_path=document.source_path,
|
||||||
|
schema_path=schema_path,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _format_path(path: Any) -> str:
|
||||||
|
parts = [str(part) for part in path]
|
||||||
|
return "$" if not parts else "$." + ".".join(parts)
|
||||||
19
tests/fixtures/simple-document-schema.md
vendored
Normal file
19
tests/fixtures/simple-document-schema.md
vendored
Normal file
@@ -0,0 +1,19 @@
|
|||||||
|
---
|
||||||
|
version: "1.0.0"
|
||||||
|
---
|
||||||
|
|
||||||
|
# Simple Document Schema
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"$schema": "https://json-schema.org/draft/2020-12/schema",
|
||||||
|
"type": "object",
|
||||||
|
"required": ["headings"],
|
||||||
|
"properties": {
|
||||||
|
"headings": {
|
||||||
|
"type": "array",
|
||||||
|
"minItems": 1
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
3
tests/fixtures/valid-document.md
vendored
Normal file
3
tests/fixtures/valid-document.md
vendored
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
# Hello
|
||||||
|
|
||||||
|
World.
|
||||||
164
tests/test_schema_contract.py
Normal file
164
tests/test_schema_contract.py
Normal file
@@ -0,0 +1,164 @@
|
|||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from click.testing import CliRunner
|
||||||
|
|
||||||
|
from markitect_tool.cli import main
|
||||||
|
from markitect_tool.schema import (
|
||||||
|
InvalidSchemaFormatError,
|
||||||
|
SchemaNotFoundError,
|
||||||
|
load_schema_file,
|
||||||
|
validate_markdown_file,
|
||||||
|
validate_schema,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
SCHEMA_TEXT = """---
|
||||||
|
schema-id: "https://example.test/schemas/document/v1"
|
||||||
|
version: "1.0.0"
|
||||||
|
status: "stable"
|
||||||
|
---
|
||||||
|
|
||||||
|
# Document Schema
|
||||||
|
|
||||||
|
## Schema Definition
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"$schema": "https://json-schema.org/draft/2020-12/schema",
|
||||||
|
"title": "Document Schema",
|
||||||
|
"type": "object",
|
||||||
|
"required": ["frontmatter", "headings"],
|
||||||
|
"properties": {
|
||||||
|
"frontmatter": {
|
||||||
|
"type": "object",
|
||||||
|
"required": ["title"],
|
||||||
|
"properties": {
|
||||||
|
"title": {"type": "string"}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"headings": {
|
||||||
|
"type": "array",
|
||||||
|
"minItems": 1,
|
||||||
|
"items": {
|
||||||
|
"type": "object",
|
||||||
|
"required": ["level", "text"],
|
||||||
|
"properties": {
|
||||||
|
"level": {"type": "integer"},
|
||||||
|
"text": {"type": "string"}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
|
def test_load_schema_file_extracts_metadata_and_json_schema(tmp_path: Path):
|
||||||
|
schema_file = tmp_path / "document-schema.md"
|
||||||
|
schema_file.write_text(SCHEMA_TEXT, encoding="utf-8")
|
||||||
|
|
||||||
|
loaded = load_schema_file(schema_file)
|
||||||
|
|
||||||
|
assert loaded.metadata["schema-id"] == "https://example.test/schemas/document/v1"
|
||||||
|
assert loaded.metadata["status"] == "stable"
|
||||||
|
assert loaded.schema["title"] == "Document Schema"
|
||||||
|
assert loaded.schema["x-markitect-source"]["format"] == "markdown"
|
||||||
|
assert loaded.source_path == str(schema_file)
|
||||||
|
|
||||||
|
|
||||||
|
def test_load_schema_file_requires_json_block(tmp_path: Path):
|
||||||
|
schema_file = tmp_path / "missing.md"
|
||||||
|
schema_file.write_text("# Missing\n\nNo schema.", encoding="utf-8")
|
||||||
|
|
||||||
|
try:
|
||||||
|
load_schema_file(schema_file)
|
||||||
|
except SchemaNotFoundError as exc:
|
||||||
|
assert "No JSON schema found" in str(exc)
|
||||||
|
else:
|
||||||
|
raise AssertionError("expected SchemaNotFoundError")
|
||||||
|
|
||||||
|
|
||||||
|
def test_load_schema_file_rejects_invalid_json(tmp_path: Path):
|
||||||
|
schema_file = tmp_path / "invalid.md"
|
||||||
|
schema_file.write_text("```json\n{invalid json}\n```", encoding="utf-8")
|
||||||
|
|
||||||
|
try:
|
||||||
|
load_schema_file(schema_file)
|
||||||
|
except InvalidSchemaFormatError as exc:
|
||||||
|
assert "Invalid JSON schema block" in str(exc)
|
||||||
|
else:
|
||||||
|
raise AssertionError("expected InvalidSchemaFormatError")
|
||||||
|
|
||||||
|
|
||||||
|
def test_validate_markdown_file_returns_valid_result(tmp_path: Path):
|
||||||
|
schema_file = tmp_path / "document-schema.md"
|
||||||
|
schema_file.write_text(SCHEMA_TEXT, encoding="utf-8")
|
||||||
|
markdown_file = tmp_path / "document.md"
|
||||||
|
markdown_file.write_text("---\ntitle: Example\n---\n\n# Example\n\nBody.", encoding="utf-8")
|
||||||
|
|
||||||
|
result = validate_markdown_file(markdown_file, schema_file)
|
||||||
|
|
||||||
|
assert result.valid is True
|
||||||
|
assert result.violations == []
|
||||||
|
assert result.document_path == str(markdown_file)
|
||||||
|
assert result.schema_path == str(schema_file)
|
||||||
|
|
||||||
|
|
||||||
|
def test_validate_markdown_file_reports_violations(tmp_path: Path):
|
||||||
|
schema_file = tmp_path / "document-schema.md"
|
||||||
|
schema_file.write_text(SCHEMA_TEXT, encoding="utf-8")
|
||||||
|
markdown_file = tmp_path / "document.md"
|
||||||
|
markdown_file.write_text("# Missing Title\n\nBody.", encoding="utf-8")
|
||||||
|
|
||||||
|
result = validate_markdown_file(markdown_file, schema_file)
|
||||||
|
|
||||||
|
assert result.valid is False
|
||||||
|
assert result.violations
|
||||||
|
assert result.violations[0].path == "$.frontmatter"
|
||||||
|
assert "title" in result.violations[0].message
|
||||||
|
|
||||||
|
|
||||||
|
def test_validate_schema_reports_invalid_schema():
|
||||||
|
result = validate_schema({"type": 7})
|
||||||
|
|
||||||
|
assert result.valid is False
|
||||||
|
assert result.violations
|
||||||
|
|
||||||
|
|
||||||
|
def test_mkt_validate_exits_zero_for_valid_document(tmp_path: Path):
|
||||||
|
schema_file = tmp_path / "document-schema.md"
|
||||||
|
schema_file.write_text(SCHEMA_TEXT, encoding="utf-8")
|
||||||
|
markdown_file = tmp_path / "document.md"
|
||||||
|
markdown_file.write_text("---\ntitle: Example\n---\n\n# Example\n", encoding="utf-8")
|
||||||
|
|
||||||
|
result = CliRunner().invoke(
|
||||||
|
main, ["validate", str(markdown_file), "--schema", str(schema_file)]
|
||||||
|
)
|
||||||
|
|
||||||
|
assert result.exit_code == 0
|
||||||
|
assert "valid" in result.output
|
||||||
|
|
||||||
|
|
||||||
|
def test_mkt_validate_exits_nonzero_for_invalid_document(tmp_path: Path):
|
||||||
|
schema_file = tmp_path / "document-schema.md"
|
||||||
|
schema_file.write_text(SCHEMA_TEXT, encoding="utf-8")
|
||||||
|
markdown_file = tmp_path / "document.md"
|
||||||
|
markdown_file.write_text("# Missing Title\n", encoding="utf-8")
|
||||||
|
|
||||||
|
result = CliRunner().invoke(
|
||||||
|
main, ["validate", str(markdown_file), "--schema", str(schema_file)]
|
||||||
|
)
|
||||||
|
|
||||||
|
assert result.exit_code == 1
|
||||||
|
assert "invalid" in result.output
|
||||||
|
|
||||||
|
|
||||||
|
def test_mkt_schema_validate(tmp_path: Path):
|
||||||
|
schema_file = tmp_path / "document-schema.md"
|
||||||
|
schema_file.write_text(SCHEMA_TEXT, encoding="utf-8")
|
||||||
|
|
||||||
|
result = CliRunner().invoke(main, ["schema", "validate", str(schema_file)])
|
||||||
|
|
||||||
|
assert result.exit_code == 0
|
||||||
|
assert "valid" in result.output
|
||||||
@@ -52,7 +52,7 @@ sections, content blocks, parser tokens, API access, and `mkt parse`.
|
|||||||
|
|
||||||
```task
|
```task
|
||||||
id: MKTT-WP-0003-T003
|
id: MKTT-WP-0003-T003
|
||||||
status: todo
|
status: done
|
||||||
priority: high
|
priority: high
|
||||||
state_hub_task_id: "36a22def-d415-4c08-a793-836ee52e4308"
|
state_hub_task_id: "36a22def-d415-4c08-a793-836ee52e4308"
|
||||||
```
|
```
|
||||||
@@ -60,6 +60,9 @@ state_hub_task_id: "36a22def-d415-4c08-a793-836ee52e4308"
|
|||||||
Implement FR-010 through FR-012: define/derive schemas, validate documents,
|
Implement FR-010 through FR-012: define/derive schemas, validate documents,
|
||||||
and report structured violations with file/location context.
|
and report structured violations with file/location context.
|
||||||
|
|
||||||
|
Initial implementation complete for Markdown schema loading, JSON Schema
|
||||||
|
validation, structured violations, `mkt validate`, and `mkt schema validate`.
|
||||||
|
|
||||||
## P3.4 - Implement query and extraction
|
## P3.4 - Implement query and extraction
|
||||||
|
|
||||||
```task
|
```task
|
||||||
|
|||||||
154
workplans/MKTT-WP-0004-practical-contract-framework.md
Normal file
154
workplans/MKTT-WP-0004-practical-contract-framework.md
Normal file
@@ -0,0 +1,154 @@
|
|||||||
|
---
|
||||||
|
id: MKTT-WP-0004
|
||||||
|
type: workplan
|
||||||
|
title: "Practical Document Contract Framework"
|
||||||
|
domain: markitect
|
||||||
|
status: proposed
|
||||||
|
owner: markitect-tool
|
||||||
|
topic_slug: markitect
|
||||||
|
created: "2026-05-03"
|
||||||
|
updated: "2026-05-03"
|
||||||
|
---
|
||||||
|
|
||||||
|
# MKTT-WP-0004: Practical Document Contract Framework
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Improve the practical utility of `markitect-tool` by moving beyond generic
|
||||||
|
heading-count schema validation toward document contracts with section
|
||||||
|
specifications, fields/forms, context-aware rules, metric bands, optional LLM
|
||||||
|
assessments, and unified diagnostics.
|
||||||
|
|
||||||
|
## Background
|
||||||
|
|
||||||
|
Research and legacy comparison are captured in:
|
||||||
|
|
||||||
|
- `docs/practical-schema-framework-research.md`
|
||||||
|
- `docs/markitect-main-scope-assessment.md`
|
||||||
|
- `docs/markitect-main-test-migration-inventory.md`
|
||||||
|
|
||||||
|
## P4.1 - Define contract terminology and file format
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0004-T001
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
```
|
||||||
|
|
||||||
|
Define the first `DocumentContract` format in markdown/YAML:
|
||||||
|
|
||||||
|
- document type
|
||||||
|
- section specifications
|
||||||
|
- field/form specifications
|
||||||
|
- deterministic rules/assertions
|
||||||
|
- metric bands
|
||||||
|
- optional assessment rubrics
|
||||||
|
- diagnostic metadata
|
||||||
|
|
||||||
|
Keep it provider-neutral and readable by humans.
|
||||||
|
|
||||||
|
## P4.2 - Implement unified diagnostic model
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0004-T002
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
```
|
||||||
|
|
||||||
|
Create diagnostics with severity, code, message, source location, contract
|
||||||
|
location, rule id, and optional repair guidance. Use this model for JSON Schema
|
||||||
|
violations and all new contract checks.
|
||||||
|
|
||||||
|
## P4.3 - Implement section specifications
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0004-T003
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
```
|
||||||
|
|
||||||
|
Support required, recommended, optional, discouraged, and forbidden sections.
|
||||||
|
Support aliases, expected heading level, section type, ordering constraints,
|
||||||
|
and clear diagnostics.
|
||||||
|
|
||||||
|
## P4.4 - Implement metric bands
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0004-T004
|
||||||
|
status: todo
|
||||||
|
priority: medium
|
||||||
|
```
|
||||||
|
|
||||||
|
Support document-level and section-level bands for words, characters,
|
||||||
|
sentences, paragraphs, sections, list items, code blocks, and nesting depth.
|
||||||
|
Allow soft warnings and hard errors.
|
||||||
|
|
||||||
|
## P4.5 - Design form and context model
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0004-T005
|
||||||
|
status: todo
|
||||||
|
priority: medium
|
||||||
|
```
|
||||||
|
|
||||||
|
Specify fields, defaults, prefill sources, dynamic requiredness, conditional
|
||||||
|
visibility, calculations, and validation against external context. This task is
|
||||||
|
design-first; implementation can follow in a later workplan.
|
||||||
|
|
||||||
|
## P4.6 - Design LLM assessment adapter contract
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0004-T006
|
||||||
|
status: todo
|
||||||
|
priority: medium
|
||||||
|
```
|
||||||
|
|
||||||
|
Define provider-neutral request/response models for section-level rubrics:
|
||||||
|
criteria, inputs, context, score, pass/fail, reason, model metadata, and cache
|
||||||
|
keys. Do not bind core logic to any provider.
|
||||||
|
|
||||||
|
## P4.7 - Add practical CLI surface
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0004-T007
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
```
|
||||||
|
|
||||||
|
Add:
|
||||||
|
|
||||||
|
```text
|
||||||
|
mkt contract validate <contract.md>
|
||||||
|
mkt contract check <document.md> --contract <contract.md>
|
||||||
|
mkt metrics <document.md>
|
||||||
|
```
|
||||||
|
|
||||||
|
Ensure output is useful to humans and machines.
|
||||||
|
|
||||||
|
## P4.8 - Build use-case examples
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: MKTT-WP-0004-T008
|
||||||
|
status: todo
|
||||||
|
priority: medium
|
||||||
|
```
|
||||||
|
|
||||||
|
Create examples for:
|
||||||
|
|
||||||
|
- ADR
|
||||||
|
- PRD/FRS
|
||||||
|
- workplan
|
||||||
|
- personalized/business letter
|
||||||
|
- concept note or entity profile
|
||||||
|
|
||||||
|
Each example should include contract, valid document, invalid document, and
|
||||||
|
expected diagnostics.
|
||||||
|
|
||||||
|
## Decision Point
|
||||||
|
|
||||||
|
This workplan should probably run before WP-0003 query/transform/cache work,
|
||||||
|
because it changes what "validation" means and establishes the diagnostic model
|
||||||
|
that later query/transform/generation features should reuse.
|
||||||
|
|
||||||
|
If postponed, continue WP-0003 with query/extraction only if we commit to
|
||||||
|
revisiting diagnostics and contract semantics before generation or LLM hooks.
|
||||||
Reference in New Issue
Block a user