diff --git a/docs/contract-framework.md b/docs/contract-framework.md new file mode 100644 index 0000000..153e859 --- /dev/null +++ b/docs/contract-framework.md @@ -0,0 +1,161 @@ +# Document Contract Framework + +Date: 2026-05-03 + +## Purpose + +The contract framework makes markdown documents practically checkable. It keeps +Markdown as the authoring surface and uses fenced YAML as a structured extension +for rules that need machine interpretation. + +The first implementation is deterministic. It checks document type, fields, +sections, ordering, metric bands, and text assertions. Forms, context, and LLM +rubrics are represented in the contract vocabulary as extension points before +runtime adapters are added. + +## Contract File Shape + +A contract is a Markdown document with optional frontmatter and one fenced YAML +block marked as `yaml contract`. + +````markdown +--- +title: ADR Contract +version: "1.0" +--- + +# ADR Contract + +```yaml contract +id: adr-contract-v1 +document: + type: adr + title: Architecture Decision Record +fields: + status: + type: string + required: true +sections: + - id: context + title: Context + presence: required + level: 2 +metrics: + document: + words: + min: 100 + max: 1200 + severity: warning +``` +```` + +Markdown carries the explanation. YAML carries the contract. + +## Core Terms + +| Term | Meaning | +| --- | --- | +| Document contract | The machine-readable agreement for one typed Markdown artifact. | +| Document type | A named kind such as `adr`, `prd`, `workplan`, or `business-letter`. | +| Section spec | A semantic section role with matching headings, presence, level, order, metrics, and assertions. | +| Field spec | A typed value expected in frontmatter or later external context. | +| Metric band | A soft or hard size/complexity target. | +| Assertion | A deterministic content expectation over document or section text. | +| Diagnostic | A structured finding with severity, code, source, contract location, rule id, and guidance. | + +## Section Presence + +Section specs support these presence values: + +- `required`: missing section is an error. +- `recommended`: missing section is a warning. +- `optional`: section is allowed but not required. +- `discouraged`: present section is a warning. +- `forbidden`: present section is an error. + +Headings are matched case-insensitively against `title`, `id`, `headings`, or +`aliases`. + +## Metric Bands + +Supported metrics are: + +- `characters` +- `words` +- `sentences` +- `paragraphs` +- `sections` +- `headings` +- `list_items` +- `code_blocks` +- `max_heading_depth` +- `nesting_depth` + +Document-level bands live under `metrics.document`. Section-level bands live +inside a section spec. + +The current metrics layer follows the parser model: every heading-led region is +a section, including the document H1 title section. + +## Assertions + +Assertions currently support: + +- `contains` +- `contains_any` +- `not_contains` +- `matches` +- `not_matches` + +Assertions are deterministic and produce the same diagnostic model as sections, +fields, and metric bands. This is the bridge to later LLM rubrics: semantic +checks can become additional assessments without changing how failures are +reported. + +## Forms And Context + +Field specs are the first step toward form-backed Markdown generation. Runtime +form handling should build on the same field vocabulary: + +- `id` +- `type` +- `required` +- `default` +- `source` +- `path` +- `enum` +- `pattern` +- `min` / `max` +- `min_length` / `max_length` + +Dynamic requiredness, visibility, calculations, and prefill should be declared +as context-aware rules in later work. The contract should remain the source of +truth, while UI and generation layers act as adapters. + +## LLM Assessment Extension + +LLM-assisted checks should be declared as rubrics, scoped to document or section +roles. Core Markitect should not call a provider directly. A future adapter +should accept a provider-neutral request: + +- contract id and rule id +- document or section text +- relevant fields and context +- rubric criteria +- cache key material + +It should return: + +- pass/fail +- score +- reason +- model/provider metadata +- diagnostics using the shared diagnostic model + +## CLI + +```text +mkt contract validate +mkt contract check --contract +mkt metrics +``` diff --git a/docs/state-hub-integration.md b/docs/state-hub-integration.md index 12f2a6f..d2c4f10 100644 --- a/docs/state-hub-integration.md +++ b/docs/state-hub-integration.md @@ -37,6 +37,10 @@ SBOM source: `sbom-tools.yaml`. Initial SBOM ingest succeeded on 2026-05-03 with eight declared entries for the core and optional dependencies. +The DB-first onboarding workstream `repo-integration-markitect-tool` is now +completed. It remains visible as a completed ADR-001 bootstrap exception rather +than an active orphan. + ## Registered Extension Points | ID | Title | Source | diff --git a/examples/contracts/adr.contract.md b/examples/contracts/adr.contract.md new file mode 100644 index 0000000..2b3be4b --- /dev/null +++ b/examples/contracts/adr.contract.md @@ -0,0 +1,52 @@ +--- +title: ADR Contract +version: "1.0" +--- + +# ADR Contract + +```yaml contract +id: adr-contract-v1 +document: + type: adr + title: Architecture Decision Record +fields: + status: + type: string + required: true + enum: [proposed, accepted, superseded] +metrics: + document: + words: + min: 40 + max: 900 + severity: warning +sections: + - id: context + title: Context + presence: required + level: 2 + order: + before: decision + assertions: + - id: context-names-problem + contains_any: [problem, motivation, constraint] + severity: warning + guidance: Explain why the decision exists. + - id: decision + title: Decision + presence: required + level: 2 + assertions: + - id: decision-commits + matches: "\\b(choose|adopt|use|will)\\b" + severity: error + guidance: State the actual decision, not only background. + - id: consequences + title: Consequences + presence: recommended + level: 2 + - id: deprecated + title: Deprecated Approach + presence: forbidden +``` diff --git a/examples/contracts/business-letter.contract.md b/examples/contracts/business-letter.contract.md new file mode 100644 index 0000000..fd3e742 --- /dev/null +++ b/examples/contracts/business-letter.contract.md @@ -0,0 +1,43 @@ +--- +title: Business Letter Contract +version: "0.1" +--- + +# Business Letter Contract + +```yaml contract +id: business-letter-contract-v1 +document: + type: business-letter +fields: + recipient_name: + type: string + required: true + source: context.recipient.name + sender_name: + type: string + required: true + source: context.sender.name +sections: + - id: greeting + title: Greeting + presence: required + level: 2 + - id: body + title: Body + presence: required + level: 2 + metrics: + words: + min: 40 + max: 350 + severity: warning + - id: closing + title: Closing + presence: required + level: 2 +rubrics: + - id: tone-fit + scope: section.body + criteria: The body should match the relationship and communication purpose. +``` diff --git a/examples/contracts/concept-note.contract.md b/examples/contracts/concept-note.contract.md new file mode 100644 index 0000000..6aab589 --- /dev/null +++ b/examples/contracts/concept-note.contract.md @@ -0,0 +1,43 @@ +--- +title: Concept Note Contract +version: "0.1" +--- + +# Concept Note Contract + +```yaml contract +id: concept-note-contract-v1 +document: + type: concept-note +fields: + concept_id: + type: string + required: true + status: + type: string + required: true + enum: [draft, reviewed, accepted, archived] +sections: + - id: definition + title: Definition + presence: required + level: 2 + - id: assertions + title: Assertions + presence: required + level: 2 + assertions: + - id: assertions-use-claims + contains_any: [claim, evidence, assumption] + severity: warning + - id: relationships + title: Relationships + presence: recommended + level: 2 +metrics: + document: + words: + min: 120 + max: 1200 + severity: warning +``` diff --git a/examples/contracts/prd-frs.contract.md b/examples/contracts/prd-frs.contract.md new file mode 100644 index 0000000..80b6e03 --- /dev/null +++ b/examples/contracts/prd-frs.contract.md @@ -0,0 +1,49 @@ +--- +title: PRD and FRS Contract +version: "0.1" +--- + +# PRD And FRS Contract + +```yaml contract +id: prd-frs-contract-v1 +document: + type: prd-frs +fields: + product: + type: string + required: true + owner: + type: string + required: true +metrics: + document: + words: + min: 300 + max: 4000 + severity: warning +sections: + - id: problem + title: Problem + presence: required + level: 2 + - id: goals + title: Goals + presence: required + level: 2 + assertions: + - id: goals-are-testable + contains_any: [measure, metric, success] + severity: warning + - id: functional-requirements + title: Functional Requirements + presence: required + level: 2 + - id: non-goals + title: Non-Goals + presence: recommended + level: 2 + - id: implementation-plan + title: Implementation Plan + presence: discouraged +``` diff --git a/examples/contracts/workplan.contract.md b/examples/contracts/workplan.contract.md new file mode 100644 index 0000000..1f0c6cb --- /dev/null +++ b/examples/contracts/workplan.contract.md @@ -0,0 +1,43 @@ +--- +title: Workplan Contract +version: "0.1" +--- + +# Workplan Contract + +```yaml contract +id: workplan-contract-v1 +document: + type: workplan +fields: + id: + type: string + required: true + status: + type: string + required: true + enum: [proposed, active, done, deferred] +sections: + - id: purpose + title: Purpose + presence: required + level: 2 + - id: tasks + title: Tasks + presence: required + level: 2 + assertions: + - id: tasks-have-task-blocks + contains: "status:" + severity: error + - id: decision-point + title: Decision Point + presence: recommended + level: 2 +metrics: + document: + sections: + min: 2 + max: 12 + severity: warning +``` diff --git a/examples/diagnostics/adr-invalid.expected-diagnostics.md b/examples/diagnostics/adr-invalid.expected-diagnostics.md new file mode 100644 index 0000000..39628fb --- /dev/null +++ b/examples/diagnostics/adr-invalid.expected-diagnostics.md @@ -0,0 +1,8 @@ +# Expected Diagnostics: adr-invalid.md + +- `contract.field.missing`: `status` is required. +- `contract.metric.too_low`: the document is below the target word band. +- `contract.assertion.contains_any_missing`: context does not mention problem, motivation, or constraint. +- `contract.section.missing`: `decision` is required. +- `contract.section.recommended_missing`: `consequences` is recommended. +- `contract.section.forbidden`: `deprecated` is present. diff --git a/examples/diagnostics/business-letter-invalid.expected-diagnostics.md b/examples/diagnostics/business-letter-invalid.expected-diagnostics.md new file mode 100644 index 0000000..4f81915 --- /dev/null +++ b/examples/diagnostics/business-letter-invalid.expected-diagnostics.md @@ -0,0 +1,5 @@ +# Expected Diagnostics: business-letter-invalid.md + +- `contract.field.missing`: `sender_name` is required. +- `contract.section.missing`: `closing` is required. +- `contract.metric.too_low`: the `Body` section is below the target word band. diff --git a/examples/diagnostics/concept-note-invalid.expected-diagnostics.md b/examples/diagnostics/concept-note-invalid.expected-diagnostics.md new file mode 100644 index 0000000..02329e2 --- /dev/null +++ b/examples/diagnostics/concept-note-invalid.expected-diagnostics.md @@ -0,0 +1,5 @@ +# Expected Diagnostics: concept-note-invalid.md + +- `contract.field.enum`: `status` must be one of the allowed lifecycle values. +- `contract.metric.too_low`: the document is below the target word band. +- `contract.section.missing`: `assertions` is required. diff --git a/examples/diagnostics/prd-frs-invalid.expected-diagnostics.md b/examples/diagnostics/prd-frs-invalid.expected-diagnostics.md new file mode 100644 index 0000000..95ec988 --- /dev/null +++ b/examples/diagnostics/prd-frs-invalid.expected-diagnostics.md @@ -0,0 +1,8 @@ +# Expected Diagnostics: prd-frs-invalid.md + +- `contract.field.missing`: `owner` is required. +- `contract.metric.too_low`: the document is below the target word band. +- `contract.assertion.contains_any_missing`: goals do not mention measure, metric, or success. +- `contract.section.missing`: `functional-requirements` is required. +- `contract.section.recommended_missing`: `non-goals` is recommended. +- `contract.section.discouraged`: `implementation-plan` is discouraged in this contract. diff --git a/examples/diagnostics/workplan-invalid.expected-diagnostics.md b/examples/diagnostics/workplan-invalid.expected-diagnostics.md new file mode 100644 index 0000000..ab2e039 --- /dev/null +++ b/examples/diagnostics/workplan-invalid.expected-diagnostics.md @@ -0,0 +1,6 @@ +# Expected Diagnostics: workplan-invalid.md + +- `contract.field.missing`: `id` is required. +- `contract.field.enum`: `status` must be one of the allowed lifecycle values. +- `contract.assertion.contains_missing`: the `Tasks` section lacks task metadata. +- `contract.section.recommended_missing`: `decision-point` is recommended. diff --git a/examples/documents/adr-invalid.md b/examples/documents/adr-invalid.md new file mode 100644 index 0000000..35a66a1 --- /dev/null +++ b/examples/documents/adr-invalid.md @@ -0,0 +1,13 @@ +--- +document_type: adr +--- + +# Weak ADR + +## Context + +This is short. + +## Deprecated Approach + +This section should not be here. diff --git a/examples/documents/adr-valid.md b/examples/documents/adr-valid.md new file mode 100644 index 0000000..b790c62 --- /dev/null +++ b/examples/documents/adr-valid.md @@ -0,0 +1,23 @@ +--- +document_type: adr +status: accepted +--- + +# Use Markdown Contracts + +## Context + +The problem is that plain heading counts do not explain whether content is +useful. Authors and agents need a contract that names the expected sections and +the job each section must do. + +## Decision + +We will use markdown-native document contracts with deterministic diagnostics as +the foundation for generation, validation, and later semantic assessment. + +## Consequences + +The tool can check author intent before generation or review work continues. +Future adapters can add form prefill and LLM rubrics without replacing the core +diagnostic model. diff --git a/examples/documents/business-letter-invalid.md b/examples/documents/business-letter-invalid.md new file mode 100644 index 0000000..2937cb4 --- /dev/null +++ b/examples/documents/business-letter-invalid.md @@ -0,0 +1,14 @@ +--- +document_type: business-letter +recipient_name: Ada Lovelace +--- + +# Incomplete Letter + +## Greeting + +Hello, + +## Body + +Thanks. diff --git a/examples/documents/business-letter-valid.md b/examples/documents/business-letter-valid.md new file mode 100644 index 0000000..eead7cd --- /dev/null +++ b/examples/documents/business-letter-valid.md @@ -0,0 +1,25 @@ +--- +document_type: business-letter +recipient_name: Ada Lovelace +sender_name: Markitect Team +--- + +# Follow-Up Letter + +## Greeting + +Dear Ada Lovelace, + +## Body + +Thank you for the thoughtful discussion about structured Markdown generation. +We reviewed the requirements and will send a concise proposal that separates +document contracts, field prefill, validation diagnostics, and optional semantic +assessment. This keeps the implementation practical while leaving room for +future automation. + +## Closing + +Kind regards, + +Markitect Team diff --git a/examples/documents/concept-note-invalid.md b/examples/documents/concept-note-invalid.md new file mode 100644 index 0000000..656f58d --- /dev/null +++ b/examples/documents/concept-note-invalid.md @@ -0,0 +1,15 @@ +--- +document_type: concept-note +concept_id: contract-diagnostic-model +status: maybe +--- + +# Contract Diagnostic Model + +## Definition + +A vague note. + +## Relationships + +It relates to other things. diff --git a/examples/documents/concept-note-valid.md b/examples/documents/concept-note-valid.md new file mode 100644 index 0000000..d3ff2b8 --- /dev/null +++ b/examples/documents/concept-note-valid.md @@ -0,0 +1,24 @@ +--- +document_type: concept-note +concept_id: contract-diagnostic-model +status: draft +--- + +# Contract Diagnostic Model + +## Definition + +A contract diagnostic model is the shared representation for validation, +assessment, and repair findings emitted by Markitect pipeline tools. + +## Assertions + +The central claim is that authors and agents need one diagnostic vocabulary +across structural checks, field checks, metric bands, and semantic assessments. +Evidence comes from the way legacy Markitect scattered related failures across +different subsystems. + +## Relationships + +The model relates to document contracts, form fields, section specifications, +and future LLM rubric adapters. diff --git a/examples/documents/prd-frs-invalid.md b/examples/documents/prd-frs-invalid.md new file mode 100644 index 0000000..09558e6 --- /dev/null +++ b/examples/documents/prd-frs-invalid.md @@ -0,0 +1,18 @@ +--- +document_type: prd-frs +product: Markitect Tool +--- + +# Thin PRD + +## Problem + +The document is too vague. + +## Goals + +The goals are listed without criteria. + +## Implementation Plan + +Build something immediately. diff --git a/examples/documents/prd-frs-valid.md b/examples/documents/prd-frs-valid.md new file mode 100644 index 0000000..05d20fb --- /dev/null +++ b/examples/documents/prd-frs-valid.md @@ -0,0 +1,31 @@ +--- +document_type: prd-frs +product: Markitect Tool +owner: Platform Architecture +--- + +# Markitect Tool PRD And FRS + +## Problem + +Markdown pipelines often check document shape without knowing whether the +sections contain the content needed by authors, reviewers, and generation +agents. + +## Goals + +The product should make document contracts testable. Success metrics include +clear diagnostics, stable CLI behavior, and examples that show how contracts +apply to real document types. + +## Functional Requirements + +- Load Markdown contract files with fenced YAML contract blocks. +- Check required fields and section presence. +- Report metric bands and deterministic assertions. +- Produce machine-readable and human-readable diagnostics. + +## Non-Goals + +The first release does not execute provider-specific LLM calls or provide a UI +form renderer. diff --git a/examples/documents/workplan-invalid.md b/examples/documents/workplan-invalid.md new file mode 100644 index 0000000..fcee435 --- /dev/null +++ b/examples/documents/workplan-invalid.md @@ -0,0 +1,14 @@ +--- +document_type: workplan +status: blocked +--- + +# Weak Workplan + +## Purpose + +There is not enough implementation shape here. + +## Tasks + +The task list is prose only. diff --git a/examples/documents/workplan-valid.md b/examples/documents/workplan-valid.md new file mode 100644 index 0000000..b243517 --- /dev/null +++ b/examples/documents/workplan-valid.md @@ -0,0 +1,26 @@ +--- +document_type: workplan +id: MKTT-WP-EXAMPLE +status: active +--- + +# Example Workplan + +## Purpose + +Establish a focused implementation slice with enough structure for State Hub, +human review, and follow-on implementation. + +## Tasks + +```task +id: MKTT-WP-EXAMPLE-T001 +status: todo +priority: high +``` + +Implement the smallest practical behavior and verify it through the CLI. + +## Decision Point + +Continue only if diagnostics are clear enough for humans and agents. diff --git a/src/markitect_tool/__init__.py b/src/markitect_tool/__init__.py index 488bf00..9bf97ec 100644 --- a/src/markitect_tool/__init__.py +++ b/src/markitect_tool/__init__.py @@ -9,6 +9,18 @@ from markitect_tool.core import ( parse_markdown, parse_markdown_file, ) +from markitect_tool.contract import ( + ContractCheckResult, + ContractValidationResult, + DocumentContract, + check_document_contract, + check_markdown_file, + collect_metrics, + load_contract_file, + validate_contract, + validate_contract_file, +) +from markitect_tool.diagnostics import Diagnostic, SourceLocation from markitect_tool.schema import ( MarkdownSchema, SchemaValidationResult, @@ -32,4 +44,15 @@ __all__ = [ "load_schema_file", "validate_document", "validate_markdown_file", + "ContractCheckResult", + "ContractValidationResult", + "DocumentContract", + "check_document_contract", + "check_markdown_file", + "collect_metrics", + "load_contract_file", + "validate_contract", + "validate_contract_file", + "Diagnostic", + "SourceLocation", ] diff --git a/src/markitect_tool/cli/main.py b/src/markitect_tool/cli/main.py index a0ff5bf..331caa3 100644 --- a/src/markitect_tool/cli/main.py +++ b/src/markitect_tool/cli/main.py @@ -9,6 +9,13 @@ import click import yaml from markitect_tool.core import parse_markdown_file +from markitect_tool.contract import ( + ContractLoaderError, + check_markdown_file, + collect_metrics, + load_contract_file, + validate_contract, +) from markitect_tool.schema import load_schema_file, validate_markdown_file, validate_schema @@ -41,6 +48,23 @@ def parse(file: Path, output_format: str) -> None: click.echo(json.dumps(data, indent=2, ensure_ascii=False)) +@main.command() +@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path)) +@click.option( + "--format", + "output_format", + type=click.Choice(["json", "yaml", "text"], case_sensitive=False), + default="text", + show_default=True, +) +def metrics(file: Path, output_format: str) -> None: + """Report practical size and complexity metrics for a Markdown file.""" + + document = parse_markdown_file(file) + data = collect_metrics(document).to_dict() | {"document_path": str(file)} + _emit_metrics(data, output_format) + + @main.command() @click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path)) @click.option( @@ -88,6 +112,54 @@ def schema_validate(schema_file: Path, output_format: str) -> None: raise click.exceptions.Exit(0 if result.valid else 1) +@main.group() +def contract() -> None: + """Work with Markdown document contracts.""" + + +@contract.command("validate") +@click.argument("contract_file", type=click.Path(exists=True, dir_okay=False, path_type=Path)) +@click.option( + "--format", + "output_format", + type=click.Choice(["json", "yaml", "text"], case_sensitive=False), + default="text", + show_default=True, +) +def contract_validate(contract_file: Path, output_format: str) -> None: + """Validate that a Markdown contract file is well formed.""" + + result = validate_contract(load_contract_file(contract_file)) + _emit_diagnostic_result(result.to_dict(), output_format) + raise click.exceptions.Exit(0 if result.valid else 1) + + +@contract.command("check") +@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path)) +@click.option( + "--contract", + "contract_file", + required=True, + type=click.Path(exists=True, dir_okay=False, path_type=Path), +) +@click.option( + "--format", + "output_format", + type=click.Choice(["json", "yaml", "text"], case_sensitive=False), + default="text", + show_default=True, +) +def contract_check(file: Path, contract_file: Path, output_format: str) -> None: + """Check a Markdown file against a Markdown document contract.""" + + try: + result = check_markdown_file(file, contract_file) + except ContractLoaderError as exc: + raise click.ClickException(str(exc)) from exc + _emit_diagnostic_result(result.to_dict(), output_format) + raise click.exceptions.Exit(0 if result.valid else 1) + + def _emit_result(data: dict, output_format: str) -> None: if output_format == "json": click.echo(json.dumps(data, indent=2, ensure_ascii=False)) @@ -102,5 +174,45 @@ def _emit_result(data: dict, output_format: str) -> None: click.echo(f"- {violation['path']}: {violation['message']}") +def _emit_diagnostic_result(data: dict, output_format: str) -> None: + if output_format == "json": + click.echo(json.dumps(data, indent=2, ensure_ascii=False)) + elif output_format == "yaml": + click.echo(yaml.safe_dump(data, sort_keys=False)) + else: + click.echo("valid" if data.get("valid") else "invalid") + for diagnostic in data.get("diagnostics", []): + click.echo( + f"- [{diagnostic['severity']}] {diagnostic['code']}: " + f"{diagnostic['message']}" + ) + if diagnostic.get("source"): + source = diagnostic["source"] + suffix = f":{source['line']}" if source.get("line") else "" + click.echo(f" source: {source.get('path', '')}{suffix}") + if diagnostic.get("guidance"): + click.echo(f" guidance: {diagnostic['guidance']}") + + +def _emit_metrics(data: dict, output_format: str) -> None: + if output_format == "json": + click.echo(json.dumps(data, indent=2, ensure_ascii=False)) + elif output_format == "yaml": + click.echo(yaml.safe_dump(data, sort_keys=False)) + else: + doc = data["document"] + click.echo("document") + for metric, value in doc.items(): + click.echo(f"- {metric}: {value}") + sections = data.get("sections", []) + if sections: + click.echo("sections") + for section in sections: + click.echo( + f"- {section['heading']}: words={section['words']}, " + f"paragraphs={section['paragraphs']}, line={section['line']}" + ) + + if __name__ == "__main__": main() diff --git a/src/markitect_tool/contract/__init__.py b/src/markitect_tool/contract/__init__.py new file mode 100644 index 0000000..c9acc26 --- /dev/null +++ b/src/markitect_tool/contract/__init__.py @@ -0,0 +1,47 @@ +"""Document contract loading, metrics, and validation.""" + +from markitect_tool.contract.checker import ( + ContractCheckResult, + ContractValidationResult, + check_document_contract, + check_markdown_file, + validate_contract, + validate_contract_file, +) +from markitect_tool.contract.loader import ( + ContractLoaderError, + ContractNotFoundError, + InvalidContractFormatError, + load_contract_file, + load_contract_text, +) +from markitect_tool.contract.metrics import DocumentMetrics, SectionMetrics, collect_metrics +from markitect_tool.contract.model import ( + AssertionSpec, + DocumentContract, + FieldSpec, + MetricBand, + SectionSpec, +) + +__all__ = [ + "AssertionSpec", + "ContractCheckResult", + "ContractLoaderError", + "ContractNotFoundError", + "ContractValidationResult", + "DocumentContract", + "DocumentMetrics", + "FieldSpec", + "InvalidContractFormatError", + "MetricBand", + "SectionMetrics", + "SectionSpec", + "check_document_contract", + "check_markdown_file", + "collect_metrics", + "load_contract_file", + "load_contract_text", + "validate_contract", + "validate_contract_file", +] diff --git a/src/markitect_tool/contract/checker.py b/src/markitect_tool/contract/checker.py new file mode 100644 index 0000000..167d2c1 --- /dev/null +++ b/src/markitect_tool/contract/checker.py @@ -0,0 +1,945 @@ +"""Validate contracts and check Markdown documents against them.""" + +from __future__ import annotations + +import re +from dataclasses import dataclass, field +from pathlib import Path +from typing import Any + +from markitect_tool.contract.loader import load_contract_file +from markitect_tool.contract.metrics import DocumentMetrics, SectionMetrics, collect_metrics +from markitect_tool.contract.model import ( + FIELD_TYPES, + METRIC_NAMES, + PRESENCE_VALUES, + AssertionSpec, + DocumentContract, + FieldSpec, + MetricBand, + SectionSpec, + normalize_metric_name, +) +from markitect_tool.core import Document, Section, parse_markdown_file +from markitect_tool.diagnostics import ( + Diagnostic, + SourceLocation, + has_error, + valid_severity, +) + + +@dataclass(frozen=True) +class ContractValidationResult: + """Validation result for a contract definition.""" + + valid: bool + diagnostics: list[Diagnostic] + contract_path: str | None = None + + def to_dict(self) -> dict[str, Any]: + data = { + "valid": self.valid, + "diagnostics": [diagnostic.to_dict() for diagnostic in self.diagnostics], + "contract_path": self.contract_path, + } + return {key: value for key, value in data.items() if value is not None} + + +@dataclass(frozen=True) +class ContractCheckResult: + """Check result for one document and one contract.""" + + valid: bool + diagnostics: list[Diagnostic] + document_path: str | None = None + contract_path: str | None = None + metrics: dict[str, Any] = field(default_factory=dict) + + def to_dict(self) -> dict[str, Any]: + data = { + "valid": self.valid, + "diagnostics": [diagnostic.to_dict() for diagnostic in self.diagnostics], + "document_path": self.document_path, + "contract_path": self.contract_path, + "metrics": self.metrics or None, + } + return {key: value for key, value in data.items() if value is not None} + + +def validate_contract_file(contract_path: str | Path) -> ContractValidationResult: + """Load and validate a Markdown contract file.""" + + return validate_contract(load_contract_file(contract_path)) + + +def validate_contract(contract: DocumentContract) -> ContractValidationResult: + """Validate the contract definition itself.""" + + diagnostics: list[Diagnostic] = [] + contract_location = _contract_location(contract) + + if not contract.id: + diagnostics.append( + Diagnostic( + severity="error", + code="contract.id.missing", + message="Contract must declare an id.", + contract=contract_location, + guidance="Add `id` to the contract YAML block or frontmatter.", + ) + ) + if not contract.document_type: + diagnostics.append( + Diagnostic( + severity="error", + code="contract.document_type.missing", + message="Contract must declare the document type it governs.", + contract=contract_location, + guidance="Add `document.type` or `document_type` to the contract.", + ) + ) + + section_ids: set[str] = set() + for section in contract.sections: + diagnostics.extend(_validate_section_spec(section, contract)) + if section.id: + if section.id in section_ids: + diagnostics.append( + Diagnostic( + severity="error", + code="contract.section.id.duplicate", + message=f"Section id `{section.id}` is declared more than once.", + contract=contract_location, + rule_id=section.id, + ) + ) + section_ids.add(section.id) + + for field_spec in contract.fields: + diagnostics.extend(_validate_field_spec(field_spec, contract)) + for band in contract.metrics: + diagnostics.extend(_validate_metric_band(band, contract, rule_id=band.rule_id)) + for assertion in contract.assertions: + diagnostics.extend(_validate_assertion(assertion, contract)) + + return ContractValidationResult( + valid=not has_error(diagnostics), + diagnostics=diagnostics, + contract_path=contract.source_path, + ) + + +def check_markdown_file( + markdown_path: str | Path, contract_path: str | Path +) -> ContractCheckResult: + """Parse and check a Markdown file against a contract file.""" + + document = parse_markdown_file(markdown_path) + contract = load_contract_file(contract_path) + return check_document_contract(document, contract) + + +def check_document_contract( + document: Document, contract: DocumentContract +) -> ContractCheckResult: + """Check a parsed Markdown document against a document contract.""" + + contract_validation = validate_contract(contract) + document_metrics = collect_metrics(document) + diagnostics = list(contract_validation.diagnostics) + if contract_validation.valid: + diagnostics.extend(_check_document_type(document, contract)) + diagnostics.extend(_check_fields(document, contract)) + diagnostics.extend(_check_document_metrics(document, contract, document_metrics)) + diagnostics.extend(_check_assertions(document.body, contract.assertions, document, contract)) + diagnostics.extend(_check_sections(document, contract, document_metrics)) + + return ContractCheckResult( + valid=not has_error(diagnostics), + diagnostics=diagnostics, + document_path=document.source_path, + contract_path=contract.source_path, + metrics=document_metrics.to_dict(), + ) + + +def _validate_section_spec( + section: SectionSpec, contract: DocumentContract +) -> list[Diagnostic]: + diagnostics: list[Diagnostic] = [] + contract_location = _contract_location(contract) + if not section.id: + diagnostics.append( + Diagnostic( + severity="error", + code="contract.section.id.missing", + message="Every section specification must declare an id.", + contract=contract_location, + ) + ) + if section.presence not in PRESENCE_VALUES: + diagnostics.append( + Diagnostic( + severity="error", + code="contract.section.presence.invalid", + message=( + f"Section `{section.id or ''}` uses unsupported presence " + f"`{section.presence}`." + ), + contract=contract_location, + rule_id=section.id, + ) + ) + if section.level is not None and not isinstance(section.level, int): + diagnostics.append( + Diagnostic( + severity="error", + code="contract.section.level.invalid", + message=f"Section `{section.id}` level must be an integer.", + contract=contract_location, + rule_id=section.id, + ) + ) + for band in section.metrics: + diagnostics.extend(_validate_metric_band(band, contract, rule_id=section.id)) + for assertion in section.assertions: + diagnostics.extend(_validate_assertion(assertion, contract)) + return diagnostics + + +def _validate_field_spec(field_spec: FieldSpec, contract: DocumentContract) -> list[Diagnostic]: + diagnostics: list[Diagnostic] = [] + contract_location = _contract_location(contract) + if not field_spec.id: + diagnostics.append( + Diagnostic( + severity="error", + code="contract.field.id.missing", + message="Every field specification must declare an id.", + contract=contract_location, + ) + ) + if field_spec.type and field_spec.type not in FIELD_TYPES: + diagnostics.append( + Diagnostic( + severity="error", + code="contract.field.type.invalid", + message=f"Field `{field_spec.id}` uses unsupported type `{field_spec.type}`.", + contract=contract_location, + rule_id=field_spec.id, + ) + ) + if field_spec.pattern: + diagnostics.extend(_validate_regex(field_spec.pattern, contract, field_spec.id)) + return diagnostics + + +def _validate_metric_band( + band: MetricBand, contract: DocumentContract, rule_id: str | None = None +) -> list[Diagnostic]: + diagnostics: list[Diagnostic] = [] + contract_location = _contract_location(contract) + if not isinstance(band.raw, dict): + diagnostics.append( + Diagnostic( + severity="error", + code="contract.metric.band.invalid", + message=f"Metric `{band.metric}` band must be a mapping.", + contract=contract_location, + rule_id=rule_id, + ) + ) + return diagnostics + if band.metric not in METRIC_NAMES: + diagnostics.append( + Diagnostic( + severity="error", + code="contract.metric.unknown", + message=f"Unsupported metric `{band.metric}`.", + contract=contract_location, + rule_id=rule_id, + ) + ) + for severity in {band.severity, band.min_severity, band.max_severity}: + if severity is not None and not valid_severity(severity): + diagnostics.append( + Diagnostic( + severity="error", + code="contract.severity.invalid", + message=f"Unsupported severity `{severity}` for metric `{band.metric}`.", + contract=contract_location, + rule_id=rule_id, + ) + ) + if band.min is None and band.max is None: + diagnostics.append( + Diagnostic( + severity="error", + code="contract.metric.band.empty", + message=f"Metric `{band.metric}` needs at least one of min or max.", + contract=contract_location, + rule_id=rule_id, + ) + ) + if band.min is not None and not isinstance(band.min, int | float): + diagnostics.append( + Diagnostic( + severity="error", + code="contract.metric.min.invalid", + message=f"Metric `{band.metric}` min must be numeric.", + contract=contract_location, + rule_id=rule_id, + ) + ) + if band.max is not None and not isinstance(band.max, int | float): + diagnostics.append( + Diagnostic( + severity="error", + code="contract.metric.max.invalid", + message=f"Metric `{band.metric}` max must be numeric.", + contract=contract_location, + rule_id=rule_id, + ) + ) + if ( + isinstance(band.min, int | float) + and isinstance(band.max, int | float) + and band.min > band.max + ): + diagnostics.append( + Diagnostic( + severity="error", + code="contract.metric.band.inverted", + message=f"Metric `{band.metric}` min cannot be greater than max.", + contract=contract_location, + rule_id=rule_id, + ) + ) + return diagnostics + + +def _validate_assertion( + assertion: AssertionSpec, contract: DocumentContract +) -> list[Diagnostic]: + diagnostics: list[Diagnostic] = [] + contract_location = _contract_location(contract) + if not valid_severity(assertion.severity): + diagnostics.append( + Diagnostic( + severity="error", + code="contract.severity.invalid", + message=f"Unsupported assertion severity `{assertion.severity}`.", + contract=contract_location, + rule_id=assertion.id, + ) + ) + if not any( + [ + assertion.contains, + assertion.contains_any, + assertion.not_contains, + assertion.matches, + assertion.not_matches, + ] + ): + diagnostics.append( + Diagnostic( + severity="error", + code="contract.assertion.empty", + message="Assertion needs at least one deterministic condition.", + contract=contract_location, + rule_id=assertion.id, + ) + ) + for pattern in assertion.matches + assertion.not_matches: + diagnostics.extend(_validate_regex(pattern, contract, assertion.id)) + return diagnostics + + +def _validate_regex( + pattern: str, contract: DocumentContract, rule_id: str | None +) -> list[Diagnostic]: + try: + re.compile(pattern) + except re.error as exc: + return [ + Diagnostic( + severity="error", + code="contract.regex.invalid", + message=f"Invalid regular expression `{pattern}`: {exc}", + contract=_contract_location(contract), + rule_id=rule_id, + ) + ] + return [] + + +def _check_document_type(document: Document, contract: DocumentContract) -> list[Diagnostic]: + declared = ( + document.frontmatter.get("document_type") + or document.frontmatter.get("document-type") + or document.frontmatter.get("type") + ) + if not declared or not contract.document_type or str(declared) == contract.document_type: + return [] + return [ + Diagnostic( + severity="error", + code="contract.document_type.mismatch", + message=( + f"Document declares type `{declared}`, but contract expects " + f"`{contract.document_type}`." + ), + source=SourceLocation(path=document.source_path, line=1), + contract=_contract_location(contract), + rule_id=contract.id, + guidance="Use the matching contract or update the document frontmatter type.", + ) + ] + + +def _check_fields(document: Document, contract: DocumentContract) -> list[Diagnostic]: + diagnostics: list[Diagnostic] = [] + document_data = document.to_dict() + for field_spec in contract.fields: + value, exists = _resolve_path(document_data, field_spec.path or "") + field_location = SourceLocation(path=document.source_path, line=1) + if field_spec.required and not exists: + diagnostics.append( + Diagnostic( + severity="error", + code="contract.field.missing", + message=f"Required field `{field_spec.id}` is missing.", + source=field_location, + contract=_contract_location(contract), + rule_id=field_spec.id, + guidance=f"Provide `{field_spec.path}` in the document or context.", + ) + ) + continue + if not exists: + continue + diagnostics.extend(_check_field_value(field_spec, value, field_location, contract)) + return diagnostics + + +def _check_field_value( + field_spec: FieldSpec, + value: Any, + field_location: SourceLocation, + contract: DocumentContract, +) -> list[Diagnostic]: + diagnostics: list[Diagnostic] = [] + if field_spec.type and not _value_matches_type(value, field_spec.type): + diagnostics.append( + Diagnostic( + severity="error", + code="contract.field.type_mismatch", + message=( + f"Field `{field_spec.id}` must be `{field_spec.type}`, " + f"got `{type(value).__name__}`." + ), + source=field_location, + contract=_contract_location(contract), + rule_id=field_spec.id, + ) + ) + if field_spec.enum is not None and value not in field_spec.enum: + diagnostics.append( + Diagnostic( + severity="error", + code="contract.field.enum", + message=f"Field `{field_spec.id}` must be one of {field_spec.enum}.", + source=field_location, + contract=_contract_location(contract), + rule_id=field_spec.id, + ) + ) + if field_spec.pattern and isinstance(value, str) and not re.search(field_spec.pattern, value): + diagnostics.append( + Diagnostic( + severity="error", + code="contract.field.pattern", + message=f"Field `{field_spec.id}` does not match its required pattern.", + source=field_location, + contract=_contract_location(contract), + rule_id=field_spec.id, + ) + ) + if field_spec.min_length is not None and hasattr(value, "__len__") and len(value) < field_spec.min_length: + diagnostics.append( + Diagnostic( + severity="error", + code="contract.field.min_length", + message=f"Field `{field_spec.id}` is shorter than {field_spec.min_length}.", + source=field_location, + contract=_contract_location(contract), + rule_id=field_spec.id, + ) + ) + if field_spec.max_length is not None and hasattr(value, "__len__") and len(value) > field_spec.max_length: + diagnostics.append( + Diagnostic( + severity="error", + code="contract.field.max_length", + message=f"Field `{field_spec.id}` is longer than {field_spec.max_length}.", + source=field_location, + contract=_contract_location(contract), + rule_id=field_spec.id, + ) + ) + if field_spec.min is not None and isinstance(value, int | float) and value < field_spec.min: + diagnostics.append( + Diagnostic( + severity="error", + code="contract.field.min", + message=f"Field `{field_spec.id}` is below {field_spec.min}.", + source=field_location, + contract=_contract_location(contract), + rule_id=field_spec.id, + ) + ) + if field_spec.max is not None and isinstance(value, int | float) and value > field_spec.max: + diagnostics.append( + Diagnostic( + severity="error", + code="contract.field.max", + message=f"Field `{field_spec.id}` is above {field_spec.max}.", + source=field_location, + contract=_contract_location(contract), + rule_id=field_spec.id, + ) + ) + return diagnostics + + +def _check_document_metrics( + document: Document, + contract: DocumentContract, + metrics: DocumentMetrics, +) -> list[Diagnostic]: + return _check_bands( + contract.metrics, + metrics.to_dict()["document"], + source=SourceLocation(path=document.source_path, line=1), + contract=contract, + subject=f"document `{contract.document_type or contract.id}`", + ) + + +def _check_sections( + document: Document, + contract: DocumentContract, + metrics: DocumentMetrics, +) -> list[Diagnostic]: + diagnostics: list[Diagnostic] = [] + section_metrics_by_index = { + index: section_metrics + for index, section_metrics in enumerate(metrics.section_metrics) + } + matches_by_id: dict[str, list[tuple[int, Section]]] = {} + + for section_spec in contract.sections: + matches = _matching_sections(document.sections, section_spec) + if section_spec.id: + matches_by_id[section_spec.id] = matches + diagnostics.extend(_check_section_presence(document, contract, section_spec, matches)) + if not matches or section_spec.presence in {"forbidden", "discouraged"}: + continue + + if len(matches) > 1: + diagnostics.append( + Diagnostic( + severity="warning", + code="contract.section.duplicate", + message=f"Section `{section_spec.id}` appears {len(matches)} times.", + source=SourceLocation(path=document.source_path, line=matches[1][1].heading.line), + contract=_contract_location(contract), + rule_id=section_spec.id, + guidance="Keep one authoritative section or split it into distinct section roles.", + ) + ) + for index, section in matches: + diagnostics.extend(_check_section_level(document, contract, section_spec, section)) + section_metrics = section_metrics_by_index[index] + diagnostics.extend( + _check_section_metrics(document, section, section_metrics, contract, section_spec) + ) + section_text = "\n".join(block.text for block in section.blocks if block.text) + diagnostics.extend( + _check_assertions(section_text, section_spec.assertions, document, contract, section) + ) + + diagnostics.extend(_check_ordering(document, contract, matches_by_id)) + return diagnostics + + +def _matching_sections( + sections: list[Section], section_spec: SectionSpec +) -> list[tuple[int, Section]]: + expected = {_normalize_heading(value) for value in section_spec.headings} + if not expected: + return [] + return [ + (index, section) + for index, section in enumerate(sections) + if _normalize_heading(section.heading.text) in expected + ] + + +def _check_section_presence( + document: Document, + contract: DocumentContract, + section_spec: SectionSpec, + matches: list[tuple[int, Section]], +) -> list[Diagnostic]: + if matches and section_spec.presence == "forbidden": + return [ + Diagnostic( + severity="error", + code="contract.section.forbidden", + message=f"Forbidden section `{section_spec.id}` is present.", + source=SourceLocation(path=document.source_path, line=matches[0][1].heading.line), + contract=_contract_location(contract), + rule_id=section_spec.id, + guidance=f"Remove the `{matches[0][1].heading.text}` section.", + ) + ] + if matches and section_spec.presence == "discouraged": + return [ + Diagnostic( + severity="warning", + code="contract.section.discouraged", + message=f"Discouraged section `{section_spec.id}` is present.", + source=SourceLocation(path=document.source_path, line=matches[0][1].heading.line), + contract=_contract_location(contract), + rule_id=section_spec.id, + ) + ] + if not matches and section_spec.presence == "required": + return [ + Diagnostic( + severity="error", + code="contract.section.missing", + message=f"Required section `{section_spec.id}` is missing.", + source=SourceLocation(path=document.source_path), + contract=_contract_location(contract), + rule_id=section_spec.id, + guidance=_section_guidance(section_spec), + ) + ] + if not matches and section_spec.presence == "recommended": + return [ + Diagnostic( + severity="warning", + code="contract.section.recommended_missing", + message=f"Recommended section `{section_spec.id}` is missing.", + source=SourceLocation(path=document.source_path), + contract=_contract_location(contract), + rule_id=section_spec.id, + guidance=_section_guidance(section_spec), + ) + ] + return [] + + +def _check_section_level( + document: Document, + contract: DocumentContract, + section_spec: SectionSpec, + section: Section, +) -> list[Diagnostic]: + if section_spec.level is None or section.heading.level == section_spec.level: + return [] + return [ + Diagnostic( + severity="error", + code="contract.section.level", + message=( + f"Section `{section_spec.id}` must use heading level " + f"{section_spec.level}, got {section.heading.level}." + ), + source=SourceLocation(path=document.source_path, line=section.heading.line), + contract=_contract_location(contract), + rule_id=section_spec.id, + guidance=f"Change the heading to {'#' * section_spec.level} {section.heading.text}.", + ) + ] + + +def _check_section_metrics( + document: Document, + section: Section, + section_metrics: SectionMetrics, + contract: DocumentContract, + section_spec: SectionSpec, +) -> list[Diagnostic]: + return _check_bands( + section_spec.metrics, + section_metrics.to_dict(), + source=SourceLocation(path=document.source_path, line=section.heading.line), + contract=contract, + subject=f"section `{section.heading.text}`", + rule_id=section_spec.id, + ) + + +def _check_ordering( + document: Document, + contract: DocumentContract, + matches_by_id: dict[str, list[tuple[int, Section]]], +) -> list[Diagnostic]: + diagnostics: list[Diagnostic] = [] + for section_spec in contract.sections: + if not section_spec.id or not matches_by_id.get(section_spec.id): + continue + index = matches_by_id[section_spec.id][0][0] + for target in section_spec.order_before: + target_match = matches_by_id.get(target) + if target_match and index > target_match[0][0]: + diagnostics.append( + Diagnostic( + severity="error", + code="contract.section.order", + message=f"Section `{section_spec.id}` must appear before `{target}`.", + source=SourceLocation( + path=document.source_path, + line=matches_by_id[section_spec.id][0][1].heading.line, + ), + contract=_contract_location(contract), + rule_id=section_spec.id, + ) + ) + for target in section_spec.order_after: + target_match = matches_by_id.get(target) + if target_match and index < target_match[0][0]: + diagnostics.append( + Diagnostic( + severity="error", + code="contract.section.order", + message=f"Section `{section_spec.id}` must appear after `{target}`.", + source=SourceLocation( + path=document.source_path, + line=matches_by_id[section_spec.id][0][1].heading.line, + ), + contract=_contract_location(contract), + rule_id=section_spec.id, + ) + ) + return diagnostics + + +def _check_bands( + bands: list[MetricBand], + values: dict[str, Any], + *, + source: SourceLocation, + contract: DocumentContract, + subject: str, + rule_id: str | None = None, +) -> list[Diagnostic]: + diagnostics: list[Diagnostic] = [] + for band in bands: + metric = normalize_metric_name(band.metric) + if metric not in values: + continue + actual = values[metric] + if band.min is not None and actual < band.min: + diagnostics.append( + Diagnostic( + severity=band.severity_for("min"), + code="contract.metric.too_low", + message=( + f"{subject} has {actual} {metric}; expected at least {band.min}." + ), + source=source, + contract=_contract_location(contract), + rule_id=band.rule_id or rule_id, + guidance=band.guidance, + details={"metric": metric, "actual": actual, "min": band.min}, + ) + ) + if band.max is not None and actual > band.max: + diagnostics.append( + Diagnostic( + severity=band.severity_for("max"), + code="contract.metric.too_high", + message=f"{subject} has {actual} {metric}; expected at most {band.max}.", + source=source, + contract=_contract_location(contract), + rule_id=band.rule_id or rule_id, + guidance=band.guidance, + details={"metric": metric, "actual": actual, "max": band.max}, + ) + ) + return diagnostics + + +def _check_assertions( + text: str, + assertions: list[AssertionSpec], + document: Document, + contract: DocumentContract, + section: Section | None = None, +) -> list[Diagnostic]: + diagnostics: list[Diagnostic] = [] + source_line = section.heading.line if section else 1 + for assertion in assertions: + diagnostics.extend( + _check_assertion( + text, + assertion, + source=SourceLocation(path=document.source_path, line=source_line), + contract=contract, + ) + ) + return diagnostics + + +def _check_assertion( + text: str, + assertion: AssertionSpec, + *, + source: SourceLocation, + contract: DocumentContract, +) -> list[Diagnostic]: + diagnostics: list[Diagnostic] = [] + haystack = text if assertion.case_sensitive else text.lower() + + for needle in assertion.contains: + expected = needle if assertion.case_sensitive else needle.lower() + if expected not in haystack: + diagnostics.append( + _assertion_diagnostic( + assertion, + "contract.assertion.contains_missing", + assertion.message or f"Expected content to contain `{needle}`.", + source, + contract, + {"expected": needle}, + ) + ) + + if assertion.contains_any: + if not any( + (needle if assertion.case_sensitive else needle.lower()) in haystack + for needle in assertion.contains_any + ): + diagnostics.append( + _assertion_diagnostic( + assertion, + "contract.assertion.contains_any_missing", + assertion.message + or f"Expected content to contain one of {assertion.contains_any}.", + source, + contract, + {"expected_any": assertion.contains_any}, + ) + ) + + for needle in assertion.not_contains: + forbidden = needle if assertion.case_sensitive else needle.lower() + if forbidden in haystack: + diagnostics.append( + _assertion_diagnostic( + assertion, + "contract.assertion.forbidden_content", + assertion.message or f"Content must not contain `{needle}`.", + source, + contract, + {"forbidden": needle}, + ) + ) + + regex_flags = 0 if assertion.case_sensitive else re.IGNORECASE + for pattern in assertion.matches: + if not re.search(pattern, text, flags=regex_flags | re.MULTILINE): + diagnostics.append( + _assertion_diagnostic( + assertion, + "contract.assertion.pattern_missing", + assertion.message or f"Expected content to match `{pattern}`.", + source, + contract, + {"pattern": pattern}, + ) + ) + for pattern in assertion.not_matches: + if re.search(pattern, text, flags=regex_flags | re.MULTILINE): + diagnostics.append( + _assertion_diagnostic( + assertion, + "contract.assertion.forbidden_pattern", + assertion.message or f"Content must not match `{pattern}`.", + source, + contract, + {"pattern": pattern}, + ) + ) + return diagnostics + + +def _assertion_diagnostic( + assertion: AssertionSpec, + code: str, + message: str, + source: SourceLocation, + contract: DocumentContract, + details: dict[str, Any], +) -> Diagnostic: + return Diagnostic( + severity=assertion.severity, + code=code, + message=message, + source=source, + contract=_contract_location(contract), + rule_id=assertion.id, + guidance=assertion.guidance, + details=details, + ) + + +def _section_guidance(section_spec: SectionSpec) -> str: + heading = section_spec.title or (section_spec.headings[0] if section_spec.headings else section_spec.id) + level = section_spec.level or 2 + return f"Add a {'#' * level} {heading} section." + + +def _contract_location(contract: DocumentContract) -> SourceLocation: + return SourceLocation(path=contract.source_path, line=contract.source_line) + + +def _normalize_heading(text: str) -> str: + return re.sub(r"\s+", " ", text.strip().lower()) + + +def _resolve_path(data: dict[str, Any], path: str) -> tuple[Any, bool]: + if not path: + return None, False + normalized = path.removeprefix("$.").removeprefix("document.") + current: Any = data + for part in normalized.split("."): + if isinstance(current, dict) and part in current: + current = current[part] + else: + return None, False + return current, True + + +def _value_matches_type(value: Any, expected_type: str) -> bool: + if expected_type == "string": + return isinstance(value, str) + if expected_type == "number": + return isinstance(value, int | float) and not isinstance(value, bool) + if expected_type == "integer": + return isinstance(value, int) and not isinstance(value, bool) + if expected_type == "boolean": + return isinstance(value, bool) + if expected_type == "array": + return isinstance(value, list) + if expected_type == "object": + return isinstance(value, dict) + if expected_type == "date": + return isinstance(value, str) + return True diff --git a/src/markitect_tool/contract/loader.py b/src/markitect_tool/contract/loader.py new file mode 100644 index 0000000..0636b7c --- /dev/null +++ b/src/markitect_tool/contract/loader.py @@ -0,0 +1,142 @@ +"""Load document contracts from Markdown files.""" + +from __future__ import annotations + +from copy import deepcopy +from pathlib import Path +from typing import Any + +import yaml + +from markitect_tool.contract.model import DocumentContract +from markitect_tool.core import parse_markdown + + +class ContractLoaderError(ValueError): + """Raised when a contract file cannot be loaded.""" + + +class ContractNotFoundError(ContractLoaderError): + """Raised when no contract definition can be found in a Markdown file.""" + + +class InvalidContractFormatError(ContractLoaderError): + """Raised when the contract definition is not valid YAML.""" + + +def load_contract_file(path: str | Path) -> DocumentContract: + """Load a Markdown-native document contract file.""" + + file_path = Path(path) + text = file_path.read_text(encoding="utf-8") + return load_contract_text(text, source_path=str(file_path)) + + +def load_contract_text(text: str, source_path: str | None = None) -> DocumentContract: + """Load a document contract from Markdown text.""" + + document = parse_markdown(text, source_path=source_path) + frontmatter_contract = document.frontmatter.get("contract") + if frontmatter_contract is not None and not isinstance(frontmatter_contract, dict): + raise InvalidContractFormatError("Frontmatter `contract` must be a mapping") + + block_data, block_line = _extract_contract_block(document.tokens, source_path) + merged = _merge_contracts(frontmatter_contract or {}, block_data or {}) + + metadata = { + key: value + for key, value in document.frontmatter.items() + if key != "contract" + } + if not merged and _looks_like_contract(metadata): + merged = deepcopy(metadata) + if not merged: + raise ContractNotFoundError( + "No contract definition found. Add a fenced ```yaml contract block." + ) + return DocumentContract.from_mapping( + merged, + metadata=metadata, + source_path=source_path, + source_line=block_line, + ) + + +def _extract_contract_block( + tokens: list[dict[str, Any]], source_path: str | None +) -> tuple[dict[str, Any] | None, int | None]: + yaml_candidates: list[tuple[dict[str, Any], int | None, bool]] = [] + for token in tokens: + if token.get("type") != "fence": + continue + info = str(token.get("info", "")).strip().lower() + if not _is_yaml_info(info): + continue + line = _token_line(token) + raw_yaml = token.get("content", "") + try: + data = yaml.safe_load(raw_yaml) if raw_yaml.strip() else {} + except yaml.YAMLError as exc: + raise InvalidContractFormatError( + f"Invalid YAML contract block in {source_path or ''}: {exc}" + ) from exc + if data is None: + data = {} + if not isinstance(data, dict): + raise InvalidContractFormatError("Contract YAML block must be a mapping") + yaml_candidates.append((data, line, "contract" in info.split())) + + for data, line, explicit in yaml_candidates: + if explicit: + return data, line + for data, line, _explicit in yaml_candidates: + if _looks_like_contract(data): + return data, line + return None, None + + +def _is_yaml_info(info: str) -> bool: + parts = info.split() + return "yaml" in parts or "yml" in parts + + +def _token_line(token: dict[str, Any]) -> int | None: + token_map = token.get("map") + if not token_map: + return None + return int(token_map[0]) + 1 + + +def _looks_like_contract(data: dict[str, Any]) -> bool: + return any( + key in data + for key in { + "document", + "document_type", + "document-type", + "sections", + "fields", + "metrics", + "metric_bands", + "assertions", + "forms", + "rubrics", + } + ) + + +def _merge_contracts( + frontmatter_contract: dict[str, Any], block_contract: dict[str, Any] +) -> dict[str, Any]: + merged = deepcopy(frontmatter_contract) + for key, value in block_contract.items(): + if ( + isinstance(value, dict) + and isinstance(merged.get(key), dict) + ): + nested = deepcopy(merged[key]) + nested.update(value) + merged[key] = nested + else: + merged[key] = value + return merged diff --git a/src/markitect_tool/contract/metrics.py b/src/markitect_tool/contract/metrics.py new file mode 100644 index 0000000..e973065 --- /dev/null +++ b/src/markitect_tool/contract/metrics.py @@ -0,0 +1,127 @@ +"""Metric extraction for parsed Markdown documents.""" + +from __future__ import annotations + +import re +from dataclasses import dataclass, field +from typing import Any + +from markitect_tool.core import Document, Section + + +WORD_RE = re.compile(r"[A-Za-z0-9]+(?:[-'][A-Za-z0-9]+)*") +SENTENCE_RE = re.compile(r"[.!?]+(?:\s|$)") +LIST_ITEM_RE = re.compile(r"^\s*(?:[-+*]|\d+[.)])\s+", re.MULTILINE) + + +@dataclass(frozen=True) +class SectionMetrics: + """Metrics for one heading-led section.""" + + heading: str + line: int + level: int + characters: int + words: int + sentences: int + paragraphs: int + sections: int = 1 + headings: int = 1 + list_items: int = 0 + code_blocks: int = 0 + nesting_depth: int = 1 + + def to_dict(self) -> dict[str, Any]: + return { + "heading": self.heading, + "line": self.line, + "level": self.level, + "characters": self.characters, + "words": self.words, + "sentences": self.sentences, + "paragraphs": self.paragraphs, + "sections": self.sections, + "headings": self.headings, + "list_items": self.list_items, + "code_blocks": self.code_blocks, + "nesting_depth": self.nesting_depth, + } + + +@dataclass(frozen=True) +class DocumentMetrics: + """Metrics for a parsed Markdown document.""" + + characters: int + words: int + sentences: int + paragraphs: int + sections: int + headings: int + list_items: int + code_blocks: int + max_heading_depth: int + section_metrics: list[SectionMetrics] = field(default_factory=list) + + def to_dict(self) -> dict[str, Any]: + return { + "document": { + "characters": self.characters, + "words": self.words, + "sentences": self.sentences, + "paragraphs": self.paragraphs, + "sections": self.sections, + "headings": self.headings, + "list_items": self.list_items, + "code_blocks": self.code_blocks, + "max_heading_depth": self.max_heading_depth, + }, + "sections": [section.to_dict() for section in self.section_metrics], + } + + +def collect_metrics(document: Document) -> DocumentMetrics: + """Collect document-level and section-level metrics.""" + + section_metrics = [_section_metrics(section) for section in document.sections] + text = document.body.strip() + return DocumentMetrics( + characters=len(text), + words=count_words(text), + sentences=count_sentences(text), + paragraphs=sum(1 for block in document.blocks if block.type == "paragraph"), + sections=len(document.sections), + headings=len(document.headings), + list_items=count_list_items(text), + code_blocks=sum(1 for block in document.blocks if block.type == "code"), + max_heading_depth=max((heading.level for heading in document.headings), default=0), + section_metrics=section_metrics, + ) + + +def count_words(text: str) -> int: + return len(WORD_RE.findall(text)) + + +def count_sentences(text: str) -> int: + return len(SENTENCE_RE.findall(text)) + + +def count_list_items(text: str) -> int: + return len(LIST_ITEM_RE.findall(text)) + + +def _section_metrics(section: Section) -> SectionMetrics: + text = "\n".join(block.text for block in section.blocks if block.text).strip() + return SectionMetrics( + heading=section.heading.text, + line=section.heading.line, + level=section.heading.level, + characters=len(text), + words=count_words(text), + sentences=count_sentences(text), + paragraphs=sum(1 for block in section.blocks if block.type == "paragraph"), + list_items=count_list_items(text), + code_blocks=sum(1 for block in section.blocks if block.type == "code"), + nesting_depth=section.heading.level, + ) diff --git a/src/markitect_tool/contract/model.py b/src/markitect_tool/contract/model.py new file mode 100644 index 0000000..2c518fc --- /dev/null +++ b/src/markitect_tool/contract/model.py @@ -0,0 +1,364 @@ +"""Markdown-native document contract model.""" + +from __future__ import annotations + +from dataclasses import dataclass, field +from typing import Any + + +PRESENCE_VALUES = {"required", "recommended", "optional", "discouraged", "forbidden"} +FIELD_TYPES = { + "string", + "number", + "integer", + "boolean", + "array", + "object", + "date", +} +METRIC_ALIASES = { + "char": "characters", + "chars": "characters", + "character": "characters", + "characters": "characters", + "word": "words", + "words": "words", + "word_count": "words", + "sentence": "sentences", + "sentences": "sentences", + "paragraph": "paragraphs", + "paragraphs": "paragraphs", + "section": "sections", + "sections": "sections", + "heading": "headings", + "headings": "headings", + "list_item": "list_items", + "list_items": "list_items", + "code_block": "code_blocks", + "code_blocks": "code_blocks", + "max_heading_depth": "max_heading_depth", + "heading_depth": "max_heading_depth", + "nesting_depth": "nesting_depth", +} +METRIC_NAMES = set(METRIC_ALIASES.values()) + + +@dataclass(frozen=True) +class MetricBand: + """A soft or hard target for one metric.""" + + metric: str + min: float | None = None + max: float | None = None + severity: str = "warning" + min_severity: str | None = None + max_severity: str | None = None + rule_id: str | None = None + guidance: str | None = None + raw: Any = field(default_factory=dict) + + @classmethod + def from_mapping(cls, metric: str, raw: Any) -> "MetricBand": + normalized = normalize_metric_name(metric) + if not isinstance(raw, dict): + return cls(metric=normalized, raw=raw) + return cls( + metric=normalized, + min=raw.get("min"), + max=raw.get("max"), + severity=str(raw.get("severity", "warning")), + min_severity=raw.get("min_severity"), + max_severity=raw.get("max_severity"), + rule_id=raw.get("id") or raw.get("rule_id"), + guidance=raw.get("guidance"), + raw=raw, + ) + + def severity_for(self, bound: str) -> str: + if bound == "min": + return self.min_severity or self.severity + if bound == "max": + return self.max_severity or self.severity + return self.severity + + +@dataclass(frozen=True) +class AssertionSpec: + """A deterministic assertion over document or section text.""" + + id: str | None = None + message: str | None = None + severity: str = "error" + guidance: str | None = None + contains: list[str] = field(default_factory=list) + contains_any: list[str] = field(default_factory=list) + not_contains: list[str] = field(default_factory=list) + matches: list[str] = field(default_factory=list) + not_matches: list[str] = field(default_factory=list) + case_sensitive: bool = False + raw: Any = field(default_factory=dict) + + @classmethod + def from_mapping(cls, raw: Any) -> "AssertionSpec": + if not isinstance(raw, dict): + return cls(raw=raw) + return cls( + id=raw.get("id") or raw.get("rule_id"), + message=raw.get("message"), + severity=str(raw.get("severity", "error")), + guidance=raw.get("guidance"), + contains=as_string_list(raw.get("contains")), + contains_any=as_string_list(raw.get("contains_any") or raw.get("contains_any_of")), + not_contains=as_string_list(raw.get("not_contains") or raw.get("forbid")), + matches=as_string_list(raw.get("matches") or raw.get("pattern")), + not_matches=as_string_list(raw.get("not_matches") or raw.get("forbid_pattern")), + case_sensitive=bool(raw.get("case_sensitive", False)), + raw=raw, + ) + + +@dataclass(frozen=True) +class FieldSpec: + """A structured value expected in frontmatter or external context.""" + + id: str | None + path: str | None = None + type: str | None = None + required: bool = False + label: str | None = None + description: str | None = None + enum: list[Any] | None = None + pattern: str | None = None + min: float | None = None + max: float | None = None + min_length: int | None = None + max_length: int | None = None + default: Any = None + source: str | None = None + raw: Any = field(default_factory=dict) + + @classmethod + def from_mapping(cls, raw: Any, fallback_id: str | None = None) -> "FieldSpec": + if not isinstance(raw, dict): + return cls(id=fallback_id, raw=raw) + field_id = raw.get("id") or raw.get("name") or fallback_id + return cls( + id=field_id, + path=raw.get("path") or (f"frontmatter.{field_id}" if field_id else None), + type=raw.get("type"), + required=bool(raw.get("required", False)), + label=raw.get("label"), + description=raw.get("description"), + enum=raw.get("enum"), + pattern=raw.get("pattern"), + min=raw.get("min"), + max=raw.get("max"), + min_length=raw.get("min_length"), + max_length=raw.get("max_length"), + default=raw.get("default"), + source=raw.get("source"), + raw=raw, + ) + + +@dataclass(frozen=True) +class SectionSpec: + """Expected semantic role and constraints for a Markdown section.""" + + id: str | None + title: str | None = None + section_type: str | None = None + presence: str = "optional" + headings: list[str] = field(default_factory=list) + level: int | None = None + order_before: list[str] = field(default_factory=list) + order_after: list[str] = field(default_factory=list) + metrics: list[MetricBand] = field(default_factory=list) + assertions: list[AssertionSpec] = field(default_factory=list) + raw: Any = field(default_factory=dict) + + @classmethod + def from_mapping(cls, raw: Any, fallback_id: str | None = None) -> "SectionSpec": + if not isinstance(raw, dict): + return cls(id=fallback_id, raw=raw) + + section_id = raw.get("id") or fallback_id + match = raw.get("match") if isinstance(raw.get("match"), dict) else {} + headings = unique_strings( + as_string_list(raw.get("headings")) + + as_string_list(raw.get("aliases")) + + as_string_list(match.get("headings")) + + as_string_list(match.get("aliases")) + + as_string_list(raw.get("title")) + + as_string_list(section_id) + ) + order = raw.get("order") if isinstance(raw.get("order"), dict) else {} + return cls( + id=section_id, + title=raw.get("title"), + section_type=raw.get("section_type") or raw.get("type") or raw.get("role"), + presence=normalize_presence(raw), + headings=headings, + level=raw.get("level"), + order_before=as_string_list(raw.get("before") or order.get("before")), + order_after=as_string_list(raw.get("after") or order.get("after")), + metrics=metric_bands_from_mapping(raw.get("metrics")), + assertions=assertions_from_value(raw.get("assertions")), + raw=raw, + ) + + +@dataclass(frozen=True) +class DocumentContract: + """A contract for a typed Markdown document.""" + + id: str | None + document_type: str | None + title: str | None = None + version: str | None = None + description: str | None = None + sections: list[SectionSpec] = field(default_factory=list) + fields: list[FieldSpec] = field(default_factory=list) + metrics: list[MetricBand] = field(default_factory=list) + assertions: list[AssertionSpec] = field(default_factory=list) + forms: list[dict[str, Any]] = field(default_factory=list) + context: dict[str, Any] = field(default_factory=dict) + rubrics: list[dict[str, Any]] = field(default_factory=list) + metadata: dict[str, Any] = field(default_factory=dict) + raw: dict[str, Any] = field(default_factory=dict) + source_path: str | None = None + source_line: int | None = None + + @classmethod + def from_mapping( + cls, + raw: dict[str, Any], + *, + metadata: dict[str, Any] | None = None, + source_path: str | None = None, + source_line: int | None = None, + ) -> "DocumentContract": + metadata = metadata or {} + document = raw.get("document") if isinstance(raw.get("document"), dict) else {} + return cls( + id=raw.get("id") or metadata.get("contract-id") or metadata.get("id"), + document_type=( + raw.get("document_type") + or raw.get("document-type") + or raw.get("type") + or document.get("type") + or metadata.get("document-type") + ), + title=raw.get("title") or document.get("title") or metadata.get("title"), + version=str(raw.get("version") or metadata.get("version") or "") + or None, + description=raw.get("description") or document.get("description"), + sections=sections_from_value(raw.get("sections")), + fields=fields_from_value(raw.get("fields")), + metrics=metric_bands_from_mapping( + raw.get("metrics", {}).get("document") + if isinstance(raw.get("metrics"), dict) + and isinstance(raw.get("metrics", {}).get("document"), dict) + else raw.get("metrics") or raw.get("metric_bands") + ), + assertions=assertions_from_value(raw.get("assertions")), + forms=raw.get("forms") if isinstance(raw.get("forms"), list) else [], + context=raw.get("context") if isinstance(raw.get("context"), dict) else {}, + rubrics=raw.get("rubrics") if isinstance(raw.get("rubrics"), list) else [], + metadata=metadata, + raw=raw, + source_path=source_path, + source_line=source_line, + ) + + def to_dict(self) -> dict[str, Any]: + return { + "id": self.id, + "document_type": self.document_type, + "title": self.title, + "version": self.version, + "description": self.description, + "sections": [section.raw for section in self.sections], + "fields": [field.raw for field in self.fields], + "metrics": [band.raw for band in self.metrics], + "assertions": [assertion.raw for assertion in self.assertions], + "forms": self.forms, + "context": self.context, + "rubrics": self.rubrics, + "source_path": self.source_path, + } + + +def normalize_metric_name(metric: str) -> str: + return METRIC_ALIASES.get(str(metric).strip().lower(), str(metric).strip().lower()) + + +def normalize_presence(raw: dict[str, Any]) -> str: + explicit = raw.get("presence") + if explicit: + return str(explicit) + if raw.get("forbidden") is True or raw.get("prohibited") is True: + return "forbidden" + if raw.get("discouraged") is True: + return "discouraged" + if raw.get("required") is True: + return "required" + if raw.get("recommended") is True: + return "recommended" + return "optional" + + +def sections_from_value(value: Any) -> list[SectionSpec]: + return [ + SectionSpec.from_mapping(item, fallback_id=fallback_id) + for fallback_id, item in items_from_value(value) + ] + + +def fields_from_value(value: Any) -> list[FieldSpec]: + return [ + FieldSpec.from_mapping(item, fallback_id=fallback_id) + for fallback_id, item in items_from_value(value) + ] + + +def assertions_from_value(value: Any) -> list[AssertionSpec]: + if value is None: + return [] + values = value if isinstance(value, list) else [value] + return [AssertionSpec.from_mapping(item) for item in values] + + +def metric_bands_from_mapping(value: Any) -> list[MetricBand]: + if not isinstance(value, dict): + return [] if value is None else [MetricBand.from_mapping("", value)] + return [MetricBand.from_mapping(metric, raw) for metric, raw in value.items()] + + +def items_from_value(value: Any) -> list[tuple[str | None, Any]]: + if value is None: + return [] + if isinstance(value, dict): + return [(str(key), item) for key, item in value.items()] + if isinstance(value, list): + return [(None, item) for item in value] + return [(None, value)] + + +def as_string_list(value: Any) -> list[str]: + if value is None: + return [] + if isinstance(value, list): + return [str(item) for item in value if item is not None] + return [str(value)] + + +def unique_strings(values: list[str]) -> list[str]: + seen: set[str] = set() + result: list[str] = [] + for value in values: + normalized = value.strip() + if normalized and normalized.lower() not in seen: + seen.add(normalized.lower()) + result.append(normalized) + return result diff --git a/src/markitect_tool/core/parser.py b/src/markitect_tool/core/parser.py index 3ae44a3..c85a072 100644 --- a/src/markitect_tool/core/parser.py +++ b/src/markitect_tool/core/parser.py @@ -29,7 +29,7 @@ def parse_markdown(markdown: str, source_path: str | None = None) -> Document: frontmatter, body, body_line_offset = _split_frontmatter(markdown) tokens = _parse_tokens(body) - blocks, headings = _blocks_and_headings(tokens, body_line_offset) + blocks, headings = _blocks_and_headings(tokens, body_line_offset, body) sections = _sections_from_blocks(blocks, headings) return Document( source_path=source_path, @@ -97,7 +97,7 @@ def _token_to_dict(token: Token) -> dict[str, Any]: def _blocks_and_headings( - tokens: list[dict[str, Any]], line_offset: int + tokens: list[dict[str, Any]], line_offset: int, markdown: str ) -> tuple[list[ContentBlock], list[Heading]]: blocks: list[ContentBlock] = [] headings: list[Heading] = [] @@ -126,6 +126,8 @@ def _blocks_and_headings( if not text and token_type.endswith("_open"): inline = _next_inline(tokens, index) text = inline.get("content", "") if inline else "" + if not text: + text = _source_text(token, line_offset, markdown) blocks.append( ContentBlock( type=_block_type(token_type), @@ -151,6 +153,16 @@ def _line_range(token: dict[str, Any], line_offset: int) -> tuple[int | None, in return line_map[0] + line_offset + 1, line_map[1] + line_offset +def _source_text(token: dict[str, Any], line_offset: int, markdown: str) -> str: + line_start, line_end = _line_range(token, line_offset) + if line_start is None or line_end is None: + return "" + lines = markdown.splitlines() + start_index = max(line_start - line_offset - 1, 0) + end_index = max(line_end - line_offset, start_index) + return "\n".join(lines[start_index:end_index]).strip() + + def _block_type(token_type: str) -> str: return { "paragraph_open": "paragraph", diff --git a/src/markitect_tool/diagnostics.py b/src/markitect_tool/diagnostics.py new file mode 100644 index 0000000..10de833 --- /dev/null +++ b/src/markitect_tool/diagnostics.py @@ -0,0 +1,65 @@ +"""Shared diagnostic primitives for Markitect validation layers.""" + +from __future__ import annotations + +from dataclasses import dataclass, field +from typing import Any + + +SEVERITIES = {"info", "warning", "error"} + + +@dataclass(frozen=True) +class SourceLocation: + """A source location inside a document or contract.""" + + path: str | None = None + line: int | None = None + column: int | None = None + + def to_dict(self) -> dict[str, Any]: + data = { + "path": self.path, + "line": self.line, + "column": self.column, + } + return {key: value for key, value in data.items() if value is not None} + + +@dataclass(frozen=True) +class Diagnostic: + """A structured validation or assessment finding.""" + + severity: str + code: str + message: str + source: SourceLocation | None = None + contract: SourceLocation | None = None + rule_id: str | None = None + guidance: str | None = None + details: dict[str, Any] = field(default_factory=dict) + + def to_dict(self) -> dict[str, Any]: + data: dict[str, Any] = { + "severity": self.severity, + "code": self.code, + "message": self.message, + "source": self.source.to_dict() if self.source else None, + "contract": self.contract.to_dict() if self.contract else None, + "rule_id": self.rule_id, + "guidance": self.guidance, + "details": self.details or None, + } + return {key: value for key, value in data.items() if value is not None} + + +def valid_severity(severity: str | None) -> bool: + """Return whether a severity is supported by the diagnostic model.""" + + return severity in SEVERITIES + + +def has_error(diagnostics: list[Diagnostic]) -> bool: + """Return whether the diagnostic list contains at least one error.""" + + return any(diagnostic.severity == "error" for diagnostic in diagnostics) diff --git a/src/markitect_tool/schema/validator.py b/src/markitect_tool/schema/validator.py index 1c9e74b..c7ccc27 100644 --- a/src/markitect_tool/schema/validator.py +++ b/src/markitect_tool/schema/validator.py @@ -9,6 +9,7 @@ from typing import Any from jsonschema import Draft202012Validator, SchemaError, ValidationError from markitect_tool.core import Document, parse_markdown_file +from markitect_tool.diagnostics import Diagnostic, SourceLocation from markitect_tool.schema.loader import MarkdownSchema, load_schema_file @@ -23,6 +24,21 @@ class ValidationViolation: def to_dict(self) -> dict[str, str]: return asdict(self) + def to_diagnostic( + self, + *, + source_path: str | None = None, + contract_path: str | None = None, + ) -> Diagnostic: + return Diagnostic( + severity="error", + code="schema.validation", + message=self.message, + source=SourceLocation(path=source_path), + contract=SourceLocation(path=contract_path), + details={"path": self.path, "schema_path": self.schema_path}, + ) + @dataclass(frozen=True) class SchemaValidationResult: @@ -42,6 +58,17 @@ class SchemaValidationResult: } return {key: value for key, value in data.items() if value is not None} + def to_diagnostics(self) -> list[Diagnostic]: + """Return schema violations as unified diagnostics.""" + + return [ + violation.to_diagnostic( + source_path=self.document_path, + contract_path=self.schema_path, + ) + for violation in self.violations + ] + def validate_schema(schema: dict[str, Any]) -> SchemaValidationResult: """Validate that a JSON Schema itself is well formed.""" diff --git a/tests/test_contract_framework.py b/tests/test_contract_framework.py new file mode 100644 index 0000000..84fd40d --- /dev/null +++ b/tests/test_contract_framework.py @@ -0,0 +1,336 @@ +from pathlib import Path + +from click.testing import CliRunner + +from markitect_tool.cli import main +from markitect_tool.contract import ( + check_markdown_file, + collect_metrics, + load_contract_file, + validate_contract, +) +from markitect_tool.core import parse_markdown + + +EXAMPLE_CASES = [ + ( + "adr", + Path("examples/contracts/adr.contract.md"), + Path("examples/documents/adr-valid.md"), + Path("examples/documents/adr-invalid.md"), + { + "contract.field.missing", + "contract.metric.too_low", + "contract.assertion.contains_any_missing", + "contract.section.missing", + "contract.section.recommended_missing", + "contract.section.forbidden", + }, + ), + ( + "prd-frs", + Path("examples/contracts/prd-frs.contract.md"), + Path("examples/documents/prd-frs-valid.md"), + Path("examples/documents/prd-frs-invalid.md"), + { + "contract.field.missing", + "contract.metric.too_low", + "contract.assertion.contains_any_missing", + "contract.section.missing", + "contract.section.recommended_missing", + "contract.section.discouraged", + }, + ), + ( + "workplan", + Path("examples/contracts/workplan.contract.md"), + Path("examples/documents/workplan-valid.md"), + Path("examples/documents/workplan-invalid.md"), + { + "contract.field.missing", + "contract.field.enum", + "contract.assertion.contains_missing", + "contract.section.recommended_missing", + }, + ), + ( + "business-letter", + Path("examples/contracts/business-letter.contract.md"), + Path("examples/documents/business-letter-valid.md"), + Path("examples/documents/business-letter-invalid.md"), + { + "contract.field.missing", + "contract.section.missing", + "contract.metric.too_low", + }, + ), + ( + "concept-note", + Path("examples/contracts/concept-note.contract.md"), + Path("examples/documents/concept-note-valid.md"), + Path("examples/documents/concept-note-invalid.md"), + { + "contract.field.enum", + "contract.metric.too_low", + "contract.section.missing", + }, + ), +] + + +CONTRACT_TEXT = """--- +title: ADR Contract +version: "1.0" +--- + +# ADR Contract + +```yaml contract +id: adr-contract-v1 +document: + type: adr + title: Architecture Decision Record +fields: + status: + type: string + required: true + enum: [proposed, accepted, superseded] +metrics: + document: + words: + min: 12 + max: 240 + severity: warning +sections: + - id: context + title: Context + presence: required + level: 2 + order: + before: decision + metrics: + words: + min: 4 + max: 80 + severity: warning + assertions: + - id: context-names-problem + contains_any: [problem, motivation] + severity: warning + guidance: Explain why the decision exists. + - id: decision + title: Decision + presence: required + level: 2 + assertions: + - id: decision-commits + matches: "\\\\b(choose|adopt|use|will)\\\\b" + severity: error + guidance: State the actual decision, not only background. + - id: consequences + title: Consequences + presence: recommended + level: 2 + - id: deprecated + title: Deprecated Approach + presence: forbidden +``` +""" + + +VALID_ADR = """--- +document_type: adr +status: accepted +--- + +# Use Markdown Contracts + +## Context + +The problem is that plain heading counts do not explain whether content is useful. + +## Decision + +We will use a markdown-native document contract with deterministic diagnostics. + +## Consequences + +The tool can check author intent before generation or review work continues. +""" + + +INVALID_ADR = """--- +document_type: adr +--- + +# Weak ADR + +## Context + +This is short. + +## Deprecated Approach + +This section should not be here. +""" + + +def test_load_contract_file_extracts_markdown_yaml_contract(tmp_path: Path): + contract_file = tmp_path / "adr.contract.md" + contract_file.write_text(CONTRACT_TEXT, encoding="utf-8") + + contract = load_contract_file(contract_file) + + assert contract.id == "adr-contract-v1" + assert contract.document_type == "adr" + assert contract.fields[0].id == "status" + assert [section.id for section in contract.sections] == [ + "context", + "decision", + "consequences", + "deprecated", + ] + + +def test_validate_contract_accepts_complete_contract(tmp_path: Path): + contract_file = tmp_path / "adr.contract.md" + contract_file.write_text(CONTRACT_TEXT, encoding="utf-8") + + result = validate_contract(load_contract_file(contract_file)) + + assert result.valid is True + assert result.diagnostics == [] + + +def test_validate_contract_reports_bad_regex(tmp_path: Path): + contract_file = tmp_path / "bad.contract.md" + contract_file.write_text( + CONTRACT_TEXT.replace("\\\\b(choose|adopt|use|will)\\\\b", "[bad"), + encoding="utf-8", + ) + + result = validate_contract(load_contract_file(contract_file)) + + assert result.valid is False + assert result.diagnostics[0].code == "contract.regex.invalid" + + +def test_check_markdown_file_accepts_valid_document(tmp_path: Path): + contract_file = tmp_path / "adr.contract.md" + document_file = tmp_path / "adr.md" + contract_file.write_text(CONTRACT_TEXT, encoding="utf-8") + document_file.write_text(VALID_ADR, encoding="utf-8") + + result = check_markdown_file(document_file, contract_file) + + assert result.valid is True + assert result.diagnostics == [] + assert result.metrics["document"]["sections"] == 4 + + +def test_check_markdown_file_reports_practical_failures(tmp_path: Path): + contract_file = tmp_path / "adr.contract.md" + document_file = tmp_path / "adr.md" + contract_file.write_text(CONTRACT_TEXT, encoding="utf-8") + document_file.write_text(INVALID_ADR, encoding="utf-8") + + result = check_markdown_file(document_file, contract_file) + codes = {diagnostic.code for diagnostic in result.diagnostics} + + assert result.valid is False + assert "contract.field.missing" in codes + assert "contract.section.missing" in codes + assert "contract.section.forbidden" in codes + assert "contract.metric.too_low" in codes + + +def test_check_markdown_file_keeps_warning_only_results_valid(tmp_path: Path): + contract_file = tmp_path / "adr.contract.md" + document_file = tmp_path / "adr.md" + contract_file.write_text(CONTRACT_TEXT, encoding="utf-8") + document_file.write_text( + VALID_ADR.replace("The problem is", "The situation is"), + encoding="utf-8", + ) + + result = check_markdown_file(document_file, contract_file) + + assert result.valid is True + assert [diagnostic.code for diagnostic in result.diagnostics] == [ + "contract.assertion.contains_any_missing" + ] + assert result.diagnostics[0].severity == "warning" + + +def test_collect_metrics_counts_document_and_sections(): + document = parse_markdown(VALID_ADR) + + metrics = collect_metrics(document) + + assert metrics.words > 20 + assert metrics.sections == 4 + context_metrics = next( + section for section in metrics.section_metrics if section.heading == "Context" + ) + assert context_metrics.words >= 10 + + +def test_mkt_contract_validate(tmp_path: Path): + contract_file = tmp_path / "adr.contract.md" + contract_file.write_text(CONTRACT_TEXT, encoding="utf-8") + + result = CliRunner().invoke(main, ["contract", "validate", str(contract_file)]) + + assert result.exit_code == 0 + assert "valid" in result.output + + +def test_mkt_contract_check_reports_invalid_document(tmp_path: Path): + contract_file = tmp_path / "adr.contract.md" + document_file = tmp_path / "adr.md" + contract_file.write_text(CONTRACT_TEXT, encoding="utf-8") + document_file.write_text(INVALID_ADR, encoding="utf-8") + + result = CliRunner().invoke( + main, ["contract", "check", str(document_file), "--contract", str(contract_file)] + ) + + assert result.exit_code == 1 + assert "contract.section.missing" in result.output + assert "guidance" in result.output + + +def test_mkt_metrics_outputs_text(tmp_path: Path): + document_file = tmp_path / "adr.md" + document_file.write_text(VALID_ADR, encoding="utf-8") + + result = CliRunner().invoke(main, ["metrics", str(document_file)]) + + assert result.exit_code == 0 + assert "document" in result.output + assert "words" in result.output + assert "Context" in result.output + + +def test_example_contracts_validate(): + for _name, contract_path, _valid_path, _invalid_path, _expected in EXAMPLE_CASES: + result = validate_contract(load_contract_file(contract_path)) + + assert result.valid is True + + +def test_example_valid_documents_have_no_error_diagnostics(): + for name, contract_path, valid_path, _invalid_path, _expected in EXAMPLE_CASES: + result = check_markdown_file(valid_path, contract_path) + + assert result.valid is True, name + assert all(diagnostic.severity != "error" for diagnostic in result.diagnostics) + + +def test_example_invalid_documents_report_expected_diagnostics(): + for name, contract_path, _valid_path, invalid_path, expected in EXAMPLE_CASES: + result = check_markdown_file(invalid_path, contract_path) + codes = {diagnostic.code for diagnostic in result.diagnostics} + + assert result.valid is False, name + assert expected <= codes diff --git a/workplans/MKTT-WP-0001-repo-foundation.md b/workplans/MKTT-WP-0001-repo-foundation.md index ea26aa8..ae7598c 100644 --- a/workplans/MKTT-WP-0001-repo-foundation.md +++ b/workplans/MKTT-WP-0001-repo-foundation.md @@ -3,7 +3,7 @@ id: MKTT-WP-0001 type: workplan title: "markitect-tool Repository Foundation" domain: markitect -status: active +status: done owner: markitect-tool topic_slug: markitect created: "2026-05-03" diff --git a/workplans/MKTT-WP-0002-markitect-main-migration.md b/workplans/MKTT-WP-0002-markitect-main-migration.md index 2c054f9..b3cd77d 100644 --- a/workplans/MKTT-WP-0002-markitect-main-migration.md +++ b/workplans/MKTT-WP-0002-markitect-main-migration.md @@ -3,7 +3,7 @@ id: MKTT-WP-0002 type: workplan title: "markitect-main Scope Extraction" domain: markitect -status: active +status: done owner: markitect-tool topic_slug: markitect created: "2026-05-03" diff --git a/workplans/MKTT-WP-0004-practical-contract-framework.md b/workplans/MKTT-WP-0004-practical-contract-framework.md index 30d1dce..d553c72 100644 --- a/workplans/MKTT-WP-0004-practical-contract-framework.md +++ b/workplans/MKTT-WP-0004-practical-contract-framework.md @@ -3,11 +3,12 @@ id: MKTT-WP-0004 type: workplan title: "Practical Document Contract Framework" domain: markitect -status: proposed +status: done owner: markitect-tool topic_slug: markitect created: "2026-05-03" updated: "2026-05-03" +state_hub_workstream_id: "558787e1-d287-46a5-9214-634e8b90a858" --- # MKTT-WP-0004: Practical Document Contract Framework @@ -19,6 +20,24 @@ heading-count schema validation toward document contracts with section specifications, fields/forms, context-aware rules, metric bands, optional LLM assessments, and unified diagnostics. +## Implementation Result + +Initial deterministic contract framework implemented: + +- Markdown contract files with fenced `yaml contract` blocks. +- Shared diagnostic model with severity, code, source, contract location, + rule id, details, and repair guidance. +- Contract validation, document contract checking, and metrics CLI commands. +- Required/recommended/optional/discouraged/forbidden section specs. +- Field specs for frontmatter values. +- Document-level and section-level metric bands. +- Deterministic content assertions. +- Design documentation for form/context and provider-neutral LLM rubric + adapters. +- Example contracts, valid documents, invalid documents, and expected + diagnostic notes for ADR, PRD/FRS, workplan, business letter, and concept + note use cases. + ## Background Research and legacy comparison are captured in: @@ -31,8 +50,9 @@ Research and legacy comparison are captured in: ```task id: MKTT-WP-0004-T001 -status: todo +status: done priority: high +state_hub_task_id: "2065d56a-9371-4fd0-9a3d-7a69c718e851" ``` Define the first `DocumentContract` format in markdown/YAML: @@ -51,8 +71,9 @@ Keep it provider-neutral and readable by humans. ```task id: MKTT-WP-0004-T002 -status: todo +status: done priority: high +state_hub_task_id: "3ed3af1b-c747-492c-acda-ecb4ee564a38" ``` Create diagnostics with severity, code, message, source location, contract @@ -63,8 +84,9 @@ violations and all new contract checks. ```task id: MKTT-WP-0004-T003 -status: todo +status: done priority: high +state_hub_task_id: "c4166e5a-53a5-4207-a3fb-b4ddf388cd5e" ``` Support required, recommended, optional, discouraged, and forbidden sections. @@ -75,8 +97,9 @@ and clear diagnostics. ```task id: MKTT-WP-0004-T004 -status: todo +status: done priority: medium +state_hub_task_id: "304af70e-1a33-4ee2-bcbd-7b966436cf37" ``` Support document-level and section-level bands for words, characters, @@ -87,32 +110,42 @@ Allow soft warnings and hard errors. ```task id: MKTT-WP-0004-T005 -status: todo +status: done priority: medium +state_hub_task_id: "1bcc82fe-b578-446c-86a7-938f732b24fa" ``` Specify fields, defaults, prefill sources, dynamic requiredness, conditional visibility, calculations, and validation against external context. This task is design-first; implementation can follow in a later workplan. +Design captured in `docs/contract-framework.md`. Runtime form rendering, +dynamic visibility, calculations, and context resolvers remain later adapter +work. + ## P4.6 - Design LLM assessment adapter contract ```task id: MKTT-WP-0004-T006 -status: todo +status: done priority: medium +state_hub_task_id: "bef295ba-fbc0-4df6-9cc4-040ed9b5f346" ``` Define provider-neutral request/response models for section-level rubrics: criteria, inputs, context, score, pass/fail, reason, model metadata, and cache keys. Do not bind core logic to any provider. +Provider-neutral adapter shape captured in `docs/contract-framework.md`. +Execution, caching, and provider integration remain later work. + ## P4.7 - Add practical CLI surface ```task id: MKTT-WP-0004-T007 -status: todo +status: done priority: high +state_hub_task_id: "9f61a5af-0b65-460a-8231-ec50279c5c6a" ``` Add: @@ -129,8 +162,9 @@ Ensure output is useful to humans and machines. ```task id: MKTT-WP-0004-T008 -status: todo +status: done priority: medium +state_hub_task_id: "7ec8c0f2-c598-4095-aefe-f6f97e84a470" ``` Create examples for: