generated from coulomb/repo-seed
Contract framework with markdown-native contracts utilizing fenced YAML blocks
This commit is contained in:
161
docs/contract-framework.md
Normal file
161
docs/contract-framework.md
Normal file
@@ -0,0 +1,161 @@
|
||||
# Document Contract Framework
|
||||
|
||||
Date: 2026-05-03
|
||||
|
||||
## Purpose
|
||||
|
||||
The contract framework makes markdown documents practically checkable. It keeps
|
||||
Markdown as the authoring surface and uses fenced YAML as a structured extension
|
||||
for rules that need machine interpretation.
|
||||
|
||||
The first implementation is deterministic. It checks document type, fields,
|
||||
sections, ordering, metric bands, and text assertions. Forms, context, and LLM
|
||||
rubrics are represented in the contract vocabulary as extension points before
|
||||
runtime adapters are added.
|
||||
|
||||
## Contract File Shape
|
||||
|
||||
A contract is a Markdown document with optional frontmatter and one fenced YAML
|
||||
block marked as `yaml contract`.
|
||||
|
||||
````markdown
|
||||
---
|
||||
title: ADR Contract
|
||||
version: "1.0"
|
||||
---
|
||||
|
||||
# ADR Contract
|
||||
|
||||
```yaml contract
|
||||
id: adr-contract-v1
|
||||
document:
|
||||
type: adr
|
||||
title: Architecture Decision Record
|
||||
fields:
|
||||
status:
|
||||
type: string
|
||||
required: true
|
||||
sections:
|
||||
- id: context
|
||||
title: Context
|
||||
presence: required
|
||||
level: 2
|
||||
metrics:
|
||||
document:
|
||||
words:
|
||||
min: 100
|
||||
max: 1200
|
||||
severity: warning
|
||||
```
|
||||
````
|
||||
|
||||
Markdown carries the explanation. YAML carries the contract.
|
||||
|
||||
## Core Terms
|
||||
|
||||
| Term | Meaning |
|
||||
| --- | --- |
|
||||
| Document contract | The machine-readable agreement for one typed Markdown artifact. |
|
||||
| Document type | A named kind such as `adr`, `prd`, `workplan`, or `business-letter`. |
|
||||
| Section spec | A semantic section role with matching headings, presence, level, order, metrics, and assertions. |
|
||||
| Field spec | A typed value expected in frontmatter or later external context. |
|
||||
| Metric band | A soft or hard size/complexity target. |
|
||||
| Assertion | A deterministic content expectation over document or section text. |
|
||||
| Diagnostic | A structured finding with severity, code, source, contract location, rule id, and guidance. |
|
||||
|
||||
## Section Presence
|
||||
|
||||
Section specs support these presence values:
|
||||
|
||||
- `required`: missing section is an error.
|
||||
- `recommended`: missing section is a warning.
|
||||
- `optional`: section is allowed but not required.
|
||||
- `discouraged`: present section is a warning.
|
||||
- `forbidden`: present section is an error.
|
||||
|
||||
Headings are matched case-insensitively against `title`, `id`, `headings`, or
|
||||
`aliases`.
|
||||
|
||||
## Metric Bands
|
||||
|
||||
Supported metrics are:
|
||||
|
||||
- `characters`
|
||||
- `words`
|
||||
- `sentences`
|
||||
- `paragraphs`
|
||||
- `sections`
|
||||
- `headings`
|
||||
- `list_items`
|
||||
- `code_blocks`
|
||||
- `max_heading_depth`
|
||||
- `nesting_depth`
|
||||
|
||||
Document-level bands live under `metrics.document`. Section-level bands live
|
||||
inside a section spec.
|
||||
|
||||
The current metrics layer follows the parser model: every heading-led region is
|
||||
a section, including the document H1 title section.
|
||||
|
||||
## Assertions
|
||||
|
||||
Assertions currently support:
|
||||
|
||||
- `contains`
|
||||
- `contains_any`
|
||||
- `not_contains`
|
||||
- `matches`
|
||||
- `not_matches`
|
||||
|
||||
Assertions are deterministic and produce the same diagnostic model as sections,
|
||||
fields, and metric bands. This is the bridge to later LLM rubrics: semantic
|
||||
checks can become additional assessments without changing how failures are
|
||||
reported.
|
||||
|
||||
## Forms And Context
|
||||
|
||||
Field specs are the first step toward form-backed Markdown generation. Runtime
|
||||
form handling should build on the same field vocabulary:
|
||||
|
||||
- `id`
|
||||
- `type`
|
||||
- `required`
|
||||
- `default`
|
||||
- `source`
|
||||
- `path`
|
||||
- `enum`
|
||||
- `pattern`
|
||||
- `min` / `max`
|
||||
- `min_length` / `max_length`
|
||||
|
||||
Dynamic requiredness, visibility, calculations, and prefill should be declared
|
||||
as context-aware rules in later work. The contract should remain the source of
|
||||
truth, while UI and generation layers act as adapters.
|
||||
|
||||
## LLM Assessment Extension
|
||||
|
||||
LLM-assisted checks should be declared as rubrics, scoped to document or section
|
||||
roles. Core Markitect should not call a provider directly. A future adapter
|
||||
should accept a provider-neutral request:
|
||||
|
||||
- contract id and rule id
|
||||
- document or section text
|
||||
- relevant fields and context
|
||||
- rubric criteria
|
||||
- cache key material
|
||||
|
||||
It should return:
|
||||
|
||||
- pass/fail
|
||||
- score
|
||||
- reason
|
||||
- model/provider metadata
|
||||
- diagnostics using the shared diagnostic model
|
||||
|
||||
## CLI
|
||||
|
||||
```text
|
||||
mkt contract validate <contract.md>
|
||||
mkt contract check <document.md> --contract <contract.md>
|
||||
mkt metrics <document.md>
|
||||
```
|
||||
@@ -37,6 +37,10 @@ SBOM source: `sbom-tools.yaml`.
|
||||
Initial SBOM ingest succeeded on 2026-05-03 with eight declared entries for the
|
||||
core and optional dependencies.
|
||||
|
||||
The DB-first onboarding workstream `repo-integration-markitect-tool` is now
|
||||
completed. It remains visible as a completed ADR-001 bootstrap exception rather
|
||||
than an active orphan.
|
||||
|
||||
## Registered Extension Points
|
||||
|
||||
| ID | Title | Source |
|
||||
|
||||
52
examples/contracts/adr.contract.md
Normal file
52
examples/contracts/adr.contract.md
Normal file
@@ -0,0 +1,52 @@
|
||||
---
|
||||
title: ADR Contract
|
||||
version: "1.0"
|
||||
---
|
||||
|
||||
# ADR Contract
|
||||
|
||||
```yaml contract
|
||||
id: adr-contract-v1
|
||||
document:
|
||||
type: adr
|
||||
title: Architecture Decision Record
|
||||
fields:
|
||||
status:
|
||||
type: string
|
||||
required: true
|
||||
enum: [proposed, accepted, superseded]
|
||||
metrics:
|
||||
document:
|
||||
words:
|
||||
min: 40
|
||||
max: 900
|
||||
severity: warning
|
||||
sections:
|
||||
- id: context
|
||||
title: Context
|
||||
presence: required
|
||||
level: 2
|
||||
order:
|
||||
before: decision
|
||||
assertions:
|
||||
- id: context-names-problem
|
||||
contains_any: [problem, motivation, constraint]
|
||||
severity: warning
|
||||
guidance: Explain why the decision exists.
|
||||
- id: decision
|
||||
title: Decision
|
||||
presence: required
|
||||
level: 2
|
||||
assertions:
|
||||
- id: decision-commits
|
||||
matches: "\\b(choose|adopt|use|will)\\b"
|
||||
severity: error
|
||||
guidance: State the actual decision, not only background.
|
||||
- id: consequences
|
||||
title: Consequences
|
||||
presence: recommended
|
||||
level: 2
|
||||
- id: deprecated
|
||||
title: Deprecated Approach
|
||||
presence: forbidden
|
||||
```
|
||||
43
examples/contracts/business-letter.contract.md
Normal file
43
examples/contracts/business-letter.contract.md
Normal file
@@ -0,0 +1,43 @@
|
||||
---
|
||||
title: Business Letter Contract
|
||||
version: "0.1"
|
||||
---
|
||||
|
||||
# Business Letter Contract
|
||||
|
||||
```yaml contract
|
||||
id: business-letter-contract-v1
|
||||
document:
|
||||
type: business-letter
|
||||
fields:
|
||||
recipient_name:
|
||||
type: string
|
||||
required: true
|
||||
source: context.recipient.name
|
||||
sender_name:
|
||||
type: string
|
||||
required: true
|
||||
source: context.sender.name
|
||||
sections:
|
||||
- id: greeting
|
||||
title: Greeting
|
||||
presence: required
|
||||
level: 2
|
||||
- id: body
|
||||
title: Body
|
||||
presence: required
|
||||
level: 2
|
||||
metrics:
|
||||
words:
|
||||
min: 40
|
||||
max: 350
|
||||
severity: warning
|
||||
- id: closing
|
||||
title: Closing
|
||||
presence: required
|
||||
level: 2
|
||||
rubrics:
|
||||
- id: tone-fit
|
||||
scope: section.body
|
||||
criteria: The body should match the relationship and communication purpose.
|
||||
```
|
||||
43
examples/contracts/concept-note.contract.md
Normal file
43
examples/contracts/concept-note.contract.md
Normal file
@@ -0,0 +1,43 @@
|
||||
---
|
||||
title: Concept Note Contract
|
||||
version: "0.1"
|
||||
---
|
||||
|
||||
# Concept Note Contract
|
||||
|
||||
```yaml contract
|
||||
id: concept-note-contract-v1
|
||||
document:
|
||||
type: concept-note
|
||||
fields:
|
||||
concept_id:
|
||||
type: string
|
||||
required: true
|
||||
status:
|
||||
type: string
|
||||
required: true
|
||||
enum: [draft, reviewed, accepted, archived]
|
||||
sections:
|
||||
- id: definition
|
||||
title: Definition
|
||||
presence: required
|
||||
level: 2
|
||||
- id: assertions
|
||||
title: Assertions
|
||||
presence: required
|
||||
level: 2
|
||||
assertions:
|
||||
- id: assertions-use-claims
|
||||
contains_any: [claim, evidence, assumption]
|
||||
severity: warning
|
||||
- id: relationships
|
||||
title: Relationships
|
||||
presence: recommended
|
||||
level: 2
|
||||
metrics:
|
||||
document:
|
||||
words:
|
||||
min: 120
|
||||
max: 1200
|
||||
severity: warning
|
||||
```
|
||||
49
examples/contracts/prd-frs.contract.md
Normal file
49
examples/contracts/prd-frs.contract.md
Normal file
@@ -0,0 +1,49 @@
|
||||
---
|
||||
title: PRD and FRS Contract
|
||||
version: "0.1"
|
||||
---
|
||||
|
||||
# PRD And FRS Contract
|
||||
|
||||
```yaml contract
|
||||
id: prd-frs-contract-v1
|
||||
document:
|
||||
type: prd-frs
|
||||
fields:
|
||||
product:
|
||||
type: string
|
||||
required: true
|
||||
owner:
|
||||
type: string
|
||||
required: true
|
||||
metrics:
|
||||
document:
|
||||
words:
|
||||
min: 300
|
||||
max: 4000
|
||||
severity: warning
|
||||
sections:
|
||||
- id: problem
|
||||
title: Problem
|
||||
presence: required
|
||||
level: 2
|
||||
- id: goals
|
||||
title: Goals
|
||||
presence: required
|
||||
level: 2
|
||||
assertions:
|
||||
- id: goals-are-testable
|
||||
contains_any: [measure, metric, success]
|
||||
severity: warning
|
||||
- id: functional-requirements
|
||||
title: Functional Requirements
|
||||
presence: required
|
||||
level: 2
|
||||
- id: non-goals
|
||||
title: Non-Goals
|
||||
presence: recommended
|
||||
level: 2
|
||||
- id: implementation-plan
|
||||
title: Implementation Plan
|
||||
presence: discouraged
|
||||
```
|
||||
43
examples/contracts/workplan.contract.md
Normal file
43
examples/contracts/workplan.contract.md
Normal file
@@ -0,0 +1,43 @@
|
||||
---
|
||||
title: Workplan Contract
|
||||
version: "0.1"
|
||||
---
|
||||
|
||||
# Workplan Contract
|
||||
|
||||
```yaml contract
|
||||
id: workplan-contract-v1
|
||||
document:
|
||||
type: workplan
|
||||
fields:
|
||||
id:
|
||||
type: string
|
||||
required: true
|
||||
status:
|
||||
type: string
|
||||
required: true
|
||||
enum: [proposed, active, done, deferred]
|
||||
sections:
|
||||
- id: purpose
|
||||
title: Purpose
|
||||
presence: required
|
||||
level: 2
|
||||
- id: tasks
|
||||
title: Tasks
|
||||
presence: required
|
||||
level: 2
|
||||
assertions:
|
||||
- id: tasks-have-task-blocks
|
||||
contains: "status:"
|
||||
severity: error
|
||||
- id: decision-point
|
||||
title: Decision Point
|
||||
presence: recommended
|
||||
level: 2
|
||||
metrics:
|
||||
document:
|
||||
sections:
|
||||
min: 2
|
||||
max: 12
|
||||
severity: warning
|
||||
```
|
||||
8
examples/diagnostics/adr-invalid.expected-diagnostics.md
Normal file
8
examples/diagnostics/adr-invalid.expected-diagnostics.md
Normal file
@@ -0,0 +1,8 @@
|
||||
# Expected Diagnostics: adr-invalid.md
|
||||
|
||||
- `contract.field.missing`: `status` is required.
|
||||
- `contract.metric.too_low`: the document is below the target word band.
|
||||
- `contract.assertion.contains_any_missing`: context does not mention problem, motivation, or constraint.
|
||||
- `contract.section.missing`: `decision` is required.
|
||||
- `contract.section.recommended_missing`: `consequences` is recommended.
|
||||
- `contract.section.forbidden`: `deprecated` is present.
|
||||
@@ -0,0 +1,5 @@
|
||||
# Expected Diagnostics: business-letter-invalid.md
|
||||
|
||||
- `contract.field.missing`: `sender_name` is required.
|
||||
- `contract.section.missing`: `closing` is required.
|
||||
- `contract.metric.too_low`: the `Body` section is below the target word band.
|
||||
@@ -0,0 +1,5 @@
|
||||
# Expected Diagnostics: concept-note-invalid.md
|
||||
|
||||
- `contract.field.enum`: `status` must be one of the allowed lifecycle values.
|
||||
- `contract.metric.too_low`: the document is below the target word band.
|
||||
- `contract.section.missing`: `assertions` is required.
|
||||
@@ -0,0 +1,8 @@
|
||||
# Expected Diagnostics: prd-frs-invalid.md
|
||||
|
||||
- `contract.field.missing`: `owner` is required.
|
||||
- `contract.metric.too_low`: the document is below the target word band.
|
||||
- `contract.assertion.contains_any_missing`: goals do not mention measure, metric, or success.
|
||||
- `contract.section.missing`: `functional-requirements` is required.
|
||||
- `contract.section.recommended_missing`: `non-goals` is recommended.
|
||||
- `contract.section.discouraged`: `implementation-plan` is discouraged in this contract.
|
||||
@@ -0,0 +1,6 @@
|
||||
# Expected Diagnostics: workplan-invalid.md
|
||||
|
||||
- `contract.field.missing`: `id` is required.
|
||||
- `contract.field.enum`: `status` must be one of the allowed lifecycle values.
|
||||
- `contract.assertion.contains_missing`: the `Tasks` section lacks task metadata.
|
||||
- `contract.section.recommended_missing`: `decision-point` is recommended.
|
||||
13
examples/documents/adr-invalid.md
Normal file
13
examples/documents/adr-invalid.md
Normal file
@@ -0,0 +1,13 @@
|
||||
---
|
||||
document_type: adr
|
||||
---
|
||||
|
||||
# Weak ADR
|
||||
|
||||
## Context
|
||||
|
||||
This is short.
|
||||
|
||||
## Deprecated Approach
|
||||
|
||||
This section should not be here.
|
||||
23
examples/documents/adr-valid.md
Normal file
23
examples/documents/adr-valid.md
Normal file
@@ -0,0 +1,23 @@
|
||||
---
|
||||
document_type: adr
|
||||
status: accepted
|
||||
---
|
||||
|
||||
# Use Markdown Contracts
|
||||
|
||||
## Context
|
||||
|
||||
The problem is that plain heading counts do not explain whether content is
|
||||
useful. Authors and agents need a contract that names the expected sections and
|
||||
the job each section must do.
|
||||
|
||||
## Decision
|
||||
|
||||
We will use markdown-native document contracts with deterministic diagnostics as
|
||||
the foundation for generation, validation, and later semantic assessment.
|
||||
|
||||
## Consequences
|
||||
|
||||
The tool can check author intent before generation or review work continues.
|
||||
Future adapters can add form prefill and LLM rubrics without replacing the core
|
||||
diagnostic model.
|
||||
14
examples/documents/business-letter-invalid.md
Normal file
14
examples/documents/business-letter-invalid.md
Normal file
@@ -0,0 +1,14 @@
|
||||
---
|
||||
document_type: business-letter
|
||||
recipient_name: Ada Lovelace
|
||||
---
|
||||
|
||||
# Incomplete Letter
|
||||
|
||||
## Greeting
|
||||
|
||||
Hello,
|
||||
|
||||
## Body
|
||||
|
||||
Thanks.
|
||||
25
examples/documents/business-letter-valid.md
Normal file
25
examples/documents/business-letter-valid.md
Normal file
@@ -0,0 +1,25 @@
|
||||
---
|
||||
document_type: business-letter
|
||||
recipient_name: Ada Lovelace
|
||||
sender_name: Markitect Team
|
||||
---
|
||||
|
||||
# Follow-Up Letter
|
||||
|
||||
## Greeting
|
||||
|
||||
Dear Ada Lovelace,
|
||||
|
||||
## Body
|
||||
|
||||
Thank you for the thoughtful discussion about structured Markdown generation.
|
||||
We reviewed the requirements and will send a concise proposal that separates
|
||||
document contracts, field prefill, validation diagnostics, and optional semantic
|
||||
assessment. This keeps the implementation practical while leaving room for
|
||||
future automation.
|
||||
|
||||
## Closing
|
||||
|
||||
Kind regards,
|
||||
|
||||
Markitect Team
|
||||
15
examples/documents/concept-note-invalid.md
Normal file
15
examples/documents/concept-note-invalid.md
Normal file
@@ -0,0 +1,15 @@
|
||||
---
|
||||
document_type: concept-note
|
||||
concept_id: contract-diagnostic-model
|
||||
status: maybe
|
||||
---
|
||||
|
||||
# Contract Diagnostic Model
|
||||
|
||||
## Definition
|
||||
|
||||
A vague note.
|
||||
|
||||
## Relationships
|
||||
|
||||
It relates to other things.
|
||||
24
examples/documents/concept-note-valid.md
Normal file
24
examples/documents/concept-note-valid.md
Normal file
@@ -0,0 +1,24 @@
|
||||
---
|
||||
document_type: concept-note
|
||||
concept_id: contract-diagnostic-model
|
||||
status: draft
|
||||
---
|
||||
|
||||
# Contract Diagnostic Model
|
||||
|
||||
## Definition
|
||||
|
||||
A contract diagnostic model is the shared representation for validation,
|
||||
assessment, and repair findings emitted by Markitect pipeline tools.
|
||||
|
||||
## Assertions
|
||||
|
||||
The central claim is that authors and agents need one diagnostic vocabulary
|
||||
across structural checks, field checks, metric bands, and semantic assessments.
|
||||
Evidence comes from the way legacy Markitect scattered related failures across
|
||||
different subsystems.
|
||||
|
||||
## Relationships
|
||||
|
||||
The model relates to document contracts, form fields, section specifications,
|
||||
and future LLM rubric adapters.
|
||||
18
examples/documents/prd-frs-invalid.md
Normal file
18
examples/documents/prd-frs-invalid.md
Normal file
@@ -0,0 +1,18 @@
|
||||
---
|
||||
document_type: prd-frs
|
||||
product: Markitect Tool
|
||||
---
|
||||
|
||||
# Thin PRD
|
||||
|
||||
## Problem
|
||||
|
||||
The document is too vague.
|
||||
|
||||
## Goals
|
||||
|
||||
The goals are listed without criteria.
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
Build something immediately.
|
||||
31
examples/documents/prd-frs-valid.md
Normal file
31
examples/documents/prd-frs-valid.md
Normal file
@@ -0,0 +1,31 @@
|
||||
---
|
||||
document_type: prd-frs
|
||||
product: Markitect Tool
|
||||
owner: Platform Architecture
|
||||
---
|
||||
|
||||
# Markitect Tool PRD And FRS
|
||||
|
||||
## Problem
|
||||
|
||||
Markdown pipelines often check document shape without knowing whether the
|
||||
sections contain the content needed by authors, reviewers, and generation
|
||||
agents.
|
||||
|
||||
## Goals
|
||||
|
||||
The product should make document contracts testable. Success metrics include
|
||||
clear diagnostics, stable CLI behavior, and examples that show how contracts
|
||||
apply to real document types.
|
||||
|
||||
## Functional Requirements
|
||||
|
||||
- Load Markdown contract files with fenced YAML contract blocks.
|
||||
- Check required fields and section presence.
|
||||
- Report metric bands and deterministic assertions.
|
||||
- Produce machine-readable and human-readable diagnostics.
|
||||
|
||||
## Non-Goals
|
||||
|
||||
The first release does not execute provider-specific LLM calls or provide a UI
|
||||
form renderer.
|
||||
14
examples/documents/workplan-invalid.md
Normal file
14
examples/documents/workplan-invalid.md
Normal file
@@ -0,0 +1,14 @@
|
||||
---
|
||||
document_type: workplan
|
||||
status: blocked
|
||||
---
|
||||
|
||||
# Weak Workplan
|
||||
|
||||
## Purpose
|
||||
|
||||
There is not enough implementation shape here.
|
||||
|
||||
## Tasks
|
||||
|
||||
The task list is prose only.
|
||||
26
examples/documents/workplan-valid.md
Normal file
26
examples/documents/workplan-valid.md
Normal file
@@ -0,0 +1,26 @@
|
||||
---
|
||||
document_type: workplan
|
||||
id: MKTT-WP-EXAMPLE
|
||||
status: active
|
||||
---
|
||||
|
||||
# Example Workplan
|
||||
|
||||
## Purpose
|
||||
|
||||
Establish a focused implementation slice with enough structure for State Hub,
|
||||
human review, and follow-on implementation.
|
||||
|
||||
## Tasks
|
||||
|
||||
```task
|
||||
id: MKTT-WP-EXAMPLE-T001
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Implement the smallest practical behavior and verify it through the CLI.
|
||||
|
||||
## Decision Point
|
||||
|
||||
Continue only if diagnostics are clear enough for humans and agents.
|
||||
@@ -9,6 +9,18 @@ from markitect_tool.core import (
|
||||
parse_markdown,
|
||||
parse_markdown_file,
|
||||
)
|
||||
from markitect_tool.contract import (
|
||||
ContractCheckResult,
|
||||
ContractValidationResult,
|
||||
DocumentContract,
|
||||
check_document_contract,
|
||||
check_markdown_file,
|
||||
collect_metrics,
|
||||
load_contract_file,
|
||||
validate_contract,
|
||||
validate_contract_file,
|
||||
)
|
||||
from markitect_tool.diagnostics import Diagnostic, SourceLocation
|
||||
from markitect_tool.schema import (
|
||||
MarkdownSchema,
|
||||
SchemaValidationResult,
|
||||
@@ -32,4 +44,15 @@ __all__ = [
|
||||
"load_schema_file",
|
||||
"validate_document",
|
||||
"validate_markdown_file",
|
||||
"ContractCheckResult",
|
||||
"ContractValidationResult",
|
||||
"DocumentContract",
|
||||
"check_document_contract",
|
||||
"check_markdown_file",
|
||||
"collect_metrics",
|
||||
"load_contract_file",
|
||||
"validate_contract",
|
||||
"validate_contract_file",
|
||||
"Diagnostic",
|
||||
"SourceLocation",
|
||||
]
|
||||
|
||||
@@ -9,6 +9,13 @@ import click
|
||||
import yaml
|
||||
|
||||
from markitect_tool.core import parse_markdown_file
|
||||
from markitect_tool.contract import (
|
||||
ContractLoaderError,
|
||||
check_markdown_file,
|
||||
collect_metrics,
|
||||
load_contract_file,
|
||||
validate_contract,
|
||||
)
|
||||
from markitect_tool.schema import load_schema_file, validate_markdown_file, validate_schema
|
||||
|
||||
|
||||
@@ -41,6 +48,23 @@ def parse(file: Path, output_format: str) -> None:
|
||||
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
||||
|
||||
|
||||
@main.command()
|
||||
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||
@click.option(
|
||||
"--format",
|
||||
"output_format",
|
||||
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
|
||||
default="text",
|
||||
show_default=True,
|
||||
)
|
||||
def metrics(file: Path, output_format: str) -> None:
|
||||
"""Report practical size and complexity metrics for a Markdown file."""
|
||||
|
||||
document = parse_markdown_file(file)
|
||||
data = collect_metrics(document).to_dict() | {"document_path": str(file)}
|
||||
_emit_metrics(data, output_format)
|
||||
|
||||
|
||||
@main.command()
|
||||
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||
@click.option(
|
||||
@@ -88,6 +112,54 @@ def schema_validate(schema_file: Path, output_format: str) -> None:
|
||||
raise click.exceptions.Exit(0 if result.valid else 1)
|
||||
|
||||
|
||||
@main.group()
|
||||
def contract() -> None:
|
||||
"""Work with Markdown document contracts."""
|
||||
|
||||
|
||||
@contract.command("validate")
|
||||
@click.argument("contract_file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||
@click.option(
|
||||
"--format",
|
||||
"output_format",
|
||||
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
|
||||
default="text",
|
||||
show_default=True,
|
||||
)
|
||||
def contract_validate(contract_file: Path, output_format: str) -> None:
|
||||
"""Validate that a Markdown contract file is well formed."""
|
||||
|
||||
result = validate_contract(load_contract_file(contract_file))
|
||||
_emit_diagnostic_result(result.to_dict(), output_format)
|
||||
raise click.exceptions.Exit(0 if result.valid else 1)
|
||||
|
||||
|
||||
@contract.command("check")
|
||||
@click.argument("file", type=click.Path(exists=True, dir_okay=False, path_type=Path))
|
||||
@click.option(
|
||||
"--contract",
|
||||
"contract_file",
|
||||
required=True,
|
||||
type=click.Path(exists=True, dir_okay=False, path_type=Path),
|
||||
)
|
||||
@click.option(
|
||||
"--format",
|
||||
"output_format",
|
||||
type=click.Choice(["json", "yaml", "text"], case_sensitive=False),
|
||||
default="text",
|
||||
show_default=True,
|
||||
)
|
||||
def contract_check(file: Path, contract_file: Path, output_format: str) -> None:
|
||||
"""Check a Markdown file against a Markdown document contract."""
|
||||
|
||||
try:
|
||||
result = check_markdown_file(file, contract_file)
|
||||
except ContractLoaderError as exc:
|
||||
raise click.ClickException(str(exc)) from exc
|
||||
_emit_diagnostic_result(result.to_dict(), output_format)
|
||||
raise click.exceptions.Exit(0 if result.valid else 1)
|
||||
|
||||
|
||||
def _emit_result(data: dict, output_format: str) -> None:
|
||||
if output_format == "json":
|
||||
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
||||
@@ -102,5 +174,45 @@ def _emit_result(data: dict, output_format: str) -> None:
|
||||
click.echo(f"- {violation['path']}: {violation['message']}")
|
||||
|
||||
|
||||
def _emit_diagnostic_result(data: dict, output_format: str) -> None:
|
||||
if output_format == "json":
|
||||
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
||||
elif output_format == "yaml":
|
||||
click.echo(yaml.safe_dump(data, sort_keys=False))
|
||||
else:
|
||||
click.echo("valid" if data.get("valid") else "invalid")
|
||||
for diagnostic in data.get("diagnostics", []):
|
||||
click.echo(
|
||||
f"- [{diagnostic['severity']}] {diagnostic['code']}: "
|
||||
f"{diagnostic['message']}"
|
||||
)
|
||||
if diagnostic.get("source"):
|
||||
source = diagnostic["source"]
|
||||
suffix = f":{source['line']}" if source.get("line") else ""
|
||||
click.echo(f" source: {source.get('path', '<document>')}{suffix}")
|
||||
if diagnostic.get("guidance"):
|
||||
click.echo(f" guidance: {diagnostic['guidance']}")
|
||||
|
||||
|
||||
def _emit_metrics(data: dict, output_format: str) -> None:
|
||||
if output_format == "json":
|
||||
click.echo(json.dumps(data, indent=2, ensure_ascii=False))
|
||||
elif output_format == "yaml":
|
||||
click.echo(yaml.safe_dump(data, sort_keys=False))
|
||||
else:
|
||||
doc = data["document"]
|
||||
click.echo("document")
|
||||
for metric, value in doc.items():
|
||||
click.echo(f"- {metric}: {value}")
|
||||
sections = data.get("sections", [])
|
||||
if sections:
|
||||
click.echo("sections")
|
||||
for section in sections:
|
||||
click.echo(
|
||||
f"- {section['heading']}: words={section['words']}, "
|
||||
f"paragraphs={section['paragraphs']}, line={section['line']}"
|
||||
)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
||||
47
src/markitect_tool/contract/__init__.py
Normal file
47
src/markitect_tool/contract/__init__.py
Normal file
@@ -0,0 +1,47 @@
|
||||
"""Document contract loading, metrics, and validation."""
|
||||
|
||||
from markitect_tool.contract.checker import (
|
||||
ContractCheckResult,
|
||||
ContractValidationResult,
|
||||
check_document_contract,
|
||||
check_markdown_file,
|
||||
validate_contract,
|
||||
validate_contract_file,
|
||||
)
|
||||
from markitect_tool.contract.loader import (
|
||||
ContractLoaderError,
|
||||
ContractNotFoundError,
|
||||
InvalidContractFormatError,
|
||||
load_contract_file,
|
||||
load_contract_text,
|
||||
)
|
||||
from markitect_tool.contract.metrics import DocumentMetrics, SectionMetrics, collect_metrics
|
||||
from markitect_tool.contract.model import (
|
||||
AssertionSpec,
|
||||
DocumentContract,
|
||||
FieldSpec,
|
||||
MetricBand,
|
||||
SectionSpec,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"AssertionSpec",
|
||||
"ContractCheckResult",
|
||||
"ContractLoaderError",
|
||||
"ContractNotFoundError",
|
||||
"ContractValidationResult",
|
||||
"DocumentContract",
|
||||
"DocumentMetrics",
|
||||
"FieldSpec",
|
||||
"InvalidContractFormatError",
|
||||
"MetricBand",
|
||||
"SectionMetrics",
|
||||
"SectionSpec",
|
||||
"check_document_contract",
|
||||
"check_markdown_file",
|
||||
"collect_metrics",
|
||||
"load_contract_file",
|
||||
"load_contract_text",
|
||||
"validate_contract",
|
||||
"validate_contract_file",
|
||||
]
|
||||
945
src/markitect_tool/contract/checker.py
Normal file
945
src/markitect_tool/contract/checker.py
Normal file
@@ -0,0 +1,945 @@
|
||||
"""Validate contracts and check Markdown documents against them."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from markitect_tool.contract.loader import load_contract_file
|
||||
from markitect_tool.contract.metrics import DocumentMetrics, SectionMetrics, collect_metrics
|
||||
from markitect_tool.contract.model import (
|
||||
FIELD_TYPES,
|
||||
METRIC_NAMES,
|
||||
PRESENCE_VALUES,
|
||||
AssertionSpec,
|
||||
DocumentContract,
|
||||
FieldSpec,
|
||||
MetricBand,
|
||||
SectionSpec,
|
||||
normalize_metric_name,
|
||||
)
|
||||
from markitect_tool.core import Document, Section, parse_markdown_file
|
||||
from markitect_tool.diagnostics import (
|
||||
Diagnostic,
|
||||
SourceLocation,
|
||||
has_error,
|
||||
valid_severity,
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ContractValidationResult:
|
||||
"""Validation result for a contract definition."""
|
||||
|
||||
valid: bool
|
||||
diagnostics: list[Diagnostic]
|
||||
contract_path: str | None = None
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
data = {
|
||||
"valid": self.valid,
|
||||
"diagnostics": [diagnostic.to_dict() for diagnostic in self.diagnostics],
|
||||
"contract_path": self.contract_path,
|
||||
}
|
||||
return {key: value for key, value in data.items() if value is not None}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ContractCheckResult:
|
||||
"""Check result for one document and one contract."""
|
||||
|
||||
valid: bool
|
||||
diagnostics: list[Diagnostic]
|
||||
document_path: str | None = None
|
||||
contract_path: str | None = None
|
||||
metrics: dict[str, Any] = field(default_factory=dict)
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
data = {
|
||||
"valid": self.valid,
|
||||
"diagnostics": [diagnostic.to_dict() for diagnostic in self.diagnostics],
|
||||
"document_path": self.document_path,
|
||||
"contract_path": self.contract_path,
|
||||
"metrics": self.metrics or None,
|
||||
}
|
||||
return {key: value for key, value in data.items() if value is not None}
|
||||
|
||||
|
||||
def validate_contract_file(contract_path: str | Path) -> ContractValidationResult:
|
||||
"""Load and validate a Markdown contract file."""
|
||||
|
||||
return validate_contract(load_contract_file(contract_path))
|
||||
|
||||
|
||||
def validate_contract(contract: DocumentContract) -> ContractValidationResult:
|
||||
"""Validate the contract definition itself."""
|
||||
|
||||
diagnostics: list[Diagnostic] = []
|
||||
contract_location = _contract_location(contract)
|
||||
|
||||
if not contract.id:
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.id.missing",
|
||||
message="Contract must declare an id.",
|
||||
contract=contract_location,
|
||||
guidance="Add `id` to the contract YAML block or frontmatter.",
|
||||
)
|
||||
)
|
||||
if not contract.document_type:
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.document_type.missing",
|
||||
message="Contract must declare the document type it governs.",
|
||||
contract=contract_location,
|
||||
guidance="Add `document.type` or `document_type` to the contract.",
|
||||
)
|
||||
)
|
||||
|
||||
section_ids: set[str] = set()
|
||||
for section in contract.sections:
|
||||
diagnostics.extend(_validate_section_spec(section, contract))
|
||||
if section.id:
|
||||
if section.id in section_ids:
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.section.id.duplicate",
|
||||
message=f"Section id `{section.id}` is declared more than once.",
|
||||
contract=contract_location,
|
||||
rule_id=section.id,
|
||||
)
|
||||
)
|
||||
section_ids.add(section.id)
|
||||
|
||||
for field_spec in contract.fields:
|
||||
diagnostics.extend(_validate_field_spec(field_spec, contract))
|
||||
for band in contract.metrics:
|
||||
diagnostics.extend(_validate_metric_band(band, contract, rule_id=band.rule_id))
|
||||
for assertion in contract.assertions:
|
||||
diagnostics.extend(_validate_assertion(assertion, contract))
|
||||
|
||||
return ContractValidationResult(
|
||||
valid=not has_error(diagnostics),
|
||||
diagnostics=diagnostics,
|
||||
contract_path=contract.source_path,
|
||||
)
|
||||
|
||||
|
||||
def check_markdown_file(
|
||||
markdown_path: str | Path, contract_path: str | Path
|
||||
) -> ContractCheckResult:
|
||||
"""Parse and check a Markdown file against a contract file."""
|
||||
|
||||
document = parse_markdown_file(markdown_path)
|
||||
contract = load_contract_file(contract_path)
|
||||
return check_document_contract(document, contract)
|
||||
|
||||
|
||||
def check_document_contract(
|
||||
document: Document, contract: DocumentContract
|
||||
) -> ContractCheckResult:
|
||||
"""Check a parsed Markdown document against a document contract."""
|
||||
|
||||
contract_validation = validate_contract(contract)
|
||||
document_metrics = collect_metrics(document)
|
||||
diagnostics = list(contract_validation.diagnostics)
|
||||
if contract_validation.valid:
|
||||
diagnostics.extend(_check_document_type(document, contract))
|
||||
diagnostics.extend(_check_fields(document, contract))
|
||||
diagnostics.extend(_check_document_metrics(document, contract, document_metrics))
|
||||
diagnostics.extend(_check_assertions(document.body, contract.assertions, document, contract))
|
||||
diagnostics.extend(_check_sections(document, contract, document_metrics))
|
||||
|
||||
return ContractCheckResult(
|
||||
valid=not has_error(diagnostics),
|
||||
diagnostics=diagnostics,
|
||||
document_path=document.source_path,
|
||||
contract_path=contract.source_path,
|
||||
metrics=document_metrics.to_dict(),
|
||||
)
|
||||
|
||||
|
||||
def _validate_section_spec(
|
||||
section: SectionSpec, contract: DocumentContract
|
||||
) -> list[Diagnostic]:
|
||||
diagnostics: list[Diagnostic] = []
|
||||
contract_location = _contract_location(contract)
|
||||
if not section.id:
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.section.id.missing",
|
||||
message="Every section specification must declare an id.",
|
||||
contract=contract_location,
|
||||
)
|
||||
)
|
||||
if section.presence not in PRESENCE_VALUES:
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.section.presence.invalid",
|
||||
message=(
|
||||
f"Section `{section.id or '<missing>'}` uses unsupported presence "
|
||||
f"`{section.presence}`."
|
||||
),
|
||||
contract=contract_location,
|
||||
rule_id=section.id,
|
||||
)
|
||||
)
|
||||
if section.level is not None and not isinstance(section.level, int):
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.section.level.invalid",
|
||||
message=f"Section `{section.id}` level must be an integer.",
|
||||
contract=contract_location,
|
||||
rule_id=section.id,
|
||||
)
|
||||
)
|
||||
for band in section.metrics:
|
||||
diagnostics.extend(_validate_metric_band(band, contract, rule_id=section.id))
|
||||
for assertion in section.assertions:
|
||||
diagnostics.extend(_validate_assertion(assertion, contract))
|
||||
return diagnostics
|
||||
|
||||
|
||||
def _validate_field_spec(field_spec: FieldSpec, contract: DocumentContract) -> list[Diagnostic]:
|
||||
diagnostics: list[Diagnostic] = []
|
||||
contract_location = _contract_location(contract)
|
||||
if not field_spec.id:
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.field.id.missing",
|
||||
message="Every field specification must declare an id.",
|
||||
contract=contract_location,
|
||||
)
|
||||
)
|
||||
if field_spec.type and field_spec.type not in FIELD_TYPES:
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.field.type.invalid",
|
||||
message=f"Field `{field_spec.id}` uses unsupported type `{field_spec.type}`.",
|
||||
contract=contract_location,
|
||||
rule_id=field_spec.id,
|
||||
)
|
||||
)
|
||||
if field_spec.pattern:
|
||||
diagnostics.extend(_validate_regex(field_spec.pattern, contract, field_spec.id))
|
||||
return diagnostics
|
||||
|
||||
|
||||
def _validate_metric_band(
|
||||
band: MetricBand, contract: DocumentContract, rule_id: str | None = None
|
||||
) -> list[Diagnostic]:
|
||||
diagnostics: list[Diagnostic] = []
|
||||
contract_location = _contract_location(contract)
|
||||
if not isinstance(band.raw, dict):
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.metric.band.invalid",
|
||||
message=f"Metric `{band.metric}` band must be a mapping.",
|
||||
contract=contract_location,
|
||||
rule_id=rule_id,
|
||||
)
|
||||
)
|
||||
return diagnostics
|
||||
if band.metric not in METRIC_NAMES:
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.metric.unknown",
|
||||
message=f"Unsupported metric `{band.metric}`.",
|
||||
contract=contract_location,
|
||||
rule_id=rule_id,
|
||||
)
|
||||
)
|
||||
for severity in {band.severity, band.min_severity, band.max_severity}:
|
||||
if severity is not None and not valid_severity(severity):
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.severity.invalid",
|
||||
message=f"Unsupported severity `{severity}` for metric `{band.metric}`.",
|
||||
contract=contract_location,
|
||||
rule_id=rule_id,
|
||||
)
|
||||
)
|
||||
if band.min is None and band.max is None:
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.metric.band.empty",
|
||||
message=f"Metric `{band.metric}` needs at least one of min or max.",
|
||||
contract=contract_location,
|
||||
rule_id=rule_id,
|
||||
)
|
||||
)
|
||||
if band.min is not None and not isinstance(band.min, int | float):
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.metric.min.invalid",
|
||||
message=f"Metric `{band.metric}` min must be numeric.",
|
||||
contract=contract_location,
|
||||
rule_id=rule_id,
|
||||
)
|
||||
)
|
||||
if band.max is not None and not isinstance(band.max, int | float):
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.metric.max.invalid",
|
||||
message=f"Metric `{band.metric}` max must be numeric.",
|
||||
contract=contract_location,
|
||||
rule_id=rule_id,
|
||||
)
|
||||
)
|
||||
if (
|
||||
isinstance(band.min, int | float)
|
||||
and isinstance(band.max, int | float)
|
||||
and band.min > band.max
|
||||
):
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.metric.band.inverted",
|
||||
message=f"Metric `{band.metric}` min cannot be greater than max.",
|
||||
contract=contract_location,
|
||||
rule_id=rule_id,
|
||||
)
|
||||
)
|
||||
return diagnostics
|
||||
|
||||
|
||||
def _validate_assertion(
|
||||
assertion: AssertionSpec, contract: DocumentContract
|
||||
) -> list[Diagnostic]:
|
||||
diagnostics: list[Diagnostic] = []
|
||||
contract_location = _contract_location(contract)
|
||||
if not valid_severity(assertion.severity):
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.severity.invalid",
|
||||
message=f"Unsupported assertion severity `{assertion.severity}`.",
|
||||
contract=contract_location,
|
||||
rule_id=assertion.id,
|
||||
)
|
||||
)
|
||||
if not any(
|
||||
[
|
||||
assertion.contains,
|
||||
assertion.contains_any,
|
||||
assertion.not_contains,
|
||||
assertion.matches,
|
||||
assertion.not_matches,
|
||||
]
|
||||
):
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.assertion.empty",
|
||||
message="Assertion needs at least one deterministic condition.",
|
||||
contract=contract_location,
|
||||
rule_id=assertion.id,
|
||||
)
|
||||
)
|
||||
for pattern in assertion.matches + assertion.not_matches:
|
||||
diagnostics.extend(_validate_regex(pattern, contract, assertion.id))
|
||||
return diagnostics
|
||||
|
||||
|
||||
def _validate_regex(
|
||||
pattern: str, contract: DocumentContract, rule_id: str | None
|
||||
) -> list[Diagnostic]:
|
||||
try:
|
||||
re.compile(pattern)
|
||||
except re.error as exc:
|
||||
return [
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.regex.invalid",
|
||||
message=f"Invalid regular expression `{pattern}`: {exc}",
|
||||
contract=_contract_location(contract),
|
||||
rule_id=rule_id,
|
||||
)
|
||||
]
|
||||
return []
|
||||
|
||||
|
||||
def _check_document_type(document: Document, contract: DocumentContract) -> list[Diagnostic]:
|
||||
declared = (
|
||||
document.frontmatter.get("document_type")
|
||||
or document.frontmatter.get("document-type")
|
||||
or document.frontmatter.get("type")
|
||||
)
|
||||
if not declared or not contract.document_type or str(declared) == contract.document_type:
|
||||
return []
|
||||
return [
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.document_type.mismatch",
|
||||
message=(
|
||||
f"Document declares type `{declared}`, but contract expects "
|
||||
f"`{contract.document_type}`."
|
||||
),
|
||||
source=SourceLocation(path=document.source_path, line=1),
|
||||
contract=_contract_location(contract),
|
||||
rule_id=contract.id,
|
||||
guidance="Use the matching contract or update the document frontmatter type.",
|
||||
)
|
||||
]
|
||||
|
||||
|
||||
def _check_fields(document: Document, contract: DocumentContract) -> list[Diagnostic]:
|
||||
diagnostics: list[Diagnostic] = []
|
||||
document_data = document.to_dict()
|
||||
for field_spec in contract.fields:
|
||||
value, exists = _resolve_path(document_data, field_spec.path or "")
|
||||
field_location = SourceLocation(path=document.source_path, line=1)
|
||||
if field_spec.required and not exists:
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.field.missing",
|
||||
message=f"Required field `{field_spec.id}` is missing.",
|
||||
source=field_location,
|
||||
contract=_contract_location(contract),
|
||||
rule_id=field_spec.id,
|
||||
guidance=f"Provide `{field_spec.path}` in the document or context.",
|
||||
)
|
||||
)
|
||||
continue
|
||||
if not exists:
|
||||
continue
|
||||
diagnostics.extend(_check_field_value(field_spec, value, field_location, contract))
|
||||
return diagnostics
|
||||
|
||||
|
||||
def _check_field_value(
|
||||
field_spec: FieldSpec,
|
||||
value: Any,
|
||||
field_location: SourceLocation,
|
||||
contract: DocumentContract,
|
||||
) -> list[Diagnostic]:
|
||||
diagnostics: list[Diagnostic] = []
|
||||
if field_spec.type and not _value_matches_type(value, field_spec.type):
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.field.type_mismatch",
|
||||
message=(
|
||||
f"Field `{field_spec.id}` must be `{field_spec.type}`, "
|
||||
f"got `{type(value).__name__}`."
|
||||
),
|
||||
source=field_location,
|
||||
contract=_contract_location(contract),
|
||||
rule_id=field_spec.id,
|
||||
)
|
||||
)
|
||||
if field_spec.enum is not None and value not in field_spec.enum:
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.field.enum",
|
||||
message=f"Field `{field_spec.id}` must be one of {field_spec.enum}.",
|
||||
source=field_location,
|
||||
contract=_contract_location(contract),
|
||||
rule_id=field_spec.id,
|
||||
)
|
||||
)
|
||||
if field_spec.pattern and isinstance(value, str) and not re.search(field_spec.pattern, value):
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.field.pattern",
|
||||
message=f"Field `{field_spec.id}` does not match its required pattern.",
|
||||
source=field_location,
|
||||
contract=_contract_location(contract),
|
||||
rule_id=field_spec.id,
|
||||
)
|
||||
)
|
||||
if field_spec.min_length is not None and hasattr(value, "__len__") and len(value) < field_spec.min_length:
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.field.min_length",
|
||||
message=f"Field `{field_spec.id}` is shorter than {field_spec.min_length}.",
|
||||
source=field_location,
|
||||
contract=_contract_location(contract),
|
||||
rule_id=field_spec.id,
|
||||
)
|
||||
)
|
||||
if field_spec.max_length is not None and hasattr(value, "__len__") and len(value) > field_spec.max_length:
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.field.max_length",
|
||||
message=f"Field `{field_spec.id}` is longer than {field_spec.max_length}.",
|
||||
source=field_location,
|
||||
contract=_contract_location(contract),
|
||||
rule_id=field_spec.id,
|
||||
)
|
||||
)
|
||||
if field_spec.min is not None and isinstance(value, int | float) and value < field_spec.min:
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.field.min",
|
||||
message=f"Field `{field_spec.id}` is below {field_spec.min}.",
|
||||
source=field_location,
|
||||
contract=_contract_location(contract),
|
||||
rule_id=field_spec.id,
|
||||
)
|
||||
)
|
||||
if field_spec.max is not None and isinstance(value, int | float) and value > field_spec.max:
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.field.max",
|
||||
message=f"Field `{field_spec.id}` is above {field_spec.max}.",
|
||||
source=field_location,
|
||||
contract=_contract_location(contract),
|
||||
rule_id=field_spec.id,
|
||||
)
|
||||
)
|
||||
return diagnostics
|
||||
|
||||
|
||||
def _check_document_metrics(
|
||||
document: Document,
|
||||
contract: DocumentContract,
|
||||
metrics: DocumentMetrics,
|
||||
) -> list[Diagnostic]:
|
||||
return _check_bands(
|
||||
contract.metrics,
|
||||
metrics.to_dict()["document"],
|
||||
source=SourceLocation(path=document.source_path, line=1),
|
||||
contract=contract,
|
||||
subject=f"document `{contract.document_type or contract.id}`",
|
||||
)
|
||||
|
||||
|
||||
def _check_sections(
|
||||
document: Document,
|
||||
contract: DocumentContract,
|
||||
metrics: DocumentMetrics,
|
||||
) -> list[Diagnostic]:
|
||||
diagnostics: list[Diagnostic] = []
|
||||
section_metrics_by_index = {
|
||||
index: section_metrics
|
||||
for index, section_metrics in enumerate(metrics.section_metrics)
|
||||
}
|
||||
matches_by_id: dict[str, list[tuple[int, Section]]] = {}
|
||||
|
||||
for section_spec in contract.sections:
|
||||
matches = _matching_sections(document.sections, section_spec)
|
||||
if section_spec.id:
|
||||
matches_by_id[section_spec.id] = matches
|
||||
diagnostics.extend(_check_section_presence(document, contract, section_spec, matches))
|
||||
if not matches or section_spec.presence in {"forbidden", "discouraged"}:
|
||||
continue
|
||||
|
||||
if len(matches) > 1:
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="warning",
|
||||
code="contract.section.duplicate",
|
||||
message=f"Section `{section_spec.id}` appears {len(matches)} times.",
|
||||
source=SourceLocation(path=document.source_path, line=matches[1][1].heading.line),
|
||||
contract=_contract_location(contract),
|
||||
rule_id=section_spec.id,
|
||||
guidance="Keep one authoritative section or split it into distinct section roles.",
|
||||
)
|
||||
)
|
||||
for index, section in matches:
|
||||
diagnostics.extend(_check_section_level(document, contract, section_spec, section))
|
||||
section_metrics = section_metrics_by_index[index]
|
||||
diagnostics.extend(
|
||||
_check_section_metrics(document, section, section_metrics, contract, section_spec)
|
||||
)
|
||||
section_text = "\n".join(block.text for block in section.blocks if block.text)
|
||||
diagnostics.extend(
|
||||
_check_assertions(section_text, section_spec.assertions, document, contract, section)
|
||||
)
|
||||
|
||||
diagnostics.extend(_check_ordering(document, contract, matches_by_id))
|
||||
return diagnostics
|
||||
|
||||
|
||||
def _matching_sections(
|
||||
sections: list[Section], section_spec: SectionSpec
|
||||
) -> list[tuple[int, Section]]:
|
||||
expected = {_normalize_heading(value) for value in section_spec.headings}
|
||||
if not expected:
|
||||
return []
|
||||
return [
|
||||
(index, section)
|
||||
for index, section in enumerate(sections)
|
||||
if _normalize_heading(section.heading.text) in expected
|
||||
]
|
||||
|
||||
|
||||
def _check_section_presence(
|
||||
document: Document,
|
||||
contract: DocumentContract,
|
||||
section_spec: SectionSpec,
|
||||
matches: list[tuple[int, Section]],
|
||||
) -> list[Diagnostic]:
|
||||
if matches and section_spec.presence == "forbidden":
|
||||
return [
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.section.forbidden",
|
||||
message=f"Forbidden section `{section_spec.id}` is present.",
|
||||
source=SourceLocation(path=document.source_path, line=matches[0][1].heading.line),
|
||||
contract=_contract_location(contract),
|
||||
rule_id=section_spec.id,
|
||||
guidance=f"Remove the `{matches[0][1].heading.text}` section.",
|
||||
)
|
||||
]
|
||||
if matches and section_spec.presence == "discouraged":
|
||||
return [
|
||||
Diagnostic(
|
||||
severity="warning",
|
||||
code="contract.section.discouraged",
|
||||
message=f"Discouraged section `{section_spec.id}` is present.",
|
||||
source=SourceLocation(path=document.source_path, line=matches[0][1].heading.line),
|
||||
contract=_contract_location(contract),
|
||||
rule_id=section_spec.id,
|
||||
)
|
||||
]
|
||||
if not matches and section_spec.presence == "required":
|
||||
return [
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.section.missing",
|
||||
message=f"Required section `{section_spec.id}` is missing.",
|
||||
source=SourceLocation(path=document.source_path),
|
||||
contract=_contract_location(contract),
|
||||
rule_id=section_spec.id,
|
||||
guidance=_section_guidance(section_spec),
|
||||
)
|
||||
]
|
||||
if not matches and section_spec.presence == "recommended":
|
||||
return [
|
||||
Diagnostic(
|
||||
severity="warning",
|
||||
code="contract.section.recommended_missing",
|
||||
message=f"Recommended section `{section_spec.id}` is missing.",
|
||||
source=SourceLocation(path=document.source_path),
|
||||
contract=_contract_location(contract),
|
||||
rule_id=section_spec.id,
|
||||
guidance=_section_guidance(section_spec),
|
||||
)
|
||||
]
|
||||
return []
|
||||
|
||||
|
||||
def _check_section_level(
|
||||
document: Document,
|
||||
contract: DocumentContract,
|
||||
section_spec: SectionSpec,
|
||||
section: Section,
|
||||
) -> list[Diagnostic]:
|
||||
if section_spec.level is None or section.heading.level == section_spec.level:
|
||||
return []
|
||||
return [
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.section.level",
|
||||
message=(
|
||||
f"Section `{section_spec.id}` must use heading level "
|
||||
f"{section_spec.level}, got {section.heading.level}."
|
||||
),
|
||||
source=SourceLocation(path=document.source_path, line=section.heading.line),
|
||||
contract=_contract_location(contract),
|
||||
rule_id=section_spec.id,
|
||||
guidance=f"Change the heading to {'#' * section_spec.level} {section.heading.text}.",
|
||||
)
|
||||
]
|
||||
|
||||
|
||||
def _check_section_metrics(
|
||||
document: Document,
|
||||
section: Section,
|
||||
section_metrics: SectionMetrics,
|
||||
contract: DocumentContract,
|
||||
section_spec: SectionSpec,
|
||||
) -> list[Diagnostic]:
|
||||
return _check_bands(
|
||||
section_spec.metrics,
|
||||
section_metrics.to_dict(),
|
||||
source=SourceLocation(path=document.source_path, line=section.heading.line),
|
||||
contract=contract,
|
||||
subject=f"section `{section.heading.text}`",
|
||||
rule_id=section_spec.id,
|
||||
)
|
||||
|
||||
|
||||
def _check_ordering(
|
||||
document: Document,
|
||||
contract: DocumentContract,
|
||||
matches_by_id: dict[str, list[tuple[int, Section]]],
|
||||
) -> list[Diagnostic]:
|
||||
diagnostics: list[Diagnostic] = []
|
||||
for section_spec in contract.sections:
|
||||
if not section_spec.id or not matches_by_id.get(section_spec.id):
|
||||
continue
|
||||
index = matches_by_id[section_spec.id][0][0]
|
||||
for target in section_spec.order_before:
|
||||
target_match = matches_by_id.get(target)
|
||||
if target_match and index > target_match[0][0]:
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.section.order",
|
||||
message=f"Section `{section_spec.id}` must appear before `{target}`.",
|
||||
source=SourceLocation(
|
||||
path=document.source_path,
|
||||
line=matches_by_id[section_spec.id][0][1].heading.line,
|
||||
),
|
||||
contract=_contract_location(contract),
|
||||
rule_id=section_spec.id,
|
||||
)
|
||||
)
|
||||
for target in section_spec.order_after:
|
||||
target_match = matches_by_id.get(target)
|
||||
if target_match and index < target_match[0][0]:
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity="error",
|
||||
code="contract.section.order",
|
||||
message=f"Section `{section_spec.id}` must appear after `{target}`.",
|
||||
source=SourceLocation(
|
||||
path=document.source_path,
|
||||
line=matches_by_id[section_spec.id][0][1].heading.line,
|
||||
),
|
||||
contract=_contract_location(contract),
|
||||
rule_id=section_spec.id,
|
||||
)
|
||||
)
|
||||
return diagnostics
|
||||
|
||||
|
||||
def _check_bands(
|
||||
bands: list[MetricBand],
|
||||
values: dict[str, Any],
|
||||
*,
|
||||
source: SourceLocation,
|
||||
contract: DocumentContract,
|
||||
subject: str,
|
||||
rule_id: str | None = None,
|
||||
) -> list[Diagnostic]:
|
||||
diagnostics: list[Diagnostic] = []
|
||||
for band in bands:
|
||||
metric = normalize_metric_name(band.metric)
|
||||
if metric not in values:
|
||||
continue
|
||||
actual = values[metric]
|
||||
if band.min is not None and actual < band.min:
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity=band.severity_for("min"),
|
||||
code="contract.metric.too_low",
|
||||
message=(
|
||||
f"{subject} has {actual} {metric}; expected at least {band.min}."
|
||||
),
|
||||
source=source,
|
||||
contract=_contract_location(contract),
|
||||
rule_id=band.rule_id or rule_id,
|
||||
guidance=band.guidance,
|
||||
details={"metric": metric, "actual": actual, "min": band.min},
|
||||
)
|
||||
)
|
||||
if band.max is not None and actual > band.max:
|
||||
diagnostics.append(
|
||||
Diagnostic(
|
||||
severity=band.severity_for("max"),
|
||||
code="contract.metric.too_high",
|
||||
message=f"{subject} has {actual} {metric}; expected at most {band.max}.",
|
||||
source=source,
|
||||
contract=_contract_location(contract),
|
||||
rule_id=band.rule_id or rule_id,
|
||||
guidance=band.guidance,
|
||||
details={"metric": metric, "actual": actual, "max": band.max},
|
||||
)
|
||||
)
|
||||
return diagnostics
|
||||
|
||||
|
||||
def _check_assertions(
|
||||
text: str,
|
||||
assertions: list[AssertionSpec],
|
||||
document: Document,
|
||||
contract: DocumentContract,
|
||||
section: Section | None = None,
|
||||
) -> list[Diagnostic]:
|
||||
diagnostics: list[Diagnostic] = []
|
||||
source_line = section.heading.line if section else 1
|
||||
for assertion in assertions:
|
||||
diagnostics.extend(
|
||||
_check_assertion(
|
||||
text,
|
||||
assertion,
|
||||
source=SourceLocation(path=document.source_path, line=source_line),
|
||||
contract=contract,
|
||||
)
|
||||
)
|
||||
return diagnostics
|
||||
|
||||
|
||||
def _check_assertion(
|
||||
text: str,
|
||||
assertion: AssertionSpec,
|
||||
*,
|
||||
source: SourceLocation,
|
||||
contract: DocumentContract,
|
||||
) -> list[Diagnostic]:
|
||||
diagnostics: list[Diagnostic] = []
|
||||
haystack = text if assertion.case_sensitive else text.lower()
|
||||
|
||||
for needle in assertion.contains:
|
||||
expected = needle if assertion.case_sensitive else needle.lower()
|
||||
if expected not in haystack:
|
||||
diagnostics.append(
|
||||
_assertion_diagnostic(
|
||||
assertion,
|
||||
"contract.assertion.contains_missing",
|
||||
assertion.message or f"Expected content to contain `{needle}`.",
|
||||
source,
|
||||
contract,
|
||||
{"expected": needle},
|
||||
)
|
||||
)
|
||||
|
||||
if assertion.contains_any:
|
||||
if not any(
|
||||
(needle if assertion.case_sensitive else needle.lower()) in haystack
|
||||
for needle in assertion.contains_any
|
||||
):
|
||||
diagnostics.append(
|
||||
_assertion_diagnostic(
|
||||
assertion,
|
||||
"contract.assertion.contains_any_missing",
|
||||
assertion.message
|
||||
or f"Expected content to contain one of {assertion.contains_any}.",
|
||||
source,
|
||||
contract,
|
||||
{"expected_any": assertion.contains_any},
|
||||
)
|
||||
)
|
||||
|
||||
for needle in assertion.not_contains:
|
||||
forbidden = needle if assertion.case_sensitive else needle.lower()
|
||||
if forbidden in haystack:
|
||||
diagnostics.append(
|
||||
_assertion_diagnostic(
|
||||
assertion,
|
||||
"contract.assertion.forbidden_content",
|
||||
assertion.message or f"Content must not contain `{needle}`.",
|
||||
source,
|
||||
contract,
|
||||
{"forbidden": needle},
|
||||
)
|
||||
)
|
||||
|
||||
regex_flags = 0 if assertion.case_sensitive else re.IGNORECASE
|
||||
for pattern in assertion.matches:
|
||||
if not re.search(pattern, text, flags=regex_flags | re.MULTILINE):
|
||||
diagnostics.append(
|
||||
_assertion_diagnostic(
|
||||
assertion,
|
||||
"contract.assertion.pattern_missing",
|
||||
assertion.message or f"Expected content to match `{pattern}`.",
|
||||
source,
|
||||
contract,
|
||||
{"pattern": pattern},
|
||||
)
|
||||
)
|
||||
for pattern in assertion.not_matches:
|
||||
if re.search(pattern, text, flags=regex_flags | re.MULTILINE):
|
||||
diagnostics.append(
|
||||
_assertion_diagnostic(
|
||||
assertion,
|
||||
"contract.assertion.forbidden_pattern",
|
||||
assertion.message or f"Content must not match `{pattern}`.",
|
||||
source,
|
||||
contract,
|
||||
{"pattern": pattern},
|
||||
)
|
||||
)
|
||||
return diagnostics
|
||||
|
||||
|
||||
def _assertion_diagnostic(
|
||||
assertion: AssertionSpec,
|
||||
code: str,
|
||||
message: str,
|
||||
source: SourceLocation,
|
||||
contract: DocumentContract,
|
||||
details: dict[str, Any],
|
||||
) -> Diagnostic:
|
||||
return Diagnostic(
|
||||
severity=assertion.severity,
|
||||
code=code,
|
||||
message=message,
|
||||
source=source,
|
||||
contract=_contract_location(contract),
|
||||
rule_id=assertion.id,
|
||||
guidance=assertion.guidance,
|
||||
details=details,
|
||||
)
|
||||
|
||||
|
||||
def _section_guidance(section_spec: SectionSpec) -> str:
|
||||
heading = section_spec.title or (section_spec.headings[0] if section_spec.headings else section_spec.id)
|
||||
level = section_spec.level or 2
|
||||
return f"Add a {'#' * level} {heading} section."
|
||||
|
||||
|
||||
def _contract_location(contract: DocumentContract) -> SourceLocation:
|
||||
return SourceLocation(path=contract.source_path, line=contract.source_line)
|
||||
|
||||
|
||||
def _normalize_heading(text: str) -> str:
|
||||
return re.sub(r"\s+", " ", text.strip().lower())
|
||||
|
||||
|
||||
def _resolve_path(data: dict[str, Any], path: str) -> tuple[Any, bool]:
|
||||
if not path:
|
||||
return None, False
|
||||
normalized = path.removeprefix("$.").removeprefix("document.")
|
||||
current: Any = data
|
||||
for part in normalized.split("."):
|
||||
if isinstance(current, dict) and part in current:
|
||||
current = current[part]
|
||||
else:
|
||||
return None, False
|
||||
return current, True
|
||||
|
||||
|
||||
def _value_matches_type(value: Any, expected_type: str) -> bool:
|
||||
if expected_type == "string":
|
||||
return isinstance(value, str)
|
||||
if expected_type == "number":
|
||||
return isinstance(value, int | float) and not isinstance(value, bool)
|
||||
if expected_type == "integer":
|
||||
return isinstance(value, int) and not isinstance(value, bool)
|
||||
if expected_type == "boolean":
|
||||
return isinstance(value, bool)
|
||||
if expected_type == "array":
|
||||
return isinstance(value, list)
|
||||
if expected_type == "object":
|
||||
return isinstance(value, dict)
|
||||
if expected_type == "date":
|
||||
return isinstance(value, str)
|
||||
return True
|
||||
142
src/markitect_tool/contract/loader.py
Normal file
142
src/markitect_tool/contract/loader.py
Normal file
@@ -0,0 +1,142 @@
|
||||
"""Load document contracts from Markdown files."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from copy import deepcopy
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import yaml
|
||||
|
||||
from markitect_tool.contract.model import DocumentContract
|
||||
from markitect_tool.core import parse_markdown
|
||||
|
||||
|
||||
class ContractLoaderError(ValueError):
|
||||
"""Raised when a contract file cannot be loaded."""
|
||||
|
||||
|
||||
class ContractNotFoundError(ContractLoaderError):
|
||||
"""Raised when no contract definition can be found in a Markdown file."""
|
||||
|
||||
|
||||
class InvalidContractFormatError(ContractLoaderError):
|
||||
"""Raised when the contract definition is not valid YAML."""
|
||||
|
||||
|
||||
def load_contract_file(path: str | Path) -> DocumentContract:
|
||||
"""Load a Markdown-native document contract file."""
|
||||
|
||||
file_path = Path(path)
|
||||
text = file_path.read_text(encoding="utf-8")
|
||||
return load_contract_text(text, source_path=str(file_path))
|
||||
|
||||
|
||||
def load_contract_text(text: str, source_path: str | None = None) -> DocumentContract:
|
||||
"""Load a document contract from Markdown text."""
|
||||
|
||||
document = parse_markdown(text, source_path=source_path)
|
||||
frontmatter_contract = document.frontmatter.get("contract")
|
||||
if frontmatter_contract is not None and not isinstance(frontmatter_contract, dict):
|
||||
raise InvalidContractFormatError("Frontmatter `contract` must be a mapping")
|
||||
|
||||
block_data, block_line = _extract_contract_block(document.tokens, source_path)
|
||||
merged = _merge_contracts(frontmatter_contract or {}, block_data or {})
|
||||
|
||||
metadata = {
|
||||
key: value
|
||||
for key, value in document.frontmatter.items()
|
||||
if key != "contract"
|
||||
}
|
||||
if not merged and _looks_like_contract(metadata):
|
||||
merged = deepcopy(metadata)
|
||||
if not merged:
|
||||
raise ContractNotFoundError(
|
||||
"No contract definition found. Add a fenced ```yaml contract block."
|
||||
)
|
||||
return DocumentContract.from_mapping(
|
||||
merged,
|
||||
metadata=metadata,
|
||||
source_path=source_path,
|
||||
source_line=block_line,
|
||||
)
|
||||
|
||||
|
||||
def _extract_contract_block(
|
||||
tokens: list[dict[str, Any]], source_path: str | None
|
||||
) -> tuple[dict[str, Any] | None, int | None]:
|
||||
yaml_candidates: list[tuple[dict[str, Any], int | None, bool]] = []
|
||||
for token in tokens:
|
||||
if token.get("type") != "fence":
|
||||
continue
|
||||
info = str(token.get("info", "")).strip().lower()
|
||||
if not _is_yaml_info(info):
|
||||
continue
|
||||
line = _token_line(token)
|
||||
raw_yaml = token.get("content", "")
|
||||
try:
|
||||
data = yaml.safe_load(raw_yaml) if raw_yaml.strip() else {}
|
||||
except yaml.YAMLError as exc:
|
||||
raise InvalidContractFormatError(
|
||||
f"Invalid YAML contract block in {source_path or '<string>'}: {exc}"
|
||||
) from exc
|
||||
if data is None:
|
||||
data = {}
|
||||
if not isinstance(data, dict):
|
||||
raise InvalidContractFormatError("Contract YAML block must be a mapping")
|
||||
yaml_candidates.append((data, line, "contract" in info.split()))
|
||||
|
||||
for data, line, explicit in yaml_candidates:
|
||||
if explicit:
|
||||
return data, line
|
||||
for data, line, _explicit in yaml_candidates:
|
||||
if _looks_like_contract(data):
|
||||
return data, line
|
||||
return None, None
|
||||
|
||||
|
||||
def _is_yaml_info(info: str) -> bool:
|
||||
parts = info.split()
|
||||
return "yaml" in parts or "yml" in parts
|
||||
|
||||
|
||||
def _token_line(token: dict[str, Any]) -> int | None:
|
||||
token_map = token.get("map")
|
||||
if not token_map:
|
||||
return None
|
||||
return int(token_map[0]) + 1
|
||||
|
||||
|
||||
def _looks_like_contract(data: dict[str, Any]) -> bool:
|
||||
return any(
|
||||
key in data
|
||||
for key in {
|
||||
"document",
|
||||
"document_type",
|
||||
"document-type",
|
||||
"sections",
|
||||
"fields",
|
||||
"metrics",
|
||||
"metric_bands",
|
||||
"assertions",
|
||||
"forms",
|
||||
"rubrics",
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
def _merge_contracts(
|
||||
frontmatter_contract: dict[str, Any], block_contract: dict[str, Any]
|
||||
) -> dict[str, Any]:
|
||||
merged = deepcopy(frontmatter_contract)
|
||||
for key, value in block_contract.items():
|
||||
if (
|
||||
isinstance(value, dict)
|
||||
and isinstance(merged.get(key), dict)
|
||||
):
|
||||
nested = deepcopy(merged[key])
|
||||
nested.update(value)
|
||||
merged[key] = nested
|
||||
else:
|
||||
merged[key] = value
|
||||
return merged
|
||||
127
src/markitect_tool/contract/metrics.py
Normal file
127
src/markitect_tool/contract/metrics.py
Normal file
@@ -0,0 +1,127 @@
|
||||
"""Metric extraction for parsed Markdown documents."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any
|
||||
|
||||
from markitect_tool.core import Document, Section
|
||||
|
||||
|
||||
WORD_RE = re.compile(r"[A-Za-z0-9]+(?:[-'][A-Za-z0-9]+)*")
|
||||
SENTENCE_RE = re.compile(r"[.!?]+(?:\s|$)")
|
||||
LIST_ITEM_RE = re.compile(r"^\s*(?:[-+*]|\d+[.)])\s+", re.MULTILINE)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class SectionMetrics:
|
||||
"""Metrics for one heading-led section."""
|
||||
|
||||
heading: str
|
||||
line: int
|
||||
level: int
|
||||
characters: int
|
||||
words: int
|
||||
sentences: int
|
||||
paragraphs: int
|
||||
sections: int = 1
|
||||
headings: int = 1
|
||||
list_items: int = 0
|
||||
code_blocks: int = 0
|
||||
nesting_depth: int = 1
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
return {
|
||||
"heading": self.heading,
|
||||
"line": self.line,
|
||||
"level": self.level,
|
||||
"characters": self.characters,
|
||||
"words": self.words,
|
||||
"sentences": self.sentences,
|
||||
"paragraphs": self.paragraphs,
|
||||
"sections": self.sections,
|
||||
"headings": self.headings,
|
||||
"list_items": self.list_items,
|
||||
"code_blocks": self.code_blocks,
|
||||
"nesting_depth": self.nesting_depth,
|
||||
}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class DocumentMetrics:
|
||||
"""Metrics for a parsed Markdown document."""
|
||||
|
||||
characters: int
|
||||
words: int
|
||||
sentences: int
|
||||
paragraphs: int
|
||||
sections: int
|
||||
headings: int
|
||||
list_items: int
|
||||
code_blocks: int
|
||||
max_heading_depth: int
|
||||
section_metrics: list[SectionMetrics] = field(default_factory=list)
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
return {
|
||||
"document": {
|
||||
"characters": self.characters,
|
||||
"words": self.words,
|
||||
"sentences": self.sentences,
|
||||
"paragraphs": self.paragraphs,
|
||||
"sections": self.sections,
|
||||
"headings": self.headings,
|
||||
"list_items": self.list_items,
|
||||
"code_blocks": self.code_blocks,
|
||||
"max_heading_depth": self.max_heading_depth,
|
||||
},
|
||||
"sections": [section.to_dict() for section in self.section_metrics],
|
||||
}
|
||||
|
||||
|
||||
def collect_metrics(document: Document) -> DocumentMetrics:
|
||||
"""Collect document-level and section-level metrics."""
|
||||
|
||||
section_metrics = [_section_metrics(section) for section in document.sections]
|
||||
text = document.body.strip()
|
||||
return DocumentMetrics(
|
||||
characters=len(text),
|
||||
words=count_words(text),
|
||||
sentences=count_sentences(text),
|
||||
paragraphs=sum(1 for block in document.blocks if block.type == "paragraph"),
|
||||
sections=len(document.sections),
|
||||
headings=len(document.headings),
|
||||
list_items=count_list_items(text),
|
||||
code_blocks=sum(1 for block in document.blocks if block.type == "code"),
|
||||
max_heading_depth=max((heading.level for heading in document.headings), default=0),
|
||||
section_metrics=section_metrics,
|
||||
)
|
||||
|
||||
|
||||
def count_words(text: str) -> int:
|
||||
return len(WORD_RE.findall(text))
|
||||
|
||||
|
||||
def count_sentences(text: str) -> int:
|
||||
return len(SENTENCE_RE.findall(text))
|
||||
|
||||
|
||||
def count_list_items(text: str) -> int:
|
||||
return len(LIST_ITEM_RE.findall(text))
|
||||
|
||||
|
||||
def _section_metrics(section: Section) -> SectionMetrics:
|
||||
text = "\n".join(block.text for block in section.blocks if block.text).strip()
|
||||
return SectionMetrics(
|
||||
heading=section.heading.text,
|
||||
line=section.heading.line,
|
||||
level=section.heading.level,
|
||||
characters=len(text),
|
||||
words=count_words(text),
|
||||
sentences=count_sentences(text),
|
||||
paragraphs=sum(1 for block in section.blocks if block.type == "paragraph"),
|
||||
list_items=count_list_items(text),
|
||||
code_blocks=sum(1 for block in section.blocks if block.type == "code"),
|
||||
nesting_depth=section.heading.level,
|
||||
)
|
||||
364
src/markitect_tool/contract/model.py
Normal file
364
src/markitect_tool/contract/model.py
Normal file
@@ -0,0 +1,364 @@
|
||||
"""Markdown-native document contract model."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any
|
||||
|
||||
|
||||
PRESENCE_VALUES = {"required", "recommended", "optional", "discouraged", "forbidden"}
|
||||
FIELD_TYPES = {
|
||||
"string",
|
||||
"number",
|
||||
"integer",
|
||||
"boolean",
|
||||
"array",
|
||||
"object",
|
||||
"date",
|
||||
}
|
||||
METRIC_ALIASES = {
|
||||
"char": "characters",
|
||||
"chars": "characters",
|
||||
"character": "characters",
|
||||
"characters": "characters",
|
||||
"word": "words",
|
||||
"words": "words",
|
||||
"word_count": "words",
|
||||
"sentence": "sentences",
|
||||
"sentences": "sentences",
|
||||
"paragraph": "paragraphs",
|
||||
"paragraphs": "paragraphs",
|
||||
"section": "sections",
|
||||
"sections": "sections",
|
||||
"heading": "headings",
|
||||
"headings": "headings",
|
||||
"list_item": "list_items",
|
||||
"list_items": "list_items",
|
||||
"code_block": "code_blocks",
|
||||
"code_blocks": "code_blocks",
|
||||
"max_heading_depth": "max_heading_depth",
|
||||
"heading_depth": "max_heading_depth",
|
||||
"nesting_depth": "nesting_depth",
|
||||
}
|
||||
METRIC_NAMES = set(METRIC_ALIASES.values())
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class MetricBand:
|
||||
"""A soft or hard target for one metric."""
|
||||
|
||||
metric: str
|
||||
min: float | None = None
|
||||
max: float | None = None
|
||||
severity: str = "warning"
|
||||
min_severity: str | None = None
|
||||
max_severity: str | None = None
|
||||
rule_id: str | None = None
|
||||
guidance: str | None = None
|
||||
raw: Any = field(default_factory=dict)
|
||||
|
||||
@classmethod
|
||||
def from_mapping(cls, metric: str, raw: Any) -> "MetricBand":
|
||||
normalized = normalize_metric_name(metric)
|
||||
if not isinstance(raw, dict):
|
||||
return cls(metric=normalized, raw=raw)
|
||||
return cls(
|
||||
metric=normalized,
|
||||
min=raw.get("min"),
|
||||
max=raw.get("max"),
|
||||
severity=str(raw.get("severity", "warning")),
|
||||
min_severity=raw.get("min_severity"),
|
||||
max_severity=raw.get("max_severity"),
|
||||
rule_id=raw.get("id") or raw.get("rule_id"),
|
||||
guidance=raw.get("guidance"),
|
||||
raw=raw,
|
||||
)
|
||||
|
||||
def severity_for(self, bound: str) -> str:
|
||||
if bound == "min":
|
||||
return self.min_severity or self.severity
|
||||
if bound == "max":
|
||||
return self.max_severity or self.severity
|
||||
return self.severity
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class AssertionSpec:
|
||||
"""A deterministic assertion over document or section text."""
|
||||
|
||||
id: str | None = None
|
||||
message: str | None = None
|
||||
severity: str = "error"
|
||||
guidance: str | None = None
|
||||
contains: list[str] = field(default_factory=list)
|
||||
contains_any: list[str] = field(default_factory=list)
|
||||
not_contains: list[str] = field(default_factory=list)
|
||||
matches: list[str] = field(default_factory=list)
|
||||
not_matches: list[str] = field(default_factory=list)
|
||||
case_sensitive: bool = False
|
||||
raw: Any = field(default_factory=dict)
|
||||
|
||||
@classmethod
|
||||
def from_mapping(cls, raw: Any) -> "AssertionSpec":
|
||||
if not isinstance(raw, dict):
|
||||
return cls(raw=raw)
|
||||
return cls(
|
||||
id=raw.get("id") or raw.get("rule_id"),
|
||||
message=raw.get("message"),
|
||||
severity=str(raw.get("severity", "error")),
|
||||
guidance=raw.get("guidance"),
|
||||
contains=as_string_list(raw.get("contains")),
|
||||
contains_any=as_string_list(raw.get("contains_any") or raw.get("contains_any_of")),
|
||||
not_contains=as_string_list(raw.get("not_contains") or raw.get("forbid")),
|
||||
matches=as_string_list(raw.get("matches") or raw.get("pattern")),
|
||||
not_matches=as_string_list(raw.get("not_matches") or raw.get("forbid_pattern")),
|
||||
case_sensitive=bool(raw.get("case_sensitive", False)),
|
||||
raw=raw,
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class FieldSpec:
|
||||
"""A structured value expected in frontmatter or external context."""
|
||||
|
||||
id: str | None
|
||||
path: str | None = None
|
||||
type: str | None = None
|
||||
required: bool = False
|
||||
label: str | None = None
|
||||
description: str | None = None
|
||||
enum: list[Any] | None = None
|
||||
pattern: str | None = None
|
||||
min: float | None = None
|
||||
max: float | None = None
|
||||
min_length: int | None = None
|
||||
max_length: int | None = None
|
||||
default: Any = None
|
||||
source: str | None = None
|
||||
raw: Any = field(default_factory=dict)
|
||||
|
||||
@classmethod
|
||||
def from_mapping(cls, raw: Any, fallback_id: str | None = None) -> "FieldSpec":
|
||||
if not isinstance(raw, dict):
|
||||
return cls(id=fallback_id, raw=raw)
|
||||
field_id = raw.get("id") or raw.get("name") or fallback_id
|
||||
return cls(
|
||||
id=field_id,
|
||||
path=raw.get("path") or (f"frontmatter.{field_id}" if field_id else None),
|
||||
type=raw.get("type"),
|
||||
required=bool(raw.get("required", False)),
|
||||
label=raw.get("label"),
|
||||
description=raw.get("description"),
|
||||
enum=raw.get("enum"),
|
||||
pattern=raw.get("pattern"),
|
||||
min=raw.get("min"),
|
||||
max=raw.get("max"),
|
||||
min_length=raw.get("min_length"),
|
||||
max_length=raw.get("max_length"),
|
||||
default=raw.get("default"),
|
||||
source=raw.get("source"),
|
||||
raw=raw,
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class SectionSpec:
|
||||
"""Expected semantic role and constraints for a Markdown section."""
|
||||
|
||||
id: str | None
|
||||
title: str | None = None
|
||||
section_type: str | None = None
|
||||
presence: str = "optional"
|
||||
headings: list[str] = field(default_factory=list)
|
||||
level: int | None = None
|
||||
order_before: list[str] = field(default_factory=list)
|
||||
order_after: list[str] = field(default_factory=list)
|
||||
metrics: list[MetricBand] = field(default_factory=list)
|
||||
assertions: list[AssertionSpec] = field(default_factory=list)
|
||||
raw: Any = field(default_factory=dict)
|
||||
|
||||
@classmethod
|
||||
def from_mapping(cls, raw: Any, fallback_id: str | None = None) -> "SectionSpec":
|
||||
if not isinstance(raw, dict):
|
||||
return cls(id=fallback_id, raw=raw)
|
||||
|
||||
section_id = raw.get("id") or fallback_id
|
||||
match = raw.get("match") if isinstance(raw.get("match"), dict) else {}
|
||||
headings = unique_strings(
|
||||
as_string_list(raw.get("headings"))
|
||||
+ as_string_list(raw.get("aliases"))
|
||||
+ as_string_list(match.get("headings"))
|
||||
+ as_string_list(match.get("aliases"))
|
||||
+ as_string_list(raw.get("title"))
|
||||
+ as_string_list(section_id)
|
||||
)
|
||||
order = raw.get("order") if isinstance(raw.get("order"), dict) else {}
|
||||
return cls(
|
||||
id=section_id,
|
||||
title=raw.get("title"),
|
||||
section_type=raw.get("section_type") or raw.get("type") or raw.get("role"),
|
||||
presence=normalize_presence(raw),
|
||||
headings=headings,
|
||||
level=raw.get("level"),
|
||||
order_before=as_string_list(raw.get("before") or order.get("before")),
|
||||
order_after=as_string_list(raw.get("after") or order.get("after")),
|
||||
metrics=metric_bands_from_mapping(raw.get("metrics")),
|
||||
assertions=assertions_from_value(raw.get("assertions")),
|
||||
raw=raw,
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class DocumentContract:
|
||||
"""A contract for a typed Markdown document."""
|
||||
|
||||
id: str | None
|
||||
document_type: str | None
|
||||
title: str | None = None
|
||||
version: str | None = None
|
||||
description: str | None = None
|
||||
sections: list[SectionSpec] = field(default_factory=list)
|
||||
fields: list[FieldSpec] = field(default_factory=list)
|
||||
metrics: list[MetricBand] = field(default_factory=list)
|
||||
assertions: list[AssertionSpec] = field(default_factory=list)
|
||||
forms: list[dict[str, Any]] = field(default_factory=list)
|
||||
context: dict[str, Any] = field(default_factory=dict)
|
||||
rubrics: list[dict[str, Any]] = field(default_factory=list)
|
||||
metadata: dict[str, Any] = field(default_factory=dict)
|
||||
raw: dict[str, Any] = field(default_factory=dict)
|
||||
source_path: str | None = None
|
||||
source_line: int | None = None
|
||||
|
||||
@classmethod
|
||||
def from_mapping(
|
||||
cls,
|
||||
raw: dict[str, Any],
|
||||
*,
|
||||
metadata: dict[str, Any] | None = None,
|
||||
source_path: str | None = None,
|
||||
source_line: int | None = None,
|
||||
) -> "DocumentContract":
|
||||
metadata = metadata or {}
|
||||
document = raw.get("document") if isinstance(raw.get("document"), dict) else {}
|
||||
return cls(
|
||||
id=raw.get("id") or metadata.get("contract-id") or metadata.get("id"),
|
||||
document_type=(
|
||||
raw.get("document_type")
|
||||
or raw.get("document-type")
|
||||
or raw.get("type")
|
||||
or document.get("type")
|
||||
or metadata.get("document-type")
|
||||
),
|
||||
title=raw.get("title") or document.get("title") or metadata.get("title"),
|
||||
version=str(raw.get("version") or metadata.get("version") or "")
|
||||
or None,
|
||||
description=raw.get("description") or document.get("description"),
|
||||
sections=sections_from_value(raw.get("sections")),
|
||||
fields=fields_from_value(raw.get("fields")),
|
||||
metrics=metric_bands_from_mapping(
|
||||
raw.get("metrics", {}).get("document")
|
||||
if isinstance(raw.get("metrics"), dict)
|
||||
and isinstance(raw.get("metrics", {}).get("document"), dict)
|
||||
else raw.get("metrics") or raw.get("metric_bands")
|
||||
),
|
||||
assertions=assertions_from_value(raw.get("assertions")),
|
||||
forms=raw.get("forms") if isinstance(raw.get("forms"), list) else [],
|
||||
context=raw.get("context") if isinstance(raw.get("context"), dict) else {},
|
||||
rubrics=raw.get("rubrics") if isinstance(raw.get("rubrics"), list) else [],
|
||||
metadata=metadata,
|
||||
raw=raw,
|
||||
source_path=source_path,
|
||||
source_line=source_line,
|
||||
)
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
return {
|
||||
"id": self.id,
|
||||
"document_type": self.document_type,
|
||||
"title": self.title,
|
||||
"version": self.version,
|
||||
"description": self.description,
|
||||
"sections": [section.raw for section in self.sections],
|
||||
"fields": [field.raw for field in self.fields],
|
||||
"metrics": [band.raw for band in self.metrics],
|
||||
"assertions": [assertion.raw for assertion in self.assertions],
|
||||
"forms": self.forms,
|
||||
"context": self.context,
|
||||
"rubrics": self.rubrics,
|
||||
"source_path": self.source_path,
|
||||
}
|
||||
|
||||
|
||||
def normalize_metric_name(metric: str) -> str:
|
||||
return METRIC_ALIASES.get(str(metric).strip().lower(), str(metric).strip().lower())
|
||||
|
||||
|
||||
def normalize_presence(raw: dict[str, Any]) -> str:
|
||||
explicit = raw.get("presence")
|
||||
if explicit:
|
||||
return str(explicit)
|
||||
if raw.get("forbidden") is True or raw.get("prohibited") is True:
|
||||
return "forbidden"
|
||||
if raw.get("discouraged") is True:
|
||||
return "discouraged"
|
||||
if raw.get("required") is True:
|
||||
return "required"
|
||||
if raw.get("recommended") is True:
|
||||
return "recommended"
|
||||
return "optional"
|
||||
|
||||
|
||||
def sections_from_value(value: Any) -> list[SectionSpec]:
|
||||
return [
|
||||
SectionSpec.from_mapping(item, fallback_id=fallback_id)
|
||||
for fallback_id, item in items_from_value(value)
|
||||
]
|
||||
|
||||
|
||||
def fields_from_value(value: Any) -> list[FieldSpec]:
|
||||
return [
|
||||
FieldSpec.from_mapping(item, fallback_id=fallback_id)
|
||||
for fallback_id, item in items_from_value(value)
|
||||
]
|
||||
|
||||
|
||||
def assertions_from_value(value: Any) -> list[AssertionSpec]:
|
||||
if value is None:
|
||||
return []
|
||||
values = value if isinstance(value, list) else [value]
|
||||
return [AssertionSpec.from_mapping(item) for item in values]
|
||||
|
||||
|
||||
def metric_bands_from_mapping(value: Any) -> list[MetricBand]:
|
||||
if not isinstance(value, dict):
|
||||
return [] if value is None else [MetricBand.from_mapping("<invalid>", value)]
|
||||
return [MetricBand.from_mapping(metric, raw) for metric, raw in value.items()]
|
||||
|
||||
|
||||
def items_from_value(value: Any) -> list[tuple[str | None, Any]]:
|
||||
if value is None:
|
||||
return []
|
||||
if isinstance(value, dict):
|
||||
return [(str(key), item) for key, item in value.items()]
|
||||
if isinstance(value, list):
|
||||
return [(None, item) for item in value]
|
||||
return [(None, value)]
|
||||
|
||||
|
||||
def as_string_list(value: Any) -> list[str]:
|
||||
if value is None:
|
||||
return []
|
||||
if isinstance(value, list):
|
||||
return [str(item) for item in value if item is not None]
|
||||
return [str(value)]
|
||||
|
||||
|
||||
def unique_strings(values: list[str]) -> list[str]:
|
||||
seen: set[str] = set()
|
||||
result: list[str] = []
|
||||
for value in values:
|
||||
normalized = value.strip()
|
||||
if normalized and normalized.lower() not in seen:
|
||||
seen.add(normalized.lower())
|
||||
result.append(normalized)
|
||||
return result
|
||||
@@ -29,7 +29,7 @@ def parse_markdown(markdown: str, source_path: str | None = None) -> Document:
|
||||
|
||||
frontmatter, body, body_line_offset = _split_frontmatter(markdown)
|
||||
tokens = _parse_tokens(body)
|
||||
blocks, headings = _blocks_and_headings(tokens, body_line_offset)
|
||||
blocks, headings = _blocks_and_headings(tokens, body_line_offset, body)
|
||||
sections = _sections_from_blocks(blocks, headings)
|
||||
return Document(
|
||||
source_path=source_path,
|
||||
@@ -97,7 +97,7 @@ def _token_to_dict(token: Token) -> dict[str, Any]:
|
||||
|
||||
|
||||
def _blocks_and_headings(
|
||||
tokens: list[dict[str, Any]], line_offset: int
|
||||
tokens: list[dict[str, Any]], line_offset: int, markdown: str
|
||||
) -> tuple[list[ContentBlock], list[Heading]]:
|
||||
blocks: list[ContentBlock] = []
|
||||
headings: list[Heading] = []
|
||||
@@ -126,6 +126,8 @@ def _blocks_and_headings(
|
||||
if not text and token_type.endswith("_open"):
|
||||
inline = _next_inline(tokens, index)
|
||||
text = inline.get("content", "") if inline else ""
|
||||
if not text:
|
||||
text = _source_text(token, line_offset, markdown)
|
||||
blocks.append(
|
||||
ContentBlock(
|
||||
type=_block_type(token_type),
|
||||
@@ -151,6 +153,16 @@ def _line_range(token: dict[str, Any], line_offset: int) -> tuple[int | None, in
|
||||
return line_map[0] + line_offset + 1, line_map[1] + line_offset
|
||||
|
||||
|
||||
def _source_text(token: dict[str, Any], line_offset: int, markdown: str) -> str:
|
||||
line_start, line_end = _line_range(token, line_offset)
|
||||
if line_start is None or line_end is None:
|
||||
return ""
|
||||
lines = markdown.splitlines()
|
||||
start_index = max(line_start - line_offset - 1, 0)
|
||||
end_index = max(line_end - line_offset, start_index)
|
||||
return "\n".join(lines[start_index:end_index]).strip()
|
||||
|
||||
|
||||
def _block_type(token_type: str) -> str:
|
||||
return {
|
||||
"paragraph_open": "paragraph",
|
||||
|
||||
65
src/markitect_tool/diagnostics.py
Normal file
65
src/markitect_tool/diagnostics.py
Normal file
@@ -0,0 +1,65 @@
|
||||
"""Shared diagnostic primitives for Markitect validation layers."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any
|
||||
|
||||
|
||||
SEVERITIES = {"info", "warning", "error"}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class SourceLocation:
|
||||
"""A source location inside a document or contract."""
|
||||
|
||||
path: str | None = None
|
||||
line: int | None = None
|
||||
column: int | None = None
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
data = {
|
||||
"path": self.path,
|
||||
"line": self.line,
|
||||
"column": self.column,
|
||||
}
|
||||
return {key: value for key, value in data.items() if value is not None}
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class Diagnostic:
|
||||
"""A structured validation or assessment finding."""
|
||||
|
||||
severity: str
|
||||
code: str
|
||||
message: str
|
||||
source: SourceLocation | None = None
|
||||
contract: SourceLocation | None = None
|
||||
rule_id: str | None = None
|
||||
guidance: str | None = None
|
||||
details: dict[str, Any] = field(default_factory=dict)
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
data: dict[str, Any] = {
|
||||
"severity": self.severity,
|
||||
"code": self.code,
|
||||
"message": self.message,
|
||||
"source": self.source.to_dict() if self.source else None,
|
||||
"contract": self.contract.to_dict() if self.contract else None,
|
||||
"rule_id": self.rule_id,
|
||||
"guidance": self.guidance,
|
||||
"details": self.details or None,
|
||||
}
|
||||
return {key: value for key, value in data.items() if value is not None}
|
||||
|
||||
|
||||
def valid_severity(severity: str | None) -> bool:
|
||||
"""Return whether a severity is supported by the diagnostic model."""
|
||||
|
||||
return severity in SEVERITIES
|
||||
|
||||
|
||||
def has_error(diagnostics: list[Diagnostic]) -> bool:
|
||||
"""Return whether the diagnostic list contains at least one error."""
|
||||
|
||||
return any(diagnostic.severity == "error" for diagnostic in diagnostics)
|
||||
@@ -9,6 +9,7 @@ from typing import Any
|
||||
from jsonschema import Draft202012Validator, SchemaError, ValidationError
|
||||
|
||||
from markitect_tool.core import Document, parse_markdown_file
|
||||
from markitect_tool.diagnostics import Diagnostic, SourceLocation
|
||||
from markitect_tool.schema.loader import MarkdownSchema, load_schema_file
|
||||
|
||||
|
||||
@@ -23,6 +24,21 @@ class ValidationViolation:
|
||||
def to_dict(self) -> dict[str, str]:
|
||||
return asdict(self)
|
||||
|
||||
def to_diagnostic(
|
||||
self,
|
||||
*,
|
||||
source_path: str | None = None,
|
||||
contract_path: str | None = None,
|
||||
) -> Diagnostic:
|
||||
return Diagnostic(
|
||||
severity="error",
|
||||
code="schema.validation",
|
||||
message=self.message,
|
||||
source=SourceLocation(path=source_path),
|
||||
contract=SourceLocation(path=contract_path),
|
||||
details={"path": self.path, "schema_path": self.schema_path},
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class SchemaValidationResult:
|
||||
@@ -42,6 +58,17 @@ class SchemaValidationResult:
|
||||
}
|
||||
return {key: value for key, value in data.items() if value is not None}
|
||||
|
||||
def to_diagnostics(self) -> list[Diagnostic]:
|
||||
"""Return schema violations as unified diagnostics."""
|
||||
|
||||
return [
|
||||
violation.to_diagnostic(
|
||||
source_path=self.document_path,
|
||||
contract_path=self.schema_path,
|
||||
)
|
||||
for violation in self.violations
|
||||
]
|
||||
|
||||
|
||||
def validate_schema(schema: dict[str, Any]) -> SchemaValidationResult:
|
||||
"""Validate that a JSON Schema itself is well formed."""
|
||||
|
||||
336
tests/test_contract_framework.py
Normal file
336
tests/test_contract_framework.py
Normal file
@@ -0,0 +1,336 @@
|
||||
from pathlib import Path
|
||||
|
||||
from click.testing import CliRunner
|
||||
|
||||
from markitect_tool.cli import main
|
||||
from markitect_tool.contract import (
|
||||
check_markdown_file,
|
||||
collect_metrics,
|
||||
load_contract_file,
|
||||
validate_contract,
|
||||
)
|
||||
from markitect_tool.core import parse_markdown
|
||||
|
||||
|
||||
EXAMPLE_CASES = [
|
||||
(
|
||||
"adr",
|
||||
Path("examples/contracts/adr.contract.md"),
|
||||
Path("examples/documents/adr-valid.md"),
|
||||
Path("examples/documents/adr-invalid.md"),
|
||||
{
|
||||
"contract.field.missing",
|
||||
"contract.metric.too_low",
|
||||
"contract.assertion.contains_any_missing",
|
||||
"contract.section.missing",
|
||||
"contract.section.recommended_missing",
|
||||
"contract.section.forbidden",
|
||||
},
|
||||
),
|
||||
(
|
||||
"prd-frs",
|
||||
Path("examples/contracts/prd-frs.contract.md"),
|
||||
Path("examples/documents/prd-frs-valid.md"),
|
||||
Path("examples/documents/prd-frs-invalid.md"),
|
||||
{
|
||||
"contract.field.missing",
|
||||
"contract.metric.too_low",
|
||||
"contract.assertion.contains_any_missing",
|
||||
"contract.section.missing",
|
||||
"contract.section.recommended_missing",
|
||||
"contract.section.discouraged",
|
||||
},
|
||||
),
|
||||
(
|
||||
"workplan",
|
||||
Path("examples/contracts/workplan.contract.md"),
|
||||
Path("examples/documents/workplan-valid.md"),
|
||||
Path("examples/documents/workplan-invalid.md"),
|
||||
{
|
||||
"contract.field.missing",
|
||||
"contract.field.enum",
|
||||
"contract.assertion.contains_missing",
|
||||
"contract.section.recommended_missing",
|
||||
},
|
||||
),
|
||||
(
|
||||
"business-letter",
|
||||
Path("examples/contracts/business-letter.contract.md"),
|
||||
Path("examples/documents/business-letter-valid.md"),
|
||||
Path("examples/documents/business-letter-invalid.md"),
|
||||
{
|
||||
"contract.field.missing",
|
||||
"contract.section.missing",
|
||||
"contract.metric.too_low",
|
||||
},
|
||||
),
|
||||
(
|
||||
"concept-note",
|
||||
Path("examples/contracts/concept-note.contract.md"),
|
||||
Path("examples/documents/concept-note-valid.md"),
|
||||
Path("examples/documents/concept-note-invalid.md"),
|
||||
{
|
||||
"contract.field.enum",
|
||||
"contract.metric.too_low",
|
||||
"contract.section.missing",
|
||||
},
|
||||
),
|
||||
]
|
||||
|
||||
|
||||
CONTRACT_TEXT = """---
|
||||
title: ADR Contract
|
||||
version: "1.0"
|
||||
---
|
||||
|
||||
# ADR Contract
|
||||
|
||||
```yaml contract
|
||||
id: adr-contract-v1
|
||||
document:
|
||||
type: adr
|
||||
title: Architecture Decision Record
|
||||
fields:
|
||||
status:
|
||||
type: string
|
||||
required: true
|
||||
enum: [proposed, accepted, superseded]
|
||||
metrics:
|
||||
document:
|
||||
words:
|
||||
min: 12
|
||||
max: 240
|
||||
severity: warning
|
||||
sections:
|
||||
- id: context
|
||||
title: Context
|
||||
presence: required
|
||||
level: 2
|
||||
order:
|
||||
before: decision
|
||||
metrics:
|
||||
words:
|
||||
min: 4
|
||||
max: 80
|
||||
severity: warning
|
||||
assertions:
|
||||
- id: context-names-problem
|
||||
contains_any: [problem, motivation]
|
||||
severity: warning
|
||||
guidance: Explain why the decision exists.
|
||||
- id: decision
|
||||
title: Decision
|
||||
presence: required
|
||||
level: 2
|
||||
assertions:
|
||||
- id: decision-commits
|
||||
matches: "\\\\b(choose|adopt|use|will)\\\\b"
|
||||
severity: error
|
||||
guidance: State the actual decision, not only background.
|
||||
- id: consequences
|
||||
title: Consequences
|
||||
presence: recommended
|
||||
level: 2
|
||||
- id: deprecated
|
||||
title: Deprecated Approach
|
||||
presence: forbidden
|
||||
```
|
||||
"""
|
||||
|
||||
|
||||
VALID_ADR = """---
|
||||
document_type: adr
|
||||
status: accepted
|
||||
---
|
||||
|
||||
# Use Markdown Contracts
|
||||
|
||||
## Context
|
||||
|
||||
The problem is that plain heading counts do not explain whether content is useful.
|
||||
|
||||
## Decision
|
||||
|
||||
We will use a markdown-native document contract with deterministic diagnostics.
|
||||
|
||||
## Consequences
|
||||
|
||||
The tool can check author intent before generation or review work continues.
|
||||
"""
|
||||
|
||||
|
||||
INVALID_ADR = """---
|
||||
document_type: adr
|
||||
---
|
||||
|
||||
# Weak ADR
|
||||
|
||||
## Context
|
||||
|
||||
This is short.
|
||||
|
||||
## Deprecated Approach
|
||||
|
||||
This section should not be here.
|
||||
"""
|
||||
|
||||
|
||||
def test_load_contract_file_extracts_markdown_yaml_contract(tmp_path: Path):
|
||||
contract_file = tmp_path / "adr.contract.md"
|
||||
contract_file.write_text(CONTRACT_TEXT, encoding="utf-8")
|
||||
|
||||
contract = load_contract_file(contract_file)
|
||||
|
||||
assert contract.id == "adr-contract-v1"
|
||||
assert contract.document_type == "adr"
|
||||
assert contract.fields[0].id == "status"
|
||||
assert [section.id for section in contract.sections] == [
|
||||
"context",
|
||||
"decision",
|
||||
"consequences",
|
||||
"deprecated",
|
||||
]
|
||||
|
||||
|
||||
def test_validate_contract_accepts_complete_contract(tmp_path: Path):
|
||||
contract_file = tmp_path / "adr.contract.md"
|
||||
contract_file.write_text(CONTRACT_TEXT, encoding="utf-8")
|
||||
|
||||
result = validate_contract(load_contract_file(contract_file))
|
||||
|
||||
assert result.valid is True
|
||||
assert result.diagnostics == []
|
||||
|
||||
|
||||
def test_validate_contract_reports_bad_regex(tmp_path: Path):
|
||||
contract_file = tmp_path / "bad.contract.md"
|
||||
contract_file.write_text(
|
||||
CONTRACT_TEXT.replace("\\\\b(choose|adopt|use|will)\\\\b", "[bad"),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
result = validate_contract(load_contract_file(contract_file))
|
||||
|
||||
assert result.valid is False
|
||||
assert result.diagnostics[0].code == "contract.regex.invalid"
|
||||
|
||||
|
||||
def test_check_markdown_file_accepts_valid_document(tmp_path: Path):
|
||||
contract_file = tmp_path / "adr.contract.md"
|
||||
document_file = tmp_path / "adr.md"
|
||||
contract_file.write_text(CONTRACT_TEXT, encoding="utf-8")
|
||||
document_file.write_text(VALID_ADR, encoding="utf-8")
|
||||
|
||||
result = check_markdown_file(document_file, contract_file)
|
||||
|
||||
assert result.valid is True
|
||||
assert result.diagnostics == []
|
||||
assert result.metrics["document"]["sections"] == 4
|
||||
|
||||
|
||||
def test_check_markdown_file_reports_practical_failures(tmp_path: Path):
|
||||
contract_file = tmp_path / "adr.contract.md"
|
||||
document_file = tmp_path / "adr.md"
|
||||
contract_file.write_text(CONTRACT_TEXT, encoding="utf-8")
|
||||
document_file.write_text(INVALID_ADR, encoding="utf-8")
|
||||
|
||||
result = check_markdown_file(document_file, contract_file)
|
||||
codes = {diagnostic.code for diagnostic in result.diagnostics}
|
||||
|
||||
assert result.valid is False
|
||||
assert "contract.field.missing" in codes
|
||||
assert "contract.section.missing" in codes
|
||||
assert "contract.section.forbidden" in codes
|
||||
assert "contract.metric.too_low" in codes
|
||||
|
||||
|
||||
def test_check_markdown_file_keeps_warning_only_results_valid(tmp_path: Path):
|
||||
contract_file = tmp_path / "adr.contract.md"
|
||||
document_file = tmp_path / "adr.md"
|
||||
contract_file.write_text(CONTRACT_TEXT, encoding="utf-8")
|
||||
document_file.write_text(
|
||||
VALID_ADR.replace("The problem is", "The situation is"),
|
||||
encoding="utf-8",
|
||||
)
|
||||
|
||||
result = check_markdown_file(document_file, contract_file)
|
||||
|
||||
assert result.valid is True
|
||||
assert [diagnostic.code for diagnostic in result.diagnostics] == [
|
||||
"contract.assertion.contains_any_missing"
|
||||
]
|
||||
assert result.diagnostics[0].severity == "warning"
|
||||
|
||||
|
||||
def test_collect_metrics_counts_document_and_sections():
|
||||
document = parse_markdown(VALID_ADR)
|
||||
|
||||
metrics = collect_metrics(document)
|
||||
|
||||
assert metrics.words > 20
|
||||
assert metrics.sections == 4
|
||||
context_metrics = next(
|
||||
section for section in metrics.section_metrics if section.heading == "Context"
|
||||
)
|
||||
assert context_metrics.words >= 10
|
||||
|
||||
|
||||
def test_mkt_contract_validate(tmp_path: Path):
|
||||
contract_file = tmp_path / "adr.contract.md"
|
||||
contract_file.write_text(CONTRACT_TEXT, encoding="utf-8")
|
||||
|
||||
result = CliRunner().invoke(main, ["contract", "validate", str(contract_file)])
|
||||
|
||||
assert result.exit_code == 0
|
||||
assert "valid" in result.output
|
||||
|
||||
|
||||
def test_mkt_contract_check_reports_invalid_document(tmp_path: Path):
|
||||
contract_file = tmp_path / "adr.contract.md"
|
||||
document_file = tmp_path / "adr.md"
|
||||
contract_file.write_text(CONTRACT_TEXT, encoding="utf-8")
|
||||
document_file.write_text(INVALID_ADR, encoding="utf-8")
|
||||
|
||||
result = CliRunner().invoke(
|
||||
main, ["contract", "check", str(document_file), "--contract", str(contract_file)]
|
||||
)
|
||||
|
||||
assert result.exit_code == 1
|
||||
assert "contract.section.missing" in result.output
|
||||
assert "guidance" in result.output
|
||||
|
||||
|
||||
def test_mkt_metrics_outputs_text(tmp_path: Path):
|
||||
document_file = tmp_path / "adr.md"
|
||||
document_file.write_text(VALID_ADR, encoding="utf-8")
|
||||
|
||||
result = CliRunner().invoke(main, ["metrics", str(document_file)])
|
||||
|
||||
assert result.exit_code == 0
|
||||
assert "document" in result.output
|
||||
assert "words" in result.output
|
||||
assert "Context" in result.output
|
||||
|
||||
|
||||
def test_example_contracts_validate():
|
||||
for _name, contract_path, _valid_path, _invalid_path, _expected in EXAMPLE_CASES:
|
||||
result = validate_contract(load_contract_file(contract_path))
|
||||
|
||||
assert result.valid is True
|
||||
|
||||
|
||||
def test_example_valid_documents_have_no_error_diagnostics():
|
||||
for name, contract_path, valid_path, _invalid_path, _expected in EXAMPLE_CASES:
|
||||
result = check_markdown_file(valid_path, contract_path)
|
||||
|
||||
assert result.valid is True, name
|
||||
assert all(diagnostic.severity != "error" for diagnostic in result.diagnostics)
|
||||
|
||||
|
||||
def test_example_invalid_documents_report_expected_diagnostics():
|
||||
for name, contract_path, _valid_path, invalid_path, expected in EXAMPLE_CASES:
|
||||
result = check_markdown_file(invalid_path, contract_path)
|
||||
codes = {diagnostic.code for diagnostic in result.diagnostics}
|
||||
|
||||
assert result.valid is False, name
|
||||
assert expected <= codes
|
||||
@@ -3,7 +3,7 @@ id: MKTT-WP-0001
|
||||
type: workplan
|
||||
title: "markitect-tool Repository Foundation"
|
||||
domain: markitect
|
||||
status: active
|
||||
status: done
|
||||
owner: markitect-tool
|
||||
topic_slug: markitect
|
||||
created: "2026-05-03"
|
||||
|
||||
@@ -3,7 +3,7 @@ id: MKTT-WP-0002
|
||||
type: workplan
|
||||
title: "markitect-main Scope Extraction"
|
||||
domain: markitect
|
||||
status: active
|
||||
status: done
|
||||
owner: markitect-tool
|
||||
topic_slug: markitect
|
||||
created: "2026-05-03"
|
||||
|
||||
@@ -3,11 +3,12 @@ id: MKTT-WP-0004
|
||||
type: workplan
|
||||
title: "Practical Document Contract Framework"
|
||||
domain: markitect
|
||||
status: proposed
|
||||
status: done
|
||||
owner: markitect-tool
|
||||
topic_slug: markitect
|
||||
created: "2026-05-03"
|
||||
updated: "2026-05-03"
|
||||
state_hub_workstream_id: "558787e1-d287-46a5-9214-634e8b90a858"
|
||||
---
|
||||
|
||||
# MKTT-WP-0004: Practical Document Contract Framework
|
||||
@@ -19,6 +20,24 @@ heading-count schema validation toward document contracts with section
|
||||
specifications, fields/forms, context-aware rules, metric bands, optional LLM
|
||||
assessments, and unified diagnostics.
|
||||
|
||||
## Implementation Result
|
||||
|
||||
Initial deterministic contract framework implemented:
|
||||
|
||||
- Markdown contract files with fenced `yaml contract` blocks.
|
||||
- Shared diagnostic model with severity, code, source, contract location,
|
||||
rule id, details, and repair guidance.
|
||||
- Contract validation, document contract checking, and metrics CLI commands.
|
||||
- Required/recommended/optional/discouraged/forbidden section specs.
|
||||
- Field specs for frontmatter values.
|
||||
- Document-level and section-level metric bands.
|
||||
- Deterministic content assertions.
|
||||
- Design documentation for form/context and provider-neutral LLM rubric
|
||||
adapters.
|
||||
- Example contracts, valid documents, invalid documents, and expected
|
||||
diagnostic notes for ADR, PRD/FRS, workplan, business letter, and concept
|
||||
note use cases.
|
||||
|
||||
## Background
|
||||
|
||||
Research and legacy comparison are captured in:
|
||||
@@ -31,8 +50,9 @@ Research and legacy comparison are captured in:
|
||||
|
||||
```task
|
||||
id: MKTT-WP-0004-T001
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "2065d56a-9371-4fd0-9a3d-7a69c718e851"
|
||||
```
|
||||
|
||||
Define the first `DocumentContract` format in markdown/YAML:
|
||||
@@ -51,8 +71,9 @@ Keep it provider-neutral and readable by humans.
|
||||
|
||||
```task
|
||||
id: MKTT-WP-0004-T002
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "3ed3af1b-c747-492c-acda-ecb4ee564a38"
|
||||
```
|
||||
|
||||
Create diagnostics with severity, code, message, source location, contract
|
||||
@@ -63,8 +84,9 @@ violations and all new contract checks.
|
||||
|
||||
```task
|
||||
id: MKTT-WP-0004-T003
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "c4166e5a-53a5-4207-a3fb-b4ddf388cd5e"
|
||||
```
|
||||
|
||||
Support required, recommended, optional, discouraged, and forbidden sections.
|
||||
@@ -75,8 +97,9 @@ and clear diagnostics.
|
||||
|
||||
```task
|
||||
id: MKTT-WP-0004-T004
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "304af70e-1a33-4ee2-bcbd-7b966436cf37"
|
||||
```
|
||||
|
||||
Support document-level and section-level bands for words, characters,
|
||||
@@ -87,32 +110,42 @@ Allow soft warnings and hard errors.
|
||||
|
||||
```task
|
||||
id: MKTT-WP-0004-T005
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "1bcc82fe-b578-446c-86a7-938f732b24fa"
|
||||
```
|
||||
|
||||
Specify fields, defaults, prefill sources, dynamic requiredness, conditional
|
||||
visibility, calculations, and validation against external context. This task is
|
||||
design-first; implementation can follow in a later workplan.
|
||||
|
||||
Design captured in `docs/contract-framework.md`. Runtime form rendering,
|
||||
dynamic visibility, calculations, and context resolvers remain later adapter
|
||||
work.
|
||||
|
||||
## P4.6 - Design LLM assessment adapter contract
|
||||
|
||||
```task
|
||||
id: MKTT-WP-0004-T006
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "bef295ba-fbc0-4df6-9cc4-040ed9b5f346"
|
||||
```
|
||||
|
||||
Define provider-neutral request/response models for section-level rubrics:
|
||||
criteria, inputs, context, score, pass/fail, reason, model metadata, and cache
|
||||
keys. Do not bind core logic to any provider.
|
||||
|
||||
Provider-neutral adapter shape captured in `docs/contract-framework.md`.
|
||||
Execution, caching, and provider integration remain later work.
|
||||
|
||||
## P4.7 - Add practical CLI surface
|
||||
|
||||
```task
|
||||
id: MKTT-WP-0004-T007
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "9f61a5af-0b65-460a-8231-ec50279c5c6a"
|
||||
```
|
||||
|
||||
Add:
|
||||
@@ -129,8 +162,9 @@ Ensure output is useful to humans and machines.
|
||||
|
||||
```task
|
||||
id: MKTT-WP-0004-T008
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "7ec8c0f2-c598-4095-aefe-f6f97e84a470"
|
||||
```
|
||||
|
||||
Create examples for:
|
||||
|
||||
Reference in New Issue
Block a user