Initial schemas and validation with extension workplan

2026-05-03 22:12:46 +02:00
parent b96b1fb745
commit 8c9129c371
15 changed files with 1025 additions and 2 deletions
--- a/docs/practical-schema-framework-research.md
+++ b/docs/practical-schema-framework-research.md
@@ -0,0 +1,323 @@
+# Practical Schema Framework Research
+
+Date: 2026-05-03
+
+## Purpose
+
+This document reassesses `markitect-tool` schema utility before further
+implementation. The concern is that pure structural validation, such as heading
+counts and min/max depth constraints, is rarely enough to make markdown document
+pipelines useful.
+
+The practical opportunity is to define a stronger framework for markdown-native
+document contracts: section specifications, content assertions, form fields,
+context-aware rules, LLM-assisted assessments, and high-quality diagnostics.
+
+## Research Signals
+
+### Structured Authoring
+
+DITA is the strongest analogue for typed, reusable textual units. It emphasizes
+information typing, semantic markup, modularity, reuse, interchange, and
+multiple deliverables from one source. A DITA topic is the unit of authoring and
+reuse; topics may be generic or specialized into roles such as concept, task, or
+reference.
+
+Relevance for `markitect-tool`:
+
+- A markdown document or section should have an explicit information type.
+- Information type should imply expected structure and reader purpose.
+- Reuse and composition need stable addressing of sections, not only files.
+- Specialization is a better mental model than ad hoc schema forks.
+
+Sources:
+
+- https://dita-lang.org/dita/archspec/base/basic-concepts
+- https://dita-lang.org/dita/archspec/base/introduction-to-dita
+
+### Document Schemas With Assertions
+
+DocBook remains relevant because it combines formal document schemas with
+Schematron-style assertions. That is the missing layer in many simplistic JSON
+Schema approaches: grammar says what may exist; assertions say what must be true
+in context.
+
+Relevance for `markitect-tool`:
+
+- JSON Schema over `Document.to_dict()` is useful but insufficient.
+- We need a second assertion layer for document-specific semantics.
+- Diagnostics must point to the document location and rule intention.
+
+Source:
+
+- https://docbook.org/schemas/docbook/
+
+### Dynamic Form Rules
+
+JSON Schema supports conditional validation through `dependentRequired`,
+`dependentSchemas`, and `if`/`then`/`else`. JSON Forms separates data schema
+from UI schema and uses rules to show, hide, enable, or disable UI elements
+based on JSON Schema conditions. Form.io’s architecture treats the form schema
+as a single source of truth for validation and conditional logic across client
+and server.
+
+Relevance for `markitect-tool`:
+
+- Forms should be first-class, not bolted onto document generation.
+- Field definitions need static validation and dynamic rules.
+- Prefill, visibility, requiredness, and calculated values should come from the
+  same contract used for generation and validation.
+- Context data must be explicit and typed.
+
+Sources:
+
+- https://json-schema.org/understanding-json-schema/reference/conditionals
+- https://jsonforms.io/docs/uischema/rules/
+- https://form.io/features/form-conditional-logic-form-validation/
+
+### LLM-Assisted Assessment
+
+Modern evaluation frameworks treat LLM assessment as explicit graders or
+rubrics. OpenAI graders return scores in a 0–1 range and can combine grader
+types. Promptfoo’s `llm-rubric` uses explicit criteria and expects structured
+judge output with reason, score, and pass/fail.
+
+Relevance for `markitect-tool`:
+
+- LLM checks should be declared as assessment rules, not hidden in prompts.
+- Deterministic validation and LLM assessment should produce one diagnostic
+  model.
+- Section-level rubrics are more useful than whole-document vague grading.
+- The LLM provider must remain external; `markitect-tool` defines contracts and
+  reports.
+
+Sources:
+
+- https://developers.openai.com/api/docs/guides/graders
+- https://www.promptfoo.dev/docs/configuration/expected-outputs/model-graded/llm-rubric/
+
+### Markdown Structure
+
+CommonMark gives markdown a well-defined block/inline model. mdast gives a
+language-neutral tree vocabulary for Markdown nodes. Both point toward keeping
+the parse layer separate from domain/schema layers.
+
+Relevance for `markitect-tool`:
+
+- The core document model should stay close to CommonMark/mdast concepts.
+- Practical document contracts should sit above the parse model.
+- Section addressing, source spans, and block identity are foundational for good
+  diagnostics.
+
+Sources:
+
+- https://spec.commonmark.org/0.31.2/
+- https://github.com/syntax-tree/mdast
+
+## Terminology Proposal
+
+| Term | Meaning |
+| --- | --- |
+| Document | A markdown artifact parsed into frontmatter, blocks, headings, sections, and source spans. |
+| Section | A heading-led document region with content, children, source location, and stable identity. |
+| Document Type | A named contract for a whole document, e.g. ADR, PRD, invoice letter, support reply, concept note. |
+| Section Type | A reusable role for a section, e.g. Context, Decision, Risks, Procedure, Evidence, Conclusion. |
+| Field | A typed value expected in frontmatter, inline matter, a section, or an external data record. |
+| Form | A field collection with UI hints, validation rules, defaults, dynamic visibility, and calculations. |
+| Context | External data available during validation/generation, such as user data, project data, dates, or related entities. |
+| Rule | A deterministic condition evaluated against document, fields, context, or pipeline state. |
+| Assertion | A claim that must hold for content, usually richer than shape validation. |
+| Metric Band | A soft or hard target for size/complexity, such as word count, sentence count, section count, or reading level. |
+| Assessment | A deterministic or LLM-assisted evaluation that returns pass/fail, score, reason, and diagnostics. |
+| Rubric | A human-readable criterion for LLM-assisted assessment, scoped to a document or section type. |
+| Diagnostic | A structured finding with severity, code, message, source location, rule id, and suggested repair. |
+| Contract | The full specification for a document type: structure, sections, fields, rules, forms, assertions, rubrics, and outputs. |
+| Pipeline | A repeatable sequence of parse, prefill, generate, validate, assess, transform, and compose operations. |
+
+## Most Relevant Use Cases
+
+### UC-001: Typed Document Contract
+
+Define a document type such as ADR, PRD, FRS, workplan, customer letter, or
+meeting brief. Specify required sections by semantic role, allowed alternatives,
+field requirements, and diagnostics.
+
+Practical value:
+
+- Prevents missing critical content.
+- Makes generated documents predictable.
+- Creates an explicit contract for humans and agents.
+
+Needed tooling:
+
+- `mkt contract check <doc> --contract <contract.md>`
+- Section matching by heading text, aliases, ids, or section type markers.
+- Diagnostics that say which section/field/assertion failed and why.
+
+### UC-002: Section-Level Content Expectations
+
+Specify what a section is expected to contain: assertions, required evidence,
+forbidden omissions, content patterns, examples, and reviewer prompts.
+
+Practical value:
+
+- Moves beyond “has a heading” toward “does the section do its job?”
+- Enables review of generated or human-authored text.
+
+Needed tooling:
+
+- Deterministic assertions for regex, presence, references, counts, and field
+  values.
+- Optional LLM rubrics for semantic content checks.
+- Per-section diagnostic reports.
+
+### UC-003: Size and Complexity Bands
+
+Define soft/hard bands for document and section size: words, characters,
+sentences, paragraphs, sections, list items, code blocks, and nesting depth.
+
+Practical value:
+
+- Controls generation output size.
+- Keeps templates from becoming bloated or underdeveloped.
+- Helps compare intended vs actual document complexity.
+
+Needed tooling:
+
+- Metrics extractor.
+- Rule severities: info, warning, error.
+- “Too small/too large” diagnostics with actual and target values.
+
+### UC-004: Form-Backed Markdown Generation
+
+Define forms that collect or prefill structured fields, then render markdown
+documents. Fields may be static, calculated, conditional, or context-derived.
+
+Practical value:
+
+- Bridges structured data capture and prose generation.
+- Supports repeatable business documents.
+- Makes prefill from user/project/entity data explicit.
+
+Needed tooling:
+
+- Field schema.
+- UI schema or form hints.
+- Dynamic rules for requiredness, visibility, defaults, and calculations.
+- Template rendering with validation before and after render.
+
+### UC-005: Context-Aware Validation
+
+Validate a document against external context: user data, project metadata,
+related entities, dates, policy constraints, or canonical terminology.
+
+Practical value:
+
+- Checks whether a document is correct for this case, not only generally
+  well-formed.
+- Enables pipelines like personalized letters, compliance reports, and
+  project-specific workplans.
+
+Needed tooling:
+
+- Context object schema.
+- Resolvers for local files, JSON/YAML data, and later higher-layer systems.
+- Rule expressions that can reference document and context paths.
+
+### UC-006: LLM-Assisted Section Assessment
+
+Attach rubrics to section types. Use an external LLM adapter to assess whether a
+section satisfies the rubric, returning score, reason, and pass/fail.
+
+Practical value:
+
+- Handles semantic checks that deterministic rules cannot.
+- Supports review loops for generated text.
+- Makes subjective requirements explicit and auditable.
+
+Needed tooling:
+
+- Rubric declaration format.
+- Provider-neutral assessment request/response models.
+- Caching and reproducibility metadata.
+- Clear distinction between deterministic errors and model-judged findings.
+
+### UC-007: Pipeline Diagnostics and Repair Guidance
+
+Run a document pipeline and get one coherent diagnostic report from parsing,
+schema checks, field validation, assertions, generation, composition, and
+LLM-assisted assessments.
+
+Practical value:
+
+- Makes failures debuggable.
+- Helps humans and agents repair documents.
+- Avoids scattered errors from unrelated subsystems.
+
+Needed tooling:
+
+- Common diagnostic model.
+- Error codes and severities.
+- Source spans and rule ids.
+- Suggested repair text or structured patches when safe.
+
+## Comparison With markitect-main
+
+`markitect-main` had several useful seeds:
+
+- `x-markitect-sections` for required/recommended/optional/discouraged/improper sections.
+- `x-markitect-content-control` for required, discouraged, and forbidden patterns plus word-count metrics.
+- Section and content validators with warnings/errors.
+- Schema generation and validation experiments.
+- Draft generation with `x-markitect-field-mapping`.
+- Prompt quality gates with schema and pattern validators.
+- Infospace entity parsing and LLM classification/evaluation.
+
+The problem was not lack of ideas. The problem was that the ideas lived in
+separate subsystems with different models:
+
+- Schema validation compared generated schemas rather than validating a stable
+  document contract.
+- Semantic validation used `x-markitect-*` extensions but was not integrated
+  into a unified contract framework.
+- Field mapping existed in draft generation, not in a general form/context
+  model.
+- LLM quality gates existed inside prompt execution, not as provider-neutral
+  document assessments.
+- Infospace checks were domain/application layer behavior, not syntax-layer
+  primitives.
+
+## Strategic Direction
+
+The successor should introduce a framework layer above parsing:
+
+```text
+Markdown parse model
+  -> document contract
+      -> section specifications
+      -> field/form specifications
+      -> deterministic rules/assertions
+      -> metric bands
+      -> optional LLM rubrics
+      -> unified diagnostics
+```
+
+This should not replace JSON Schema. JSON Schema remains useful for typed data
+and machine validation. The new layer should make document-specific semantics
+natural.
+
+## Recommendation
+
+Do not continue straight into generic query/transform work until this framework
+direction is captured. The next implementation slice should be a small,
+deterministic version of document contracts:
+
+1. Define the contract schema and terminology.
+2. Implement section specifications.
+3. Implement metric bands.
+4. Implement the unified diagnostic model.
+5. Leave LLM rubrics and form dynamics as designed extension points for the next
+   slice.
+
+This is the utility inflection point. It will make `markitect-tool` practically
+useful instead of merely structurally correct.