From cbde1dabc439176fbd776d5e628c0fa1ddb06ee5 Mon Sep 17 00:00:00 2001 From: tegwick Date: Sun, 8 Feb 2026 22:09:20 +0100 Subject: [PATCH] docs(prompts): add comprehensive implementation workplan Create detailed 26-week workplan for Prompt Dependency Resolution system implementing all 11 functional requirements across 8 phases: - Phase 1-2: Foundation (artifacts, templates, macros) - Phase 3-4: Resolution and execution engine with idempotent runs - Phase 5-6: Dependency tracking and incremental recomputation - Phase 7-8: Quality validation and observability/traceability Includes database schemas, verification strategies, risk management, and complete file structure for ~60 new modules. Co-Authored-By: Claude Opus 4.6 --- .../prompt-dependency-resolution/WORKPLAN.md | 1227 +++++++++++++++++ 1 file changed, 1227 insertions(+) create mode 100644 roadmap/prompt-dependency-resolution/WORKPLAN.md diff --git a/roadmap/prompt-dependency-resolution/WORKPLAN.md b/roadmap/prompt-dependency-resolution/WORKPLAN.md new file mode 100644 index 00000000..683cb86d --- /dev/null +++ b/roadmap/prompt-dependency-resolution/WORKPLAN.md @@ -0,0 +1,1227 @@ +# Prompt Dependency Resolution - Implementation Workplan + +## Overview + +This workplan details the implementation phases for building the Prompt Dependency Resolution infrastructure within MarkiTect. This system enables structured execution of PromptTemplates with deterministic dependency resolution, incremental recomputation, and quality validation across InformationSpaces. + +The system transforms MarkiTect from a static markdown tool into an executable knowledge infrastructure that supports: +- Template-driven content generation with LLMs +- Automatic dependency tracking and resolution +- Idempotent execution with content-based caching +- Incremental recomputation with change impact analysis +- Quality gate validation with halting policies + +--- + +## Functional Requirements Mapping + +The implementation is organized into 8 phases covering all 11 functional requirements from the FRS: + +| FR ID | Requirement | Implementation Phase | +|-------|-------------|---------------------| +| FR-1 | InformationSpace Addressability | Phase 1: Foundation | +| FR-2 | PromptTemplate Definition | Phase 2: Templates & Macros | +| FR-3 | PromptResolver Behavior | Phase 3: Resolver Engine | +| FR-4 | PromptRun Lifecycle | Phase 4: Execution Engine | +| FR-5 | RunManifest Persistence | Phase 4: Execution Engine | +| FR-6 | Dependency Graph Construction | Phase 5: Dependency Tracking | +| FR-7 | Incremental Recompute | Phase 6: Incremental Execution | +| FR-8 | Change Impact Assessment | Phase 6: Incremental Execution | +| FR-9 | QualityGate Validation | Phase 7: Quality & Validation | +| FR-10 | Halting and Refinement Policy | Phase 7: Quality & Validation | +| FR-11 | Traceability and Auditability | Phase 8: Observability | + +--- + +## Phase 1: Foundation - Addressable Artifacts (FR-1) + +### Capability Requirements + +| ID | Capability | Description | Priority | +|----|-----------|-------------|----------| +| CAP-101 | Artifact Identity | Persistent identifiers for content artifacts | Critical | +| CAP-102 | Content Digest | SHA-256 content hashing for change detection | Critical | +| CAP-103 | Artifact Registry | Lookup artifacts by name or ID within spaces | Critical | +| CAP-104 | Cross-Space References | Reference artifacts across space boundaries | High | +| CAP-105 | Artifact Metadata | Store artifact metadata (type, created, modified) | High | + +### Implementation Tasks + +**Week 1: Core Models** +- Create `markitect/prompts/models.py` + - `Artifact` dataclass with id, name, space_id, content_digest, metadata + - `ArtifactReference` dataclass for cross-space addressing + - Content digest calculation utilities (SHA-256) +- Create `markitect/prompts/repositories/interfaces.py` + - `IArtifactRepository` interface +- Unit tests for artifact models and digest calculation + +**Week 2: Repository Implementation** +- Create `markitect/prompts/repositories/sqlite.py` + - `SQLiteArtifactRepository` implementing `IArtifactRepository` + - CRUD operations with content digest tracking + - Cross-space artifact lookup +- Database migration scripts +- Repository unit tests + +**Week 3: Artifact Service** +- Create `markitect/prompts/services/artifact_service.py` + - Register artifacts with automatic digest calculation + - Query artifacts by name, ID, or digest + - Track artifact modifications with digest updates +- Integration tests with existing InformationSpace service + +### Database Schema + +```sql +CREATE TABLE prompt_artifacts ( + id TEXT PRIMARY KEY, + space_id TEXT NOT NULL REFERENCES spaces(id), + name TEXT NOT NULL, + artifact_type TEXT NOT NULL, + content_digest TEXT NOT NULL, + content_size INTEGER, + metadata JSON, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + UNIQUE(space_id, name) +); + +CREATE INDEX idx_artifacts_digest ON prompt_artifacts(content_digest); +CREATE INDEX idx_artifacts_space ON prompt_artifacts(space_id); +``` + +### Verification + +```bash +pytest tests/unit/prompts/test_artifact_models.py +pytest tests/unit/prompts/test_artifact_repository.py +pytest tests/integration/prompts/test_artifact_service.py +``` + +--- + +## Phase 2: Templates & Macros (FR-2) + +### Capability Requirements + +| ID | Capability | Description | Priority | +|----|-----------|-------------|----------| +| CAP-201 | PromptTemplate Model | Template definition with content and metadata | Critical | +| CAP-202 | ContentMacro Detection | Parse and extract macros from template content | Critical | +| CAP-203 | Macro Types | Support Required, Optional, Generate macro kinds | Critical | +| CAP-204 | Template Analysis | Analyze templates to extract macro dependencies | High | +| CAP-205 | Template Validation | Validate template syntax and macro references | Medium | + +### Implementation Tasks + +**Week 4: Template Models** +- Create `markitect/prompts/templates/models.py` + - `PromptTemplate` dataclass extending `Artifact` + - `ContentMacro` dataclass with kind, target, parameters + - `MacroKind` enum: REQUIRED, OPTIONAL, GENERATE + - `TemplateMetadata` for template-specific metadata +- Unit tests for template models + +**Week 5: Macro Parser** +- Create `markitect/prompts/templates/parser.py` + - Regex-based macro extraction from markdown content + - Support macro syntax: `{{require:artifact}}`, `{{optional:artifact}}`, `{{generate:template}}` + - Parameter parsing for macro arguments +- Create `markitect/prompts/templates/analyzer.py` + - `TemplateAnalyzer` class for dependency extraction + - Identify all macros and their types + - Build initial dependency list +- Parser and analyzer unit tests + +**Week 6: Template Service** +- Create `markitect/prompts/services/template_service.py` + - Register templates with automatic analysis + - Query templates by ID or name + - Retrieve template with analyzed macro list +- Integration tests + +### Template Syntax + +```markdown +# Example PromptTemplate + +## Context + +{{require:project-overview}} +{{optional:technical-constraints}} + +## Task Description + +Generate a technical design for {{require:feature-name}}. + +## Previous Designs + +{{generate:related-designs-collector}} +``` + +### Macro Format + +``` +{{:[|=|=...]}} + +Examples: +{{require:glossary/authentication}} +{{optional:standards/api-design}} +{{generate:code-examples|language=python|framework=fastapi}} +``` + +### Verification + +```bash +pytest tests/unit/prompts/test_template_models.py +pytest tests/unit/prompts/test_macro_parser.py +pytest tests/unit/prompts/test_template_analyzer.py +pytest tests/integration/prompts/test_template_service.py +``` + +--- + +## Phase 3: Resolver Engine (FR-3) + +### Capability Requirements + +| ID | Capability | Description | Priority | +|----|-----------|-------------|----------| +| CAP-301 | Resolution Strategy | Deterministic multi-space resolution order | Critical | +| CAP-302 | Required Macro Resolution | Fail on missing required artifacts | Critical | +| CAP-303 | Optional Macro Resolution | Graceful fallback for missing optional artifacts | Critical | +| CAP-304 | Generate Macro Detection | Identify generator templates for nested execution | High | +| CAP-305 | Resolution Context | Track resolution state and errors | High | + +### Implementation Tasks + +**Week 7: Resolver Core** +- Create `markitect/prompts/resolver/models.py` + - `ResolutionContext` with resolution order, resolved artifacts, errors + - `ResolutionResult` with success status, resolved content, unresolved macros + - `ResolutionError` for missing required artifacts +- Create `markitect/prompts/resolver/strategy.py` + - `ResolutionStrategy` base class + - `MultiSpaceResolutionStrategy` implementing FR-3.1 order: + 1. Local InformationSpace + 2. Explicitly included InformationSpaces + 3. Default InformationSpace + 4. Team/Shared InformationSpace (if configured) +- Unit tests for resolution strategy + +**Week 8: PromptResolver Implementation** +- Create `markitect/prompts/resolver/resolver.py` + - `PromptResolver` class + - `resolve_template(template, context) -> ResolutionResult` + - Handle Required macros: fail if not found (FR-3.2) + - Handle Optional macros: resolve to empty (FR-3.3) + - Detect Generate macros for deferred resolution (FR-3.4) + - Track resolution errors and warnings +- Resolver unit tests + +**Week 9: Context Compilation** +- Create `markitect/prompts/resolver/compiler.py` + - `ContextCompiler` class + - Compile resolved artifacts into single prompt context + - Substitute macros with resolved content + - Generate `CompiledPrompt` with full context +- Integration tests for full resolution flow + +### Resolution Order Example + +```python +# Given template in space "my-project" referencing {{require:glossary}} +# Resolution search order: +1. my-project/glossary +2. /glossary +3. /glossary +4. default-space/glossary +5. shared-space/glossary # if configured +# If not found: ResolutionError(MacroKind.REQUIRED, "glossary") +``` + +### Database Schema Additions + +```sql +CREATE TABLE prompt_resolution_config ( + space_id TEXT PRIMARY KEY REFERENCES spaces(id), + included_spaces JSON, -- Array of space IDs to search + default_space_id TEXT REFERENCES spaces(id), + shared_space_id TEXT REFERENCES spaces(id), + max_generation_depth INTEGER DEFAULT 3, + config JSON, + updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP +); +``` + +### Verification + +```bash +pytest tests/unit/prompts/test_resolution_strategy.py +pytest tests/unit/prompts/test_prompt_resolver.py +pytest tests/unit/prompts/test_context_compiler.py +pytest tests/integration/prompts/test_resolution_flow.py +``` + +--- + +## Phase 4: Execution Engine (FR-4, FR-5) + +### Capability Requirements + +| ID | Capability | Description | Priority | +|----|-----------|-------------|----------| +| CAP-401 | PromptRun Lifecycle | Three-stage execution: Analysis, Compilation, Processing | Critical | +| CAP-402 | InputBundleHash | Content-based execution fingerprinting | Critical | +| CAP-403 | Idempotent Execution | Skip re-execution for identical input bundles | Critical | +| CAP-404 | LLM Integration | Execute compiled prompts via LLM provider | Critical | +| CAP-405 | RunManifest Persistence | Store complete execution provenance | Critical | +| CAP-406 | Nested Generator Runs | Execute generate macros recursively | High | + +### Implementation Tasks + +**Week 10: Execution Models** +- Create `markitect/prompts/execution/models.py` + - `PromptRun` dataclass with id, template_id, input_bundle_hash, status + - `ExecutionStage` enum: ANALYSIS, COMPILATION, PROCESSING, COMPLETE, FAILED + - `RunConfig` with model settings, depth limits, options + - `InputBundle` with template digest, dependency digests, config hash + - `InputBundleHash` calculation (SHA-256 of sorted input bundle) +- Create `markitect/prompts/execution/manifest.py` + - `RunManifest` comprehensive execution record + - Template metadata, resolved inputs, compiled prompt digest + - Model configuration, output artifacts, validation results + - Dependency edges, timing metadata +- Unit tests for execution models + +**Week 11: Execution Engine** +- Create `markitect/prompts/execution/engine.py` + - `PromptExecutionEngine` class + - `execute(template, config) -> PromptRun` + - Stage 1: Template analysis (use TemplateAnalyzer) + - Stage 2: Context compilation (use ContextCompiler) + - Stage 3: Prompt processing (LLM invocation) + - Calculate InputBundleHash before execution + - Check for existing run with same hash (FR-4.4) + - Store RunManifest on completion +- Engine unit tests + +**Week 12: LLM Integration Layer** +- Create `markitect/prompts/execution/llm_adapter.py` + - `LLMAdapter` abstract base class + - `execute_prompt(compiled_prompt, config) -> LLMResponse` + - Mock implementation for testing + - OpenAI/Anthropic adapter stubs (to be implemented) +- Create `markitect/prompts/execution/generator.py` + - `GeneratorExecutor` for nested generate macro execution + - Enforce max depth limit (FR-3.5) + - Track parent-child run relationships + - Link generator runs in RunManifest (FR-5.3) +- Integration tests for full execution flow + +### Database Schema Additions + +```sql +CREATE TABLE prompt_runs ( + id TEXT PRIMARY KEY, + template_id TEXT NOT NULL REFERENCES prompt_artifacts(id), + input_bundle_hash TEXT NOT NULL, + status TEXT NOT NULL, + stage TEXT NOT NULL, + parent_run_id TEXT REFERENCES prompt_runs(id), + depth INTEGER DEFAULT 0, + started_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + completed_at TIMESTAMP, + error_message TEXT, + UNIQUE(input_bundle_hash) -- Idempotency constraint +); + +CREATE TABLE run_manifests ( + run_id TEXT PRIMARY KEY REFERENCES prompt_runs(id), + template_metadata JSON NOT NULL, + resolved_inputs JSON NOT NULL, + compiled_prompt_digest TEXT NOT NULL, + model_config JSON NOT NULL, + output_artifacts JSON, + dependency_edges JSON, + validation_results JSON, + impact_debt JSON, + timing_metadata JSON, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP +); + +CREATE INDEX idx_runs_template ON prompt_runs(template_id); +CREATE INDEX idx_runs_bundle_hash ON prompt_runs(input_bundle_hash); +CREATE INDEX idx_runs_parent ON prompt_runs(parent_run_id); +``` + +### InputBundleHash Calculation + +```python +def calculate_input_bundle_hash( + template_digest: str, + dependency_digests: Dict[str, str], # {artifact_name: digest} + config_hash: str, + model_settings: Dict +) -> str: + """ + Deterministic hash of complete input context. + + Components (sorted for determinism): + 1. Template content digest + 2. Sorted dependency digests by name + 3. Resolution configuration hash + 4. Model settings (name, temperature, etc.) + 5. Compilation options + """ + bundle = { + 'template': template_digest, + 'dependencies': sorted(dependency_digests.items()), + 'config': config_hash, + 'model': sorted(model_settings.items()) + } + return hashlib.sha256( + json.dumps(bundle, sort_keys=True).encode() + ).hexdigest() +``` + +### Verification + +```bash +pytest tests/unit/prompts/test_execution_models.py +pytest tests/unit/prompts/test_execution_engine.py +pytest tests/unit/prompts/test_llm_adapter.py +pytest tests/unit/prompts/test_generator_executor.py +pytest tests/integration/prompts/test_prompt_execution.py +pytest tests/integration/prompts/test_idempotent_execution.py +``` + +--- + +## Phase 5: Dependency Tracking (FR-6) + +### Capability Requirements + +| ID | Capability | Description | Priority | +|----|-----------|-------------|----------| +| CAP-501 | Dependency Edge Recording | Track input → output relationships | Critical | +| CAP-502 | Dependency Graph Construction | Build queryable dependency graph | Critical | +| CAP-503 | Circular Dependency Detection | Identify cycles in dependency chains | High | +| CAP-504 | Dependency Query | Find dependents of any artifact | High | +| CAP-505 | Cross-Space Dependencies | Track dependencies across spaces | Medium | + +### Implementation Tasks + +**Week 13: Dependency Models** +- Create `markitect/prompts/dependencies/models.py` + - `DependencyEdge` with source_id, target_id, run_id, edge_type + - `EdgeType` enum: REQUIRES, GENERATES, INCLUDES + - `DependencyGraph` class for graph operations + - `CircularDependencyError` exception +- Unit tests for dependency models + +**Week 14: Graph Builder** +- Create `markitect/prompts/dependencies/graph.py` + - `GraphBuilder` class + - Extract dependencies from RunManifest + - Add edges: artifact → run (input), run → artifact (output) + - Build adjacency list representation + - Cycle detection using DFS +- Create `markitect/prompts/dependencies/repository.py` + - `SQLiteDependencyRepository` + - Store and query dependency edges + - Efficient dependent lookup queries +- Graph builder and repository tests + +**Week 15: Query Operations** +- Create `markitect/prompts/dependencies/queries.py` + - `find_dependents(artifact_id, depth=1) -> List[Artifact]` + - `find_dependencies(artifact_id) -> List[Artifact]` + - `get_dependency_chain(source_id, target_id) -> List[Edge]` + - `detect_circular_dependencies(artifact_id) -> List[Cycle]` +- Integration tests for dependency queries + +### Database Schema Additions + +```sql +CREATE TABLE prompt_dependencies ( + id TEXT PRIMARY KEY, + source_artifact_id TEXT NOT NULL REFERENCES prompt_artifacts(id), + target_artifact_id TEXT NOT NULL REFERENCES prompt_artifacts(id), + run_id TEXT NOT NULL REFERENCES prompt_runs(id), + edge_type TEXT NOT NULL, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, + UNIQUE(source_artifact_id, target_artifact_id, run_id) +); + +CREATE INDEX idx_deps_source ON prompt_dependencies(source_artifact_id); +CREATE INDEX idx_deps_target ON prompt_dependencies(target_artifact_id); +CREATE INDEX idx_deps_run ON prompt_dependencies(run_id); +``` + +### Dependency Graph Example + +``` +Template: design-doc (ID: t1) + → requires: glossary (ID: a1) + → requires: requirements (ID: a2) + → generates: api-spec (ID: a3) + PromptRun: r1 + Edges: + a1 → r1 (REQUIRES) + a2 → r1 (REQUIRES) + r1 → a3 (GENERATES) + +When a1 (glossary) changes: + Dependents(a1, depth=1) = [r1] + Affected outputs = [a3] (need recomputation) +``` + +### Verification + +```bash +pytest tests/unit/prompts/test_dependency_models.py +pytest tests/unit/prompts/test_graph_builder.py +pytest tests/unit/prompts/test_dependency_repository.py +pytest tests/unit/prompts/test_dependency_queries.py +pytest tests/integration/prompts/test_dependency_graph.py +pytest tests/integration/prompts/test_circular_detection.py +``` + +--- + +## Phase 6: Incremental Execution (FR-7, FR-8) + +### Capability Requirements + +| ID | Capability | Description | Priority | +|----|-----------|-------------|----------| +| CAP-601 | Change Detection | Detect artifact modifications via digest comparison | Critical | +| CAP-602 | Incremental Recompute | Recompute direct dependents on change | Critical | +| CAP-603 | Depth Control | Configurable recomputation depth (default=1) | High | +| CAP-604 | Circular Suppression | Suppress recompute to prevent cycles | High | +| CAP-605 | Change Impact Analysis | Calculate change magnitude metrics | High | +| CAP-606 | Impact Debt Tracking | Record suppressed recomputations | Medium | + +### Implementation Tasks + +**Week 16: Change Detection** +- Create `markitect/prompts/incremental/models.py` + - `ArtifactChange` with old_digest, new_digest, change_type + - `ChangeType` enum: CREATED, MODIFIED, DELETED + - `ImpactDebt` for suppressed recomputations + - `RecomputeConfig` with depth, circular handling, budget limits +- Create `markitect/prompts/incremental/detector.py` + - `ChangeDetector` class + - Compare current digest with stored digest + - Identify change type and magnitude +- Unit tests for change detection + +**Week 17: Impact Analysis** +- Create `markitect/prompts/incremental/impact.py` + - `ImpactAnalyzer` class + - Calculate change magnitude (FR-8.2): + - Structural diff ratio (default) + - Content diff ratio (character-level) + - Optional: embedding distance + - Optional: LLM-assessed impact + - Generate impact score (0.0 to 1.0) +- Create `markitect/prompts/incremental/metrics.py` + - Diff calculation utilities + - Similarity scoring algorithms +- Impact analyzer tests + +**Week 18: Incremental Recompute Engine** +- Create `markitect/prompts/incremental/engine.py` + - `IncrementalExecutionEngine` class + - `recompute_dependents(artifact_id, config) -> RecomputeResult` + - Find direct dependents via dependency graph (depth=1 default) + - Check for circular dependencies + - Execute prompt runs for affected dependents + - Track suppressed recomputations as ImpactDebt + - Record impact assessments in RunManifest (FR-8.3) +- Integration tests for incremental execution + +### Recomputation Logic + +```python +def recompute_dependents(artifact_id: str, config: RecomputeConfig): + """ + FR-7: Incremental recompute with depth control. + + 1. Detect change in artifact + 2. Find dependents up to specified depth (default=1) + 3. For each dependent: + - Check if recompute would create cycle → suppress if yes + - Calculate change impact + - If impact > threshold and budget available: + - Recompute (re-execute PromptRun) + - Else: + - Record as ImpactDebt in RunManifest + 4. Return RecomputeResult with executed/suppressed counts + """ +``` + +### Database Schema Additions + +```sql +CREATE TABLE artifact_changes ( + id TEXT PRIMARY KEY, + artifact_id TEXT NOT NULL REFERENCES prompt_artifacts(id), + old_digest TEXT, + new_digest TEXT NOT NULL, + change_type TEXT NOT NULL, + detected_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP +); + +CREATE TABLE impact_debt ( + id TEXT PRIMARY KEY, + artifact_id TEXT NOT NULL REFERENCES prompt_artifacts(id), + dependent_run_id TEXT NOT NULL REFERENCES prompt_runs(id), + change_magnitude REAL NOT NULL, + suppression_reason TEXT NOT NULL, + recorded_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP +); + +CREATE INDEX idx_changes_artifact ON artifact_changes(artifact_id); +CREATE INDEX idx_debt_artifact ON impact_debt(artifact_id); +CREATE INDEX idx_debt_run ON impact_debt(dependent_run_id); +``` + +### Verification + +```bash +pytest tests/unit/prompts/test_change_detector.py +pytest tests/unit/prompts/test_impact_analyzer.py +pytest tests/unit/prompts/test_incremental_engine.py +pytest tests/integration/prompts/test_incremental_recompute.py +pytest tests/integration/prompts/test_circular_suppression.py +pytest tests/integration/prompts/test_impact_debt.py +``` + +--- + +## Phase 7: Quality & Validation (FR-9, FR-10) + +### Capability Requirements + +| ID | Capability | Description | Priority | +|----|-----------|-------------|----------| +| CAP-701 | Schema Validation | Validate generated artifacts against JSON schemas | High | +| CAP-702 | QualityGate Framework | Pluggable validation framework | High | +| CAP-703 | Validation Results | Record pass/fail with diagnostics | High | +| CAP-704 | Halting Policy | Configurable execution halting rules | Medium | +| CAP-705 | Refinement Loop | Iterative improvement with quality checks | Medium | + +### Implementation Tasks + +**Week 19: QualityGate Framework** +- Create `markitect/prompts/quality/models.py` + - `QualityGate` abstract base class + - `ValidationResult` with status, diagnostics, score + - `QualityPolicy` with halting rules + - `GateType` enum: SCHEMA, PATTERN, CUSTOM +- Create `markitect/prompts/quality/gates/schema_gate.py` + - `SchemaValidationGate` using existing schema validator + - Validate generated artifacts against JSON schemas +- Create `markitect/prompts/quality/gates/pattern_gate.py` + - `PatternValidationGate` for regex-based checks +- Unit tests for quality gates + +**Week 20: Validation Integration** +- Create `markitect/prompts/quality/validator.py` + - `QualityValidator` class + - Apply multiple gates to generated artifacts + - Aggregate validation results + - Record results in RunManifest (FR-9.3) +- Integrate with execution engine + - Run quality gates after prompt processing + - Store validation results in RunManifest +- Integration tests + +**Week 21: Halting Policy Engine** +- Create `markitect/prompts/quality/policy.py` + - `HaltingPolicyEngine` class + - Evaluate halting conditions (FR-10.2): + - QualityGate failures + - Marginal improvement below threshold + - Iteration limit reached + - Resource budget exhausted + - Record halting decisions in RunManifest (FR-10.3) +- Create `markitect/prompts/quality/refinement.py` + - `RefinementLoop` for iterative improvement + - Execute → Validate → Halt or Refine +- Policy engine and refinement loop tests + +### QualityGate Example + +```python +# Schema validation gate +schema_gate = SchemaValidationGate( + schema_path="schemas/api-spec-schema.json" +) + +# Pattern validation gate +pattern_gate = PatternValidationGate( + required_patterns=[r"## Endpoints", r"### Authentication"], + forbidden_patterns=[r"TODO", r"FIXME"] +) + +# Quality policy +policy = QualityPolicy( + gates=[schema_gate, pattern_gate], + halting_rules={ + 'max_iterations': 3, + 'min_improvement': 0.05, + 'fail_on_validation_error': True + } +) +``` + +### Database Schema Additions + +```sql +CREATE TABLE quality_gates ( + id TEXT PRIMARY KEY, + name TEXT NOT NULL, + gate_type TEXT NOT NULL, + config JSON NOT NULL, + created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP +); + +CREATE TABLE validation_results ( + id TEXT PRIMARY KEY, + run_id TEXT NOT NULL REFERENCES prompt_runs(id), + gate_id TEXT NOT NULL REFERENCES quality_gates(id), + artifact_id TEXT REFERENCES prompt_artifacts(id), + status TEXT NOT NULL, -- PASS, FAIL, WARNING + score REAL, + diagnostics JSON, + validated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP +); + +CREATE INDEX idx_validations_run ON validation_results(run_id); +CREATE INDEX idx_validations_artifact ON validation_results(artifact_id); +``` + +### Verification + +```bash +pytest tests/unit/prompts/test_quality_gates.py +pytest tests/unit/prompts/test_quality_validator.py +pytest tests/unit/prompts/test_halting_policy.py +pytest tests/unit/prompts/test_refinement_loop.py +pytest tests/integration/prompts/test_quality_validation.py +pytest tests/integration/prompts/test_halting_execution.py +``` + +--- + +## Phase 8: Observability & Traceability (FR-11) + +### Capability Requirements + +| ID | Capability | Description | Priority | +|----|-----------|-------------|----------| +| CAP-801 | Provenance Tracing | Trace any artifact to its producing run | High | +| CAP-802 | Dependency Visualization | Visualize dependency graph | Medium | +| CAP-803 | Run History | Query execution history | High | +| CAP-804 | Audit Logging | Complete audit trail of all operations | Medium | +| CAP-805 | GraphQL API | Query interface for all prompt operations | High | +| CAP-806 | CLI Commands | Command-line tools for management | High | + +### Implementation Tasks + +**Week 22: Traceability Service** +- Create `markitect/prompts/traceability/service.py` + - `TraceabilityService` class + - `trace_artifact(artifact_id) -> ProvenanceTrace` + - `get_producing_run(artifact_id) -> PromptRun` + - `get_input_artifacts(run_id) -> List[Artifact]` + - `get_generator_runs(run_id) -> List[PromptRun]` + - `get_validation_history(artifact_id) -> List[ValidationResult]` +- Unit and integration tests + +**Week 23: Query & Visualization** +- Create `markitect/prompts/visualization/graph.py` + - Export dependency graph in DOT format + - Generate Mermaid diagrams +- Create `markitect/prompts/queries/` + - Complex query operations + - Run history queries + - Impact analysis queries +- Visualization and query tests + +**Week 24: API Layer - GraphQL** +- Create `markitect/prompts/graphql/schema.py` + - Extend existing GraphQL schema with prompt types + - `PromptTemplate`, `PromptRun`, `Artifact`, `DependencyEdge` types + - Queries: template, templates, run, runs, artifact, dependencies + - Mutations: executeTemplate, recomputeDependents + - Subscriptions: onRunComplete, onArtifactChange +- Create `markitect/prompts/graphql/resolvers.py` + - Implement all query and mutation resolvers +- GraphQL integration tests + +**Week 25: CLI Commands** +- Extend `markitect/cli.py` with prompt commands: + - `markitect prompt template create/list/show/delete` + - `markitect prompt execute