Files
markitect-main/roadmap/prompt-dependency-resolution/WORKPLAN.md
tegwick cbde1dabc4 docs(prompts): add comprehensive implementation workplan
Create detailed 26-week workplan for Prompt Dependency Resolution system
implementing all 11 functional requirements across 8 phases:

- Phase 1-2: Foundation (artifacts, templates, macros)
- Phase 3-4: Resolution and execution engine with idempotent runs
- Phase 5-6: Dependency tracking and incremental recomputation
- Phase 7-8: Quality validation and observability/traceability

Includes database schemas, verification strategies, risk management,
and complete file structure for ~60 new modules.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-08 22:09:20 +01:00

43 KiB

Prompt Dependency Resolution - Implementation Workplan

Overview

This workplan details the implementation phases for building the Prompt Dependency Resolution infrastructure within MarkiTect. This system enables structured execution of PromptTemplates with deterministic dependency resolution, incremental recomputation, and quality validation across InformationSpaces.

The system transforms MarkiTect from a static markdown tool into an executable knowledge infrastructure that supports:

  • Template-driven content generation with LLMs
  • Automatic dependency tracking and resolution
  • Idempotent execution with content-based caching
  • Incremental recomputation with change impact analysis
  • Quality gate validation with halting policies

Functional Requirements Mapping

The implementation is organized into 8 phases covering all 11 functional requirements from the FRS:

FR ID Requirement Implementation Phase
FR-1 InformationSpace Addressability Phase 1: Foundation
FR-2 PromptTemplate Definition Phase 2: Templates & Macros
FR-3 PromptResolver Behavior Phase 3: Resolver Engine
FR-4 PromptRun Lifecycle Phase 4: Execution Engine
FR-5 RunManifest Persistence Phase 4: Execution Engine
FR-6 Dependency Graph Construction Phase 5: Dependency Tracking
FR-7 Incremental Recompute Phase 6: Incremental Execution
FR-8 Change Impact Assessment Phase 6: Incremental Execution
FR-9 QualityGate Validation Phase 7: Quality & Validation
FR-10 Halting and Refinement Policy Phase 7: Quality & Validation
FR-11 Traceability and Auditability Phase 8: Observability

Phase 1: Foundation - Addressable Artifacts (FR-1)

Capability Requirements

ID Capability Description Priority
CAP-101 Artifact Identity Persistent identifiers for content artifacts Critical
CAP-102 Content Digest SHA-256 content hashing for change detection Critical
CAP-103 Artifact Registry Lookup artifacts by name or ID within spaces Critical
CAP-104 Cross-Space References Reference artifacts across space boundaries High
CAP-105 Artifact Metadata Store artifact metadata (type, created, modified) High

Implementation Tasks

Week 1: Core Models

  • Create markitect/prompts/models.py
    • Artifact dataclass with id, name, space_id, content_digest, metadata
    • ArtifactReference dataclass for cross-space addressing
    • Content digest calculation utilities (SHA-256)
  • Create markitect/prompts/repositories/interfaces.py
    • IArtifactRepository interface
  • Unit tests for artifact models and digest calculation

Week 2: Repository Implementation

  • Create markitect/prompts/repositories/sqlite.py
    • SQLiteArtifactRepository implementing IArtifactRepository
    • CRUD operations with content digest tracking
    • Cross-space artifact lookup
  • Database migration scripts
  • Repository unit tests

Week 3: Artifact Service

  • Create markitect/prompts/services/artifact_service.py
    • Register artifacts with automatic digest calculation
    • Query artifacts by name, ID, or digest
    • Track artifact modifications with digest updates
  • Integration tests with existing InformationSpace service

Database Schema

CREATE TABLE prompt_artifacts (
    id TEXT PRIMARY KEY,
    space_id TEXT NOT NULL REFERENCES spaces(id),
    name TEXT NOT NULL,
    artifact_type TEXT NOT NULL,
    content_digest TEXT NOT NULL,
    content_size INTEGER,
    metadata JSON,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE(space_id, name)
);

CREATE INDEX idx_artifacts_digest ON prompt_artifacts(content_digest);
CREATE INDEX idx_artifacts_space ON prompt_artifacts(space_id);

Verification

pytest tests/unit/prompts/test_artifact_models.py
pytest tests/unit/prompts/test_artifact_repository.py
pytest tests/integration/prompts/test_artifact_service.py

Phase 2: Templates & Macros (FR-2)

Capability Requirements

ID Capability Description Priority
CAP-201 PromptTemplate Model Template definition with content and metadata Critical
CAP-202 ContentMacro Detection Parse and extract macros from template content Critical
CAP-203 Macro Types Support Required, Optional, Generate macro kinds Critical
CAP-204 Template Analysis Analyze templates to extract macro dependencies High
CAP-205 Template Validation Validate template syntax and macro references Medium

Implementation Tasks

Week 4: Template Models

  • Create markitect/prompts/templates/models.py
    • PromptTemplate dataclass extending Artifact
    • ContentMacro dataclass with kind, target, parameters
    • MacroKind enum: REQUIRED, OPTIONAL, GENERATE
    • TemplateMetadata for template-specific metadata
  • Unit tests for template models

Week 5: Macro Parser

  • Create markitect/prompts/templates/parser.py
    • Regex-based macro extraction from markdown content
    • Support macro syntax: {{require:artifact}}, {{optional:artifact}}, {{generate:template}}
    • Parameter parsing for macro arguments
  • Create markitect/prompts/templates/analyzer.py
    • TemplateAnalyzer class for dependency extraction
    • Identify all macros and their types
    • Build initial dependency list
  • Parser and analyzer unit tests

Week 6: Template Service

  • Create markitect/prompts/services/template_service.py
    • Register templates with automatic analysis
    • Query templates by ID or name
    • Retrieve template with analyzed macro list
  • Integration tests

Template Syntax

# Example PromptTemplate

## Context

{{require:project-overview}}
{{optional:technical-constraints}}

## Task Description

Generate a technical design for {{require:feature-name}}.

## Previous Designs

{{generate:related-designs-collector}}

Macro Format

{{<kind>:<target>[|<param1>=<value1>|<param2>=<value2>...]}}

Examples:
{{require:glossary/authentication}}
{{optional:standards/api-design}}
{{generate:code-examples|language=python|framework=fastapi}}

Verification

pytest tests/unit/prompts/test_template_models.py
pytest tests/unit/prompts/test_macro_parser.py
pytest tests/unit/prompts/test_template_analyzer.py
pytest tests/integration/prompts/test_template_service.py

Phase 3: Resolver Engine (FR-3)

Capability Requirements

ID Capability Description Priority
CAP-301 Resolution Strategy Deterministic multi-space resolution order Critical
CAP-302 Required Macro Resolution Fail on missing required artifacts Critical
CAP-303 Optional Macro Resolution Graceful fallback for missing optional artifacts Critical
CAP-304 Generate Macro Detection Identify generator templates for nested execution High
CAP-305 Resolution Context Track resolution state and errors High

Implementation Tasks

Week 7: Resolver Core

  • Create markitect/prompts/resolver/models.py
    • ResolutionContext with resolution order, resolved artifacts, errors
    • ResolutionResult with success status, resolved content, unresolved macros
    • ResolutionError for missing required artifacts
  • Create markitect/prompts/resolver/strategy.py
    • ResolutionStrategy base class
    • MultiSpaceResolutionStrategy implementing FR-3.1 order:
      1. Local InformationSpace
      2. Explicitly included InformationSpaces
      3. Default InformationSpace
      4. Team/Shared InformationSpace (if configured)
  • Unit tests for resolution strategy

Week 8: PromptResolver Implementation

  • Create markitect/prompts/resolver/resolver.py
    • PromptResolver class
    • resolve_template(template, context) -> ResolutionResult
    • Handle Required macros: fail if not found (FR-3.2)
    • Handle Optional macros: resolve to empty (FR-3.3)
    • Detect Generate macros for deferred resolution (FR-3.4)
    • Track resolution errors and warnings
  • Resolver unit tests

Week 9: Context Compilation

  • Create markitect/prompts/resolver/compiler.py
    • ContextCompiler class
    • Compile resolved artifacts into single prompt context
    • Substitute macros with resolved content
    • Generate CompiledPrompt with full context
  • Integration tests for full resolution flow

Resolution Order Example

# Given template in space "my-project" referencing {{require:glossary}}
# Resolution search order:
1. my-project/glossary
2. <included-space-1>/glossary
3. <included-space-2>/glossary
4. default-space/glossary
5. shared-space/glossary  # if configured
# If not found: ResolutionError(MacroKind.REQUIRED, "glossary")

Database Schema Additions

CREATE TABLE prompt_resolution_config (
    space_id TEXT PRIMARY KEY REFERENCES spaces(id),
    included_spaces JSON,  -- Array of space IDs to search
    default_space_id TEXT REFERENCES spaces(id),
    shared_space_id TEXT REFERENCES spaces(id),
    max_generation_depth INTEGER DEFAULT 3,
    config JSON,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Verification

pytest tests/unit/prompts/test_resolution_strategy.py
pytest tests/unit/prompts/test_prompt_resolver.py
pytest tests/unit/prompts/test_context_compiler.py
pytest tests/integration/prompts/test_resolution_flow.py

Phase 4: Execution Engine (FR-4, FR-5)

Capability Requirements

ID Capability Description Priority
CAP-401 PromptRun Lifecycle Three-stage execution: Analysis, Compilation, Processing Critical
CAP-402 InputBundleHash Content-based execution fingerprinting Critical
CAP-403 Idempotent Execution Skip re-execution for identical input bundles Critical
CAP-404 LLM Integration Execute compiled prompts via LLM provider Critical
CAP-405 RunManifest Persistence Store complete execution provenance Critical
CAP-406 Nested Generator Runs Execute generate macros recursively High

Implementation Tasks

Week 10: Execution Models

  • Create markitect/prompts/execution/models.py
    • PromptRun dataclass with id, template_id, input_bundle_hash, status
    • ExecutionStage enum: ANALYSIS, COMPILATION, PROCESSING, COMPLETE, FAILED
    • RunConfig with model settings, depth limits, options
    • InputBundle with template digest, dependency digests, config hash
    • InputBundleHash calculation (SHA-256 of sorted input bundle)
  • Create markitect/prompts/execution/manifest.py
    • RunManifest comprehensive execution record
    • Template metadata, resolved inputs, compiled prompt digest
    • Model configuration, output artifacts, validation results
    • Dependency edges, timing metadata
  • Unit tests for execution models

Week 11: Execution Engine

  • Create markitect/prompts/execution/engine.py
    • PromptExecutionEngine class
    • execute(template, config) -> PromptRun
    • Stage 1: Template analysis (use TemplateAnalyzer)
    • Stage 2: Context compilation (use ContextCompiler)
    • Stage 3: Prompt processing (LLM invocation)
    • Calculate InputBundleHash before execution
    • Check for existing run with same hash (FR-4.4)
    • Store RunManifest on completion
  • Engine unit tests

Week 12: LLM Integration Layer

  • Create markitect/prompts/execution/llm_adapter.py
    • LLMAdapter abstract base class
    • execute_prompt(compiled_prompt, config) -> LLMResponse
    • Mock implementation for testing
    • OpenAI/Anthropic adapter stubs (to be implemented)
  • Create markitect/prompts/execution/generator.py
    • GeneratorExecutor for nested generate macro execution
    • Enforce max depth limit (FR-3.5)
    • Track parent-child run relationships
    • Link generator runs in RunManifest (FR-5.3)
  • Integration tests for full execution flow

Database Schema Additions

CREATE TABLE prompt_runs (
    id TEXT PRIMARY KEY,
    template_id TEXT NOT NULL REFERENCES prompt_artifacts(id),
    input_bundle_hash TEXT NOT NULL,
    status TEXT NOT NULL,
    stage TEXT NOT NULL,
    parent_run_id TEXT REFERENCES prompt_runs(id),
    depth INTEGER DEFAULT 0,
    started_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    completed_at TIMESTAMP,
    error_message TEXT,
    UNIQUE(input_bundle_hash)  -- Idempotency constraint
);

CREATE TABLE run_manifests (
    run_id TEXT PRIMARY KEY REFERENCES prompt_runs(id),
    template_metadata JSON NOT NULL,
    resolved_inputs JSON NOT NULL,
    compiled_prompt_digest TEXT NOT NULL,
    model_config JSON NOT NULL,
    output_artifacts JSON,
    dependency_edges JSON,
    validation_results JSON,
    impact_debt JSON,
    timing_metadata JSON,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_runs_template ON prompt_runs(template_id);
CREATE INDEX idx_runs_bundle_hash ON prompt_runs(input_bundle_hash);
CREATE INDEX idx_runs_parent ON prompt_runs(parent_run_id);

InputBundleHash Calculation

def calculate_input_bundle_hash(
    template_digest: str,
    dependency_digests: Dict[str, str],  # {artifact_name: digest}
    config_hash: str,
    model_settings: Dict
) -> str:
    """
    Deterministic hash of complete input context.

    Components (sorted for determinism):
    1. Template content digest
    2. Sorted dependency digests by name
    3. Resolution configuration hash
    4. Model settings (name, temperature, etc.)
    5. Compilation options
    """
    bundle = {
        'template': template_digest,
        'dependencies': sorted(dependency_digests.items()),
        'config': config_hash,
        'model': sorted(model_settings.items())
    }
    return hashlib.sha256(
        json.dumps(bundle, sort_keys=True).encode()
    ).hexdigest()

Verification

pytest tests/unit/prompts/test_execution_models.py
pytest tests/unit/prompts/test_execution_engine.py
pytest tests/unit/prompts/test_llm_adapter.py
pytest tests/unit/prompts/test_generator_executor.py
pytest tests/integration/prompts/test_prompt_execution.py
pytest tests/integration/prompts/test_idempotent_execution.py

Phase 5: Dependency Tracking (FR-6)

Capability Requirements

ID Capability Description Priority
CAP-501 Dependency Edge Recording Track input → output relationships Critical
CAP-502 Dependency Graph Construction Build queryable dependency graph Critical
CAP-503 Circular Dependency Detection Identify cycles in dependency chains High
CAP-504 Dependency Query Find dependents of any artifact High
CAP-505 Cross-Space Dependencies Track dependencies across spaces Medium

Implementation Tasks

Week 13: Dependency Models

  • Create markitect/prompts/dependencies/models.py
    • DependencyEdge with source_id, target_id, run_id, edge_type
    • EdgeType enum: REQUIRES, GENERATES, INCLUDES
    • DependencyGraph class for graph operations
    • CircularDependencyError exception
  • Unit tests for dependency models

Week 14: Graph Builder

  • Create markitect/prompts/dependencies/graph.py
    • GraphBuilder class
    • Extract dependencies from RunManifest
    • Add edges: artifact → run (input), run → artifact (output)
    • Build adjacency list representation
    • Cycle detection using DFS
  • Create markitect/prompts/dependencies/repository.py
    • SQLiteDependencyRepository
    • Store and query dependency edges
    • Efficient dependent lookup queries
  • Graph builder and repository tests

Week 15: Query Operations

  • Create markitect/prompts/dependencies/queries.py
    • find_dependents(artifact_id, depth=1) -> List[Artifact]
    • find_dependencies(artifact_id) -> List[Artifact]
    • get_dependency_chain(source_id, target_id) -> List[Edge]
    • detect_circular_dependencies(artifact_id) -> List[Cycle]
  • Integration tests for dependency queries

Database Schema Additions

CREATE TABLE prompt_dependencies (
    id TEXT PRIMARY KEY,
    source_artifact_id TEXT NOT NULL REFERENCES prompt_artifacts(id),
    target_artifact_id TEXT NOT NULL REFERENCES prompt_artifacts(id),
    run_id TEXT NOT NULL REFERENCES prompt_runs(id),
    edge_type TEXT NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    UNIQUE(source_artifact_id, target_artifact_id, run_id)
);

CREATE INDEX idx_deps_source ON prompt_dependencies(source_artifact_id);
CREATE INDEX idx_deps_target ON prompt_dependencies(target_artifact_id);
CREATE INDEX idx_deps_run ON prompt_dependencies(run_id);

Dependency Graph Example

Template: design-doc (ID: t1)
  → requires: glossary (ID: a1)
  → requires: requirements (ID: a2)
  → generates: api-spec (ID: a3)
    PromptRun: r1
      Edges:
        a1 → r1 (REQUIRES)
        a2 → r1 (REQUIRES)
        r1 → a3 (GENERATES)

When a1 (glossary) changes:
  Dependents(a1, depth=1) = [r1]
  Affected outputs = [a3]  (need recomputation)

Verification

pytest tests/unit/prompts/test_dependency_models.py
pytest tests/unit/prompts/test_graph_builder.py
pytest tests/unit/prompts/test_dependency_repository.py
pytest tests/unit/prompts/test_dependency_queries.py
pytest tests/integration/prompts/test_dependency_graph.py
pytest tests/integration/prompts/test_circular_detection.py

Phase 6: Incremental Execution (FR-7, FR-8)

Capability Requirements

ID Capability Description Priority
CAP-601 Change Detection Detect artifact modifications via digest comparison Critical
CAP-602 Incremental Recompute Recompute direct dependents on change Critical
CAP-603 Depth Control Configurable recomputation depth (default=1) High
CAP-604 Circular Suppression Suppress recompute to prevent cycles High
CAP-605 Change Impact Analysis Calculate change magnitude metrics High
CAP-606 Impact Debt Tracking Record suppressed recomputations Medium

Implementation Tasks

Week 16: Change Detection

  • Create markitect/prompts/incremental/models.py
    • ArtifactChange with old_digest, new_digest, change_type
    • ChangeType enum: CREATED, MODIFIED, DELETED
    • ImpactDebt for suppressed recomputations
    • RecomputeConfig with depth, circular handling, budget limits
  • Create markitect/prompts/incremental/detector.py
    • ChangeDetector class
    • Compare current digest with stored digest
    • Identify change type and magnitude
  • Unit tests for change detection

Week 17: Impact Analysis

  • Create markitect/prompts/incremental/impact.py
    • ImpactAnalyzer class
    • Calculate change magnitude (FR-8.2):
      • Structural diff ratio (default)
      • Content diff ratio (character-level)
      • Optional: embedding distance
      • Optional: LLM-assessed impact
    • Generate impact score (0.0 to 1.0)
  • Create markitect/prompts/incremental/metrics.py
    • Diff calculation utilities
    • Similarity scoring algorithms
  • Impact analyzer tests

Week 18: Incremental Recompute Engine

  • Create markitect/prompts/incremental/engine.py
    • IncrementalExecutionEngine class
    • recompute_dependents(artifact_id, config) -> RecomputeResult
    • Find direct dependents via dependency graph (depth=1 default)
    • Check for circular dependencies
    • Execute prompt runs for affected dependents
    • Track suppressed recomputations as ImpactDebt
    • Record impact assessments in RunManifest (FR-8.3)
  • Integration tests for incremental execution

Recomputation Logic

def recompute_dependents(artifact_id: str, config: RecomputeConfig):
    """
    FR-7: Incremental recompute with depth control.

    1. Detect change in artifact
    2. Find dependents up to specified depth (default=1)
    3. For each dependent:
       - Check if recompute would create cycle → suppress if yes
       - Calculate change impact
       - If impact > threshold and budget available:
         - Recompute (re-execute PromptRun)
       - Else:
         - Record as ImpactDebt in RunManifest
    4. Return RecomputeResult with executed/suppressed counts
    """

Database Schema Additions

CREATE TABLE artifact_changes (
    id TEXT PRIMARY KEY,
    artifact_id TEXT NOT NULL REFERENCES prompt_artifacts(id),
    old_digest TEXT,
    new_digest TEXT NOT NULL,
    change_type TEXT NOT NULL,
    detected_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE impact_debt (
    id TEXT PRIMARY KEY,
    artifact_id TEXT NOT NULL REFERENCES prompt_artifacts(id),
    dependent_run_id TEXT NOT NULL REFERENCES prompt_runs(id),
    change_magnitude REAL NOT NULL,
    suppression_reason TEXT NOT NULL,
    recorded_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_changes_artifact ON artifact_changes(artifact_id);
CREATE INDEX idx_debt_artifact ON impact_debt(artifact_id);
CREATE INDEX idx_debt_run ON impact_debt(dependent_run_id);

Verification

pytest tests/unit/prompts/test_change_detector.py
pytest tests/unit/prompts/test_impact_analyzer.py
pytest tests/unit/prompts/test_incremental_engine.py
pytest tests/integration/prompts/test_incremental_recompute.py
pytest tests/integration/prompts/test_circular_suppression.py
pytest tests/integration/prompts/test_impact_debt.py

Phase 7: Quality & Validation (FR-9, FR-10)

Capability Requirements

ID Capability Description Priority
CAP-701 Schema Validation Validate generated artifacts against JSON schemas High
CAP-702 QualityGate Framework Pluggable validation framework High
CAP-703 Validation Results Record pass/fail with diagnostics High
CAP-704 Halting Policy Configurable execution halting rules Medium
CAP-705 Refinement Loop Iterative improvement with quality checks Medium

Implementation Tasks

Week 19: QualityGate Framework

  • Create markitect/prompts/quality/models.py
    • QualityGate abstract base class
    • ValidationResult with status, diagnostics, score
    • QualityPolicy with halting rules
    • GateType enum: SCHEMA, PATTERN, CUSTOM
  • Create markitect/prompts/quality/gates/schema_gate.py
    • SchemaValidationGate using existing schema validator
    • Validate generated artifacts against JSON schemas
  • Create markitect/prompts/quality/gates/pattern_gate.py
    • PatternValidationGate for regex-based checks
  • Unit tests for quality gates

Week 20: Validation Integration

  • Create markitect/prompts/quality/validator.py
    • QualityValidator class
    • Apply multiple gates to generated artifacts
    • Aggregate validation results
    • Record results in RunManifest (FR-9.3)
  • Integrate with execution engine
    • Run quality gates after prompt processing
    • Store validation results in RunManifest
  • Integration tests

Week 21: Halting Policy Engine

  • Create markitect/prompts/quality/policy.py
    • HaltingPolicyEngine class
    • Evaluate halting conditions (FR-10.2):
      • QualityGate failures
      • Marginal improvement below threshold
      • Iteration limit reached
      • Resource budget exhausted
    • Record halting decisions in RunManifest (FR-10.3)
  • Create markitect/prompts/quality/refinement.py
    • RefinementLoop for iterative improvement
    • Execute → Validate → Halt or Refine
  • Policy engine and refinement loop tests

QualityGate Example

# Schema validation gate
schema_gate = SchemaValidationGate(
    schema_path="schemas/api-spec-schema.json"
)

# Pattern validation gate
pattern_gate = PatternValidationGate(
    required_patterns=[r"## Endpoints", r"### Authentication"],
    forbidden_patterns=[r"TODO", r"FIXME"]
)

# Quality policy
policy = QualityPolicy(
    gates=[schema_gate, pattern_gate],
    halting_rules={
        'max_iterations': 3,
        'min_improvement': 0.05,
        'fail_on_validation_error': True
    }
)

Database Schema Additions

CREATE TABLE quality_gates (
    id TEXT PRIMARY KEY,
    name TEXT NOT NULL,
    gate_type TEXT NOT NULL,
    config JSON NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE validation_results (
    id TEXT PRIMARY KEY,
    run_id TEXT NOT NULL REFERENCES prompt_runs(id),
    gate_id TEXT NOT NULL REFERENCES quality_gates(id),
    artifact_id TEXT REFERENCES prompt_artifacts(id),
    status TEXT NOT NULL,  -- PASS, FAIL, WARNING
    score REAL,
    diagnostics JSON,
    validated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

CREATE INDEX idx_validations_run ON validation_results(run_id);
CREATE INDEX idx_validations_artifact ON validation_results(artifact_id);

Verification

pytest tests/unit/prompts/test_quality_gates.py
pytest tests/unit/prompts/test_quality_validator.py
pytest tests/unit/prompts/test_halting_policy.py
pytest tests/unit/prompts/test_refinement_loop.py
pytest tests/integration/prompts/test_quality_validation.py
pytest tests/integration/prompts/test_halting_execution.py

Phase 8: Observability & Traceability (FR-11)

Capability Requirements

ID Capability Description Priority
CAP-801 Provenance Tracing Trace any artifact to its producing run High
CAP-802 Dependency Visualization Visualize dependency graph Medium
CAP-803 Run History Query execution history High
CAP-804 Audit Logging Complete audit trail of all operations Medium
CAP-805 GraphQL API Query interface for all prompt operations High
CAP-806 CLI Commands Command-line tools for management High

Implementation Tasks

Week 22: Traceability Service

  • Create markitect/prompts/traceability/service.py
    • TraceabilityService class
    • trace_artifact(artifact_id) -> ProvenanceTrace
    • get_producing_run(artifact_id) -> PromptRun
    • get_input_artifacts(run_id) -> List[Artifact]
    • get_generator_runs(run_id) -> List[PromptRun]
    • get_validation_history(artifact_id) -> List[ValidationResult]
  • Unit and integration tests

Week 23: Query & Visualization

  • Create markitect/prompts/visualization/graph.py
    • Export dependency graph in DOT format
    • Generate Mermaid diagrams
  • Create markitect/prompts/queries/
    • Complex query operations
    • Run history queries
    • Impact analysis queries
  • Visualization and query tests

Week 24: API Layer - GraphQL

  • Create markitect/prompts/graphql/schema.py
    • Extend existing GraphQL schema with prompt types
    • PromptTemplate, PromptRun, Artifact, DependencyEdge types
    • Queries: template, templates, run, runs, artifact, dependencies
    • Mutations: executeTemplate, recomputeDependents
    • Subscriptions: onRunComplete, onArtifactChange
  • Create markitect/prompts/graphql/resolvers.py
    • Implement all query and mutation resolvers
  • GraphQL integration tests

Week 25: CLI Commands

  • Extend markitect/cli.py with prompt commands:
    • markitect prompt template create/list/show/delete
    • markitect prompt execute <template> [--config CONFIG]
    • markitect prompt recompute <artifact> [--depth N]
    • markitect prompt trace <artifact>
    • markitect prompt graph <artifact> [--format dot|mermaid]
    • markitect prompt runs [--template TEMPLATE] [--status STATUS]
    • markitect prompt validate <artifact> [--gates GATES]
  • CLI integration tests

Week 26: Documentation & Polish

  • User guide for prompt dependency resolution
  • API documentation
  • Example prompt templates and workflows
  • Performance optimization
  • Final integration testing

GraphQL Schema Extensions

type PromptTemplate {
  id: ID!
  name: String!
  spaceId: ID!
  content: String!
  contentDigest: String!
  macros: [ContentMacro!]!
  metadata: JSON
  createdAt: DateTime!
  updatedAt: DateTime!
}

type ContentMacro {
  kind: MacroKind!
  target: String!
  parameters: JSON
}

enum MacroKind {
  REQUIRED
  OPTIONAL
  GENERATE
}

type PromptRun {
  id: ID!
  template: PromptTemplate!
  inputBundleHash: String!
  status: RunStatus!
  stage: ExecutionStage!
  parentRun: PromptRun
  depth: Int!
  manifest: RunManifest!
  startedAt: DateTime!
  completedAt: DateTime
}

type RunManifest {
  runId: ID!
  templateMetadata: JSON!
  resolvedInputs: [ResolvedInput!]!
  compiledPromptDigest: String!
  modelConfig: JSON!
  outputArtifacts: [Artifact!]!
  dependencyEdges: [DependencyEdge!]!
  validationResults: [ValidationResult!]!
  impactDebt: [ImpactDebt!]!
}

type DependencyEdge {
  id: ID!
  source: Artifact!
  target: Artifact!
  run: PromptRun!
  edgeType: EdgeType!
}

type Query {
  promptTemplate(id: ID!): PromptTemplate
  promptTemplates(spaceId: ID): [PromptTemplate!]!
  promptRun(id: ID!): PromptRun
  promptRuns(templateId: ID, status: RunStatus): [PromptRun!]!
  artifact(id: ID!): Artifact
  dependencies(artifactId: ID!, depth: Int): [DependencyEdge!]!
  traceArtifact(id: ID!): ProvenanceTrace!
}

type Mutation {
  createTemplate(input: CreateTemplateInput!): PromptTemplate!
  executeTemplate(templateId: ID!, config: ExecutionConfig): PromptRun!
  recomputeDependents(artifactId: ID!, config: RecomputeConfig): RecomputeResult!
}

type Subscription {
  onRunComplete(templateId: ID): PromptRun!
  onArtifactChange(spaceId: ID): ArtifactChange!
}

CLI Examples

# Create and execute a template
markitect prompt template create design-doc \
  --space my-project \
  --content @templates/design-template.md

markitect prompt execute design-doc \
  --config '{"model": "gpt-4", "temperature": 0.7}'

# Trace artifact provenance
markitect prompt trace api-spec
# Output:
# Artifact: api-spec (a3)
# Produced by: PromptRun r1
# Template: design-doc (t1)
# Input artifacts:
#   - glossary (a1)
#   - requirements (a2)
# Generated at: 2026-02-08 10:30:00

# Visualize dependencies
markitect prompt graph api-spec --format mermaid > deps.mmd

# Recompute after change
markitect prompt recompute glossary --depth 2
# Recomputing dependents of glossary...
# ✓ design-doc run r1 (api-spec regenerated)
# ✓ implementation-guide run r2 (guide regenerated)
# Summary: 2 runs executed, 0 suppressed

Verification

pytest tests/unit/prompts/test_traceability_service.py
pytest tests/unit/prompts/test_visualization.py
pytest tests/integration/prompts/test_graphql_api.py
pytest tests/integration/prompts/test_cli_commands.py
pytest tests/e2e/prompts/test_complete_workflow.py

Timeline Summary

Phase Focus Duration Cumulative
1 Foundation - Addressable Artifacts 3 weeks 3 weeks
2 Templates & Macros 3 weeks 6 weeks
3 Resolver Engine 3 weeks 9 weeks
4 Execution Engine 3 weeks 12 weeks
5 Dependency Tracking 3 weeks 15 weeks
6 Incremental Execution 3 weeks 18 weeks
7 Quality & Validation 3 weeks 21 weeks
8 Observability & Traceability 5 weeks 26 weeks

Total: 26 weeks (~6 months)

Parallel Work Opportunities

  • Phases 7 (Quality) and 8 (Observability) can partially overlap
  • Documentation can be written incrementally throughout
  • CLI commands can start in parallel with Phase 7
  • GraphQL schema can be drafted early and implemented incrementally

Files to Create

Core Modules

markitect/prompts/
├── __init__.py
├── models.py                           # Phase 1: Artifact models
├── repositories/
│   ├── __init__.py
│   ├── interfaces.py                   # Phase 1: Repository interfaces
│   └── sqlite.py                       # Phase 1: SQLite implementations
├── templates/
│   ├── __init__.py
│   ├── models.py                       # Phase 2: Template models
│   ├── parser.py                       # Phase 2: Macro parser
│   └── analyzer.py                     # Phase 2: Template analyzer
├── resolver/
│   ├── __init__.py
│   ├── models.py                       # Phase 3: Resolution models
│   ├── strategy.py                     # Phase 3: Resolution strategies
│   ├── resolver.py                     # Phase 3: PromptResolver
│   └── compiler.py                     # Phase 3: Context compiler
├── execution/
│   ├── __init__.py
│   ├── models.py                       # Phase 4: Execution models
│   ├── manifest.py                     # Phase 4: RunManifest
│   ├── engine.py                       # Phase 4: Execution engine
│   ├── llm_adapter.py                  # Phase 4: LLM integration
│   └── generator.py                    # Phase 4: Generator executor
├── dependencies/
│   ├── __init__.py
│   ├── models.py                       # Phase 5: Dependency models
│   ├── graph.py                        # Phase 5: Graph builder
│   ├── repository.py                   # Phase 5: Dependency storage
│   └── queries.py                      # Phase 5: Graph queries
├── incremental/
│   ├── __init__.py
│   ├── models.py                       # Phase 6: Change models
│   ├── detector.py                     # Phase 6: Change detector
│   ├── impact.py                       # Phase 6: Impact analyzer
│   ├── metrics.py                      # Phase 6: Diff metrics
│   └── engine.py                       # Phase 6: Incremental engine
├── quality/
│   ├── __init__.py
│   ├── models.py                       # Phase 7: Quality models
│   ├── gates/
│   │   ├── __init__.py
│   │   ├── schema_gate.py             # Phase 7: Schema validation
│   │   └── pattern_gate.py            # Phase 7: Pattern validation
│   ├── validator.py                    # Phase 7: Quality validator
│   ├── policy.py                       # Phase 7: Halting policy
│   └── refinement.py                   # Phase 7: Refinement loop
├── traceability/
│   ├── __init__.py
│   └── service.py                      # Phase 8: Traceability
├── visualization/
│   ├── __init__.py
│   └── graph.py                        # Phase 8: Graph visualization
├── queries/
│   ├── __init__.py
│   └── operations.py                   # Phase 8: Complex queries
├── graphql/
│   ├── __init__.py
│   ├── schema.py                       # Phase 8: GraphQL schema
│   └── resolvers.py                    # Phase 8: Resolvers
└── services/
    ├── __init__.py
    ├── artifact_service.py             # Phase 1: Artifact operations
    └── template_service.py             # Phase 2: Template operations

Test Files

tests/unit/prompts/
├── test_artifact_models.py             # Phase 1
├── test_artifact_repository.py         # Phase 1
├── test_template_models.py             # Phase 2
├── test_macro_parser.py                # Phase 2
├── test_template_analyzer.py           # Phase 2
├── test_resolution_strategy.py         # Phase 3
├── test_prompt_resolver.py             # Phase 3
├── test_context_compiler.py            # Phase 3
├── test_execution_models.py            # Phase 4
├── test_execution_engine.py            # Phase 4
├── test_llm_adapter.py                 # Phase 4
├── test_generator_executor.py          # Phase 4
├── test_dependency_models.py           # Phase 5
├── test_graph_builder.py               # Phase 5
├── test_dependency_repository.py       # Phase 5
├── test_dependency_queries.py          # Phase 5
├── test_change_detector.py             # Phase 6
├── test_impact_analyzer.py             # Phase 6
├── test_incremental_engine.py          # Phase 6
├── test_quality_gates.py               # Phase 7
├── test_quality_validator.py           # Phase 7
├── test_halting_policy.py              # Phase 7
├── test_refinement_loop.py             # Phase 7
├── test_traceability_service.py        # Phase 8
└── test_visualization.py               # Phase 8

tests/integration/prompts/
├── test_artifact_service.py            # Phase 1
├── test_template_service.py            # Phase 2
├── test_resolution_flow.py             # Phase 3
├── test_prompt_execution.py            # Phase 4
├── test_idempotent_execution.py        # Phase 4
├── test_dependency_graph.py            # Phase 5
├── test_circular_detection.py          # Phase 5
├── test_incremental_recompute.py       # Phase 6
├── test_circular_suppression.py        # Phase 6
├── test_impact_debt.py                 # Phase 6
├── test_quality_validation.py          # Phase 7
├── test_halting_execution.py           # Phase 7
├── test_graphql_api.py                 # Phase 8
└── test_cli_commands.py                # Phase 8

tests/e2e/prompts/
└── test_complete_workflow.py           # Phase 8

Documentation Files

docs/prompts/
├── GETTING_STARTED.md
├── TEMPLATE_GUIDE.md
├── EXECUTION_GUIDE.md
├── DEPENDENCY_MANAGEMENT.md
├── QUALITY_GATES.md
└── API_REFERENCE.md

Database Migrations

migrations/prompts/
├── 001_create_artifacts_table.sql      # Phase 1
├── 002_create_resolution_config.sql    # Phase 3
├── 003_create_runs_and_manifests.sql   # Phase 4
├── 004_create_dependencies.sql         # Phase 5
├── 005_create_changes_and_debt.sql     # Phase 6
└── 006_create_quality_tables.sql       # Phase 7

Success Criteria

The implementation is considered complete when all of the following acceptance criteria from the FRS are met:

  1. FR-2 & FR-3: A PromptTemplate referencing Required, Optional, and Generate macros can be executed
  2. FR-3.4 & FR-4: Missing Generate dependencies are automatically generated and persisted
  3. FR-4.4: Re-running an unchanged PromptRun with identical InputBundleHash results in skipped execution
  4. FR-7: Changing an upstream artifact triggers recomputation of direct dependents
  5. FR-7.3: Circular recomputation is suppressed and logged
  6. FR-5: RunManifest contains complete provenance and dependency information
  7. FR-9: Schema validation failures are correctly recorded and influence halting policy

Additional Quality Metrics

  • Test Coverage: >85% for all prompt modules
  • Performance: Execute simple template in <500ms (excluding LLM call)
  • Performance: Build dependency graph for 1000 artifacts in <2s
  • Performance: Incremental recompute for 100 dependents in <5s
  • Documentation: Complete user guides for all major workflows
  • Integration: Zero regressions in existing MarkiTect functionality

Design Decisions

1. LLM Provider Abstraction

Decision: Abstract LLM integration behind LLMAdapter interface Rationale: FRS explicitly does not prescribe LLM provider. Adapter pattern allows pluggable providers (OpenAI, Anthropic, local models).

2. Storage Backend

Decision: SQLite for persistence, in-memory graph for queries Rationale: Consistent with existing InformationSpace implementation. SQLite provides ACID guarantees. In-memory graph enables fast traversal.

3. Macro Syntax

Decision: Use {{kind:target|param=value}} syntax in markdown Rationale: Non-invasive in markdown source. Compatible with existing transclusion syntax. Easy to parse with regex.

4. Incremental Recompute Default Depth

Decision: Default depth=1 (direct dependents only) Rationale: Per FR-7.2. Prevents cascading recomputation storms. User can increase depth explicitly when needed.

5. Circular Dependency Handling

Decision: Suppress recomputation, record as ImpactDebt Rationale: Per FR-7.3. Avoids infinite loops. Debt tracking ensures visibility into suppressed updates.

6. Change Impact Default Method

Decision: Structural diff ratio Rationale: Per FR-8.2. Fast, deterministic, no external dependencies. Embedding and LLM methods are optional enhancements.

7. InputBundleHash Components

Decision: Template digest + dependency digests + config + model settings Rationale: Per FR-4.3. Captures all factors affecting prompt output. Ensures idempotent execution.

8. RunManifest Storage Format

Decision: JSON columns in SQLite Rationale: Flexible schema for manifest evolution. Queryable via SQLite JSON functions. Easy to export for analysis.


Risk Management

Risk 1: LLM Integration Complexity

Impact: High | Probability: Medium Mitigation:

  • Start with mock LLM adapter for testing
  • Well-defined adapter interface
  • Implement one production adapter (e.g., OpenAI) in Phase 4
  • Additional adapters can be added incrementally

Risk 2: Performance at Scale

Impact: Medium | Probability: Medium Mitigation:

  • Index all foreign keys and frequently queried columns
  • Use in-memory graph for dependency traversal
  • Implement pagination for large result sets
  • Performance testing with 10K+ artifacts in Phase 6

Risk 3: Circular Dependency Complexity

Impact: Medium | Probability: Low Mitigation:

  • Thorough cycle detection testing
  • Clear documentation on when cycles occur
  • ImpactDebt provides visibility
  • Users can manually break cycles if needed

Risk 4: Quality Gate Extensibility

Impact: Low | Probability: Low Mitigation:

  • Plugin-based architecture for gates
  • Well-defined QualityGate interface
  • Ship with schema and pattern gates
  • Document custom gate creation

Risk 5: Integration with Existing InformationSpace

Impact: High | Probability: Low Mitigation:

  • Build on top of existing space infrastructure
  • Reuse space repositories and services
  • Comprehensive integration tests
  • Incremental rollout per phase

Future Enhancements (Out of Scope)

The following capabilities are valuable but explicitly out of scope for initial implementation:

  1. Distributed Execution: Execute prompt runs across multiple workers
  2. Real-time Collaboration: Multiple users editing templates simultaneously
  3. Version Control Integration: Git-based template versioning
  4. Advanced Visualization: Interactive dependency graph UI
  5. Cost Tracking: Track LLM API costs per run
  6. Prompt Optimization: Automatic prompt refinement based on results
  7. Multi-modal Artifacts: Support for images, audio in artifacts
  8. External Data Sources: Pull artifacts from APIs, databases
  9. Scheduling: Cron-based automatic recomputation
  10. A/B Testing: Compare multiple template variations

Implementation Status

Status: Planning Complete Next Step: Begin Phase 1 implementation Target Start: TBD Estimated Completion: 26 weeks from start


Conclusion

This workplan provides a comprehensive roadmap for implementing Prompt Dependency Resolution infrastructure in MarkiTect. The 8-phase approach ensures:

  • Incremental Delivery: Each phase delivers working functionality
  • Risk Mitigation: Complex features built on solid foundations
  • Testability: Comprehensive test coverage at every phase
  • Extensibility: Clean architecture supports future enhancements
  • Compliance: Full coverage of all FR-1 through FR-11 requirements

The implementation will transform MarkiTect into an executable knowledge infrastructure, enabling deterministic, traceable, and incremental execution of prompt-based content generation across InformationSpaces.


Status: Ready for Implementation 🚀