markitect-main/examples/content-generator/TUTORIAL.md

# MarkiTect Prompt Dependency Resolution Tutorial

**Example: Generating InfoTech Primers with Full Provenance Tracking**

This tutorial demonstrates how to use MarkiTect's Prompt Dependency Resolution infrastructure to systematically generate content with complete dependency tracking, quality validation, and traceability.

---

## Table of Contents

1. [Overview](#overview)
2. [Architecture](#architecture)
3. [Setup](#setup)
4. [Core Concepts](#core-concepts)
5. [Step-by-Step Walkthrough](#step-by-step-walkthrough)
6. [Advanced Features](#advanced-features)
7. [CLI Usage](#cli-usage)
8. [Best Practices](#best-practices)

---

## Overview

### What This Example Does

This example shows how to generate **InfoTech Primers** (structured reference documents for IT concepts) using a prompt template system with:

- **Artifact Management**: Store and version all inputs (templates, topics, guidelines)
- **Dependency Resolution**: Automatically resolve macro references across information spaces
- **Provenance Tracking**: Trace any generated primer back to its inputs and template
- **Incremental Updates**: Detect when inputs change and regenerate affected primers
- **Quality Validation**: Apply quality gates to ensure output meets standards
- **Visualization**: View dependency graphs in DOT or Mermaid format

### Why Use Prompt Dependency Resolution?

**Before** (manual approach in `prepdr/`):
```markdown
# Template with manual macros
{{topic}}
{{AuthoringRules}}
{{ResearchPrompt}}
```

Problems:
- Manual macro substitution
- No version tracking
- No dependency awareness
- Can't detect when inputs change
- No provenance traceability

**After** (with infrastructure):
```markdown
# Template with resolved dependencies
@{topic}
@{authoring_rules}
@{research_prompt}
```

Benefits:
- Automatic macro resolution
- Content-based change detection (SHA-256 digests)
- Full dependency graph construction
- Incremental recomputation when inputs change
- Complete provenance: artifact → template → inputs → validation
- CLI commands for inspection and debugging

---

## Architecture

### Information Spaces

The system organizes artifacts into **information spaces** (logical namespaces):

```
primer-templates/        # PromptTemplates for generation
  ├─ generate-primer

primer-topics/           # Topic definitions (ETL, Microservices, OAuth, etc.)
  ├─ etl
  ├─ microservices
  └─ ...

primer-guidelines/       # Authoring and research guidelines
  ├─ authoring-rules
  └─ research-prompt

generated-primers/       # Output artifacts
  ├─ etl-primer
  ├─ microservices-primer
  └─ ...
```

### Dependency Graph

When you generate a primer, the system creates a dependency graph:

```mermaid
graph LR
    A[etl topic] -->|requires| B[generate-primer template]
    C[authoring-rules] -->|requires| B
    D[research-prompt] -->|requires| B
    B -->|generates| E[etl-primer output]
```

This graph enables:
- **Impact analysis**: "What primers need regeneration if authoring-rules changes?"
- **Provenance tracing**: "What inputs produced this primer?"
- **Incremental execution**: "Only regenerate affected primers"

---

## Setup

### Prerequisites

```bash
# Ensure MarkiTect is installed
cd /path/to/markitect_project
pip install -e .
```

### Directory Structure

```
examples/content-generator/
├── TUTORIAL.md                          # This file
├── generate_primers.py                  # Main example script
├── templates/
│   └── generate-primer.md               # PromptTemplate
├── artifacts/
│   ├── topics/
│   │   ├── etl.md                       # Topic: ETL
│   │   └── microservices.md             # Topic: Microservices
│   └── guidelines/
│       ├── authoring-rules.md           # Authoring standards
│       └── research-prompt.md           # Research methodology
└── prepdr/                              # Original manual system (preserved)
    ├── README.md
    ├── ETL.md
    ├── AuthoringRules.md
    ├── AssistentPrompt.md
    └── GeneratePrimerTemplate.md
```

### Running the Example

```bash
cd examples/content-generator
python generate_primers.py
```

Expected output:
```
╔══════════════════════════════════════════════════════════════╗
║  MarkiTect Prompt Dependency Resolution Example             ║
║  InfoTech Primer Generation                                  ║
╚══════════════════════════════════════════════════════════════╝

=== Loading Artifacts ===
✓ Created artifact: generate-primer (digest: a7f3e2b1)
✓ Created artifact: etl (digest: 9c4d6e8a)
✓ Created artifact: microservices (digest: 5b2f1c9d)
✓ Created artifact: authoring-rules (digest: 3e7a9f2c)
✓ Created artifact: research-prompt (digest: 8d1b4e6f)

=== Generating Primer: etl ===
✓ Template created with 3 macro dependencies
✓ Resolved 3 macros
✓ Compiled prompt (digest: 4c9e2a7b)
✓ Persisted 3 dependency edges
✓ Generated primer: etl-primer

=== Provenance Trace ===
Artifact: abc-123-def-456
Producing Run: run-etl-001
Input Artifacts: 3
Dependency Chain: 5 artifacts

✓ Primer generation complete!
```

---

## Core Concepts

### 1. Artifacts

**Artifacts** are versioned content units with content-based addressing.

```python
from markitect.prompts.models import Artifact, ArtifactType

# Create an artifact
artifact = Artifact.create(
    space_id="primer-topics",
    name="etl",
    content=topic_content,
    artifact_type=ArtifactType.CONTENT,
)

# Automatic SHA-256 digest generation
print(artifact.content_digest)  # "9c4d6e8a..."
```

**Key features:**
- **Content digest**: SHA-256 hash for change detection
- **Space isolation**: Artifacts in different spaces can have same names
- **Type classification**: CONTENT, TEMPLATE, GENERATED, SCHEMA, CONFIG

### 2. PromptTemplates

**PromptTemplates** are artifacts with macro references.

```markdown
---
id: generate-primer-v1
artifact_type: template
---

# Generate Primer

Topic: @{topic}
Guidelines: @{authoring_rules}
```

**Macro syntax:**
- `@{macro_name}` - Resolved to artifact content
- Resolution happens at execution time
- Macros can reference artifacts in any information space

### 3. Resolution Strategy

**Resolution** finds artifacts to substitute for macros.

```python
from markitect.prompts.resolver.strategy import ResolutionConfig, ResolutionStrategy

config = ResolutionConfig(
    strategy=ResolutionStrategy.FIRST_MATCH,
    spaces=["primer-topics", "primer-guidelines"],
)
```

**Strategies:**
- `FIRST_MATCH`: Use first artifact found
- `LATEST_VERSION`: Use newest version (if artifacts have versions)
- `EXPLICIT_ONLY`: Require explicit space qualification

### 4. Dependency Tracking

**Dependency edges** are automatically created during resolution.

```python
# Edge types
EdgeType.REQUIRES    # Input dependency (template → topic)
EdgeType.GENERATES   # Output relationship (run → primer)
EdgeType.INCLUDES    # Composition (nested templates)
```

**Graph operations:**
```python
# Find all artifacts that depend on authoring-rules
dependents = query_service.find_transitive_dependents("authoring-rules-id")

# Find all inputs needed to regenerate a primer
dependencies = query_service.find_transitive_dependencies("etl-primer-id")

# Detect circular dependencies
cycles = query_service.detect_circular_dependencies()
```

### 5. Traceability

**ProvenanceTrace** captures complete lineage.

```python
trace = trace_service.trace_artifact(artifact_id)

print(trace.producing_run)      # Run that generated this
print(trace.template)            # Template used
print(trace.input_artifacts)    # All input dependencies
print(trace.validation_results) # Quality gate results
print(trace.impact_debt)        # Suppressed recomputations
```

---

## Step-by-Step Walkthrough

### Step 1: Initialize Repositories

```python
from markitect.prompts.repositories.sqlite import SQLiteArtifactRepository
from markitect.prompts.dependencies.repository import SQLiteDependencyRepository

artifact_repo = SQLiteArtifactRepository("primers.db")
dep_repo = SQLiteDependencyRepository("primers.db")
```

**What this does:**
- Creates SQLite database with artifact and dependency tables
- Artifact table: id, space_id, name, content_digest, metadata
- Dependency table: source_id, target_id, edge_type, run_id

### Step 2: Load Artifacts

```python
# Read artifact file
content = Path("artifacts/topics/etl.md").read_text()

# Create artifact
artifact = Artifact.create(
    space_id="primer-topics",
    name="etl",
    content=content,
    artifact_type=ArtifactType.CONTENT,
)

# Store in repository
artifact = artifact_repo.create(artifact)
```

**Content-based addressing:**
```python
# If you modify the content
updated_content = content + "\n\n**New section added**"
artifact.update_content(updated_content)

# Digest changes automatically
print(artifact.content_digest)  # Different hash!
```

### Step 3: Create PromptTemplate

```python
from markitect.prompts.templates.models import PromptTemplate, MacroReference

template = PromptTemplate.create(
    id="generate-primer-v1",
    name="generate-primer",
    content=template_content,
    space_id="primer-templates",
)

# Add macro dependencies
template.add_macro(MacroReference(
    name="topic",
    source_space="primer-topics"
))
template.add_macro(MacroReference(
    name="authoring_rules",
    source_space="primer-guidelines"
))
```

**Template content** (`templates/generate-primer.md`):
```markdown
# Generate InfoTech Primer

## Topic
@{topic}

## Guidelines
@{authoring_rules}

## Research Protocol
@{research_prompt}

Generate a complete primer following the authoring rules.
```

### Step 4: Resolve Dependencies

```python
from markitect.prompts.resolver.resolver import PromptResolver
from markitect.prompts.resolver.strategy import ResolutionConfig

resolver = PromptResolver(artifact_repo)

config = ResolutionConfig(
    strategy=ResolutionStrategy.FIRST_MATCH,
    spaces=["primer-topics", "primer-guidelines"],
)

resolution_result = resolver.resolve_template(template, config)

if resolution_result.success:
    for resolved in resolution_result.context.resolved_macros:
        print(f"{resolved.macro_name} → {resolved.artifact.name}")
else:
    print("Resolution failed:", resolution_result.context.errors)
```

**Resolution algorithm:**
1. Parse template to extract `@{macro_name}` references
2. For each macro:
   - Search configured spaces in order
   - Match by name (case-sensitive)
   - Return first match (FIRST_MATCH strategy)
3. Build ResolutionResult with all resolved artifacts

### Step 5: Compile Prompt

```python
from markitect.prompts.resolver.compiler import ContextCompiler

compiler = ContextCompiler()
compiled = compiler.compile(template, template_content, resolution_result)

print(compiled.content)  # Fully expanded prompt
print(compiled.content_digest)  # Hash for caching
print(compiled.dependency_digests)  # Map of macro → artifact digest
```

**Compiled output:**
```markdown
# Generate InfoTech Primer

## Topic
A three-phase computing process where data is extracted from source systems,
transformed (including validation, cleaning, enrichment, and aggregation),
and loaded into a target data store or data warehouse.
...

## Guidelines
[Full authoring-rules content]
...

## Research Protocol
[Full research-prompt content]
...
```

### Step 6: Track Dependencies

```python
from markitect.prompts.execution.manifest import RunManifest
from markitect.prompts.dependencies.graph import GraphBuilder

# Create run manifest
manifest = RunManifest.create(
    run_id="run-etl-001",
    template_id=template.id,
    template_name=template.name,
    template_digest=template.content_digest,
)

# Add resolved inputs
for resolved in resolution_result.context.resolved_macros:
    manifest.add_resolved_input(
        name=resolved.macro_name,
        artifact_id=resolved.artifact.id,
        space_id=resolved.space_id,
        digest=resolved.artifact.content_digest,
    )

    # Create dependency edge
    manifest.add_dependency_edge(
        source_id=resolved.artifact.id,
        target_id="run-etl-001",
        edge_type="requires",
    )

# Persist to database
builder = GraphBuilder(dep_repo)
edges = builder.persist_edges(manifest)
```

**Result:** Dependency edges stored in database:
```
source_artifact_id  | target_artifact_id | edge_type | run_id
--------------------|--------------------|-----------|-----------
etl-id              | run-etl-001        | requires  | run-etl-001
authoring-rules-id  | run-etl-001        | requires  | run-etl-001
research-prompt-id  | run-etl-001        | requires  | run-etl-001
```

### Step 7: Generate Output

```python
# In real usage, this would call an LLM API
# For demo, we create a mock output
output_content = """
# ETL Primer

## Definition
ETL (Extract, Transform, Load) is a data integration pattern...
[Generated content]
"""

output_artifact = Artifact.create(
    space_id="generated-primers",
    name="etl-primer",
    content=output_content,
    artifact_type=ArtifactType.GENERATED,
)
output_artifact = artifact_repo.create(output_artifact)

# Add to manifest
manifest.add_output_artifact(
    artifact_id=output_artifact.id,
    name=output_artifact.name,
    digest=output_artifact.content_digest,
    artifact_type="generated",
)

manifest.add_dependency_edge(
    source_id="run-etl-001",
    target_id=output_artifact.id,
    edge_type="generates",
)

# Persist output edges
builder.persist_edges(manifest)
```

### Step 8: Trace Provenance

```python
from markitect.prompts.traceability.service import TraceabilityService

trace_service = TraceabilityService(artifact_repo, dep_repo, db_path="primers.db")

# Trace the generated primer
trace = trace_service.trace_artifact(output_artifact.id)

# Inspect provenance
print("Template:", trace.template.name if trace.template else "None")
print("Producing run:", trace.producing_run.run_id if trace.producing_run else "None")
print("Input artifacts:")
for inp in trace.input_artifacts:
    print(f"  - {inp.name} ({inp.artifact_type})")

print("Dependency chain:")
for dep_id in trace.dependency_chain:
    artifact = artifact_repo.get_by_id(dep_id)
    print(f"  - {artifact.name if artifact else dep_id}")
```

---

## Advanced Features

### Incremental Recomputation

When an input changes, automatically detect affected outputs:

```python
from markitect.prompts.incremental.detector import ChangeDetector
from markitect.prompts.incremental.engine import IncrementalExecutionEngine
from markitect.prompts.incremental.models import RecomputeConfig

# Detect change
detector = ChangeDetector("primers.db")
authoring_rules = artifact_repo.get_by_name("primer-guidelines", "authoring-rules")

# User updates the file
new_content = Path("artifacts/guidelines/authoring-rules.md").read_text()
change = detector.detect_change(authoring_rules, new_content)

if change:
    detector.record_change(change)

    # Find affected primers
    engine = IncrementalExecutionEngine("primers.db", query_service)
    result = engine.recompute(
        change,
        config=RecomputeConfig(max_depth=2, impact_threshold=0.1),
        old_content=authoring_rules.content,
        new_content=new_content,
    )

    print(f"Total dependents: {result.total_dependents}")
    print(f"Recomputed: {result.recomputed_count}")
    print(f"Suppressed: {result.suppressed_count}")
```

**Recomputation strategies:**
- **max_depth**: Traverse dependency graph N levels
- **impact_threshold**: Only recompute if change magnitude > threshold
- **max_recomputes**: Budget limit to prevent runaway execution

### Quality Validation

Apply quality gates to generated primers:

```python
from markitect.prompts.quality.validator import QualityValidator
from markitect.prompts.quality.gates.pattern_gate import PatternValidationGate

# Create validation gate
gate = PatternValidationGate(
    required_patterns=[
        r"## Definition",
        r"## Context",
        r"## Core Concepts",
        r"## Scope and Non-Scope",
    ],
    forbidden_patterns=[
        r"TODO",
        r"FIXME",
    ],
    gate_id="primer-structure-check",
    name="Primer Structure Validator",
)

validator = QualityValidator(gates=[gate], db_path="primers.db")

# Validate output
results = validator.validate_artifact(
    content=output_content,
    artifact_id=output_artifact.id,
    run_id="run-etl-001",
)

if validator.all_passed(results):
    print("✓ All quality gates passed")
else:
    failed = validator.get_failed_gates(results)
    for result in failed:
        print(f"✗ {result.gate_id} failed")
        for diag in result.diagnostics:
            print(f"  {diag.message}")
```

### Visualization

Generate dependency graphs:

```python
from markitect.prompts.visualization.graph import GraphExporter
from markitect.prompts.dependencies.queries import DependencyQueryService

query_service = DependencyQueryService(dep_repo)

# Find all related artifacts
deps = query_service.find_transitive_dependencies(output_artifact.id)
dependents = query_service.find_transitive_dependents(output_artifact.id)
all_ids = deps | dependents | {output_artifact.id}

# Build graph
builder = GraphBuilder(dep_repo)
graph = builder.build_graph(all_ids)

# Export to Mermaid
mermaid = GraphExporter.to_mermaid(graph, "Primer Dependencies")
Path("dependencies.mermaid").write_text(mermaid)

# Export to DOT (Graphviz)
dot = GraphExporter.to_dot(graph, "Primer Dependencies")
Path("dependencies.dot").write_text(dot)
```

**Mermaid output:**
```mermaid
%%{ title: Primer Dependencies }%%
graph LR
  etl-id-->|requires|run-etl-001
  authoring-rules-id-->|requires|run-etl-001
  research-prompt-id-->|requires|run-etl-001
  run-etl-001-.->|generates|etl-primer-id
```

---

## CLI Usage

The Prompt Dependency Resolution infrastructure includes CLI commands:

### Trace Provenance

```bash
markitect prompt trace <artifact-id> --database primers.db
```

Output (JSON):
```json
{
  "artifact_id": "abc-123-def-456",
  "producing_run": {
    "run_id": "run-etl-001",
    "template_id": "generate-primer-v1",
    "status": "success"
  },
  "input_artifacts": [
    {
      "artifact_id": "...",
      "name": "etl",
      "role": "input"
    }
  ],
  "dependency_chain": ["...", "..."]
}
```

### Visualize Graph

```bash
markitect prompt graph <artifact-id> --format mermaid --database primers.db
```

### List Runs

```bash
# All runs
markitect prompt runs --database primers.db

# Filter by template
markitect prompt runs --template generate-primer-v1 --database primers.db

# Filter by status
markitect prompt runs --status success --limit 10 --database primers.db
```

### Show Impact Debt

```bash
# All stale artifacts
markitect prompt debt --database primers.db

# Specific artifact
markitect prompt debt --artifact authoring-rules-id --database primers.db
```

### Graph Statistics

```bash
markitect prompt stats --database primers.db
```

Output:
```json
{
  "total_nodes": 12,
  "total_edges": 18,
  "root_count": 3,
  "leaf_count": 2,
  "has_cycles": false
}
```

---

## Best Practices

### 1. Organize Artifacts by Space

```
Clear separation of concerns:
- templates/     ← Reusable PromptTemplates
- topics/        ← Domain-specific content
- guidelines/    ← Standards and methodologies
- output/        ← Generated artifacts
```

### 2. Use Content Digests for Change Detection

```python
# Don't compare content strings
if old_content != new_content:  # ✗ Inefficient

# Do compare digests
if artifact.has_changed(new_digest):  # ✓ Fast, hash-based
```

### 3. Apply Quality Gates

```python
# Define quality standards as code
gates = [
    PatternValidationGate(required_patterns=[...]),
    SchemaValidationGate(schema={...}),
]

# Fail fast if quality checks fail
if not validator.all_passed(results):
    raise QualityError("Output does not meet standards")
```

### 4. Track All Dependencies

```python
# Always persist dependency edges
manifest.add_dependency_edge(source, target, edge_type)
builder.persist_edges(manifest)

# This enables:
# - Impact analysis
# - Incremental recomputation
# - Provenance tracing
```

### 5. Use Incremental Execution

```python
# Don't regenerate everything on every change
config = RecomputeConfig(
    max_depth=2,              # Limit blast radius
    impact_threshold=0.1,     # Skip minor changes
    max_recomputes=10,        # Budget limit
)
```

### 6. Version Your Templates

```python
# Include version in template metadata
---
id: generate-primer-v1
version: 1.0.0
---

# When template changes significantly, create v2
---
id: generate-primer-v2
version: 2.0.0
---
```

### 7. Leverage Traceability

```python
# Use provenance traces for debugging
trace = trace_service.trace_artifact(failed_output_id)

print("Inputs used:")
for inp in trace.input_artifacts:
    print(f"  {inp.name} @ {inp.content_digest[:8]}")

# This helps identify which input caused the issue
```

---

## Comparison with Original System

### Original (`prepdr/`)

**GeneratePrimerTemplate.md:**
```markdown
<topic>
{{ETL}}
</topic>

<guidance>
{{AuthoringRules}}
</guidance>
```

**Process:**
1. Manually copy-paste content
2. Replace `{{...}}` markers by hand
3. Run through LLM
4. No record of what inputs were used
5. No change detection

**Limitations:**
- No automation
- No version control on inputs
- Can't regenerate from history
- No impact analysis when guidelines change

### With Infrastructure

**templates/generate-primer.md:**
```markdown
@{topic}
@{authoring_rules}
```

**Process:**
1. Define artifacts once
2. Create template with `@{...}` macros
3. Run resolver → compiler → executor
4. Full dependency graph persisted
5. Complete provenance trace available

**Benefits:**
- Fully automated resolution
- Content-based change detection (SHA-256)
- Reproducible: "same inputs → same output"
- Impact analysis: "what needs regeneration?"
- Traceability: "how was this generated?"
- Quality validation: automated checks
- Visualization: see dependency relationships

---

## Next Steps

1. **Extend the example:**
   - Add more topics (OAuth, Docker, Kubernetes)
   - Create topic-specific quality gates
   - Implement actual LLM integration

2. **Build a workflow:**
   - Git hooks to detect artifact changes
   - CI/CD pipeline to regenerate affected primers
   - Dashboard to show primer freshness

3. **Add advanced features:**
   - Version conflict resolution
   - A/B testing different templates
   - Batch generation with parallelization

4. **Integrate with MarkiTect:**
   - Use MarkiTect ingestion for artifact storage
   - Query relationships with relational metadata
   - Generate documentation sites from primers

---

## References

- [Prompt Dependency Resolution Specification](../../roadmap/prompt-dependency-resolution/)
- [MarkiTect Documentation](../../README.md)
- [Phase 8 Implementation](../../markitect/prompts/)

---

**Questions or feedback?** File an issue or reach out to the maintainers.