Files
markitect-main/examples/content-generator/TUTORIAL.md
tegwick 360c3b1de2
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
feat(examples): add content-generator example demonstrating Prompt Dependency Resolution
This example demonstrates the full workflow of generating InfoTech primers
using MarkiTect's Prompt Dependency Resolution infrastructure.

Features demonstrated:
- Artifact creation and storage with content-based addressing
- PromptTemplate with @{macro} resolution across multiple spaces
- Automatic dependency tracking and graph construction
- Provenance tracing from outputs back to inputs
- Visualization export (Mermaid format)
- Incremental execution with change detection

Files added:
- generate_primers.py: Complete working example
- README.md: Quick start guide and architecture overview
- TUTORIAL.md: Comprehensive 500+ line tutorial
- templates/generate-primer.md: Template with macros
- artifacts/topics/: ETL and Microservices topic definitions
- artifacts/guidelines/: Authoring rules and research protocol
- prepdr/: Original manual system (preserved for reference)

Example output:
- Generates 2 primers (ETL, Microservices)
- Creates 8 artifacts across 4 information spaces
- Records 8 dependency edges in SQLite database
- Exports dependency graph visualization

Run with: cd examples/content-generator && python generate_primers.py

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-09 23:50:07 +01:00

924 lines
24 KiB
Markdown

# MarkiTect Prompt Dependency Resolution Tutorial
**Example: Generating InfoTech Primers with Full Provenance Tracking**
This tutorial demonstrates how to use MarkiTect's Prompt Dependency Resolution infrastructure to systematically generate content with complete dependency tracking, quality validation, and traceability.
---
## Table of Contents
1. [Overview](#overview)
2. [Architecture](#architecture)
3. [Setup](#setup)
4. [Core Concepts](#core-concepts)
5. [Step-by-Step Walkthrough](#step-by-step-walkthrough)
6. [Advanced Features](#advanced-features)
7. [CLI Usage](#cli-usage)
8. [Best Practices](#best-practices)
---
## Overview
### What This Example Does
This example shows how to generate **InfoTech Primers** (structured reference documents for IT concepts) using a prompt template system with:
- **Artifact Management**: Store and version all inputs (templates, topics, guidelines)
- **Dependency Resolution**: Automatically resolve macro references across information spaces
- **Provenance Tracking**: Trace any generated primer back to its inputs and template
- **Incremental Updates**: Detect when inputs change and regenerate affected primers
- **Quality Validation**: Apply quality gates to ensure output meets standards
- **Visualization**: View dependency graphs in DOT or Mermaid format
### Why Use Prompt Dependency Resolution?
**Before** (manual approach in `prepdr/`):
```markdown
# Template with manual macros
{{topic}}
{{AuthoringRules}}
{{ResearchPrompt}}
```
Problems:
- Manual macro substitution
- No version tracking
- No dependency awareness
- Can't detect when inputs change
- No provenance traceability
**After** (with infrastructure):
```markdown
# Template with resolved dependencies
@{topic}
@{authoring_rules}
@{research_prompt}
```
Benefits:
- Automatic macro resolution
- Content-based change detection (SHA-256 digests)
- Full dependency graph construction
- Incremental recomputation when inputs change
- Complete provenance: artifact → template → inputs → validation
- CLI commands for inspection and debugging
---
## Architecture
### Information Spaces
The system organizes artifacts into **information spaces** (logical namespaces):
```
primer-templates/ # PromptTemplates for generation
├─ generate-primer
primer-topics/ # Topic definitions (ETL, Microservices, OAuth, etc.)
├─ etl
├─ microservices
└─ ...
primer-guidelines/ # Authoring and research guidelines
├─ authoring-rules
└─ research-prompt
generated-primers/ # Output artifacts
├─ etl-primer
├─ microservices-primer
└─ ...
```
### Dependency Graph
When you generate a primer, the system creates a dependency graph:
```mermaid
graph LR
A[etl topic] -->|requires| B[generate-primer template]
C[authoring-rules] -->|requires| B
D[research-prompt] -->|requires| B
B -->|generates| E[etl-primer output]
```
This graph enables:
- **Impact analysis**: "What primers need regeneration if authoring-rules changes?"
- **Provenance tracing**: "What inputs produced this primer?"
- **Incremental execution**: "Only regenerate affected primers"
---
## Setup
### Prerequisites
```bash
# Ensure MarkiTect is installed
cd /path/to/markitect_project
pip install -e .
```
### Directory Structure
```
examples/content-generator/
├── TUTORIAL.md # This file
├── generate_primers.py # Main example script
├── templates/
│ └── generate-primer.md # PromptTemplate
├── artifacts/
│ ├── topics/
│ │ ├── etl.md # Topic: ETL
│ │ └── microservices.md # Topic: Microservices
│ └── guidelines/
│ ├── authoring-rules.md # Authoring standards
│ └── research-prompt.md # Research methodology
└── prepdr/ # Original manual system (preserved)
├── README.md
├── ETL.md
├── AuthoringRules.md
├── AssistentPrompt.md
└── GeneratePrimerTemplate.md
```
### Running the Example
```bash
cd examples/content-generator
python generate_primers.py
```
Expected output:
```
╔══════════════════════════════════════════════════════════════╗
║ MarkiTect Prompt Dependency Resolution Example ║
║ InfoTech Primer Generation ║
╚══════════════════════════════════════════════════════════════╝
=== Loading Artifacts ===
✓ Created artifact: generate-primer (digest: a7f3e2b1)
✓ Created artifact: etl (digest: 9c4d6e8a)
✓ Created artifact: microservices (digest: 5b2f1c9d)
✓ Created artifact: authoring-rules (digest: 3e7a9f2c)
✓ Created artifact: research-prompt (digest: 8d1b4e6f)
=== Generating Primer: etl ===
✓ Template created with 3 macro dependencies
✓ Resolved 3 macros
✓ Compiled prompt (digest: 4c9e2a7b)
✓ Persisted 3 dependency edges
✓ Generated primer: etl-primer
=== Provenance Trace ===
Artifact: abc-123-def-456
Producing Run: run-etl-001
Input Artifacts: 3
Dependency Chain: 5 artifacts
✓ Primer generation complete!
```
---
## Core Concepts
### 1. Artifacts
**Artifacts** are versioned content units with content-based addressing.
```python
from markitect.prompts.models import Artifact, ArtifactType
# Create an artifact
artifact = Artifact.create(
space_id="primer-topics",
name="etl",
content=topic_content,
artifact_type=ArtifactType.CONTENT,
)
# Automatic SHA-256 digest generation
print(artifact.content_digest) # "9c4d6e8a..."
```
**Key features:**
- **Content digest**: SHA-256 hash for change detection
- **Space isolation**: Artifacts in different spaces can have same names
- **Type classification**: CONTENT, TEMPLATE, GENERATED, SCHEMA, CONFIG
### 2. PromptTemplates
**PromptTemplates** are artifacts with macro references.
```markdown
---
id: generate-primer-v1
artifact_type: template
---
# Generate Primer
Topic: @{topic}
Guidelines: @{authoring_rules}
```
**Macro syntax:**
- `@{macro_name}` - Resolved to artifact content
- Resolution happens at execution time
- Macros can reference artifacts in any information space
### 3. Resolution Strategy
**Resolution** finds artifacts to substitute for macros.
```python
from markitect.prompts.resolver.strategy import ResolutionConfig, ResolutionStrategy
config = ResolutionConfig(
strategy=ResolutionStrategy.FIRST_MATCH,
spaces=["primer-topics", "primer-guidelines"],
)
```
**Strategies:**
- `FIRST_MATCH`: Use first artifact found
- `LATEST_VERSION`: Use newest version (if artifacts have versions)
- `EXPLICIT_ONLY`: Require explicit space qualification
### 4. Dependency Tracking
**Dependency edges** are automatically created during resolution.
```python
# Edge types
EdgeType.REQUIRES # Input dependency (template → topic)
EdgeType.GENERATES # Output relationship (run → primer)
EdgeType.INCLUDES # Composition (nested templates)
```
**Graph operations:**
```python
# Find all artifacts that depend on authoring-rules
dependents = query_service.find_transitive_dependents("authoring-rules-id")
# Find all inputs needed to regenerate a primer
dependencies = query_service.find_transitive_dependencies("etl-primer-id")
# Detect circular dependencies
cycles = query_service.detect_circular_dependencies()
```
### 5. Traceability
**ProvenanceTrace** captures complete lineage.
```python
trace = trace_service.trace_artifact(artifact_id)
print(trace.producing_run) # Run that generated this
print(trace.template) # Template used
print(trace.input_artifacts) # All input dependencies
print(trace.validation_results) # Quality gate results
print(trace.impact_debt) # Suppressed recomputations
```
---
## Step-by-Step Walkthrough
### Step 1: Initialize Repositories
```python
from markitect.prompts.repositories.sqlite import SQLiteArtifactRepository
from markitect.prompts.dependencies.repository import SQLiteDependencyRepository
artifact_repo = SQLiteArtifactRepository("primers.db")
dep_repo = SQLiteDependencyRepository("primers.db")
```
**What this does:**
- Creates SQLite database with artifact and dependency tables
- Artifact table: id, space_id, name, content_digest, metadata
- Dependency table: source_id, target_id, edge_type, run_id
### Step 2: Load Artifacts
```python
# Read artifact file
content = Path("artifacts/topics/etl.md").read_text()
# Create artifact
artifact = Artifact.create(
space_id="primer-topics",
name="etl",
content=content,
artifact_type=ArtifactType.CONTENT,
)
# Store in repository
artifact = artifact_repo.create(artifact)
```
**Content-based addressing:**
```python
# If you modify the content
updated_content = content + "\n\n**New section added**"
artifact.update_content(updated_content)
# Digest changes automatically
print(artifact.content_digest) # Different hash!
```
### Step 3: Create PromptTemplate
```python
from markitect.prompts.templates.models import PromptTemplate, MacroReference
template = PromptTemplate.create(
id="generate-primer-v1",
name="generate-primer",
content=template_content,
space_id="primer-templates",
)
# Add macro dependencies
template.add_macro(MacroReference(
name="topic",
source_space="primer-topics"
))
template.add_macro(MacroReference(
name="authoring_rules",
source_space="primer-guidelines"
))
```
**Template content** (`templates/generate-primer.md`):
```markdown
# Generate InfoTech Primer
## Topic
@{topic}
## Guidelines
@{authoring_rules}
## Research Protocol
@{research_prompt}
Generate a complete primer following the authoring rules.
```
### Step 4: Resolve Dependencies
```python
from markitect.prompts.resolver.resolver import PromptResolver
from markitect.prompts.resolver.strategy import ResolutionConfig
resolver = PromptResolver(artifact_repo)
config = ResolutionConfig(
strategy=ResolutionStrategy.FIRST_MATCH,
spaces=["primer-topics", "primer-guidelines"],
)
resolution_result = resolver.resolve_template(template, config)
if resolution_result.success:
for resolved in resolution_result.context.resolved_macros:
print(f"{resolved.macro_name}{resolved.artifact.name}")
else:
print("Resolution failed:", resolution_result.context.errors)
```
**Resolution algorithm:**
1. Parse template to extract `@{macro_name}` references
2. For each macro:
- Search configured spaces in order
- Match by name (case-sensitive)
- Return first match (FIRST_MATCH strategy)
3. Build ResolutionResult with all resolved artifacts
### Step 5: Compile Prompt
```python
from markitect.prompts.resolver.compiler import ContextCompiler
compiler = ContextCompiler()
compiled = compiler.compile(template, template_content, resolution_result)
print(compiled.content) # Fully expanded prompt
print(compiled.content_digest) # Hash for caching
print(compiled.dependency_digests) # Map of macro → artifact digest
```
**Compiled output:**
```markdown
# Generate InfoTech Primer
## Topic
A three-phase computing process where data is extracted from source systems,
transformed (including validation, cleaning, enrichment, and aggregation),
and loaded into a target data store or data warehouse.
...
## Guidelines
[Full authoring-rules content]
...
## Research Protocol
[Full research-prompt content]
...
```
### Step 6: Track Dependencies
```python
from markitect.prompts.execution.manifest import RunManifest
from markitect.prompts.dependencies.graph import GraphBuilder
# Create run manifest
manifest = RunManifest.create(
run_id="run-etl-001",
template_id=template.id,
template_name=template.name,
template_digest=template.content_digest,
)
# Add resolved inputs
for resolved in resolution_result.context.resolved_macros:
manifest.add_resolved_input(
name=resolved.macro_name,
artifact_id=resolved.artifact.id,
space_id=resolved.space_id,
digest=resolved.artifact.content_digest,
)
# Create dependency edge
manifest.add_dependency_edge(
source_id=resolved.artifact.id,
target_id="run-etl-001",
edge_type="requires",
)
# Persist to database
builder = GraphBuilder(dep_repo)
edges = builder.persist_edges(manifest)
```
**Result:** Dependency edges stored in database:
```
source_artifact_id | target_artifact_id | edge_type | run_id
--------------------|--------------------|-----------|-----------
etl-id | run-etl-001 | requires | run-etl-001
authoring-rules-id | run-etl-001 | requires | run-etl-001
research-prompt-id | run-etl-001 | requires | run-etl-001
```
### Step 7: Generate Output
```python
# In real usage, this would call an LLM API
# For demo, we create a mock output
output_content = """
# ETL Primer
## Definition
ETL (Extract, Transform, Load) is a data integration pattern...
[Generated content]
"""
output_artifact = Artifact.create(
space_id="generated-primers",
name="etl-primer",
content=output_content,
artifact_type=ArtifactType.GENERATED,
)
output_artifact = artifact_repo.create(output_artifact)
# Add to manifest
manifest.add_output_artifact(
artifact_id=output_artifact.id,
name=output_artifact.name,
digest=output_artifact.content_digest,
artifact_type="generated",
)
manifest.add_dependency_edge(
source_id="run-etl-001",
target_id=output_artifact.id,
edge_type="generates",
)
# Persist output edges
builder.persist_edges(manifest)
```
### Step 8: Trace Provenance
```python
from markitect.prompts.traceability.service import TraceabilityService
trace_service = TraceabilityService(artifact_repo, dep_repo, db_path="primers.db")
# Trace the generated primer
trace = trace_service.trace_artifact(output_artifact.id)
# Inspect provenance
print("Template:", trace.template.name if trace.template else "None")
print("Producing run:", trace.producing_run.run_id if trace.producing_run else "None")
print("Input artifacts:")
for inp in trace.input_artifacts:
print(f" - {inp.name} ({inp.artifact_type})")
print("Dependency chain:")
for dep_id in trace.dependency_chain:
artifact = artifact_repo.get_by_id(dep_id)
print(f" - {artifact.name if artifact else dep_id}")
```
---
## Advanced Features
### Incremental Recomputation
When an input changes, automatically detect affected outputs:
```python
from markitect.prompts.incremental.detector import ChangeDetector
from markitect.prompts.incremental.engine import IncrementalExecutionEngine
from markitect.prompts.incremental.models import RecomputeConfig
# Detect change
detector = ChangeDetector("primers.db")
authoring_rules = artifact_repo.get_by_name("primer-guidelines", "authoring-rules")
# User updates the file
new_content = Path("artifacts/guidelines/authoring-rules.md").read_text()
change = detector.detect_change(authoring_rules, new_content)
if change:
detector.record_change(change)
# Find affected primers
engine = IncrementalExecutionEngine("primers.db", query_service)
result = engine.recompute(
change,
config=RecomputeConfig(max_depth=2, impact_threshold=0.1),
old_content=authoring_rules.content,
new_content=new_content,
)
print(f"Total dependents: {result.total_dependents}")
print(f"Recomputed: {result.recomputed_count}")
print(f"Suppressed: {result.suppressed_count}")
```
**Recomputation strategies:**
- **max_depth**: Traverse dependency graph N levels
- **impact_threshold**: Only recompute if change magnitude > threshold
- **max_recomputes**: Budget limit to prevent runaway execution
### Quality Validation
Apply quality gates to generated primers:
```python
from markitect.prompts.quality.validator import QualityValidator
from markitect.prompts.quality.gates.pattern_gate import PatternValidationGate
# Create validation gate
gate = PatternValidationGate(
required_patterns=[
r"## Definition",
r"## Context",
r"## Core Concepts",
r"## Scope and Non-Scope",
],
forbidden_patterns=[
r"TODO",
r"FIXME",
],
gate_id="primer-structure-check",
name="Primer Structure Validator",
)
validator = QualityValidator(gates=[gate], db_path="primers.db")
# Validate output
results = validator.validate_artifact(
content=output_content,
artifact_id=output_artifact.id,
run_id="run-etl-001",
)
if validator.all_passed(results):
print("✓ All quality gates passed")
else:
failed = validator.get_failed_gates(results)
for result in failed:
print(f"{result.gate_id} failed")
for diag in result.diagnostics:
print(f" {diag.message}")
```
### Visualization
Generate dependency graphs:
```python
from markitect.prompts.visualization.graph import GraphExporter
from markitect.prompts.dependencies.queries import DependencyQueryService
query_service = DependencyQueryService(dep_repo)
# Find all related artifacts
deps = query_service.find_transitive_dependencies(output_artifact.id)
dependents = query_service.find_transitive_dependents(output_artifact.id)
all_ids = deps | dependents | {output_artifact.id}
# Build graph
builder = GraphBuilder(dep_repo)
graph = builder.build_graph(all_ids)
# Export to Mermaid
mermaid = GraphExporter.to_mermaid(graph, "Primer Dependencies")
Path("dependencies.mermaid").write_text(mermaid)
# Export to DOT (Graphviz)
dot = GraphExporter.to_dot(graph, "Primer Dependencies")
Path("dependencies.dot").write_text(dot)
```
**Mermaid output:**
```mermaid
%%{ title: Primer Dependencies }%%
graph LR
etl-id-->|requires|run-etl-001
authoring-rules-id-->|requires|run-etl-001
research-prompt-id-->|requires|run-etl-001
run-etl-001-.->|generates|etl-primer-id
```
---
## CLI Usage
The Prompt Dependency Resolution infrastructure includes CLI commands:
### Trace Provenance
```bash
markitect prompt trace <artifact-id> --database primers.db
```
Output (JSON):
```json
{
"artifact_id": "abc-123-def-456",
"producing_run": {
"run_id": "run-etl-001",
"template_id": "generate-primer-v1",
"status": "success"
},
"input_artifacts": [
{
"artifact_id": "...",
"name": "etl",
"role": "input"
}
],
"dependency_chain": ["...", "..."]
}
```
### Visualize Graph
```bash
markitect prompt graph <artifact-id> --format mermaid --database primers.db
```
### List Runs
```bash
# All runs
markitect prompt runs --database primers.db
# Filter by template
markitect prompt runs --template generate-primer-v1 --database primers.db
# Filter by status
markitect prompt runs --status success --limit 10 --database primers.db
```
### Show Impact Debt
```bash
# All stale artifacts
markitect prompt debt --database primers.db
# Specific artifact
markitect prompt debt --artifact authoring-rules-id --database primers.db
```
### Graph Statistics
```bash
markitect prompt stats --database primers.db
```
Output:
```json
{
"total_nodes": 12,
"total_edges": 18,
"root_count": 3,
"leaf_count": 2,
"has_cycles": false
}
```
---
## Best Practices
### 1. Organize Artifacts by Space
```
Clear separation of concerns:
- templates/ ← Reusable PromptTemplates
- topics/ ← Domain-specific content
- guidelines/ ← Standards and methodologies
- output/ ← Generated artifacts
```
### 2. Use Content Digests for Change Detection
```python
# Don't compare content strings
if old_content != new_content: # ✗ Inefficient
# Do compare digests
if artifact.has_changed(new_digest): # ✓ Fast, hash-based
```
### 3. Apply Quality Gates
```python
# Define quality standards as code
gates = [
PatternValidationGate(required_patterns=[...]),
SchemaValidationGate(schema={...}),
]
# Fail fast if quality checks fail
if not validator.all_passed(results):
raise QualityError("Output does not meet standards")
```
### 4. Track All Dependencies
```python
# Always persist dependency edges
manifest.add_dependency_edge(source, target, edge_type)
builder.persist_edges(manifest)
# This enables:
# - Impact analysis
# - Incremental recomputation
# - Provenance tracing
```
### 5. Use Incremental Execution
```python
# Don't regenerate everything on every change
config = RecomputeConfig(
max_depth=2, # Limit blast radius
impact_threshold=0.1, # Skip minor changes
max_recomputes=10, # Budget limit
)
```
### 6. Version Your Templates
```python
# Include version in template metadata
---
id: generate-primer-v1
version: 1.0.0
---
# When template changes significantly, create v2
---
id: generate-primer-v2
version: 2.0.0
---
```
### 7. Leverage Traceability
```python
# Use provenance traces for debugging
trace = trace_service.trace_artifact(failed_output_id)
print("Inputs used:")
for inp in trace.input_artifacts:
print(f" {inp.name} @ {inp.content_digest[:8]}")
# This helps identify which input caused the issue
```
---
## Comparison with Original System
### Original (`prepdr/`)
**GeneratePrimerTemplate.md:**
```markdown
<topic>
{{ETL}}
</topic>
<guidance>
{{AuthoringRules}}
</guidance>
```
**Process:**
1. Manually copy-paste content
2. Replace `{{...}}` markers by hand
3. Run through LLM
4. No record of what inputs were used
5. No change detection
**Limitations:**
- No automation
- No version control on inputs
- Can't regenerate from history
- No impact analysis when guidelines change
### With Infrastructure
**templates/generate-primer.md:**
```markdown
@{topic}
@{authoring_rules}
```
**Process:**
1. Define artifacts once
2. Create template with `@{...}` macros
3. Run resolver → compiler → executor
4. Full dependency graph persisted
5. Complete provenance trace available
**Benefits:**
- Fully automated resolution
- Content-based change detection (SHA-256)
- Reproducible: "same inputs → same output"
- Impact analysis: "what needs regeneration?"
- Traceability: "how was this generated?"
- Quality validation: automated checks
- Visualization: see dependency relationships
---
## Next Steps
1. **Extend the example:**
- Add more topics (OAuth, Docker, Kubernetes)
- Create topic-specific quality gates
- Implement actual LLM integration
2. **Build a workflow:**
- Git hooks to detect artifact changes
- CI/CD pipeline to regenerate affected primers
- Dashboard to show primer freshness
3. **Add advanced features:**
- Version conflict resolution
- A/B testing different templates
- Batch generation with parallelization
4. **Integrate with MarkiTect:**
- Use MarkiTect ingestion for artifact storage
- Query relationships with relational metadata
- Generate documentation sites from primers
---
## References
- [Prompt Dependency Resolution Specification](../../roadmap/prompt-dependency-resolution/)
- [MarkiTect Documentation](../../README.md)
- [Phase 8 Implementation](../../markitect/prompts/)
---
**Questions or feedback?** File an issue or reach out to the maintainers.