Two root causes of metric fragmentation observed in collection checks:
1. Schema's Economic Domain used free-form examples ("labour economics,
trade theory") which overrode the enum in extraction-rules.md, causing
the LLM to produce multi-domain strings and non-canonical values.
Fix: schema now specifies the exact 7-value enum with descriptions.
2. Source Chapter had no format constraint, producing 9 different formats
for 7 chapters (full titles, mixed Roman/Arabic numerals, asterisks).
Fix: extraction-rules now mandate "Book [Roman], Chapter [n]" exactly.
These fixes are prerequisites for clean reprocessing (S3.2 continuation).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The SQLite artifact database is a derived cache regenerable from
committed files — no LLM calls needed. Added tutorial section
explaining why it is excluded and how to rebuild it after a fresh clone.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add OpenAIAdapter for the OpenAI chat completions API (apikey-chatgpt.txt
or OPENAI_API_KEY). Set default model to arcee-ai/trinity-large-preview:free
for the infospace pipeline and increase max_tokens from 4096 to 8192.
Reprocess chapter 05 with Trinity Large (was Gemini: 1 truncated entity,
now 19 complete entities). Process chapters 06 (Aurora Alpha, 10 entities)
and 07 (Trinity Large, 15 entities including regenerated violent-policy.md).
Canonical set now at 85 unique entities.
Add entity archive policy: entities are never silently deleted. Retired
entities move to output/entities/archive/ with a dated reason header.
New CLI option: --archive-entity <slug> --reason "...". The --list
output shows the archive count alongside the canonical set.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add GeminiAdapter calling Google's Generative Language REST API
(default model: gemini-2.5-flash). Register "gemini" as third
provider in the factory and CLI. Add rate-limit retry with
exponential backoff to the pipeline's _call_llm helper. Increase
default max_tokens from 2000 to 4096.
Process book-1-chapter-05 via Gemini free tier — 1 new entity
extracted (necessaries-conveniencies-and-amusements-of-life),
41 existing entities correctly skipped by dedup. Canonical set
now at 42 unique entities.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Restructure entity storage from per-chapter subdirectories to a flat
canonical set in output/entities/. Each entity exists as a single file;
duplicates across chapters are detected by slug collision and skipped
(first occurrence wins). Chapter views use {{ include }} transclusion
to reference shared entity files.
Add @{existing_entities} macro to extract-entities template so the LLM
knows which entities already exist and focuses on genuinely new ones.
Refactor _call_llm() from _execute_llm() for callers that handle their
own file I/O. 41 unique entities from 4 chapters (2 duplicates removed).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comprehensive walkthrough covering schema design, prompt templates,
artifact population, pipeline usage, LLM integration, git history
tracking, metrics, and how to complete the remaining 31 chapters.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
All 3 stages (entities, mappings, analysis) auto-generated.
1m53s wall time, 9,478 tokens (real), ~$0.07 est. cost.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Auto-generated mappings and analysis via Claude Code CLI adapter.
Entities were already present from a previous session.
Stats: 5m04s wall time, ~51K estimated tokens, ~$0.35 estimated cost.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implements markitect/llm/ package with concrete LLMAdapter implementations:
- OpenRouterAdapter: HTTP via urllib with retry/backoff on 429/5xx
- ClaudeCodeAdapter: subprocess-based Claude CLI with stdin piping
- Factory pattern: create_adapter("openrouter") or create_adapter("claude-code")
- API key resolution chain: constructor > env var > project-root key file
- 42 unit tests, 2 integration tests (gated on API key / CLI availability)
Also adds the infospace-with-history example with Wealth of Nations VSM
analysis pipeline, templates, schemas, source chapters, and processed
output for chapters 1-2. process_chapters.py now supports --provider
and --model flags for automatic LLM-driven processing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This example demonstrates the full workflow of generating InfoTech primers
using MarkiTect's Prompt Dependency Resolution infrastructure.
Features demonstrated:
- Artifact creation and storage with content-based addressing
- PromptTemplate with @{macro} resolution across multiple spaces
- Automatic dependency tracking and graph construction
- Provenance tracing from outputs back to inputs
- Visualization export (Mermaid format)
- Incremental execution with change detection
Files added:
- generate_primers.py: Complete working example
- README.md: Quick start guide and architecture overview
- TUTORIAL.md: Comprehensive 500+ line tutorial
- templates/generate-primer.md: Template with macros
- artifacts/topics/: ETL and Microservices topic definitions
- artifacts/guidelines/: Authoring rules and research protocol
- prepdr/: Original manual system (preserved for reference)
Example output:
- Generates 2 primers (ETL, Microservices)
- Creates 8 artifacts across 4 information spaces
- Records 8 dependency edges in SQLite database
- Exports dependency graph visualization
Run with: cd examples/content-generator && python generate_primers.py
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit closes the schema-evolution topic (260105) by adding the final
deliverable (ADR schema) and fixing markdown schema support across commands.
**ADR Schema Created**:
- Comprehensive Architecture Decision Record validation schema
- 12 section classifications (7 required, 2 recommended, 2 optional, 3 improper/discouraged)
- Content pattern validation for ADR formatting rules (status dates, decision statements, rationale structure)
- Quality metrics for completeness (word counts, sentence counts)
- Follows title case naming convention (Status, Context, Decision, etc.)
**Markdown Schema Support Fixed**:
- Fixed `markitect validate` command to support .md schemas
- Added load_schema_from_path() for both .json and .md files
- Updated structural and semantic validation to use schema dict
- Fixed `markitect generate-stub` command to support .md schemas
- Uses load_schema_from_path() instead of direct JSON loading
- Created DocumentWrapper class in semantic_validator.py
- Extracts headings from AST tokens (heading_open, inline)
- Provides get_headings_by_level() interface expected by validators
- Enables section validation to work with real documents
**Topic Closure**:
- Updated SCHEMA_EVOLUTION_WORKPLAN.md with completion summary
- Phases 1-3: 100% complete (via Schema-of-Schemas and Semantic Validation)
- Phase 4: Deferred as future enhancement (15-20 sessions)
- Phase 5: 70% complete (docs done, CI/CD templates deferred)
- Created DONE.md with comprehensive task checklist
- Generated ADR template stub (examples/templates/adr-template.md)
- Moved topic from roadmap/ to history/260105-schema-evolution/
**Files Changed**:
- markitect/cli.py: Added markdown schema support to validate and generate-stub
- markitect/semantic_validator.py: Added DocumentWrapper class for AST parsing
- markitect/schemas/adr-schema-v1.0.md: New ADR validation schema (560 lines)
- examples/templates/adr-template.md: Generated ADR template stub
- history/260105-schema-evolution/: Moved completed topic to history
**Status**: Schema evolution topic successfully closed with ADR schema as final deliverable.
All schema commands now support markdown schemas. Section validation working correctly.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Updated example documentation to use the new schema-of-schemas standard
with markdown schema format and multi-schema validation commands.
**Manpage Example Updates:**
- Changed schema reference from markdown-manpage-schema.json to manpage-schema-v1.0.md
- Updated all commands to use new multi-schema validation syntax
- Added examples of number-based validation (markitect schema-validate 2)
- Added examples of batch validation (--all, ranges, lists)
- Updated integration examples (CI/CD, pre-commit hooks, Makefile)
- Documented schema registry workflow
**Terminology Example Updates:**
- Changed schema reference from terminology-schema.json to terminology-schema-v1.0.md
- Updated all validation commands to use new CLI syntax
- Added examples of schema-list and numbered selection
- Added batch validation examples
- Updated GitHub Actions and pre-commit hook examples
- Documented schema registry access methods
**Key Changes:**
- All schema filenames now follow {domain}-schema-v{major}.{minor}.md convention
- Commands use schema registry with numbered or filename selection
- Batch validation examples added throughout
- Integration examples updated to new standard
- Documentation reflects markdown-first schema format
All schemas validated successfully against metaschema.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit completes Phase 2 of schema evolution work and establishes
a new example demonstrating schema usage for terminology documents.
## New Features
### Terminology Validation Example (examples/terminology/)
- Complete example terminology document with proper structure
- JSON schema with MarkiTect extensions for validation
- Demonstrates schema usage beyond manpages (glossaries, lexicons)
- Validates term structure: Definition, Synonyms, Related Terms, Examples
- Includes content control and quality validation rules
- Full documentation with usage examples and best practices
### Schema Registration System
- Registered terminology schema in markitect database
- Created schema catalog (markitect/schemas/schema-catalog.yaml)
- Copied schema to official location (markitect/schemas/)
- Provides metadata, features, and usage info for all schemas
### Improved schema-list Command
- Now displays creation timestamps in default output
- Table format includes Created/Updated columns
- Cleaner timestamp formatting (removed microseconds)
- Better visibility into when schemas were added
## Files Changed
Added:
- examples/terminology/README.md - Complete documentation
- examples/terminology/terminology-example.md - Example glossary
- examples/terminology/terminology-schema.json - Validation schema
- markitect/schemas/terminology-schema.json - Registered schema
- markitect/schemas/schema-catalog.yaml - Schema registry
Modified:
- markitect/cli.py - Enhanced schema-list with timestamps
- TODO.md - Documented Phase 2 completion and new example
Moved:
- SCHEMA_EVOLUTION_WORKPLAN.md → todo/ directory
## Schema Features Demonstrated
- Heading hierarchy validation (H1 → H2 → H3)
- Term structure validation with required/optional fields
- Content quality metrics (word counts, readability targets)
- MarkiTect extensions (x-markitect-sections, x-markitect-content-control)
- Classification system (required/recommended/optional/discouraged/improper)
## Usage
```bash
# List schemas with timestamps
markitect schema-list
# Validate terminology document
markitect validate glossary.md --schema terminology-schema.json
# View in table format
markitect schema-list --format table
```
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Comprehensively document the new classification system and content control
features added in Phase 1.
## Documentation Updates
### New Content Added
**1. Updated MarkiTect Extensions Section**
- Replaced deprecated x-markitect-required/recommended-sections
- Documented x-markitect-sections with five classification levels
- Documented x-markitect-content-control for content validation
**2. Added Section Classification System (150+ lines)**
- Detailed explanation of all five classification levels:
- required: Missing = ERROR
- recommended: Missing = WARNING
- optional: No validation impact
- discouraged: Present = WARNING
- improper: Present = ERROR
- Validation behavior for each classification
- JSON examples for each level
**3. Added Content Control Documentation**
- Pattern validation (required/discouraged/forbidden)
- Content quality metrics (word count, readability targets)
- Content instructions for authors
- Complete examples with explanations
**4. Updated Schema Design Best Practices**
- Replaced old extension examples with new classification system
- Added guidance on choosing appropriate classifications
- Examples showing required, recommended, optional, discouraged, improper
**5. Added Classification System Example**
- Complete working schema demonstrating all features
- Validation scenarios showing different outcomes
- Integration of sections and content-control extensions
## Changes Summary
**Lines Added**: ~200 lines of new documentation
**Sections Updated**: 4 major sections
**Examples Added**: 8 new code examples
**Key Topics Covered**:
- Five-level classification system (required → improper)
- Content pattern validation
- Quality metrics and readability targets
- Content instructions for document authors
- Validation behavior for each classification
- Complete working examples
## Validation
✅ Manual validates against improved markdown-manpage-schema.json
✅ All new features documented with examples
✅ Backward compatibility maintained
✅ Self-documenting: manual uses the features it documents
The manual now comprehensively documents the Phase 1 enhanced schema
system while itself validating against a schema using those features.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add comprehensive example showcasing schema validation with self-documenting
manpage system:
- markdown-manpage-schema.json: Reusable schema for Unix manpage structure
- markdown-schema-validation.1.md: Complete manual about schema validation
- README.md: Usage guide, integration examples, and best practices
- SCHEMA_EVOLUTION_WORKPLAN.md: Roadmap for enhanced schema system
The manual validates against its own schema, demonstrating dogfooding
principle. Workplan outlines 5-phase evolution from rigid structural
validation to flexible content control with blueprints.
Key features demonstrated:
- Schema-driven documentation structure
- Self-validating documentation
- Reusable validation patterns
- Classification system design (required/recommended/optional/discouraged/improper)
This sets foundation for Phase 1 implementation: enhanced schema format
with section classification and content control.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add Design Pattern Documentation:
- Add CopyFirstMigration.md - Documents the copy-first migration principle
used in the TestDrive-JSUI capability migration
- Add DontRepeatYourself.md - Documents the DRY principle
- Add DesignPrincipleSchema.json - JSON schema for design pattern documentation
Update Submodule:
- Update testdrive-jsui submodule pointer to include Phase 4 documentation
(migration completion with legacy file cleanup)
Context:
These design pattern examples document the principles applied during the
successful TestDrive-JSUI migration, which serves as a reference implementation
of the copy-first migration pattern.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Asset Management System (Issue #142):
- Add complete asset management framework with deduplication
- Implement AssetManager, AssetRegistry, and AssetDeduplicator classes
- Add AssetPackager for markdown document packaging
- Create comprehensive test suite for all asset management components
- Add asset constants and custom exceptions for robust error handling
Markdown Processing Enhancements:
- Update markdown_commands.py with improved functionality
- Enhanced parsing and content aggregation capabilities
- Improved filename encoding/decoding for special characters
Test Suite Improvements:
- Add comprehensive tests for Issue #138 markdown parsing
- Enhance Issue #139 content aggregation and end-to-end testing
- Complete test coverage for new asset management features
Examples and Documentation:
- Update BildungsKanonJon.md example with enhanced content
- Generate corresponding HTML output for documentation
- Add asset registry configuration
Development Tools:
- Add install script for simplified setup
This commit represents a major enhancement to MarkiTect's asset handling
capabilities with full test coverage and improved markdown processing.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Comprehensive analysis and implementation concepts for handling images and file includes
with automatic deduplication based on MarkdownPackageFormats wiki study.
## Two Complete Concepts Delivered
### Concept A: Hash-Based Asset Store
- Content-addressable storage using SHA-256 hashes
- SQLite database for virtual name mapping and metadata
- Perfect deduplication regardless of filename
- Hash-based directory structure for optimal storage
- Working prototype with 47 KB of implementation code
### Concept B: Package + Symlinks System (RECOMMENDED)
- ZIP-based .mdpkg packages following wiki standards
- Symlink-based deduplication in shared asset library
- Compatible with standard tools and workflows
- Visual transparency and tool integration
- Working prototype with 51 KB of implementation code
## Key Features Demonstrated
- ✅ Content deduplication: Same image content → single storage
- ✅ Multiple names: Different filenames for identical content
- ✅ Database integration: Asset metadata queryable and indexed
- ✅ Package portability: ZIP-based distribution format
- ✅ Working demos: Both concepts fully functional
## Analysis Results
- **Perfect Deduplication**: Both concepts eliminate duplicate content storage
- **Implementation Complexity**: Concept B more approachable, Concept A more efficient
- **Platform Compatibility**: Concept A universal, Concept B symlink-dependent
- **User Experience**: Concept B familiar workflows, Concept A requires tooling
## Technical Approach
- Based on MarkdownPackageFormats wiki standards (.mdpkg, .mdz formats)
- Python standard library (hashlib, sqlite3, zipfile, pathlib)
- Content-addressable storage patterns for efficiency
- Manifest-based metadata for package integrity
## Recommendations
1. **Start with Concept B** for rapid prototyping and user acceptance
2. **Evolve to hybrid approach** incorporating Concept A's hash-based efficiency
3. **Follow .mdpkg standards** for interoperability with emerging ecosystem
4. **Implement CLI integration** for seamless markitect workflow
Both concepts solve the core requirements with working prototypes and clear trade-offs.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- Created invoice template demonstrating business document requirements
- Added design pattern example showing knowledge management use case
- Included sample data file for template + data scenarios
- Comprehensive gap analysis identifying 6 critical tooling limitations
- Documented 3-phase development roadmap for enhanced capabilities
- Based on Issue #63 use case brainstorming requirements
Key gaps identified:
1. Template engine for dynamic document generation
2. Calculation system for mathematical operations
3. Batch processing for multi-document workflows
4. External data integration capabilities
5. Cross-document relationship management
6. Advanced output format support
Ready for requirements engineering and epic decomposition.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
## Major Integration
- ✅ Integrated Requirements Engineering Agent into development workflow
- ✅ Enhanced Makefile with requirements validation targets
- ✅ Added pre-commit validation with mock compatibility checking
- ✅ Enhanced TDD workflow to include foundation analysis
## Test Fixes
- ✅ Fixed GiteaPlugin missing _add_comment_async method
- ✅ Fixed LocalPlugin config.yml file not found errors in tests
- ✅ Enhanced mock objects in CLI tests with proper domain model attributes
- ✅ All Issue #59 tests now passing (38/38 tests pass)
## New Capabilities
- `make validate-requirements` - Foundation analysis before development
- `make check-interface-compatibility INTERFACE=Name` - Interface compatibility checking
- `make generate-dev-checklist FEATURE='Name'` - Development checklist generation
- `make validate-mocks` - Mock object compatibility validation
- `make pre-commit-validate` - Complete pre-commit validation workflow
## Problem Prevention
This integration prevents the exact interface compatibility issues and mock object
mismatches that caused hours of debugging in Issue #59. The Requirements Engineering
Agent provides proactive foundation analysis and catches problems before they occur.
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>