7c38f9b427b0fd379fc6eafc1a26b338aa0c6eb0
Merges the reprocess-v2 branch into main, covering:
Infrastructure changes:
- markitect infospace process — new CLI command for batch source processing
- SourcePipeline — @{macro} substitution, skip-if-exists, git commit per source
- PipelineStage config extended with name, output_dir, output_macro,
split_entities, macros, max_tokens fields
- Per-stage max_tokens (extract=8k, map-to-vsm=10k, synthesize=4k)
- LLM provenance comment in each new entity file
- output/processing-log.yaml with per-source token/cost/duration/retry stats
- Retry on all LLM errors (not just rate limits) with 5s back-off
- C2 coverage: add domain_densities, density_std, cross_cutting_ratio
Example (infospace-with-history):
- All 35 chapters processed: 1021 entities across Books 1–5
- Per-chapter git commits showing metric evolution from 0 → final state
- Final metrics: coverage=0.44, granularity=2.95, redundancy=0.006
- METRICS-METHODOLOGY.md C2 section corrected and expanded
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MarkiTect Documentation
Welcome to the MarkiTect documentation. This directory contains comprehensive documentation for developers, users, and contributors.
Documentation Structure
📐 Architecture Documentation (architecture/)
Deep technical documentation about system design, performance, and implementation details.
- Capabilities Architecture - Critical: How capabilities work as independent git submodules and separation of concerns
- Caching System - Why and how MarkiTect's AST caching delivers 60-85% performance improvements
- Coming soon: Database Schema, CLI Architecture
👥 User Guides (user-guides/)
End-user documentation for working with MarkiTect CLI and features.
- Coming soon: Getting Started, Command Reference, Best Practices
🔧 Development Documentation (development/)
Documentation for contributors and developers extending MarkiTect.
- Coming soon: Contributing Guide, Testing Strategy, Release Process
Quick Links
For Users
- Installation & Setup
- Command Reference (coming soon)
- Performance Guide (coming soon)
For Developers
- Architecture Overview - System design and component relationships
- Development Setup - Local development environment
- API Documentation (coming soon)
Project Management
- Project Status - Current development status
- Roadmap - Strategic development plan
- Current Tasks - Task management using Keep a Todofile format
Key Concepts
Core Architecture Principles
- Parse Once, Use Many Times - AST caching for 60-85% performance improvement
- Convention Over Configuration - Sensible defaults with minimal setup
- Schema-Driven Processing - Structured markdown with validation
- Relational Metadata - Database-powered document relationships
Performance Philosophy
MarkiTect treats markdown documents as structured, queryable data rather than plain text. This approach enables:
- Lightning-fast document processing through intelligent caching
- Complex querying and relationship management
- Schema validation and consistency enforcement
- Scalable performance that grows with your content
Contributing to Documentation
Documentation follows the same quality standards as code:
- Clear Structure - Logical organization and navigation
- Practical Examples - Real-world usage patterns
- Performance Context - Why architectural decisions matter
- User-Focused - Written for the intended audience
Documentation Standards
- Use clear, concise language
- Include practical examples
- Explain the "why" behind design decisions
- Keep technical accuracy as the highest priority
- Update docs when changing functionality
This documentation is maintained alongside the codebase. For the most current information, always refer to the latest version in the repository.
Description
Releases
1
MarkiTect 0.8.0
Latest
Languages
Python
84.7%
JavaScript
8%
HTML
5.6%
Makefile
1.3%
Shell
0.2%
Other
0.1%