e3e5b8ecc1ffd41847141bf178bb90098ff483ac
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
Three coordinated changes that let the pipeline produce a clean chapter-by-chapter git history on long texts without archaeology after the fact. 1. Richer commit messages. `SourcePipeline._git_commit` now diffs the staged changes, buckets added files by output subdirectory (entities, evaluations, classifications, mappings, analyses, metrics, logs), and includes counts in the commit body. So `git log` reads "entities: +23, evaluations: +23" per chapter instead of the same generic blurb on every commit. Zero behaviour change when no output changed; falls back to the original message if the diff query fails. 2. --eval-after-source / --classify-after-source on `infospace process`. After a source's stages succeed, the pipeline identifies which entity files are *new* (set diff of entity slugs before vs after), loads their EntityMeta, and runs per-entity evaluation and/or classification scoped to just those slugs before the per-source git commit lands. Result: each chapter's commit is self-contained — extraction + evaluation + classification in one atomic unit. Gated behind explicit flags because the cost is real (LLM latency per chapter rather than amortised across one bulk batch). 3. `markitect infospace chapters` subcommand. Lists source files in canonical order with entity count, evaluated count, classified count, and mean per-entity score per source. Text or JSON output. Natural triage surface for long-text infospaces — spot chapters that under-extracted or evaluated poorly. Also: `docs/advanced-usage.md` gets a new "Systematic processing of long texts" section with the recommended flag combo and the tradeoff note on cost. 11 new unit tests cover the chapters command (text/json/no-sources), the process flag wiring (help + provider requirement), and the commit-body bucket logic. Full infospace+llm unit suite (315 tests) green; 3 pre-existing infospace failures unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
MarkiTect Documentation
Welcome to the MarkiTect documentation. This directory contains comprehensive documentation for developers, users, and contributors.
Documentation Structure
📐 Architecture Documentation (architecture/)
Deep technical documentation about system design, performance, and implementation details.
- Capabilities Architecture - Critical: How capabilities work as independent git submodules and separation of concerns
- Caching System - Why and how MarkiTect's AST caching delivers 60-85% performance improvements
- Coming soon: Database Schema, CLI Architecture
👥 User Guides (user-guides/)
End-user documentation for working with MarkiTect CLI and features.
- Coming soon: Getting Started, Command Reference, Best Practices
🔧 Development Documentation (development/)
Documentation for contributors and developers extending MarkiTect.
- Coming soon: Contributing Guide, Testing Strategy, Release Process
Quick Links
For Users
- Installation & Setup
- Command Reference (coming soon)
- Performance Guide (coming soon)
For Developers
- Architecture Overview - System design and component relationships
- Development Setup - Local development environment
- API Documentation (coming soon)
Project Management
- Project Status - Current development status
- Roadmap - Strategic development plan
- Current Tasks - Task management using Keep a Todofile format
Key Concepts
Core Architecture Principles
- Parse Once, Use Many Times - AST caching for 60-85% performance improvement
- Convention Over Configuration - Sensible defaults with minimal setup
- Schema-Driven Processing - Structured markdown with validation
- Relational Metadata - Database-powered document relationships
Performance Philosophy
MarkiTect treats markdown documents as structured, queryable data rather than plain text. This approach enables:
- Lightning-fast document processing through intelligent caching
- Complex querying and relationship management
- Schema validation and consistency enforcement
- Scalable performance that grows with your content
Contributing to Documentation
Documentation follows the same quality standards as code:
- Clear Structure - Logical organization and navigation
- Practical Examples - Real-world usage patterns
- Performance Context - Why architectural decisions matter
- User-Focused - Written for the intended audience
Documentation Standards
- Use clear, concise language
- Include practical examples
- Explain the "why" behind design decisions
- Keep technical accuracy as the highest priority
- Update docs when changing functionality
This documentation is maintained alongside the codebase. For the most current information, always refer to the latest version in the repository.
Description
Releases
1
MarkiTect 0.8.0
Latest
Languages
Python
84.7%
JavaScript
8%
HTML
5.6%
Makefile
1.3%
Shell
0.2%
Other
0.1%