tegwick 7c38f9b427 merge(reprocess-v2): complete pipeline rewrite and full corpus processing
Merges the reprocess-v2 branch into main, covering:

Infrastructure changes:
- markitect infospace process — new CLI command for batch source processing
- SourcePipeline — @{macro} substitution, skip-if-exists, git commit per source
- PipelineStage config extended with name, output_dir, output_macro,
  split_entities, macros, max_tokens fields
- Per-stage max_tokens (extract=8k, map-to-vsm=10k, synthesize=4k)
- LLM provenance comment in each new entity file
- output/processing-log.yaml with per-source token/cost/duration/retry stats
- Retry on all LLM errors (not just rate limits) with 5s back-off
- C2 coverage: add domain_densities, density_std, cross_cutting_ratio

Example (infospace-with-history):
- All 35 chapters processed: 1021 entities across Books 1–5
- Per-chapter git commits showing metric evolution from 0 → final state
- Final metrics: coverage=0.44, granularity=2.95, redundancy=0.006
- METRICS-METHODOLOGY.md C2 section corrected and expanded

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-20 00:11:39 +01:00
2025-12-17 23:08:02 +01:00
2025-10-03 03:39:43 +02:00
2025-10-03 03:39:43 +02:00

MarkiTect Documentation

Welcome to the MarkiTect documentation. This directory contains comprehensive documentation for developers, users, and contributors.

Documentation Structure

📐 Architecture Documentation (architecture/)

Deep technical documentation about system design, performance, and implementation details.

  • Capabilities Architecture - Critical: How capabilities work as independent git submodules and separation of concerns
  • Caching System - Why and how MarkiTect's AST caching delivers 60-85% performance improvements
  • Coming soon: Database Schema, CLI Architecture

👥 User Guides (user-guides/)

End-user documentation for working with MarkiTect CLI and features.

  • Coming soon: Getting Started, Command Reference, Best Practices

🔧 Development Documentation (development/)

Documentation for contributors and developers extending MarkiTect.

  • Coming soon: Contributing Guide, Testing Strategy, Release Process

For Users

For Developers

Project Management

Key Concepts

Core Architecture Principles

  1. Parse Once, Use Many Times - AST caching for 60-85% performance improvement
  2. Convention Over Configuration - Sensible defaults with minimal setup
  3. Schema-Driven Processing - Structured markdown with validation
  4. Relational Metadata - Database-powered document relationships

Performance Philosophy

MarkiTect treats markdown documents as structured, queryable data rather than plain text. This approach enables:

  • Lightning-fast document processing through intelligent caching
  • Complex querying and relationship management
  • Schema validation and consistency enforcement
  • Scalable performance that grows with your content

Contributing to Documentation

Documentation follows the same quality standards as code:

  1. Clear Structure - Logical organization and navigation
  2. Practical Examples - Real-world usage patterns
  3. Performance Context - Why architectural decisions matter
  4. User-Focused - Written for the intended audience

Documentation Standards

  • Use clear, concise language
  • Include practical examples
  • Explain the "why" behind design decisions
  • Keep technical accuracy as the highest priority
  • Update docs when changing functionality

This documentation is maintained alongside the codebase. For the most current information, always refer to the latest version in the repository.

Description
An advanced markdown engine
https://coulomb.social/open/MarkiTect
Readme 34 MiB
2025-11-08 20:34:42 +00:00
Languages
Python 84.7%
JavaScript 8%
HTML 5.6%
Makefile 1.3%
Shell 0.2%
Other 0.1%