tegwick e3e5b8ecc1
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
feat(infospace): systematic long-text processing — rich commit bodies, per-source eval/classify, chapters view
Three coordinated changes that let the pipeline produce a clean
chapter-by-chapter git history on long texts without archaeology after
the fact.

1. Richer commit messages. `SourcePipeline._git_commit` now diffs the
   staged changes, buckets added files by output subdirectory (entities,
   evaluations, classifications, mappings, analyses, metrics, logs), and
   includes counts in the commit body. So `git log` reads "entities:
   +23, evaluations: +23" per chapter instead of the same generic blurb
   on every commit. Zero behaviour change when no output changed; falls
   back to the original message if the diff query fails.

2. --eval-after-source / --classify-after-source on `infospace process`.
   After a source's stages succeed, the pipeline identifies which entity
   files are *new* (set diff of entity slugs before vs after), loads
   their EntityMeta, and runs per-entity evaluation and/or
   classification scoped to just those slugs before the per-source git
   commit lands. Result: each chapter's commit is self-contained —
   extraction + evaluation + classification in one atomic unit. Gated
   behind explicit flags because the cost is real (LLM latency per
   chapter rather than amortised across one bulk batch).

3. `markitect infospace chapters` subcommand. Lists source files in
   canonical order with entity count, evaluated count, classified
   count, and mean per-entity score per source. Text or JSON output.
   Natural triage surface for long-text infospaces — spot chapters that
   under-extracted or evaluated poorly.

Also: `docs/advanced-usage.md` gets a new "Systematic processing of
long texts" section with the recommended flag combo and the tradeoff
note on cost.

11 new unit tests cover the chapters command (text/json/no-sources),
the process flag wiring (help + provider requirement), and the
commit-body bucket logic. Full infospace+llm unit suite (315 tests)
green; 3 pre-existing infospace failures unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 08:24:26 +02:00
2025-12-17 23:08:02 +01:00
2025-10-03 03:39:43 +02:00
2025-10-03 03:39:43 +02:00
2026-03-25 00:11:46 +01:00

MarkiTect Documentation

Welcome to the MarkiTect documentation. This directory contains comprehensive documentation for developers, users, and contributors.

Documentation Structure

📐 Architecture Documentation (architecture/)

Deep technical documentation about system design, performance, and implementation details.

  • Capabilities Architecture - Critical: How capabilities work as independent git submodules and separation of concerns
  • Caching System - Why and how MarkiTect's AST caching delivers 60-85% performance improvements
  • Coming soon: Database Schema, CLI Architecture

👥 User Guides (user-guides/)

End-user documentation for working with MarkiTect CLI and features.

  • Coming soon: Getting Started, Command Reference, Best Practices

🔧 Development Documentation (development/)

Documentation for contributors and developers extending MarkiTect.

  • Coming soon: Contributing Guide, Testing Strategy, Release Process

For Users

For Developers

Project Management

Key Concepts

Core Architecture Principles

  1. Parse Once, Use Many Times - AST caching for 60-85% performance improvement
  2. Convention Over Configuration - Sensible defaults with minimal setup
  3. Schema-Driven Processing - Structured markdown with validation
  4. Relational Metadata - Database-powered document relationships

Performance Philosophy

MarkiTect treats markdown documents as structured, queryable data rather than plain text. This approach enables:

  • Lightning-fast document processing through intelligent caching
  • Complex querying and relationship management
  • Schema validation and consistency enforcement
  • Scalable performance that grows with your content

Contributing to Documentation

Documentation follows the same quality standards as code:

  1. Clear Structure - Logical organization and navigation
  2. Practical Examples - Real-world usage patterns
  3. Performance Context - Why architectural decisions matter
  4. User-Focused - Written for the intended audience

Documentation Standards

  • Use clear, concise language
  • Include practical examples
  • Explain the "why" behind design decisions
  • Keep technical accuracy as the highest priority
  • Update docs when changing functionality

This documentation is maintained alongside the codebase. For the most current information, always refer to the latest version in the repository.

Description
An advanced markdown engine
https://coulomb.social/open/MarkiTect
Readme 34 MiB
2025-11-08 20:34:42 +00:00
Languages
Python 84.7%
JavaScript 8%
HTML 5.6%
Makefile 1.3%
Shell 0.2%
Other 0.1%