markitect-main

Files

tegwick 7c38f9b427 merge(reprocess-v2): complete pipeline rewrite and full corpus processing

Merges the reprocess-v2 branch into main, covering:

Infrastructure changes:
- markitect infospace process — new CLI command for batch source processing
- SourcePipeline — @{macro} substitution, skip-if-exists, git commit per source
- PipelineStage config extended with name, output_dir, output_macro,
  split_entities, macros, max_tokens fields
- Per-stage max_tokens (extract=8k, map-to-vsm=10k, synthesize=4k)
- LLM provenance comment in each new entity file
- output/processing-log.yaml with per-source token/cost/duration/retry stats
- Retry on all LLM errors (not just rate limits) with 5s back-off
- C2 coverage: add domain_densities, density_std, cross_cutting_ratio

Example (infospace-with-history):
- All 35 chapters processed: 1021 entities across Books 1–5
- Per-chapter git commits showing metric evolution from 0 → final state
- Final metrics: coverage=0.44, granularity=2.95, redundancy=0.006
- METRICS-METHODOLOGY.md C2 section corrected and expanded

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-20 00:11:39 +01:00

artifacts

fix(example): standardise domain enum and source chapter format in schema/rules

2026-02-19 13:02:05 +01:00

output

chore(example): final metrics snapshot — all 35 chapters processed

2026-02-19 22:54:54 +01:00

schemas

fix(example): standardise domain enum and source chapter format in schema/rules

2026-02-19 13:02:05 +01:00

templates

fix(pipeline): retry on 0-entity response, save raw debug, improve template

2026-02-19 14:26:28 +01:00

infospace.yaml

config(example): increase extract-entities max_tokens to 8000

2026-02-19 18:48:33 +01:00

INFRA-TASKS.md

docs: metrics methodology, collection-level tasks, and infospace tooling roadmap

2026-02-18 23:53:21 +01:00

METRICS-METHODOLOGY.md

docs(metrics): clarify C2 coverage — domain×chapter matrix, not domain×VSM

2026-02-20 00:08:46 +01:00

process_chapters.py

feat(example): migrate to infospace config with tooling integration (S3.1)

2026-02-19 02:29:53 +01:00

README.md

feat(llm): add Gemini adapter and process book-1-chapter-05

2026-02-11 22:54:37 +01:00

TUTORIAL.md

docs(example): rewrite tutorial for infospace tooling (S3.4)

2026-02-19 11:11:45 +01:00

README.md

This example provides a tutorial and reference experiment for how to set up a viable infospace with history using markitect.

The task is to capture the knowledge from Adam Smith's The Wealth of Nations available digitally in the public domain as a transcript of the original text and transform and extend it to a collection of concepts and entities from a systems theoretical point of view based on Stafford Beer's Viable System Model that is consistent and complete.

The tutorial should explain how to use the concept of schemas to provide a scaffolding for how to structure the necessary information entities and define a set of prompts and instructions using the prompt dependency resolution infrastructure to incrementally inject chapters of the book.

The information space should utilize the option of keeping changes as git history. And define metrics for completeness and consistency.

While running the experiment no changes must be made to the markitect infrastructure.

If demand for optimization or fixing errors occurs, a list of corresponding tasks should be generated. It will be used to optimize the markitect infrastructure to then rerun the experiment to optimize tooling and infospace over time and again.

--worsch, 10th Feb. 2026