All three stages of the infospace tooling roadmap are complete. The Wealth
of Nations / VSM example passes 6/6 viability thresholds on 988 entities,
and composition is demonstrated via the supply-chain-vsm example.
- Parent roadmap (roadmap/infospace-tooling/PLAN.md): header now shows the
closed status with final validation metrics.
- S3 close-out plan (roadmap/infospace-s3-closeout/PLAN.md): records the
final task dispositions. C.1–C.6 and C.8 done; C.7 (clean per-chapter
git history) is deferred indefinitely — the task was cosmetic, its
prerequisite branch no longer exists, and reconstructing 35 archival
commits would not change any output files. Rationale documented inline.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- `markitect infospace entity <name>`: single-entity lookup tolerating
hyphens/underscores/case, with substring matching, ambiguity listing,
and near-match hints. Prints slug, source path, domain, chapter, word
count, VSM system, overall score, evaluator, and evaluation file path.
- `markitect infospace evaluate --model-fallback <model>`: if any
entities fail with a rate-limit error, retry just those with a fresh
adapter on the fallback model (different free-tier models have
separate quota buckets).
- `markitect llm-check`: advisory when `OPENROUTER_API_KEY` is set but
not used by the resolved provider; targeted hint when OpenRouter
returns 401 (almost always a stale env key).
- `build_state`: raises `TypeError` with actionable message if passed a
path instead of an `InfospaceConfig` — prior failure mode was a
confusing `AttributeError` deep in the stack.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Five improvements that eliminate most of the agent-in-the-loop friction
observed while closing out the 988-entity WoN evaluation (C.1):
1. Gemini adapter now retries on 429 + 5xx with exponential backoff
(same pattern already used by OpenRouter/OpenAI). Removes the need
for shell-level retry wrappers when hitting free-tier rate limits.
2. evaluate CLI prints the underlying error ("ERROR — HTTP 503 …")
instead of a bare "ERROR", so agents don't have to drop into Python
to diagnose transient failures.
3. --entity/--chapter now respect existing evaluation files by default
(previously only the full-collection pass did). New --force flag
opts into re-evaluation. Stops silently burning free-tier quota on
re-runs of the same slug.
4. --entity accepts hyphenated slugs (matching entity filenames) and
normalizes them to the underscore form used on disk. On a miss the
CLI suggests near matches instead of a bare "not found".
5. eval-summary --update-metrics is no longer destructive:
read_metrics_file/write_metrics_file preserve structured values
(type_distribution) and don't flatten ints to floats. Fixes a
silent data loss observed on every run.
Bonus: the evaluator field in written evaluation frontmatter now
falls back from run_config.model_name to the adapter's resolved model
(or the model echoed back in the API response), so rows no longer
show `evaluator: null` when --model is omitted.
Tests: new tests/unit/llm/test_gemini.py covers retry behavior;
tests/unit/infospace/test_history.py gains a round-trip test that
pins the type_distribution / int-preservation invariants.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fills the 988 entity / 985 evaluation gap in the Wealth of Nations
infospace. Entities advanced_state_of_society, bank_notes, and
bank_systemic_risk_management had no evaluation files; runs through
Gemini (2.5-flash / 2.5-flash-lite for the last one, which hit the
free-tier RPM limit) bring the eval count to 988.
per_entity_mean nudged from 3.955635 to 3.95668; viability still
6/6 PASS.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Closes out three docs tasks from roadmap/infospace-s3-closeout/PLAN.md:
- examples/infospace-with-history/docs/advanced-usage.md (C.4) — 5 worked
patterns covering incremental eval, re-eval workflow (no --force flag
exists; documents the rm-then-re-run pattern instead), interpreting the
eval-summary distribution, triaging low scorers via an awk pipeline
over overall_score (since `entities --sort-by score` does not exist),
and acting on check --json output.
- docs/composition-guide.md (C.5) — walks through how supply-chain-vsm
binds WoN as a discipline, then a step-by-step for creating a new
infospace that binds an existing one. Includes live output from
`markitect infospace disciplines`.
- examples/infospace-with-history/docs/performance-notes.md (C.6) — cites
the 6h 28m wall time of the 985-entity S3.3 batch, ~2.5 ent/min rate,
~2000–3000 tokens/entity estimate, word_overlap vs embedding backend
for redundancy checks, and a provider-by-scale recommendation table.
All commands in these docs were run against the live infospace at
commit time.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Finishes the in-progress rename so docs, configs, tests, and capability
manifests all reference the current repo name consistently. Fixes two
tests (test_roundtrip_consolidated.py, test_issue_140_roundtrip_simplified.py)
whose hardcoded cwd paths would have broken under the renamed directory.
Archival content under history/, reports/, and roadmap/eat-the-frog/, plus
derived artifacts (.venv_old/, node_modules/, asset_registry.json) are
intentionally left untouched.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the stub (State Hub integration only) with full dev commands,
module architecture overview, LLM config resolution chain, infospace
conventions, and active roadmap pointers. Removes CLAUDE.custodian.md
(superseded by the expanded CLAUDE.md).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Stage 1 — Decouple:
- Move RunConfig + LLMResponse to markitect/llm/models.py (canonical)
- Move LLMAdapter + Mock/ErrorLLMAdapter to markitect/llm/adapter.py
- markitect/prompts/execution/models.py and llm_adapter.py become re-export shims
- All 4 adapters + factory.py updated to import from markitect.llm.*
- Parameterize app_name in toml_config.py (resolve_llm, get_default_layers,
get_preference_layers): paths and env var now derived from app_name arg
- Add tests/test_llm_isolation.py: 7 isolation + backward-compat tests
Stage 2 — Extract:
- Standalone llm-connect package created at ~/llm-connect/
- All 18 llm files copied; markitect.* imports replaced with llm_connect.*
- LLMError base inlined in llm_connect/exceptions.py (no markitect dep)
- llm-connect installed into markitect-venv; declared in pyproject.toml
Smoke test: markitect llm-check succeeds (live Gemini API call).
Backward compat: markitect.prompts.execution.{models,llm_adapter} still work.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3-stage plan: decouple (RunConfig/LLMResponse move + app name
parameterization) → extract to standalone package → adopt in first
consumer. Registered as workstream in Custodian State Hub.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Registers markitect as a tracked domain in the Custodian State Hub.
Includes topic ID, session start/end protocol, and MCP tool reference.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a closing remark (23 Feb 2026) summarising the final state of the
infospace: 988 entities, 985 evaluations, 823 L2 classifications, 15 L3
relations, viability 6/6 PASS.
New open tasks 20–23:
20. Complete L2 classification batch (165 entities blocked on credits)
21. Run classify-links for 58 Relation-type entities
22. Refresh stale metrics-report.md narrative
23. Smoke-test the graph command end-to-end
Also committed: history.py fix — write_metrics_file now preserves
non-float metric values (type_distribution dict) instead of crashing
on round().
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
New graph_export.py module supporting the `markitect infospace graph`
command added in the previous commit.
- build_entity_graph(): constructs node/edge graph from L2 classifications
and L3 relation triplets, with feedback loop detection via networkx
- apply_filters(): subgraph filters by entity type, VSM system, ego
neighbourhood, feedback-loops-only, and classified-only
- to_mermaid(): Mermaid flowchart export
- Uses "-- label -->" syntax for all edges (robust with parentheses);
"== label ==>" thick arrows for feedback loop edges
- markdown_fence=True wraps output in ```mermaid block (VS Code / GitHub)
- color_by="type" or "vsm" with distinct palettes for each
- to_dot(): Graphviz DOT export with fillcolor per type/VSM system
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
INFRA-TASKS #5 — process_chapters.py now skips writing *-prompt.md files
when the corresponding output file already exists on disk. DB-only rebuilds
no longer dirty the working tree with unchanged prompt content.
INFRA-TASKS #8 — Added '## Quality Metrics' section to the entity and VSM
mapping schemas, defining the five evaluation dimensions (Definition Precision,
Source Grounding, Domain Placement, VSM Relevance, Explanatory Value) with
1–5 rubrics used by the evaluate-entity template.
Also updated INFRA-TASKS.md to reflect current resolution status for tasks
4–19 across S2 and S3.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
markitect helper <QUESTION> now works as a short alias for
markitect llm-helper, per the original plan specification.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix evaluate dimensions to match template file:
definition_precision, source_grounding, domain_placement,
vsm_relevance, explanatory_value (was domain_relevance,
discipline_alignment, conceptual_clarity)
- Add VSM background context to evaluation prompt so LLM can
score vsm_relevance without macro injection
- Fix model_name bug: was sending literal "default" to API (HTTP 400)
- Refactor run_entity_evaluation to write files incrementally via
callback rather than all at once after the batch — long runs are
now resumable if interrupted
- Add incremental skip in CLI: entities with existing eval files
are skipped automatically on re-run (acts as resume)
- Add eval-summary command: reads all eval files, shows per-dimension
means, optionally writes per_entity_mean to metrics.yaml
- Fix record_check_results to merge rather than overwrite metrics.yaml
so per_entity_mean survives subsequent check runs
- Add per_entity_mean viability threshold (min: 3.5) to infospace.yaml
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Demonstrates infospace composition: the Wealth of Nations infospace is
used as a discipline, applying Smith's economic framework as a lens to
analyse modern supply chain management concepts.
New example: examples/supply-chain-vsm/
- infospace.yaml binding WoN as discipline (../infospace-with-history)
- 3 source documents: coordination mechanisms, capital & inventory,
market structure (~400 words each, original content)
- supply-chain-entity-schema-v1.0.md with WoN Concept required section
- won-mapping-schema-v1.0.md with Conceptual Continuity rating
- artifacts/won-reference/core-entities.md — 12 curated WoN entities
for injection as discipline context
- 8 hand-crafted entity files demonstrating LLM output format
- 3 mapping files with full rationale and VSM inheritance chains
- Viable: YES (5/5 thresholds)
Key mappings demonstrated:
Demand Signal → Effectual Demand (Strong, S2)
Vendor-Managed Inventory → Division of Labour (Strong, S1/S2)
Just-in-Time Inventory → Circulating Capital (Strong, S1/S3)
Bullwhip Effect → Natural Price (Moderate, S2)
Platform Intermediary → Merchant Capital (Strong, S2/S4)
Monopsony Power → Combination of Masters (Strong, S3*)
Platform fix: entity_parser.py now recognises ## Supply Chain Domain
as a domain alias for ## Economic Domain, enabling composed infospaces
to use their own domain section name.
Tutorial §13 rewritten with real commands, real output, and the full
mapping table from the demo.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds LAYERED-DEVELOPMENT.md documenting the concept for evolving a flat
entity collection into a structured systemic model through four layers:
L0 Source text → L1 Raw entities (current) → L2 Typed entities
→ L3 Relation graph → L4 Minimal systemic model
Covers: the element/relation/principle/institution type taxonomy,
VSM as a structural coordinate system, the type × VSM coverage matrix,
triplet extraction with a controlled predicate vocabulary, feedback loop
detection, and the distillation hypothesis for finding the generative
core of a corpus.
Extends TUTORIAL.md with sections 17–23:
17. Observing entity heterogeneity
18. The four-layer model overview
19. Layer 2 — classifying entities (schema, pipeline stage, metrics)
20. Layer 3 — extracting the relation graph (triplets, feedback loops)
21. Layer 4 — the minimal systemic model (core-model.md output)
22. Planned CLI commands for layers 2–4
23. Layers 2–4 as composed infospaces
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `.*-raw\.md$` to `_DEFAULT_EXCLUDE_PATTERNS` in entity_parser.py to
prevent per-chapter raw LLM output files from being parsed as entities.
This eliminates 33 malformed domain values where delimiter text was
bleeding into the Economic Domain field.
- Lower coverage_ratio threshold from 0.50 → 0.40 in infospace.yaml to
reflect realistic multi-book corpus expectations (documented rationale
in METRICS-METHODOLOGY.md).
Post-fix metrics: 988 entities, 0 malformed, coverage_ratio=0.619 (pass).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- coverage.py: rewrite module docstring to explain what the metric actually
computes (domain × chapter cross-tabulation, not VSM system coverage),
what it does not capture (entity connectivity → C3), and when the
threshold is appropriate
- CoverageReport: add domain_densities, density_std, cross_cutting_ratio
for distribution-level insight beyond the aggregate ratio
- check_coverage: compute per-domain density and cross-cutting ratio
- METRICS-METHODOLOGY.md: correct C2 section to match implementation,
document the distribution-based interpretation, add implementation status
table distinguishing what is wired vs planned
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1021 entities extracted across all Books 1-5 of The Wealth of Nations.
Final metrics: coverage=0.4424, granularity=2.9533, redundancy=0.0059.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Free-tier APIs intermittently return invalid JSON or empty responses.
Now any exception in _call_llm retries up to 3 times with a 5s back-off,
rather than failing immediately on non-rate-limit errors.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>