32 Commits

Author SHA1 Message Date
cd8339ecef Complete State Hub bootstrap workplans (WP-0001)
Some checks failed
Test Suite / security-scan (push) Has been cancelled
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
- Review integration files; fill SCOPE where templated
- Document dev workflow in stack-and-commands.md
- Seed WP-0002 implementation workplan; mark bootstrap finished
- Hub sync via fix-consistency
2026-06-22 23:35:13 +02:00
f8ab58edbe chore(consistency): sync task status from DB [auto]
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
Updated by fix-consistency on 2026-06-22:
  - update .custodian-brief.md for markitect-main
2026-06-22 23:32:31 +02:00
2b5e9743fe Add State Hub bootstrap workplan and agent integration files
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
Seed workplans/ with bootstrap workplan to satisfy ADR-001 C-01.
Includes regenerated dev-hub session-protocol and agent instruction files.
2026-06-22 21:44:38 +02:00
753c3d4fc6 chore(consistency): sync task status from DB [auto]
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
Updated by fix-consistency on 2026-06-22:
  - update .custodian-brief.md for markitect-main
2026-06-22 21:42:25 +02:00
94e84f0db9 chore(consistency): sync task status from DB [auto]
Some checks failed
Test Suite / security-scan (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
Updated by fix-consistency on 2026-06-22:
  - update .custodian-brief.md for markitect-project
2026-06-22 21:40:39 +02:00
a765ccda21 chore(consistency): sync task status from DB [auto]
Some checks failed
Test Suite / security-scan (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
Updated by fix-consistency on 2026-06-22:
  - update .custodian-brief.md for markitect-main
2026-06-22 21:40:31 +02:00
4472fa6c7f chore(consistency): sync task status from DB [auto]
Some checks failed
Test Suite / performance-tests (push) Has been cancelled
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
Updated by fix-consistency on 2026-06-22:
  - update .custodian-brief.md for markitect-main
2026-06-22 18:02:31 +02:00
526fa1e3bc Human-review .repo-classification.yaml (CUST-WP-0050 follow-up)
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
2026-06-22 17:56:16 +02:00
86de18c247 Add .repo-classification.yaml (CUST-WP-0050 T11 agent first-pass) 2026-06-22 17:47:38 +02:00
ca9d0d7030 Add credential routing instructions for all agent runtimes
Some checks failed
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
Propagate shared credential-routing section (Codex, Claude, Grok, llm-connect)
from state-hub template via scripts/propagate_credential_routing.py.
2026-06-18 22:48:38 +02:00
bc527ec09a Add capability registry scaffold (REUSE-WP-0014-T05 B03)
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
2026-06-16 01:54:12 +02:00
ce984482e2 assessment of forgotten functionality
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
2026-05-23 06:44:38 +02:00
9266f124e6 Refresh agent instruction files
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
2026-05-18 16:55:45 +02:00
8740a66611 chore(consistency): sync task status from DB [auto]
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
Updated by fix-consistency on 2026-05-03:
  - update .custodian-brief.md for markitect-project
2026-05-03 19:31:36 +02:00
b7e9edbb4b chore(consistency): sync task status from DB [auto]
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
Updated by fix-consistency on 2026-05-01:
  - update .custodian-brief.md for markitect-project
2026-05-01 23:07:28 +02:00
479fa95fdf Scope update from repo-scoping refactor
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
2026-05-01 12:27:17 +02:00
eb9b622499 chore: gitignore Claude Code session lock files
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
`.claude/scheduled_tasks.lock` is per-session runtime state (holds the
owning session id and pid for the ScheduleWakeup queue); it shouldn't
be committed. Widened the pattern to `.claude/*.lock` so future lock
kinds are covered too.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 21:50:20 +02:00
e3e5b8ecc1 feat(infospace): systematic long-text processing — rich commit bodies, per-source eval/classify, chapters view
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
Three coordinated changes that let the pipeline produce a clean
chapter-by-chapter git history on long texts without archaeology after
the fact.

1. Richer commit messages. `SourcePipeline._git_commit` now diffs the
   staged changes, buckets added files by output subdirectory (entities,
   evaluations, classifications, mappings, analyses, metrics, logs), and
   includes counts in the commit body. So `git log` reads "entities:
   +23, evaluations: +23" per chapter instead of the same generic blurb
   on every commit. Zero behaviour change when no output changed; falls
   back to the original message if the diff query fails.

2. --eval-after-source / --classify-after-source on `infospace process`.
   After a source's stages succeed, the pipeline identifies which entity
   files are *new* (set diff of entity slugs before vs after), loads
   their EntityMeta, and runs per-entity evaluation and/or
   classification scoped to just those slugs before the per-source git
   commit lands. Result: each chapter's commit is self-contained —
   extraction + evaluation + classification in one atomic unit. Gated
   behind explicit flags because the cost is real (LLM latency per
   chapter rather than amortised across one bulk batch).

3. `markitect infospace chapters` subcommand. Lists source files in
   canonical order with entity count, evaluated count, classified
   count, and mean per-entity score per source. Text or JSON output.
   Natural triage surface for long-text infospaces — spot chapters that
   under-extracted or evaluated poorly.

Also: `docs/advanced-usage.md` gets a new "Systematic processing of
long texts" section with the recommended flag combo and the tradeoff
note on cost.

11 new unit tests cover the chapters command (text/json/no-sources),
the process flag wiring (help + provider requirement), and the
commit-body bucket logic. Full infospace+llm unit suite (315 tests)
green; 3 pre-existing infospace failures unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 08:24:26 +02:00
9e8d73fa7d docs(roadmap): close out infospace tooling S3 and parent roadmap
All three stages of the infospace tooling roadmap are complete. The Wealth
of Nations / VSM example passes 6/6 viability thresholds on 988 entities,
and composition is demonstrated via the supply-chain-vsm example.

- Parent roadmap (roadmap/infospace-tooling/PLAN.md): header now shows the
  closed status with final validation metrics.
- S3 close-out plan (roadmap/infospace-s3-closeout/PLAN.md): records the
  final task dispositions. C.1–C.6 and C.8 done; C.7 (clean per-chapter
  git history) is deferred indefinitely — the task was cosmetic, its
  prerequisite branch no longer exists, and reconstructing 35 archival
  commits would not change any output files. Rationale documented inline.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 07:08:43 +02:00
d44a4cd3df feat(infospace,llm): agent ergonomics — entity lookup, model fallback, better errors
- `markitect infospace entity <name>`: single-entity lookup tolerating
  hyphens/underscores/case, with substring matching, ambiguity listing,
  and near-match hints. Prints slug, source path, domain, chapter, word
  count, VSM system, overall score, evaluator, and evaluation file path.
- `markitect infospace evaluate --model-fallback <model>`: if any
  entities fail with a rate-limit error, retry just those with a fresh
  adapter on the fallback model (different free-tier models have
  separate quota buckets).
- `markitect llm-check`: advisory when `OPENROUTER_API_KEY` is set but
  not used by the resolved provider; targeted hint when OpenRouter
  returns 401 (almost always a stale env key).
- `build_state`: raises `TypeError` with actionable message if passed a
  path instead of an `InfospaceConfig` — prior failure mode was a
  confusing `AttributeError` deep in the stack.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 01:07:25 +02:00
c0615c2d50 feat(infospace,llm): stabilize free-tier eval workflow
Some checks failed
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
Five improvements that eliminate most of the agent-in-the-loop friction
observed while closing out the 988-entity WoN evaluation (C.1):

1. Gemini adapter now retries on 429 + 5xx with exponential backoff
   (same pattern already used by OpenRouter/OpenAI). Removes the need
   for shell-level retry wrappers when hitting free-tier rate limits.

2. evaluate CLI prints the underlying error ("ERROR — HTTP 503 …")
   instead of a bare "ERROR", so agents don't have to drop into Python
   to diagnose transient failures.

3. --entity/--chapter now respect existing evaluation files by default
   (previously only the full-collection pass did). New --force flag
   opts into re-evaluation. Stops silently burning free-tier quota on
   re-runs of the same slug.

4. --entity accepts hyphenated slugs (matching entity filenames) and
   normalizes them to the underscore form used on disk. On a miss the
   CLI suggests near matches instead of a bare "not found".

5. eval-summary --update-metrics is no longer destructive:
   read_metrics_file/write_metrics_file preserve structured values
   (type_distribution) and don't flatten ints to floats. Fixes a
   silent data loss observed on every run.

Bonus: the evaluator field in written evaluation frontmatter now
falls back from run_config.model_name to the adapter's resolved model
(or the model echoed back in the API response), so rows no longer
show `evaluator: null` when --model is omitted.

Tests: new tests/unit/llm/test_gemini.py covers retry behavior;
tests/unit/infospace/test_history.py gains a round-trip test that
pins the type_distribution / int-preservation invariants.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 00:51:00 +02:00
965508ec06 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-04-22:
  - update .custodian-brief.md for markitect-project
2026-04-22 00:28:46 +02:00
f325f89dc9 feat(infospace): evaluate 3 missing WoN entities (C.1)
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
Fills the 988 entity / 985 evaluation gap in the Wealth of Nations
infospace. Entities advanced_state_of_society, bank_notes, and
bank_systemic_risk_management had no evaluation files; runs through
Gemini (2.5-flash / 2.5-flash-lite for the last one, which hit the
free-tier RPM limit) bring the eval count to 988.

per_entity_mean nudged from 3.955635 to 3.95668; viability still
6/6 PASS.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 23:52:04 +02:00
36a5136bdf docs(infospace): add advanced-usage, composition guide, and performance notes (C.4/C.5/C.6)
Closes out three docs tasks from roadmap/infospace-s3-closeout/PLAN.md:

- examples/infospace-with-history/docs/advanced-usage.md (C.4) — 5 worked
  patterns covering incremental eval, re-eval workflow (no --force flag
  exists; documents the rm-then-re-run pattern instead), interpreting the
  eval-summary distribution, triaging low scorers via an awk pipeline
  over overall_score (since `entities --sort-by score` does not exist),
  and acting on check --json output.
- docs/composition-guide.md (C.5) — walks through how supply-chain-vsm
  binds WoN as a discipline, then a step-by-step for creating a new
  infospace that binds an existing one. Includes live output from
  `markitect infospace disciplines`.
- examples/infospace-with-history/docs/performance-notes.md (C.6) — cites
  the 6h 28m wall time of the 985-entity S3.3 batch, ~2.5 ent/min rate,
  ~2000–3000 tokens/entity estimate, word_overlap vs embedding backend
  for redundancy checks, and a provider-by-scale recommendation table.

All commands in these docs were run against the live infospace at
commit time.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 07:02:46 +02:00
b7e11461f4 chore: rename markitect_project to markitect-main across project
Finishes the in-progress rename so docs, configs, tests, and capability
manifests all reference the current repo name consistently. Fixes two
tests (test_roundtrip_consolidated.py, test_issue_140_roundtrip_simplified.py)
whose hardcoded cwd paths would have broken under the renamed directory.

Archival content under history/, reports/, and roadmap/eat-the-frog/, plus
derived artifacts (.venv_old/, node_modules/, asset_registry.json) are
intentionally left untouched.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 01:57:35 +02:00
3966814868 updated SCOPE file
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
2026-03-25 00:11:46 +01:00
f4610a46e3 docs: add SCOPE.md for rapid orientation
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-17 23:11:42 +01:00
0d95e6dbcf docs(claude): expand CLAUDE.md with commands and architecture
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
Replaces the stub (State Hub integration only) with full dev commands,
module architecture overview, LLM config resolution chain, infospace
conventions, and active roadmap pointers. Removes CLAUDE.custodian.md
(superseded by the expanded CLAUDE.md).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-04 23:28:03 +01:00
36c20f37d0 feat(llm): extract adapter layer for standalone llm-connect package (S1+S2)
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
Stage 1 — Decouple:
- Move RunConfig + LLMResponse to markitect/llm/models.py (canonical)
- Move LLMAdapter + Mock/ErrorLLMAdapter to markitect/llm/adapter.py
- markitect/prompts/execution/models.py and llm_adapter.py become re-export shims
- All 4 adapters + factory.py updated to import from markitect.llm.*
- Parameterize app_name in toml_config.py (resolve_llm, get_default_layers,
  get_preference_layers): paths and env var now derived from app_name arg
- Add tests/test_llm_isolation.py: 7 isolation + backward-compat tests

Stage 2 — Extract:
- Standalone llm-connect package created at ~/llm-connect/
- All 18 llm files copied; markitect.* imports replaced with llm_connect.*
- LLMError base inlined in llm_connect/exceptions.py (no markitect dep)
- llm-connect installed into markitect-venv; declared in pyproject.toml

Smoke test: markitect llm-check succeeds (live Gemini API call).
Backward compat: markitect.prompts.execution.{models,llm_adapter} still work.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-27 08:04:50 +01:00
72b87fd82e docs(roadmap): add workplans for infospace S3 close-out and JSUI publication
infospace-s3-closeout: 8 tasks (C.1-C.8) covering 3 missing evals,
viability sign-off, docs (advanced usage, composition, perf), deferred
git history cleanup, and formal roadmap closure.

testdrive-jsui-publication: 9 tasks (P.1-P.9) covering repo structure
decision, Markitect integration gate, pack/dry-run, npm publish, CDN
verify, fresh install test, GitHub release, and badges.

Both registered as workstreams in Custodian State Hub.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-26 00:44:05 +01:00
eaf4a955af docs(roadmap): add workplan for extracting llm module as shared library
Some checks failed
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
3-stage plan: decouple (RunConfig/LLMResponse move + app name
parameterization) → extract to standalone package → adopt in first
consumer. Registered as workstream in Custodian State Hub.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 21:51:54 +01:00
e9dc9a8517 docs(custodian): add session protocol CLAUDE.md for State Hub integration
Registers markitect as a tracked domain in the Custodian State Hub.
Includes topic ID, session start/end protocol, and MCP tool reference.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-24 21:42:41 +01:00
72 changed files with 3588 additions and 339 deletions

20
.claude/rules/agents.md Normal file
View File

@@ -0,0 +1,20 @@
## Kaizen Agents
Specialized agent personas available on demand via the state-hub MCP.
**Discover:** `list_kaizen_agents()` — returns all agents with name, description, category
**Load:** `get_kaizen_agent("tdd-workflow")` — returns full instructions; read and follow them
Common agents:
| Agent | Category | When to use |
|-------|----------|-------------|
| `tdd-workflow` | testing | Step-by-step TDD8 workflow for any feature |
| `code-refactoring` | quality | Code quality analysis and safe refactoring |
| `test-maintenance` | testing | Diagnose and fix failing tests |
| `requirements-engineering` | process | Prevent interface/mock mismatches upfront |
| `keepaTodofile` | process | Maintain TODO.md during work |
| `project-management` | process | Track status, determine next steps |
| `datamodel-optimization` | quality | Optimize dataclasses and data structures |
All 17 agents: call `list_kaizen_agents()` for the full list.

View File

@@ -0,0 +1,8 @@
## Architecture
<!-- TODO: Describe the key design decisions and component structure.
Key modules, data flows, external integrations, state machines, etc. -->
## Quick Reference
`~/state-hub/mcp_server/TOOLS.md` — MCP tool reference

View File

@@ -0,0 +1,50 @@
# Credential and access routing
**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
for inference. Run this check **before** requesting secrets, API keys, SSH access,
login tokens, or database passwords — in any repo, not only `ops-warden`.
ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
other credential need belongs to another subsystem. **Do not** message
`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
### Lookup (do this first)
```bash
warden route find "<describe your need>" --json
warden route show <catalog-id> --json
```
Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
| Agent runtime | How to orient |
| --- | --- |
| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=markitect-main` is for coordination, not secret vending |
| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
### Quick routing table
| I need… | Owner | ops-warden executes? |
| --- | --- | --- |
| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes**`warden sign` |
| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
| Login / OIDC / MFA | key-cape / Keycloak | No — route only |
| Authorization decision | flex-auth | No — route only |
| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
### Anti-patterns (do not do these)
- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
- Pasting secrets into Git, State Hub, workplans, logs, or chat
### Other capabilities (reuse-surface)
Non-credential capabilities are usually discovered through **reuse-surface** federation
(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
every repo's agent instructions because it is high-frequency, high-risk, and easy to
get wrong.
**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`

View File

@@ -0,0 +1,38 @@
## First Session Protocol
Triggered when `get_domain_summary("communication")` shows **no workstreams**.
The project is registered but work has not yet been structured.
**Step 1 — Read, don't write**
- `~/the-custodian/canon/projects/communication/project_charter_v0.1.md` — purpose, scope
- `~/the-custodian/canon/projects/communication/roadmap_v0.1.md` — planned phases
- Scan repo root: README, directory structure, existing code or docs
**Step 2 — Survey in-progress work**
Look for TODOs, open branches, half-finished files. Note done vs. started but incomplete.
**Step 3 — Propose workstreams to Bernd**
Propose 13 workstreams — each a coherent strand, weeks to months, anchored to a
roadmap phase. **Wait for approval before creating.**
**Step 4 — Create workplan file first, then DB record (ADR-001)**
```
workplans/MARKITECT-WP-NNNN-<slug>.md ← write this first
```
Then register in the hub:
```
create_workstream(topic_id="36c7421b-c537-4723-bf75-42a3ebc6a1dc", title="...", owner="...", description="...")
create_task(workstream_id="<id>", title="...", priority="high|medium|low")
```
**Step 5 — Record the setup**
```
add_progress_event(
summary="First session: structured communication into N workstreams, M tasks",
event_type="milestone",
topic_id="36c7421b-c537-4723-bf75-42a3ebc6a1dc",
detail={"workstreams": [...], "tasks_created": M}
)
```
<!-- Delete or archive this file once past first session -->

View File

@@ -0,0 +1,8 @@
## Repo boundary
This repo owns **Markitect Main** only. It does not own:
<!-- TODO: List what belongs in adjacent repos, e.g.:
- SSH key management → railiance-infra/
- State hub code → state-hub/
-->

View File

@@ -0,0 +1,5 @@
**Purpose:** Markitect Main - (fill in purpose)
**Domain:** communication
**Repo slug:** markitect-main
**Topic ID:** 36c7421b-c537-4723-bf75-42a3ebc6a1dc

View File

@@ -0,0 +1,85 @@
## Session Protocol
Dev Hub (State Hub API): http://127.0.0.1:8000
MCP server name in `~/.claude.json`: `dev-hub`
**Step 1 — Orient**
Read the offline-safe brief first — it works without a live hub connection:
```bash
cat .custodian-brief.md
```
Then call the MCP tool for richer cross-domain context when MCP tools are exposed:
```
get_domain_summary("communication")
```
If MCP tools are unavailable in the current agent session, use the REST API:
```bash
curl -s "http://127.0.0.1:8000/state/summary" | python3 -m json.tool
```
If the hub is offline: `cd ~/state-hub && make api`
**Step 2 — Check inbox**
With MCP tools:
```
get_messages(to_agent="markitect-main", unread_only=True)
```
Mark read with `mark_message_read(message_id)`. Reply or act on coordination
requests before proceeding.
Without MCP tools:
```bash
curl -s "http://127.0.0.1:8000/messages/?to_agent=markitect-main&unread_only=true" \
| python3 -m json.tool
curl -s -X PATCH "http://127.0.0.1:8000/messages/<id>/read" \
-H "Content-Type: application/json" -d '{}'
```
**Step 3 — Scan workplans**
```bash
ls workplans/
```
For each file with `status: ready`, `active`, or `blocked`, note pending
`wait`/`todo`/`progress` tasks.
**Step 4 — Present brief**
1. **Active workstreams** for `communication` — title, task counts, blocking decisions
2. **Pending tasks** from `workplans/` + any `[repo:markitect-main]` hub tasks
3. **Goal guidance** — if `goal_guidance` in summary:
- `needs_workplan`: surface as top action — *"Repo goal '{title}' has no workplan yet"*
- `alignment_warnings`: flag if active work is not aligned with current goal
4. **Suggested next action** — highest-priority open item
5. **SBOM status** — flag if `last_sbom_at` is unset for this repo
If no workstreams: follow First Session Protocol (`first-session.md`).
**During work:** `record_decision()` · `add_progress_event()` · `resolve_decision()`
> State Hub is a *read model*. Bootstrap tools (`create_workstream`, `create_task`)
> are First Session Protocol only. Work structure belongs in repo files (ADR-001).
**Session close:**
With MCP tools:
```
add_progress_event(summary="...", topic_id="36c7421b-c537-4723-bf75-42a3ebc6a1dc", workstream_id="<uuid>")
```
Without MCP tools:
```bash
curl -s -X POST http://127.0.0.1:8000/progress/ \
-H "Content-Type: application/json" \
-d '{"topic_id":"36c7421b-c537-4723-bf75-42a3ebc6a1dc","workstream_id":"<uuid>","event_type":"note","summary":"what changed","author":"codex"}'
```
If workplan files were modified, ensure the local copy is up to date first:
```bash
git -C <repo_path> pull --ff-only
cd ~/state-hub && make fix-consistency REPO=markitect-main
```
For repos where implementation runs on a remote machine (e.g. CoulombCore),
use the combined target which pulls before fixing:
```bash
cd ~/state-hub && make fix-consistency-remote REPO=markitect-main
```
**C-15** (DB task ahead of file) is normal in multi-machine workflows — writeback
will sync the file to match DB. **C-16** (repo behind remote) blocks all writes
until you pull — intentional to prevent clobbering remote progress.

View File

@@ -0,0 +1,16 @@
## Stack
- **Language:** Python 3.12+ (monorepo) + JavaScript UI (testdrive-jsui)
- **Key deps:** uv/pip, pytest, npm; see `pyproject.toml`, `package.json`, `Makefile`
## Dev Commands
```bash
make setup
make test
make test-js
make test-all
make lint
make build
make help
```

View File

@@ -0,0 +1,40 @@
## Workplan Convention (ADR-001)
File location: `workplans/MARKITECT-WP-NNNN-<slug>.md`
ID prefix: `MARKITECT-WP-`
Work items originate as files in this repo **before** being registered in the hub.
Canonical workplan/workstream frontmatter statuses are:
`proposed`, `ready`, `active`, `blocked`, `backlog`, `finished`, `archived`.
Use `proposed` for a newly drafted plan, `ready` after review against current
repo state, and `finished` when implementation is complete. `stalled` and
`needs_review` are derived health labels, not stored statuses.
Closed workplans may be moved to `workplans/archived/` with a completion-date
prefix: `YYMMDD-MARKITECT-WP-NNNN-<slug>.md`. The frontmatter id remains
unchanged; the prefix is only for quick visual reference.
Small opportunistic tasks discovered during another session use **Ad Hoc Tasks**:
`workplans/ADHOC-YYYY-MM-DD.md`, workstream slug `adhoc-YYYY-MM-DD`, and task ids
`ADHOC-YYYY-MM-DD-T01`, `T02`, etc. Use adhocs only for low-risk work completed
directly. Promote anything requiring analysis, design, approval, dependencies, or
multiple planned phases into a normal workplan.
Ecosystem todos from other agents arrive as `[repo:markitect-main]` hub tasks —
visible at session start. Pick one up by creating the workplan file, then registering
the workstream.
Task blocks use this shape:
```task
id: MARKITECT-WP-NNNN-T01
status: wait | todo | progress | done | cancel
priority: high | medium | low
state_hub_task_id: "<uuid>" # written by fix-consistency — do not edit
```
Status progression is `todo``progress``done`; use `wait` for waiting or
blocked work and `cancel` for stopped work.
<!-- Ralph Loop rules and HEUREKA sequence: ~/.claude/CLAUDE.md — do not duplicate here -->

View File

@@ -10,7 +10,7 @@ principles with strict separation of concerns.
## Directory Structure & Clean Architecture
```
markitect_project/
markitect-main/
├── domain/ # Business logic (innermost layer)
├── application/ # Use cases and workflows
├── infrastructure/ # External interfaces (database, file system)

18
.custodian-brief.md Normal file
View File

@@ -0,0 +1,18 @@
<!-- custodian-brief: generated by fix-consistency — do not edit manually -->
# Custodian Brief — markitect-main
**Domain:** communication
**Last synced:** 2026-06-22 21:32 UTC
**State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)*
## Active Workstreams
*(none — repo may need first-session setup)*
---
## MCP Orientation (when available)
If the state-hub MCP server is reachable, call:
`get_domain_summary("communication")`
This provides richer cross-domain context.
If the MCP call fails, use this file as your orientation source.

2
.gitignore vendored
View File

@@ -91,6 +91,8 @@ debug_*.py
# Claude Code local settings (user-specific permissions)
.claude/settings.local.json
# Claude Code runtime session locks (per-session, not content)
.claude/*.lock
.aider*

2
.gitmodules vendored
View File

@@ -1,6 +1,6 @@
[submodule "wiki"]
path = wiki
url = http://92.205.130.254:32166/coulomb/markitect_project.wiki.git
url = http://92.205.130.254:32166/coulomb/markitect-main.wiki.git
branch = main
[submodule "capabilities/kaizen-agentic"]
path = capabilities/kaizen-agentic

25
.repo-classification.yaml Normal file
View File

@@ -0,0 +1,25 @@
repo_classification:
standard: Repo Classification Standard
version: '1.0'
classified_at: '2026-06-22'
classified_by: human
category: product
domain: communication
secondary_domains:
- infotech
- agents
capability_tags:
- knowledge
- documentation
- product-development
- platform
business_stake:
- product
- technology
- execution
business_mechanics:
- intention
- coordination
- operation
- adaptation
notes: Markitect successor to archived markitect-project; human confirmed.

219
AGENTS.md Normal file
View File

@@ -0,0 +1,219 @@
# Markitect Main — Agent Instructions
## Repo Identity
**Purpose:** Markitect Main - (fill in purpose)
**Domain:** communication
**Repo slug:** markitect-main
**Topic ID:** `36c7421b-c537-4723-bf75-42a3ebc6a1dc`
**Workplan prefix:** `MARKITECT-WP-`
---
## State Hub Integration
The Custodian State Hub tracks work across all domains. Interact via HTTP REST —
there is no MCP server for Codex agents.
| Context | URL |
|---------|-----|
| Local workstation | `http://127.0.0.1:8000` |
| Remote via tunnel | `http://127.0.0.1:18000` |
### Orient at session start
```bash
# Offline brief — works without hub connection
cat .custodian-brief.md
# Active workstreams for this domain
curl -s "http://127.0.0.1:8000/workstreams/?topic_id=36c7421b-c537-4723-bf75-42a3ebc6a1dc&status=active" \
| python3 -m json.tool
# Check inbox
curl -s "http://127.0.0.1:8000/messages/?to_agent=markitect-main&unread_only=true" \
| python3 -m json.tool
```
Mark a message read:
```bash
curl -s -X PATCH "http://127.0.0.1:8000/messages/<id>/read" \
-H "Content-Type: application/json" -d '{}'
```
### Log progress (required at session close)
```bash
curl -s -X POST http://127.0.0.1:8000/progress/ \
-H "Content-Type: application/json" \
-d '{
"summary": "what was done",
"event_type": "note",
"author": "codex",
"workstream_id": "<uuid>",
"task_id": "<uuid>"
}'
```
Omit `workstream_id` / `task_id` when not applicable.
### Update task status
```bash
curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
-H "Content-Type: application/json" \
-d '{"status": "progress"}'
# values: wait | todo | progress | done | cancel
```
### Flag a task for human review
```bash
curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
-H "Content-Type: application/json" \
-d '{"needs_human": true, "intervention_note": "reason"}'
```
---
## Session Protocol
**Start:**
1. `cat .custodian-brief.md` — domain goal and open workstreams (offline-safe)
2. Check inbox: `GET /messages/?to_agent=markitect-main&unread_only=true`; mark read
3. Scan workplans: `ls workplans/` — note `status: ready`, `active`, or `blocked` files and open tasks
4. Check human-needed tasks: `GET /tasks/?needs_human=true`
**During work:**
- Update task statuses in workplan files as tasks progress
- Record significant decisions via `POST /decisions/`
**Close:**
1. Update workplan file task statuses to reflect progress
2. Log: `POST /progress/` with a summary of what changed
3. Note for the custodian operator: after workplan file changes, run from
`~/state-hub`:
```bash
make fix-consistency REPO=markitect-main
```
This syncs task status from files into the hub DB.
---
## Credential and access routing
**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
for inference. Run this check **before** requesting secrets, API keys, SSH access,
login tokens, or database passwords — in any repo, not only `ops-warden`.
ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
other credential need belongs to another subsystem. **Do not** message
`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
### Lookup (do this first)
```bash
warden route find "<describe your need>" --json
warden route show <catalog-id> --json
```
Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
| Agent runtime | How to orient |
| --- | --- |
| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=markitect-main` is for coordination, not secret vending |
| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
### Quick routing table
| I need… | Owner | ops-warden executes? |
| --- | --- | --- |
| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes** — `warden sign` |
| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
| Login / OIDC / MFA | key-cape / Keycloak | No — route only |
| Authorization decision | flex-auth | No — route only |
| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
### Anti-patterns (do not do these)
- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
- Pasting secrets into Git, State Hub, workplans, logs, or chat
### Other capabilities (reuse-surface)
Non-credential capabilities are usually discovered through **reuse-surface** federation
(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
every repo's agent instructions because it is high-frequency, high-risk, and easy to
get wrong.
**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`
<!-- REPO-AGENTS-EXTENSIONS -->
<!-- Append repo-specific agent instructions below this marker.
The state-hub template sync preserves content after this line. -->
---
## Workplan Convention (ADR-001)
Work items originate as files in this repo — not in the hub. The hub is a
read/cache/index layer that rebuilds from files.
**File location:** `workplans/MARKITECT-WP-NNNN-<slug>.md`
**Archived location:** finished workplans may move to
`workplans/archived/YYMMDD-MARKITECT-WP-NNNN-<slug>.md`. The `YYMMDD` prefix is
the completion/archive date; the frontmatter `id` does not change.
**Ad Hoc Tasks:** small opportunistic fixes discovered during a session use
`workplans/ADHOC-YYYY-MM-DD.md` with task ids `ADHOC-YYYY-MM-DD-T01`, etc. Use
this only for low-risk work completed directly; create a normal workplan for
anything needing analysis, design, approval, dependencies, or multiple phases.
**Frontmatter:**
```yaml
---
id: MARKITECT-WP-NNNN
type: workplan
title: "..."
domain: communication
repo: markitect-main
status: proposed | ready | active | blocked | backlog | finished | archived
owner: codex
topic_slug: ...
created: "YYYY-MM-DD"
updated: "YYYY-MM-DD"
state_hub_workstream_id: "<uuid>" # written by fix-consistency — do not edit
---
```
Use `proposed` for a new draft, `ready` after review against current repo
state, and `finished` after implementation. `stalled` and `needs_review` are
derived health labels, not frontmatter statuses.
**Task block format** (one per `##` section):
```
## Task Title
` ` `task
id: MARKITECT-WP-NNNN-T01
status: wait | todo | progress | done | cancel
priority: high | medium | low
state_hub_task_id: "<uuid>" # written by fix-consistency — do not edit
` ` `
Task description text.
```
Status progression: `todo` → `progress` → `done`; use `wait` for waiting/blocked work and `cancel` for stopped work.
To create a new workplan:
1. Write the file following the format above
2. Notify the custodian operator to run `make fix-consistency REPO=markitect-main`
(or send a message to the hub agent via `POST /messages/`)

12
CLAUDE.md Normal file
View File

@@ -0,0 +1,12 @@
# Markitect Main — Claude Code Instructions
@SCOPE.md
@.claude/rules/repo-identity.md
@.claude/rules/session-protocol.md
@.claude/rules/first-session.md
@.claude/rules/workplan-convention.md
@.claude/rules/stack-and-commands.md
@.claude/rules/architecture.md
@.claude/rules/repo-boundary.md
@.claude/rules/credential-routing.md
@.claude/rules/agents.md

View File

@@ -457,7 +457,7 @@ Sister projects can reuse these capabilities directly:
Install capabilities via local file references:
```toml
[project.dependencies]
release-management = {path = "../markitect_project/capabilities/release-management"}
release-management = {path = "../markitect-main/capabilities/release-management"}
```
### Shared Infrastructure

129
SCOPE.md Normal file
View File

@@ -0,0 +1,129 @@
# SCOPE
> This file helps you quickly understand what this repository is about,
> when it is relevant, and when it is not.
> It is intentionally lightweight and may be incomplete.
---
## One-liner
Intelligent markdown engine and information management platform — treats documents as structured, queryable information spaces with schema validation, transclusion, LLM-driven evaluation, and infospace lifecycle management.
---
## Core Idea
MarkiTect turns fragmented knowledge (scattered docs, chats, notes) into structured, versioned, reusable artifacts. The core abstraction is an **infospace**: a curated collection of typed entities (concepts, mechanisms, observations) governed by a YAML config, validated against schemas, and evaluated for quality across five dimensions. The platform automates generation, validation, and transformation at scale, delegating domain-level judgment to LLMs while Python handles structure and evaluation.
---
## In Scope
- Parse, validate, and analyze markdown documents against schemas
- Generate schemas from example documents; enforce naming convention `{domain}-schema-v{major}.{minor}.md`
- Infospace lifecycle: create, populate, evaluate (per-entity + collection quality scores), compose, export
- Transclusion: embed content from one document into another, maintaining single source of truth
- LLM-driven prompt execution with dependency resolution and quality gates
- Relationship graph export (Mermaid, DOT) and analysis (networkx, FCA)
- Batch document processing; CLI (`markitect <command>`) and programmatic API
- Rendering: markdown → interactive HTML via plugin system (testdrive-jsui)
- Asset management (image embedding, resource handling)
---
## Out of Scope
- Visual/WYSIWYG editing (markdown-first, text-based workflows only)
- Real-time collaborative editing (git-based versioning instead)
- Financial transactions or external payment integration
- Making domain-level judgments in Python code (delegated to LLM via prompt templates)
- Storing secrets or credentials in plaintext
- Full GraphQL API (structure exists but not fully implemented)
- Vendor-specific integrations or lock-in
---
## Relevant When
- Managing large document sets (hundreds to thousands) needing consistent structure and validation
- Building or maintaining institutional knowledge bases, technical documentation, or canon releases
- Automating document generation from schemas or templates
- Tracking relationships and dependencies between knowledge artifacts
- Needing programmatic access to document structure (beyond file reading)
- Applying quality evaluation to a structured concept collection
---
## Not Relevant When
- Working with a handful of simple, unrelated documents
- Visual editor required
- Exclusively non-markdown source formats (PDF/Word need conversion first)
- No consistency, validation, or automation needed
---
## Current State
- Status: active (v0.13.0-dev, ~90 commits ahead of release)
- Implementation: substantial — core modules mature (CLI, parsing, schema management, prompt execution, infospace); infospace S3 close-out in progress; LLM adapter extracted to standalone `llm-connect` package
- Stability: stable core; plugin system and infospace tooling evolving; 200+ CHANGELOG entries since v0.6.0
- Usage: active personal development; examples with 988 entities and full evaluation pipeline
---
## How It Fits
- Upstream dependencies: `llm-connect` (LLM adapter library, extracted), `testdrive-jsui` (rendering plugin submodule), `markitect-utils` (utility library)
- Downstream consumers: Custodian — MarkiTect is the knowledge artifact platform in the canonical dependency order (Railiance → **Markitect** → Coulomb.social → Personhood/Foerster → Custodian)
- Often used with: the-custodian (state hub tracks markitect domain workstreams), kaizen-agentic (project-management agent for session workflow)
---
## Terminology
- Preferred terms: infospace, topic, discipline, entity, evaluation, viability, transclusion, schema, quality gates
- Also known as: "markitect", "the markdown engine"
- Potentially confusing terms: "topic" = the subject matter an infospace explains (not a chat thread); "discipline" = a reusable framework of concepts (itself a viable infospace); "infospace" ≠ filesystem directory (it's a curated conceptual collection with explicit quality thresholds)
---
## Related / Overlapping
- `llm-connect` — standalone LLM adapter extracted from MarkiTect (dependency)
- `the-custodian` — tracks markitect workstreams; custodian canon includes a markitect domain charter
- `marki-docx` — separate repo (on tegwick machine); relationship: docx export capability for MarkiTect artifacts
---
## Provided Capabilities
```capability
type: documentation
title: Structured document validation and schema management
description: Parse, validate, and enforce schemas on markdown documents — generate schemas from examples, validate entity collections, report naming convention compliance.
keywords: [markdown, schema, validation, document, structure, linting]
```
```capability
type: documentation
title: Infospace lifecycle management
description: Create, populate, evaluate (quality scores), compose, and export curated knowledge collections (infospaces) with transclusion and relationship graph analysis.
keywords: [infospace, knowledge, curation, evaluation, transclusion, quality, graph]
```
```capability
type: data
title: LLM-driven knowledge artifact generation
description: Execute prompts with dependency resolution and quality gates to generate typed entities — concepts, mechanisms, observations — at scale from schemas and templates.
keywords: [llm, generation, prompt, entity, artifact, knowledge, automation]
```
---
## Getting Oriented
- Start with: `CLAUDE.md` (dev commands, LLM config, infospace lifecycle), `INTRODUCTION.md` (use cases, philosophy)
- Key files / directories: `markitect/cli.py` (CLI entry point), `markitect/infospace/` (primary active area), `markitect/prompts/` (LLM execution), `roadmap/` (6 active planning tracks), `examples/infospace-with-history/` (988-entity reference implementation)
- Entry points: `markitect --help`; `markitect infospace --help`; `pytest tests/unit/` (inner TDD loop)

View File

@@ -15,7 +15,7 @@ You are responsible for:
### Directory Structure
```
markitect_project/
markitect-main/
├── Makefile # Main project Makefile
├── scripts/
│ └── capability_discovery.mk # Auto-discovery and delegation system

View File

@@ -7,7 +7,7 @@ detachment:
capability_name: issue-facade
capability_family: issue-tracking
integration_pattern: capabilities-directory
original_location: /home/worsch/markitect_project/capabilities/issue-facade
original_location: /home/worsch/markitect-main/capabilities/issue-facade
capability_metadata:
spec_file: CAPABILITY-issue-tracking.yaml
@@ -17,23 +17,23 @@ capability_metadata:
integration_details:
parent_project: capabilities
parent_path: /home/worsch/markitect_project/capabilities
parent_path: /home/worsch/markitect-main/capabilities
re_integration_guide: |
To re-integrate this capability using the new architecture:
# Option 1: Git submodule (recommended)
cd /home/worsch/markitect_project/capabilities
cd /home/worsch/markitect-main/capabilities
git submodule add <repo-url> _issue-facade
pip install -e _issue-facade/
# Option 2: Clone directly
cd /home/worsch/markitect_project/capabilities
cd /home/worsch/markitect-main/capabilities
git clone <repo-url> _issue-facade
pip install -e _issue-facade/
# Option 3: Copy into project
cd /home/worsch/markitect_project/capabilities
cd /home/worsch/markitect-main/capabilities
cp -r /path/to/issue-facade _issue-facade
pip install -e _issue-facade/

View File

@@ -8,7 +8,7 @@ This test module validates outline mode schema generation improvements including
- Content instruction integration
- End-to-end workflow from example document to generated drafts
Created for Issue #46: https://gitea.coulomb.social/coulomb/markitect_project/issues/46
Created for Issue #46: https://gitea.coulomb.social/coulomb/markitect-main/issues/46
"""
import pytest

View File

@@ -209,7 +209,7 @@ tests/
## 🎯 Detailed File Structure After Migration
```
markitect_project/
markitect-main/
├── capabilities/
│ └── release-management/
│ ├── README.md ✅ CREATED

View File

@@ -162,7 +162,7 @@ clean_before_build = true
[tool.release-management.registries.gitea]
url = "http://92.205.130.254:32166"
owner = "coulomb"
repo = "markitect_project"
repo = "markitect-main"
auth_token_env = "GITEA_API_TOKEN"
[tool.release-management.registries.pypi]

View File

@@ -141,7 +141,7 @@ make release-publish VERSION=0.8.0
## Registry Information
- **Gitea URL**: http://92.205.130.254:32166
- **Repository**: coulomb/markitect_project
- **Repository**: coulomb/markitect-main
- **PyPI Registry URL**: http://92.205.130.254:32166/api/packages/coulomb/pypi
- **Package List URL**: http://92.205.130.254:32166/api/v1/packages/coulomb

View File

@@ -8,7 +8,7 @@
```bash
# ❌ WRONG - Don't edit capability files from main repo
cd /home/worsch/markitect_project/capabilities/testdrive-jsui
cd /home/worsch/markitect-main/capabilities/testdrive-jsui
vim src/testdrive_jsui/core.py # DON'T DO THIS!
# ✅ CORRECT - Use separate Claude instance/session
@@ -29,7 +29,7 @@ cd /path/to/work/testdrive-jsui
| Session | Purpose | Location |
|---------|---------|----------|
| **Main Repo** | Integration, configuration | `/home/worsch/markitect_project` |
| **Main Repo** | Integration, configuration | `/home/worsch/markitect-main` |
| **Capability** | Feature development, bugs | Separate clone or `capabilities/capability-name` |
**Why?** Prevents accidental cross-contamination and respects repository boundaries.
@@ -40,7 +40,7 @@ cd /path/to/work/testdrive-jsui
```bash
# After pushing changes to capability repo
cd /home/worsch/markitect_project
cd /home/worsch/markitect-main
git submodule update --remote capabilities/testdrive-jsui
git add capabilities/testdrive-jsui
git commit -m "chore: update testdrive-jsui to latest"
@@ -50,7 +50,7 @@ git push
### Add New Capability
```bash
cd /home/worsch/markitect_project
cd /home/worsch/markitect-main
# Add as submodule
git submodule add http://gitea/coulomb/new-capability.git capabilities/new-capability
@@ -67,7 +67,7 @@ git commit -m "feat: add new-capability submodule"
```bash
# Option 1: In submodule directory (careful!)
cd /home/worsch/markitect_project/capabilities/testdrive-jsui
cd /home/worsch/markitect-main/capabilities/testdrive-jsui
git checkout -b feature-branch
# make changes
git commit -m "feat: new feature"
@@ -86,7 +86,7 @@ git push origin feature-branch
### Check Capability Status
```bash
cd /home/worsch/markitect_project
cd /home/worsch/markitect-main
# List all capabilities
make capabilities-list

View File

@@ -9,7 +9,7 @@ MarkiTect is a markdown processing toolkit with transclusion, schema validation,
## Current Directory Structure
```
markitect_project/
markitect-main/
├── markitect/ # Main package
│ ├── [34 root-level .py files] # Core functionality (see below)
│ ├── assets/ # Asset discovery, management, caching (21 files)

View File

@@ -8,7 +8,7 @@ MarkiTect uses a **capabilities-based architecture** where functionality is orga
### 1. **Separation of Concerns**
**Critical Rule:** The main repository (`markitect_project`) **MUST NOT** directly modify capability code.
**Critical Rule:** The main repository (`markitect-main`) **MUST NOT** directly modify capability code.
-**DO**: Use capabilities as dependencies
-**DO**: Configure capabilities through documented interfaces
@@ -28,7 +28,7 @@ MarkiTect uses a **capabilities-based architecture** where functionality is orga
Capabilities are integrated as **git submodules**, not regular directories:
```
markitect_project/
markitect-main/
├── .gitmodules # Submodule configuration
├── capabilities/
│ ├── testdrive-jsui/ # Git submodule → separate repo
@@ -80,8 +80,8 @@ engine.render_document(content, mode='edit', config=config)
#### Main Repository Session
```bash
# In markitect_project/
cd /home/worsch/markitect_project
# In markitect-main/
cd /home/worsch/markitect-main
# Main repo tasks:
# - Integrate capabilities
@@ -93,7 +93,7 @@ cd /home/worsch/markitect_project
#### Capability Session
```bash
# In capability repository
cd /home/worsch/markitect_project/capabilities/testdrive-jsui
cd /home/worsch/markitect-main/capabilities/testdrive-jsui
# OR clone separately
git clone http://gitea/coulomb/testdrive-jsui.git
@@ -122,7 +122,7 @@ cd testdrive-jsui
2. **Update main project** (different Claude instance)
```bash
cd /home/worsch/markitect_project
cd /home/worsch/markitect-main
git submodule update --remote capabilities/testdrive-jsui
git commit -m "chore: update testdrive-jsui submodule"
```
@@ -139,7 +139,7 @@ When a capability releases a new version:
```bash
# In main repo
cd /home/worsch/markitect_project
cd /home/worsch/markitect-main
# Update specific capability
cd capabilities/testdrive-jsui
@@ -160,7 +160,7 @@ git commit -am "chore: update all capabilities"
# http://gitea/coulomb/new-capability
# 2. Add as submodule to main repo
cd /home/worsch/markitect_project
cd /home/worsch/markitect-main
git submodule add http://gitea/coulomb/new-capability.git capabilities/new-capability
# 3. Add dependency to pyproject.toml
@@ -324,7 +324,7 @@ def test_testdrive_jsui_integration():
1. **Create separate git repo**
```bash
cd /tmp
cp -r markitect_project/capabilities/capability-name capability-name
cp -r markitect-main/capabilities/capability-name capability-name
cd capability-name
git init
git add .
@@ -335,7 +335,7 @@ def test_testdrive_jsui_integration():
2. **Remove from main repo**
```bash
cd markitect_project
cd markitect-main
git rm -rf capabilities/capability-name
git commit -m "chore: remove capability-name for submodule conversion"
```

203
docs/composition-guide.md Normal file
View File

@@ -0,0 +1,203 @@
# Infospace Composition Guide
One completed, viable infospace can be reused as a **discipline** for
another infospace — a lens applied to a different topic. This guide
explains how composition works and walks through the live
`examples/supply-chain-vsm/` reference.
---
## What composition means
An **infospace** is a directory of typed entities governed by
`infospace.yaml`. Its entities and relations describe a specific topic
(for example, Adam Smith's *Wealth of Nations*).
A **discipline** is an infospace declared as a reusable analytical
framework by another infospace. When infospace B binds infospace A as a
discipline:
1. B's entities can reference A's entities in `## WoN Concept` (or
equivalent) sections.
2. Properties A has already computed on its entities — such as VSM system
placement — become available to B by transitivity through the mapping.
3. B can impose its own viability thresholds independently of A's. The two
infospaces each pass or fail viability on their own terms.
The binding is declarative: a relative path in `infospace.yaml` plus a
display name. No code. No import. The discipline is looked up on disk at
the declared path when B's commands run.
---
## The viability pre-condition
Binding a non-viable infospace as a discipline is a mistake: a framework
that fails its own thresholds is not a stable reference frame. Before
binding, confirm the candidate discipline is viable:
```bash
cd examples/infospace-with-history
markitect infospace viability
```
```
Metric Value Threshold Status
---------------------------------------------------------------
redundancy_ratio 0.0061 max=0.1 PASS
coverage_ratio 0.6190 min=0.4 PASS
coherence_components 0.0000 max=3 PASS
consistency_cycles 0.0000 max=0 PASS
granularity_entropy 2.6748 min=1.0 PASS
per_entity_mean 3.9556 min=3.5 PASS
Viable: YES (6/6 thresholds met)
```
If the discipline is not viable, fix it first (see
`examples/infospace-with-history/docs/advanced-usage.md` §4 for triaging
low scorers).
---
## Example — how `supply-chain-vsm` binds WoN
The supply-chain infospace declares WoN as a discipline in its
`infospace.yaml`:
```yaml
topic:
name: "Modern Supply Chain Management"
domain: "Operations Management"
sources: artifacts/sources/
disciplines:
- name: "Wealth of Nations"
path: ../infospace-with-history
```
The binding is a **relative path**, so the two infospaces travel together
(they can be moved as a pair without breaking the link).
Verify the binding resolves and the discipline is viable:
```bash
cd examples/supply-chain-vsm
markitect infospace disciplines
```
```
Name Entities Viable Path
----------------------------------------------------------------------
Wealth of Nations 988 YES ../infospace-with-history
```
Each supply-chain entity then carries a `## WoN Concept` section
mapping it to exactly one WoN entity. The consolidated mapping files
(`output/mappings/*-mappings.md`) record the pairing, rationale, and a
conceptual-continuity rating (Strong / Moderate / Weak):
| Supply Chain Entity | WoN Concept | Strength | VSM |
|------------------------------|----------------------------------|----------|-------|
| Demand Signal | Effectual Demand | Strong | S2 |
| Vendor-Managed Inventory | Division of Labour | Strong | S1/S2 |
| Just-in-Time Inventory | Circulating Capital | Strong | S1/S3 |
| Bullwhip Effect | Natural Price as Central Price | Moderate | S2 |
| Safety Stock | Accumulation of Stock | Moderate | S3 |
Because each WoN entity already has a VSM system placement (S1S5), the
supply-chain entities inherit a VSM position by transitivity through
their mapping — without supply-chain-vsm needing its own VSM reference.
---
## Creating a new infospace that binds an existing one
Step-by-step, using WoN as the discipline for a hypothetical "Modern
Monetary Policy" infospace:
### 1. Start from the target topic
```bash
mkdir -p examples/monetary-policy/artifacts/sources
cd examples/monetary-policy
markitect infospace init
```
### 2. Declare the discipline in `infospace.yaml`
```yaml
topic:
name: "Modern Monetary Policy"
domain: "Macroeconomics"
sources: artifacts/sources/
disciplines:
- name: "Wealth of Nations"
path: ../infospace-with-history
```
Alternatively, bind imperatively after `init`:
```bash
markitect infospace bind-discipline ../infospace-with-history --name "Wealth of Nations"
```
### 3. Set your own viability thresholds
Copy the `viability:` block from a reference infospace and tune the
numbers to the scale and maturity of your topic. A smaller infospace
(50 entities, not 988) may need laxer `coverage_ratio` and stricter
`redundancy_ratio`.
### 4. Verify the binding
```bash
markitect infospace disciplines
```
If `Viable` is `NO`, stop and fix the discipline before continuing.
### 5. Reference discipline entities in your own entities
For each entity in the new infospace, add a `## <Discipline> Concept`
section that names the WoN entity the concept maps to, plus a rationale.
The exact section heading is configured per schema — see
`schemas/won-mapping-schema-v1.0.md` in `supply-chain-vsm` for the
template used there.
### 6. Run checks and evaluate
```bash
markitect infospace check
markitect infospace evaluate --provider openrouter
markitect infospace eval-summary --update-metrics
markitect infospace viability
```
The new infospace passes or fails viability independently of WoN.
---
## Why composition, not inclusion?
An alternative would be to copy WoN entities directly into the target
infospace. Composition avoids that by design:
- **One source of truth** — if WoN is refined, every infospace that binds
it picks up the improvement on the next run without a sync step.
- **Separation of concerns** — each infospace owns its own schema,
thresholds, and entity set. Changing the target topic cannot pollute
the discipline.
- **Bounded dependency** — the binding is a path, so the coupling is
visible in one place (`infospace.yaml`) and easy to remove.
---
## See also
- `examples/supply-chain-vsm/README.md` — the full reference composition.
- `examples/supply-chain-vsm/output/mappings/` — consolidated mapping
files showing the rationale and strength rating for each pairing.
- `examples/infospace-with-history/docs/advanced-usage.md` — patterns for
maintaining the discipline once it is in use.

View File

@@ -0,0 +1,141 @@
# markitect-main → Successor Repos: Gap Assessment
**Date:** 2026-05-23
**Author:** Claude (custodian session)
**Status:** Draft — awaiting Bernd's decisions on items A/B/C below
## Purpose
Bernd is retiring `markitect-main` and has transferred most functionality to
sibling repos. This document identifies what was provided by `markitect-main`
that is **not addressed** in those successors, and flags candidates that may
not fit any successor's intent.
## Successor Ecosystem (5 repos, not 3)
| Repo | Role |
|---|---|
| `markitect-tool` | Markdown syntax layer + structured-document primitives; defines source-adapter and render-adapter contracts. CLI: `mkt`. |
| `kontextual-engine` | Headless knowledge operations engine: artifacts, collections, persistence, relationships, workflow runs/manifests, query, quality/assessment, API. |
| `infospace-bench` | Application layer — concrete infospaces, evaluation methodology, reference pilots. |
| `markitect-filter` | Source-format ingestion adapters (`source.epub3`, `source.pdf`) implementing the markitect-tool source-adapter contract. |
| `markitect-quarkdown` | Render/export adapter — implements the markitect-tool render-adapter contract via Quarkdown. |
## Method
Analysis is grounded in each successor's own assessment docs (recent, May 2026):
- `markitect-tool/docs/markitect-main-scope-assessment.md`
- `kontextual-engine/docs/markitect-main-scope-assessment.md`
- `kontextual-engine/docs/system-layer-extraction-inventory.md`
- `kontextual-engine/docs/system-layer-migration-backlog.md`
- `infospace-bench/docs/markitect-main-scope-assessment.md`
- `infospace-bench/docs/legacy-infospace-feature-inventory.md`
- `infospace-bench/docs/replacement-acceptance-matrix.md`
Cross-checked against actual `markitect-main` module sizing (Python LOC) and
`__init__.py` docstrings.
**Confidence:** These successor docs are authoritative on *intent*. They have
**not** been line-verified to confirm every "reimplement"-classified item
actually landed in the successor. Where verification matters, it's flagged.
---
## A. Doesn't fit any successor's intent — needs a new home or explicit retirement
These are explicitly pushed away by tool/engine/bench and are unrelated to
filter/quarkdown.
| markitect-main area | LOC | What it is | Status |
|---|---|---|---|
| `markitect/finance/` | ~8,100 | Cost-tracking system: cost items, period allocation to issues, financial reports, audit trails | **Orphan.** markitect-main's own SCOPE.md lists "financial transactions" as out-of-scope. Belongs with issue/project-ops, not knowledge tooling. |
| `issue_tracker/` + `_issue-tracking/` + `.issues/` | ~1,200 | Issue tracking (finance allocates costs to these issues) | **Orphan to the five** — but likely already superseded by the `issue-facade` capability / `use-issues` skill. **Verify before retiring.** |
| `markitect/profile/` | ~1,600 | User-profile CRUD, multi-profile, DB-backed | **Orphan.** Unrelated to all five. (Distinct from quarkdown's *render* "profile".) |
| `markitect/production/` | ~3,800 | Deployment-readiness validation, cross-platform checks, perf benchmarking | Engine keeps only "structured error/audit *ideas*". Deployment-validation bulk is orphan. |
| `tools/`, `services/`, gitea/tddai glue | ~5,500 | Project-ops tooling | Out-of-scope everywhere. |
| `markitect/legacy/` + `legacy_compat.py` | ~2,700 | Backward-compat shims | Retire by definition. |
## B. Rendering / asset / plugin layer — only *partially* covered, real residual gap
**This is the most consequential gap.** `SCOPE.md` lists "Rendering: markdown
→ interactive HTML via plugin system (testdrive-jsui)" as an in-scope
capability of markitect-main.
| Area | LOC | Covered? |
|---|---|---|
| `markitect/plugins/` (generic processor/formatter/validator/exporter plugin system) | ~8,000 | **No.** tool defines a render-adapter *contract* and an *extension* point, but the general plugin runtime isn't carried. |
| `markitect/assets/` (content-addressable asset store, dedup, `.mdpkg` ZIP packaging, symlink handling) + `asset_registry.json` (277 KB) | ~6,000 | **No.** Bench says "leave behind unless a concrete export needs assets." |
| Interactive-HTML / testdrive-jsui rendering, `static/`, `themes/`, `templates/document.html`, JS UI | — | **Partial only.** quarkdown covers a *Quarkdown* export path; the interactive-HTML / JS-UI path has no home. |
**Decision needed:** spin these into a dedicated render/asset repo (sibling to
quarkdown), fold the asset store into one of the existing repos, or retire the
interactive-HTML path.
## C. The other "Information Space" lineage — `markitect/spaces/` (~11,000 LOC)
**Distinct from `markitect/infospace/`** (which infospace-bench inherited).
`spaces/` is an older/parallel abstraction with features bench did *not* take:
- event-driven change tracking & notifications
- persistent transclusion context with cross-space references
- bidirectional directory synchronization
- HTML rendering of spaces with caching/themes
Engine takes generic persistence concepts and bench takes infospace semantics,
but **these specific `spaces/` behaviors (bidirectional sync, event
notifications, cross-space transclusion context) aren't mapped anywhere.**
Likely intended as dead/superseded — but 11k LOC warrants an explicit "retire
vs salvage" call.
## D. Declined-by-design (confirm retirement, don't re-extract)
| Area | LOC | Disposition |
|---|---|---|
| `markitect/graphql/` | ~4,000 | All three explicitly declined GraphQL ("evidence of API need, not a commitment"). |
| `markitect/query_paradigms/` | ~3,500 | Engine/tool keep the *QueryResult envelope* concept but say "do not port the registry wholesale." |
| `markitect/proxy/` | ~870 | Non-markdown→md proxy with checksum/freshness tracking. **Overlaps markitect-filter.** Freshness/staleness-tracking mechanism may be worth checking against bench's deferred "stale-mappings." |
| `capabilities/` (top-level) | ~8,300 | Capability-packaging architecture; partially maps to tool (schema generation) but the packaging approach itself isn't carried. |
---
## What this means
The successors are, by their own assessments, **near complete for the
in-scope core** (parsing/schema → tool; persistence/workflow → engine;
infospace lifecycle → bench; ingestion → filter; one render path →
quarkdown). The truly unaddressed functionality is almost entirely the stuff
markitect-main accreted **beyond** its stated scope: finance, issue tracking,
user profiles, production/deployment validation, the asset/plugin/interactive-HTML
rendering stack, and the older `spaces/` abstraction.
## Decisions for Bernd
Three live decisions, not a long extraction backlog:
### Decision 1 — Render/asset stack (Section B)
The one with genuine product value left.
- **Option 1a:** new repo (sibling to quarkdown) for plugin runtime + asset store + interactive-HTML
- **Option 1b:** fold the asset store into an existing repo (most likely markitect-tool, behind a flag); retire interactive-HTML
- **Option 1c:** retire the interactive-HTML path entirely; trust quarkdown export as the single render story
### Decision 2 — `markitect/spaces/` (Section C)
- **Option 2a:** salvage bidirectional-sync / event-tracking / cross-space transclusion into engine (engine has the persistence story to support it)
- **Option 2b:** retire wholesale as superseded by infospace
### Decision 3 — Project-ops cluster (Section A: finance + issues + profile)
- **Option 3a:** confirm `issue-facade` already replaces `issue_tracker/` + `finance/`; retire both
- **Option 3b:** identify a home for any pieces worth keeping
---
## Suggested verification before deciding
If verification matters before committing:
- **For Decision 1:** grep the five repos for any render/asset adapter that already covers the HTML path beyond Quarkdown.
- **For Decision 2:** check whether engine's `OperationRun` + collection model can express bidirectional-sync semantics, or whether new primitives would be needed.
- **For Decision 3:** confirm whether `issue-facade` truly replaces `issue_tracker/` + `finance/` end-to-end.
Happy to do any of these focused passes when you're ready to decide.

View File

@@ -117,7 +117,7 @@ This graph enables:
```bash
# Ensure MarkiTect is installed
cd /path/to/markitect_project
cd /path/to/markitect-main
pip install -e .
```

View File

@@ -0,0 +1,230 @@
# Advanced Usage — Wealth of Nations Infospace
Patterns for working with the WoN infospace (988 entities) after the initial
pipeline run. Every command in this file has been run against the actual
infospace at the time of writing (2026-04-21); output shapes are excerpted
verbatim.
All commands assume `cwd = examples/infospace-with-history` and the
`markitect-venv` Python environment.
---
## 1. Incremental evaluation — add entities after the initial run
`markitect infospace evaluate` writes one file per entity under
`output/evaluations/<slug>.md`. It skips any entity whose evaluation file
already exists, so re-running after adding a new entity processes only the
new one.
```bash
# Add a new entity file
vim output/entities/new-concept.md
# Evaluate only the new entity (explicit)
markitect infospace evaluate --entity new-concept --provider openrouter
# Or re-run the whole pass — existing 988 are skipped, only the new file hits the LLM
markitect infospace evaluate --provider openrouter
```
**How skip detection works.** Evaluation slugs are normalised to underscores
with `_s_` preserving apostrophes (`farmers-capital` entity →
`farmer_s_capital.md` evaluation). If a new entity slug collides with an
existing evaluation under this normalisation, the eval will be skipped.
To be sure an entity was picked up, check:
```bash
# Count entities vs evaluations
ls output/entities/*.md | grep -Ev 'book-[0-9]+-(chapter-[0-9]+|introduction)-' | wc -l
ls output/evaluations/*.md | wc -l
```
---
## 2. Re-evaluating after guideline changes
`evaluate` has no `--force` flag; re-evaluation requires deleting the
existing file first.
```bash
# Re-evaluate a single entity after updating the evaluation rubric
rm output/evaluations/accumulation_of_stock.md
markitect infospace evaluate --entity accumulation-of-stock --provider openrouter
# Re-evaluate a whole chapter
ls output/entities/book-1-chapter-06-entities.md # see which entities the chapter produced
# Map chapter entities to eval filenames (apostrophe/underscore normalisation) and rm them
```
After re-evaluating, refresh the aggregate:
```bash
markitect infospace eval-summary --update-metrics
```
This merges `per_entity_mean` into `output/metrics/metrics.yaml` so the next
`markitect infospace viability` check reflects the new scores.
---
## 3. Interpreting per-entity score distributions
`eval-summary` shows the mean for each of the five evaluation dimensions
plus the overall range:
```
$ markitect infospace eval-summary
Evaluation summary — 985 entities evaluated
Dimension Mean
--------------------------------------
overall 3.956
definition_precision 3.620
domain_placement 4.559
explanatory_value 3.936
source_grounding 4.358
vsm_relevance 3.305
Range: 1.00 4.80
```
Interpretation:
- `overall` above the 3.5 viability threshold → the collection passes
`per_entity_mean`.
- The lowest dimension (`vsm_relevance` = 3.305) is the weakest signal. If
the collection is meant to be VSM-grounded, this is the dimension most
worth improving (via sharper entity definitions or schema changes).
- A wide range (1.00 4.80) tells you there are outliers at both ends —
worth triaging (see pattern 4).
---
## 4. Triaging low scorers
`markitect infospace entities --by-type` prints each entity's star score
in-line:
```
$ markitect infospace entities --by-type | head
=== Element (315 entities) ===
active_and_productive_stock Accumulation S1 ★4.6
advanced_state_of_society General Theory S5
agio_of_bank_money Exchange S2 ★4.8
```
Entities with no `★` have no evaluation yet. To list the lowest-scoring
entities across the whole collection:
```bash
# Extract overall_score from every evaluation file and sort ascending
for f in output/evaluations/*.md; do
score=$(awk '/^overall_score:/ {print $2; exit}' "$f")
printf "%s\t%s\n" "$score" "$(basename "$f" .md)"
done | sort -n | head -20
```
The 20 lowest scorers are the natural triage list — inspect their
`output/entities/<slug>.md` and evaluation rationales to decide whether to
refine the entity, merge it with a better-formed neighbour, or drop it.
---
## 5. Reading and acting on collection-check output
`markitect infospace check` runs five concerns (C1C5). Use `--concern` to
focus on one and `--json` for machine-readable output:
```bash
# Redundancy — which pairs of entities are suspiciously similar?
markitect infospace check --concern redundancy --json
```
```json
{
"redundancy": {
"concern": "C1",
"redundancy_ratio": 0.0061,
"similar_pairs": [
{"entity_a": "bank_economic_contribution_metrics",
"entity_b": "bank_economic_development_metrics",
"similarity": 1.0, "method": "word_overlap"},
{"entity_a": "economic_system_objectives",
"entity_b": "economic_system_purpose",
"similarity": 0.9394, "method": "word_overlap"}
]
}
}
```
Acting on this:
- **Similarity = 1.0** is almost certainly a duplicate — pick one slug and
merge or delete the other.
- **0.850.99** usually means two entities genuinely cover the same idea
with slight phrasing differences. Merging is the cleanest fix.
- **< 0.85** usually represents legitimate adjacent concepts — leave as-is
unless the definition rubric says otherwise.
For coverage and coherence, the pattern is the same: the `--json` output
surfaces the specific entities / missing links / disconnected components
you need to look at, rather than a bare ratio.
---
## 5. Systematic processing of long texts
For long source material (books, multi-chapter specifications, corpora), the
pipeline can produce a clean chapter-by-chapter git history on its own if
you let it. The pattern:
```bash
# Process all sources in canonical order, eval and classify per chapter,
# snapshot metrics after each chapter.
markitect infospace process --all \
--provider openrouter \
--eval-after-source \
--classify-after-source \
--check-after-each
```
What you get:
- **One commit per source file**, not per batch run. The commit message body
lists counts by bucket (`entities: +23`, `evaluations: +23`,
`classifications: +23`) derived from the actual staged diff, so `git log`
reads like the story of the infospace growing.
- **Chapter-atomic commits.** `--eval-after-source` and
`--classify-after-source` evaluate and classify *only the new entities*
from the just-processed source before the commit lands, so each commit is
a self-contained chapter snapshot.
- **Metrics-per-chapter trail.** `--check-after-each` appends a snapshot to
`output/metrics/history.yaml` after every chapter, so `markitect infospace
history` later shows the metric trajectory rather than just start/end.
**Cost tradeoff.** `--eval-after-source` pays LLM latency per chapter rather
than amortising it across one bulk batch. It's worth it when you care about
the git history or want early quality signal, not when you're bulk-backfilling
a known-good corpus.
**Triage during the run.** While processing, use `markitect infospace
chapters` in another shell to see per-source entity/eval/classify counts and
mean scores — handy for spotting chapters that under-extracted or evaluated
poorly.
```
$ markitect infospace chapters
source entities evaluated classified mean_score
------------------- -------- --------- ---------- ----------
book-1-chapter-01 96 96 79 4.22
book-1-chapter-02 16 16 10 4.06
```
---
## See also
- `METRICS-METHODOLOGY.md` — how each metric is computed.
- `docs/composition-guide.md` — using this infospace as a discipline for a
different domain.
- `docs/performance-notes.md` — observed timings and provider choices.

View File

@@ -0,0 +1,106 @@
# Performance Notes — Wealth of Nations Infospace
Observed timings, file sizes, and provider choices from the 988-entity WoN
example. These are **operational notes**, not a benchmark — numbers come
from the actual S3.3 evaluation run (2026-02-23) rather than a controlled
experiment.
---
## Evaluation batch duration
The initial evaluation pass produced 985 `output/evaluations/*.md` files:
- First `evaluated_at`: `2026-02-23T00:11:52`
- Last `evaluated_at`: `2026-02-23T06:39:45`
- **Total wall time: ~6h 28m**
- **Effective throughput: ~2.5 entities/min** (~152 entities/hour)
Extracted from evaluation frontmatter:
```bash
grep -h '^evaluated_at:' output/evaluations/*.md | sort | sed -n '1p;$p'
```
Caveats:
- This was against OpenRouter's free tier, which applies implicit
rate-limiting and occasional retries.
- Throughput is not constant — gaps between bursts show up as plateaus
when you plot the timestamps.
- The batch was not fully parallelised; a tuned concurrent client could
likely 24× this throughput on a paid OpenRouter tier.
---
## Tokens per entity (estimate)
Direct token counts are not logged in the evaluation files, but the
inputs and outputs are on disk:
- **Input per request**: evaluation schema (~3.7 KB) + entity file
(~0.7 KB median) + fixed system prompt ≈ **~15002500 tokens in**
- **Output per request**: structured evaluation with 5 dimensions and
rationales, median eval file 3.6 KB ≈ **~600800 tokens out**
- **Round-trip total**: **~20003000 tokens per entity**
- **Batch total estimate**: 985 entities × ~2500 tokens ≈ **~2.5M tokens**
for the full pass
The constant per-entity input means the cheapest way to reduce spend on a
re-run is to narrow the targeted entities (`--entity <slug>` or
`--chapter <n>`), not to shorten the schema.
---
## Embedding cache and collection checks
`markitect infospace check --concern redundancy` supports two similarity
backends (see `markitect/infospace/checks/redundancy.py`):
- **`word_overlap`** — the default, used when no embeddings are provided.
Pure-Python set intersection over tokenised entity text. **No LLM calls,
no cache needed.** This is what the current WoN check runs.
- **`embedding`** — active when a pre-computed `{slug: vector}` mapping is
passed in. No persistent on-disk embedding cache exists today; the
caller is responsible for computing and supplying the vectors.
Implication: the 988-entity `check` runs in seconds because it's all
word-overlap. Switching to embedding similarity would add an embedding
API pass (another ~988 requests) which is currently a manual step
outside the CLI.
---
## Provider choice — recommendation
For the WoN dataset specifically (text-heavy entities, 5-dimension
rubric):
| Scale | Recommended provider | Rationale |
|-----------------------|----------------------------------|-----------|
| < 50 entities | `gemini/gemini-2.5-flash` | Fast default; free tier is generous enough; consistent with `markitect llm-check` out of the box. |
| 50 1000 entities | `openrouter` with a `:free` model (e.g. `arcee-ai/trinity-large-preview:free`) | What the S3.3 batch used; gets through 988 entities in one overnight run without cost. |
| > 1000 entities | `openrouter` with a paid small-context model, or `openai` | Free-tier rate limits start to dominate wall time; paying for higher concurrency is cheaper than calendar time. |
All providers are accepted by `markitect infospace evaluate --provider`.
The evaluation schema doesn't assume any provider-specific features.
Note on provider mixing: if part of a collection is evaluated under one
provider/model and the rest under another, `per_entity_mean` can drift
slightly (different models calibrate scores differently). For the
viability threshold of 3.5 the drift is usually negligible, but for
fine-grained outlier analysis prefer a single provider per batch.
---
## What is *not* measured here
- **End-to-end pipeline time** (entity extraction from raw chapters,
classification, relation graph) — only the evaluation phase is timed.
- **Memory footprint** — the full in-memory state for 988 entities is
small (< 200 MB observed), but not systematically measured.
- **Failure/retry rates** — the 985 vs 988 gap is three entities the
original run missed (plus one added later); no structured retry log
was kept.
Expanding any of these into a proper benchmark is **out of scope** for
the WoN example and should live alongside a synthetic corpus that can be
regenerated deterministically.

View File

@@ -0,0 +1,28 @@
---
entity_slug: advanced_state_of_society
evaluator: gemini-2.5-flash
evaluated_at: '2026-04-21T21:32:17.135192'
overall_score: 4.5
scores:
- name: definition_precision
value: 4.0
max_value: 5.0
rationale: The definition is precise, listing key characteristics like accumulated
stock and private property. It clearly distinguishes the concept by contrasting
it with earlier economic conditions.
- name: source_grounding
value: 5.0
max_value: 5.0
rationale: This entity is deeply grounded in Smith's work, particularly in Book
I
---
# Evaluation: Advanced State Of Society
## definition_precision — 4.0 / 5.0
The definition is precise, listing key characteristics like accumulated stock and private property. It clearly distinguishes the concept by contrasting it with earlier economic conditions.
## source_grounding — 5.0 / 5.0
This entity is deeply grounded in Smith's work, particularly in Book I

View File

@@ -0,0 +1,61 @@
---
entity_slug: bank_notes
evaluator: null
evaluated_at: '2026-04-21T21:33:16.736926'
overall_score: 4.4
scores:
- name: definition_precision
value: 5.0
max_value: 5.0
rationale: The definition is precise, clearly distinguishing bank notes by their
issuer, form, and key characteristics (payable on demand, confidence-based). It
avoids circularity and captures a distinct concept.
- name: source_grounding
value: 5.0
max_value: 5.0
rationale: The entity is excellently grounded in "The Wealth of Nations," specifically
Book II, Chapter 2, where Smith extensively discusses bank notes' role in economizing
precious metals and their reliance on public confidence.
- name: domain_placement
value: 4.0
max_value: 5.0
rationale: '"Exchange" is an appropriate domain as bank notes primarily function
as a medium for facilitating transactions. While "Money" or "Finance" could also
fit, "Exchange" accurately reflects their operational role in the economy.'
- name: vsm_relevance
value: 3.0
max_value: 5.0
rationale: Bank notes are a critical *medium* or *tool* that enables the primary
operations (S1) of an economy (i.e., exchange of goods and services). However,
they are not a VSM system or management function themselves, making their direct
mapping somewhat abstract.
- name: explanatory_value
value: 5.0
max_value: 5.0
rationale: This entity offers significant explanatory power by detailing how paper
money functions, its reliance on confidence, and its role in reducing the need
for precious metals, thereby illuminating a key mechanism in Smith's economic
theory.
---
# Evaluation: Bank Notes
## definition_precision — 5.0 / 5.0
The definition is precise, clearly distinguishing bank notes by their issuer, form, and key characteristics (payable on demand, confidence-based). It avoids circularity and captures a distinct concept.
## source_grounding — 5.0 / 5.0
The entity is excellently grounded in "The Wealth of Nations," specifically Book II, Chapter 2, where Smith extensively discusses bank notes' role in economizing precious metals and their reliance on public confidence.
## domain_placement — 4.0 / 5.0
"Exchange" is an appropriate domain as bank notes primarily function as a medium for facilitating transactions. While "Money" or "Finance" could also fit, "Exchange" accurately reflects their operational role in the economy.
## vsm_relevance — 3.0 / 5.0
Bank notes are a critical *medium* or *tool* that enables the primary operations (S1) of an economy (i.e., exchange of goods and services). However, they are not a VSM system or management function themselves, making their direct mapping somewhat abstract.
## explanatory_value — 5.0 / 5.0
This entity offers significant explanatory power by detailing how paper money functions, its reliance on confidence, and its role in reducing the need for precious metals, thereby illuminating a key mechanism in Smith's economic theory.

View File

@@ -0,0 +1,60 @@
---
entity_slug: bank_systemic_risk_management
evaluator: gemini-2.5-flash-lite
evaluated_at: '2026-04-21T21:49:35.222637'
overall_score: 4.0
scores:
- name: definition_precision
value: 4.0
max_value: 5.0
rationale: The definition is precise and clearly outlines the purpose of bank systemic
risk management. It avoids being an overly broad umbrella term.
- name: source_grounding
value: 3.0
max_value: 5.0
rationale: While the concept of managing risks to the banking system is present
in Book II, Chapter 2, the explicit framing of "systemic risk management" as a
distinct entity with specific practices might be a slight abstraction beyond Smith's
direct terminology.
- name: domain_placement
value: 5.0
max_value: 5.0
rationale: The "Regulation" domain is highly appropriate. Managing systemic risk
is fundamentally a regulatory concern aimed at ensuring the stability of the financial
system.
- name: vsm_relevance
value: 4.0
max_value: 5.0
rationale: This entity strongly maps to VSM System 3 (Internal Regulation/Audit)
as it involves monitoring and controlling internal operations to prevent systemic
failures. It also has elements of System 5 (Policy) in setting overall stability
goals.
- name: explanatory_value
value: 4.0
max_value: 5.0
rationale: The entity provides good explanatory value by highlighting a crucial
mechanism for maintaining financial stability. It explains *how* the banking system
can be protected from cascading failures.
---
# Evaluation: Bank Systemic Risk Management
## definition_precision — 4.0 / 5.0
The definition is precise and clearly outlines the purpose of bank systemic risk management. It avoids being an overly broad umbrella term.
## source_grounding — 3.0 / 5.0
While the concept of managing risks to the banking system is present in Book II, Chapter 2, the explicit framing of "systemic risk management" as a distinct entity with specific practices might be a slight abstraction beyond Smith's direct terminology.
## domain_placement — 5.0 / 5.0
The "Regulation" domain is highly appropriate. Managing systemic risk is fundamentally a regulatory concern aimed at ensuring the stability of the financial system.
## vsm_relevance — 4.0 / 5.0
This entity strongly maps to VSM System 3 (Internal Regulation/Audit) as it involves monitoring and controlling internal operations to prevent systemic failures. It also has elements of System 5 (Policy) in setting overall stability goals.
## explanatory_value — 4.0 / 5.0
The entity provides good explanatory value by highlighting a crucial mechanism for maintaining financial stability. It explains *how* the banking system can be protected from cascading failures.

View File

@@ -3,7 +3,7 @@ consistency_cycles: 0.0
coverage_ratio: 0.619048
granularity_entropy: 2.674752
modularity: 0.0
per_entity_mean: 3.955635
per_entity_mean: 3.95668
redundancy_ratio: 0.006073
type_distribution:
Element: 315

View File

@@ -240,8 +240,14 @@ def llm_catalog(output_format):
)
def llm_check(provider, model):
"""Send a minimal prompt to verify a provider is reachable and responding."""
import os
from markitect.llm import create_adapter
from markitect.llm.exceptions import LLMConfigurationError, LLMError
from markitect.llm.exceptions import (
LLMAPIError,
LLMConfigurationError,
LLMError,
)
from markitect.prompts.execution.models import RunConfig
resolved = resolve_llm(cli_provider=provider, cli_model=model)
@@ -252,6 +258,17 @@ def llm_check(provider, model):
f" model from: {resolved.model_source}"
)
# Advisory: OPENROUTER_API_KEY is set but this call won't use it. Common
# source of "works for me, fails for agents" when the env var holds a
# stale key that overrides a clean config entry.
if resolved.provider != "openrouter" and os.environ.get("OPENROUTER_API_KEY"):
click.echo(
" note: OPENROUTER_API_KEY is set but won't be used for this "
"provider. If OpenRouter calls fail elsewhere with 401, the env "
"var may be stale — unset or update it.",
err=True,
)
try:
adapter = create_adapter(
provider=resolved.provider,
@@ -273,6 +290,19 @@ def llm_check(provider, model):
except LLMError as exc:
elapsed = time.monotonic() - start
click.echo(f"ERROR \u2014 LLM error after {elapsed:.1f}s: {exc}", err=True)
# Targeted hint: 401 on openrouter almost always means a stale key.
if (
resolved.provider == "openrouter"
and isinstance(exc, LLMAPIError)
and exc.status_code == 401
):
click.echo(
" hint: OpenRouter returned 401 (unauthorized). Check whether "
"OPENROUTER_API_KEY is stale (`unset OPENROUTER_API_KEY` to "
"fall back to the key in ~/.config/markitect/config.toml, or "
"update the env var).",
err=True,
)
sys.exit(1)
except Exception as exc:
elapsed = time.monotonic() - start

View File

@@ -7,8 +7,9 @@ inspecting, and evaluating infospaces.
from __future__ import annotations
import re
from pathlib import Path
from typing import Optional
from typing import Dict, Optional
import click
@@ -228,6 +229,227 @@ def _entities_by_type(cfg, root: "Path", entity_list: list) -> None:
click.echo(f"\nTotal: {total} entities")
# ── chapters (per-source triage view) ────────────────────────────────
@infospace_commands.command()
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
@click.option(
"--format", "output_format",
type=click.Choice(["text", "json"]),
default="text",
help="Output format.",
)
def chapters(config_path: Optional[str], output_format: str):
"""List source files in canonical order with per-source stats.
For each source file in the sources directory, reports entity count,
mean per-entity score (if evaluated), classification coverage, and
processing status. Useful for triaging long-text infospaces.
"""
cfg, cfg_path = _load_config_or_exit(config_path)
root = cfg_path.parent
sources_dir = root / cfg.topic.sources if cfg.topic.sources else root
if not sources_dir.is_dir():
click.echo(f"No sources directory at {sources_dir}.", err=True)
raise SystemExit(1)
source_files = sorted(sources_dir.glob("*.md"))
if not source_files:
click.echo(f"No source files in {sources_dir}.", err=True)
raise SystemExit(1)
entities_dir = root / cfg.entities_dir
entity_list = (
parse_entity_directory(entities_dir) if entities_dir.is_dir() else []
)
# Build a source_id → [entities] map using the source_chapter field.
# Matching is lenient: entities with a source_chapter substring-equal
# to a normalized form of the source stem count as belonging to it.
def _chapter_keys(source_id: str) -> list:
"""Return strings an entity's source_chapter might contain."""
keys = [source_id, source_id.replace("-", " ")]
m = re.match(r"book-(\d+)-chapter-(\d+)", source_id)
if m:
book, chap = m.group(1), m.group(2)
roman = {"1": "I", "2": "II", "3": "III", "4": "IV", "5": "V"}
if book in roman:
keys.append(f"Book {roman[book]}, Chapter {int(chap)}")
keys.append(f"Book {roman[book]} Chapter {int(chap)}")
return keys
# Precompute evaluation scores and classification slugs once.
evals_dir = root / cfg.evaluations_dir
cls_dir = root / cfg.classifications_dir
eval_scores: Dict[str, float] = {}
if evals_dir.is_dir():
from markitect.infospace.evaluation_io import read_entity_evaluation
for ev_path in evals_dir.glob("*.md"):
try:
ev = read_entity_evaluation(ev_path)
if ev.overall_score is not None:
eval_scores[ev_path.stem] = ev.overall_score
except Exception:
continue
classified_slugs = (
{p.stem for p in cls_dir.glob("*.md")} if cls_dir.is_dir() else set()
)
rows = []
for source_file in source_files:
source_id = source_file.stem
keys = _chapter_keys(source_id)
matched = [
e for e in entity_list
if any(k.lower() in (e.source_chapter or "").lower() for k in keys)
]
slugs = {e.slug for e in matched}
evaluated = slugs & set(eval_scores)
classified = slugs & classified_slugs
mean = (
sum(eval_scores[s] for s in evaluated) / len(evaluated)
if evaluated else None
)
rows.append({
"source_id": source_id,
"entities": len(matched),
"evaluated": len(evaluated),
"classified": len(classified),
"mean_score": round(mean, 2) if mean is not None else None,
})
if output_format == "json":
import json
click.echo(json.dumps(rows, indent=2))
return
# Text: aligned table.
headers = ("source", "entities", "evaluated", "classified", "mean_score")
widths = [
max(len(h), max((len(str(r[h.replace(' ', '_')])) if h != "source"
else len(r["source_id"]))
for r in rows)) if rows else len(h)
for h in headers
]
fmt = " ".join(f"{{:<{w}}}" for w in widths)
click.echo(fmt.format(*headers))
click.echo(fmt.format(*("-" * w for w in widths)))
for r in rows:
click.echo(fmt.format(
r["source_id"],
r["entities"],
r["evaluated"],
r["classified"],
"-" if r["mean_score"] is None else f"{r['mean_score']:.2f}",
))
totals = {
"entities": sum(r["entities"] for r in rows),
"evaluated": sum(r["evaluated"] for r in rows),
"classified": sum(r["classified"] for r in rows),
}
click.echo(
f"\n{len(rows)} source file(s); "
f"{totals['entities']} entities, "
f"{totals['evaluated']} evaluated, "
f"{totals['classified']} classified."
)
# ── entity (single lookup) ───────────────────────────────────────────
@infospace_commands.command()
@click.argument("name")
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
def entity(name: str, config_path: Optional[str]):
"""Look up one entity by name, tolerating case / hyphens / underscores.
Prints slug, source path, domain, chapter, word count, overall score,
VSM system (if classified), and evaluation-file path.
"""
cfg, cfg_path = _load_config_or_exit(config_path)
root = cfg_path.parent
entities_dir = root / cfg.entities_dir
if not entities_dir.is_dir():
click.echo("No entities directory found.", err=True)
raise SystemExit(1)
entity_list = parse_entity_directory(entities_dir)
if not entity_list:
click.echo("No entities found.", err=True)
raise SystemExit(1)
# Normalize: lowercase, underscores.
def norm(s: str) -> str:
return s.lower().replace("-", "_").replace(" ", "_")
target = norm(name)
by_slug = {e.slug: e for e in entity_list}
match = by_slug.get(target)
if match is None:
# Substring fallback for partial input.
candidates = [e for e in entity_list if target in norm(e.slug)]
if len(candidates) == 1:
match = candidates[0]
elif len(candidates) > 1:
click.echo(f"Ambiguous — '{name}' matches multiple entities:", err=True)
for c in sorted(candidates, key=lambda e: e.slug)[:10]:
click.echo(f" {c.slug}", err=True)
if len(candidates) > 10:
click.echo(f" … and {len(candidates) - 10} more", err=True)
raise SystemExit(1)
else:
click.echo(f"No entity matching '{name}'.", err=True)
near = sorted(
e.slug for e in entity_list
if target.split("_", 1)[0] in e.slug
)[:5]
if near:
click.echo(f" Near matches: {', '.join(near)}", err=True)
raise SystemExit(1)
# Load score + classification (best-effort).
score: Optional[float] = None
evaluator: Optional[str] = None
eval_file = root / cfg.evaluations_dir / f"{match.slug}.md"
if eval_file.is_file():
try:
from markitect.infospace.evaluation_io import read_entity_evaluation
ev = read_entity_evaluation(eval_file)
score = ev.overall_score
evaluator = ev.evaluator
except Exception:
pass
vsm: Optional[str] = None
cls_file = root / cfg.classifications_dir / f"{match.slug}.md"
if cls_file.is_file():
try:
from markitect.infospace.classification_io import read_entity_classification
cls = read_entity_classification(cls_file)
vsm = cls.vsm_system
except Exception:
pass
# Output — one field per line so it's easy to grep or pipe.
click.echo(f"slug: {match.slug}")
click.echo(f"source_path: {match.source_path}")
click.echo(f"domain: {match.domain or '-'}")
click.echo(f"chapter: {match.source_chapter or '-'}")
click.echo(f"word_count: {match.total_word_count}")
click.echo(f"vsm_system: {vsm or '-'}")
if score is not None:
click.echo(f"overall_score: {score:.2f}")
click.echo(f"evaluator: {evaluator or '-'}")
click.echo(f"evaluation: {eval_file}")
else:
click.echo("evaluation: (not yet evaluated)")
# ── evaluate ─────────────────────────────────────────────────────────
@@ -237,7 +459,14 @@ def _entities_by_type(cfg, root: "Path", entity_list: list) -> None:
@click.option("--model", default=None, help="LLM model name.")
@click.option("--entity", "entity_slug", default=None, help="Evaluate a single entity by slug.")
@click.option("--chapter", default=None, help="Evaluate entities from a specific chapter.")
def evaluate(config_path, provider, model, entity_slug, chapter):
@click.option("--force", is_flag=True, default=False,
help="Re-evaluate entities whose evaluation file already exists.")
@click.option("--model-fallback", "model_fallback", default=None,
help="If the primary model hits a rate limit (429), retry the "
"failed entities once with this model. Useful on free tiers "
"where models have separate quota buckets (e.g. "
"gemini-2.5-flash → gemini-2.5-flash-lite).")
def evaluate(config_path, provider, model, entity_slug, chapter, force, model_fallback):
"""Evaluate entities using LLM-based quality assessment."""
cfg, cfg_path = _load_config_or_exit(config_path)
root = cfg_path.parent
@@ -252,32 +481,44 @@ def evaluate(config_path, provider, model, entity_slug, chapter):
click.echo("No entities to evaluate.")
return
# Filter
# Filter. Accept hyphenated input for --entity by normalizing to the
# underscore slug format produced by parse_entity_directory.
if entity_slug:
entity_list = [e for e in entity_list if e.slug == entity_slug]
if not entity_list:
click.echo(f"Error: Entity '{entity_slug}' not found.", err=True)
normalized = entity_slug.replace("-", "_")
matches = [e for e in entity_list if e.slug == normalized]
if not matches:
# Build a short "did you mean…" list from entities sharing a stem.
stem = normalized.split("_", 1)[0]
near = sorted(e.slug for e in entity_list if e.slug.startswith(stem))[:5]
msg = f"Error: Entity '{entity_slug}' not found."
if near:
msg += f" Did you mean: {', '.join(near)} ?"
click.echo(msg, err=True)
raise SystemExit(1)
entity_list = matches
elif chapter:
entity_list = [e for e in entity_list if chapter in e.source_chapter]
if not entity_list:
click.echo(f"No entities found for chapter '{chapter}'.")
return
# Skip entities that already have evaluation files (incremental resume)
# Skip entities that already have evaluation files (incremental resume).
# Applies uniformly to full-pass, --entity, and --chapter runs unless
# --force is set.
from markitect.infospace.evaluate import run_entity_evaluation
output_dir = root / cfg.evaluations_dir
if not entity_slug and not chapter and output_dir.is_dir():
previous_digests = {
p.stem: "" # non-empty sentinel → triggers skip in BatchEvaluator
for p in output_dir.glob("*.md")
}
entity_list = [e for e in entity_list if e.slug not in previous_digests]
if not force and output_dir.is_dir():
existing = {p.stem for p in output_dir.glob("*.md")}
before = len(entity_list)
entity_list = [e for e in entity_list if e.slug not in existing]
skipped = before - len(entity_list)
if not entity_list:
click.echo("All entities already evaluated. Nothing to do.")
click.echo("All selected entities already evaluated. "
"Re-run with --force to overwrite.")
return
if previous_digests:
click.echo(f"Skipping {len(previous_digests)} already-evaluated entities.")
if skipped:
click.echo(f"Skipping {skipped} already-evaluated entities. "
"Use --force to re-evaluate.")
# Create adapter
from markitect.llm import create_adapter
@@ -285,10 +526,14 @@ def evaluate(config_path, provider, model, entity_slug, chapter):
adapter = create_adapter(provider, model=model)
run_config = RunConfig(model_name=model, temperature=0.3, max_tokens=2000)
# Progress callback
# Progress callback — surface error detail so agents don't have to
# drop into Python to see whether an ERROR was 429, 503, or auth.
def on_progress(done, total, result):
status = result.status.upper()
click.echo(f" [{done}/{total}] {result.key}: {status}")
if status == "ERROR" and result.error:
click.echo(f" [{done}/{total}] {result.key}: ERROR — {result.error}")
else:
click.echo(f" [{done}/{total}] {result.key}: {status}")
click.echo(f"Evaluating {len(entity_list)} entities via {provider}...")
@@ -301,6 +546,42 @@ def evaluate(config_path, provider, model, entity_slug, chapter):
progress_callback=on_progress,
)
# Model fallback: if any entities failed with a rate-limit-looking
# error and the user opted in with --model-fallback, retry them once
# with a fresh adapter on the fallback model. Different free-tier
# models have separate quota buckets, so this often succeeds when
# the primary is exhausted.
if model_fallback and summary.failed > 0:
rate_limited = [
r for r in summary.results
if r.status == "error"
and r.error
and ("429" in r.error or "rate" in r.error.lower())
]
if rate_limited:
retry_slugs = {r.key for r in rate_limited}
retry_entities = [e for e in entity_list if e.slug in retry_slugs]
click.echo(
f"\n{len(retry_entities)} rate-limited entities — "
f"retrying with --model-fallback {model_fallback}..."
)
fb_adapter = create_adapter(provider, model=model_fallback)
fb_run_config = RunConfig(
model_name=model_fallback, temperature=0.3, max_tokens=2000
)
fb_summary = run_entity_evaluation(
config=cfg,
entities=retry_entities,
adapter=fb_adapter,
run_config=fb_run_config,
output_dir=output_dir,
progress_callback=on_progress,
)
summary.succeeded += fb_summary.succeeded
summary.failed = (summary.failed - len(retry_entities)) + fb_summary.failed
summary.total_prompt_tokens += fb_summary.total_prompt_tokens
summary.total_completion_tokens += fb_summary.total_completion_tokens
click.echo(f"\nDone: {summary.succeeded} succeeded, {summary.failed} failed, {summary.skipped} skipped")
if summary.total_tokens > 0:
click.echo(f"Tokens used: {summary.total_tokens}")
@@ -1015,6 +1296,18 @@ def disciplines(config_path: Optional[str]):
help="Run collection checks (C1C5) after each source file.",
)
@click.option("--no-commit", is_flag=True, help="Skip git commits.")
@click.option(
"--eval-after-source",
is_flag=True,
help="After each source's stages succeed, evaluate just the newly-"
"added entities so the per-source commit is self-contained.",
)
@click.option(
"--classify-after-source",
is_flag=True,
help="After each source's stages succeed, classify just the newly-"
"added entities so the per-source commit is self-contained.",
)
def process(
glob_pattern: Optional[str],
process_all: bool,
@@ -1023,6 +1316,8 @@ def process(
model: Optional[str],
check_after_each: bool,
no_commit: bool,
eval_after_source: bool,
classify_after_source: bool,
):
"""Process source files through the pipeline defined in infospace.yaml.
@@ -1096,12 +1391,22 @@ def process(
# Run pipeline
from markitect.infospace.pipeline import SourcePipeline
if (eval_after_source or classify_after_source) and adapter is None:
click.echo(
"Error: --eval-after-source / --classify-after-source require "
"--provider (they call the LLM).",
err=True,
)
raise SystemExit(1)
pipeline = SourcePipeline(
cfg, root,
adapter=adapter,
provider=provider or "",
model=(model or _PROVIDER_DEFAULTS.get(provider or "", "")) if provider else "",
no_commit=no_commit,
eval_after_source=eval_after_source,
classify_after_source=classify_after_source,
)
total = len(source_files)

View File

@@ -195,12 +195,23 @@ def run_entity_evaluation(
"""
topic = config.topic.name
evaluations_path = output_dir or Path(config.evaluations_dir)
evaluator_name = (run_config.model_name if run_config else "unknown")
# Fall back from run_config.model_name (may be None if the CLI user did
# not pass --model) to the adapter's resolved model, and only then to
# "unknown". Keeps the evaluator field in the written frontmatter
# informative for later audits.
default_evaluator = (
(run_config.model_name if run_config else None)
or getattr(adapter, "_model", None)
or "unknown"
)
def _write_and_notify(done: int, total: int, result) -> None:
# Write file immediately on success (incremental — run is resumable)
if result.status == "success" and result.response is not None:
scores = parse_evaluation_response(result.response.content, dimensions)
# Prefer the model name the adapter actually echoed back — it
# reflects post-resolution fallbacks (e.g. flash → flash-lite).
evaluator_name = result.response.model or default_evaluator
evaluation = EntityEvaluation(
entity_slug=result.key,
evaluator=evaluator_name,

View File

@@ -81,17 +81,26 @@ def snapshot_from_checks(
# ── Metrics file I/O ────────────────────────────────────────────────
def write_metrics_file(metrics: Dict[str, float], path: Path) -> None:
def write_metrics_file(metrics: Dict[str, Any], path: Path) -> None:
"""Write the latest metrics to a simple YAML file.
This file is used by ``markitect infospace viability`` for quick
threshold checking.
threshold checking. Non-numeric values (e.g. ``type_distribution``)
are passed through unchanged; floats are rounded to 6 dp; ints are
preserved as ints so external consumers don't see ``29`` silently
become ``29.0`` on every round-trip.
"""
def _normalize(v: Any) -> Any:
if isinstance(v, bool):
return v
if isinstance(v, float):
return round(v, 6)
return v
path.parent.mkdir(parents=True, exist_ok=True)
path.write_text(
yaml.safe_dump(
{k: round(v, 6) if isinstance(v, float) else v
for k, v in sorted(metrics.items())},
{k: _normalize(v) for k, v in sorted(metrics.items())},
default_flow_style=False,
sort_keys=True,
),
@@ -99,14 +108,20 @@ def write_metrics_file(metrics: Dict[str, float], path: Path) -> None:
)
def read_metrics_file(path: Path) -> Dict[str, float]:
"""Read the latest metrics from a YAML file."""
def read_metrics_file(path: Path) -> Dict[str, Any]:
"""Read the latest metrics from a YAML file.
Returns all keys as written on disk, preserving types verbatim so a
round-trip via :func:`write_metrics_file` does not silently drop
structured values (e.g. ``type_distribution``) or flatten ints to
floats.
"""
if not path.is_file():
return {}
raw = yaml.safe_load(path.read_text(encoding="utf-8"))
if not isinstance(raw, dict):
return {}
return {k: float(v) for k, v in raw.items() if isinstance(v, (int, float))}
return raw
# ── History operations ───────────────────────────────────────────────

View File

@@ -62,6 +62,8 @@ class SourcePipeline:
provider: str = "",
model: str = "",
no_commit: bool = False,
eval_after_source: bool = False,
classify_after_source: bool = False,
) -> None:
self.config = config
self.root = root
@@ -69,6 +71,8 @@ class SourcePipeline:
self.provider = provider
self.model = model
self.no_commit = no_commit
self.eval_after_source = eval_after_source
self.classify_after_source = classify_after_source
# ── Public API ────────────────────────────────────────────────────
@@ -110,6 +114,12 @@ class SourcePipeline:
stage_outputs: Dict[str, str] = {}
stage_logs: List[Dict[str, Any]] = []
# Snapshot entity slugs before any stage runs so we can identify
# which entities were newly produced by this source. Used to scope
# --eval-after-source / --classify-after-source to only the new
# entities.
pre_entity_slugs = self._current_entity_slugs()
print(f"\nProcessing: {source_id}")
print("=" * 60)
@@ -133,6 +143,14 @@ class SourcePipeline:
print(f"\n {source_id}: all stages complete.")
self._write_processing_log(source_id, stage_logs, success=True)
# Per-source follow-ups: evaluate and/or classify just the new
# entities this source produced, so the next commit contains a
# fully-processed chapter.
new_slugs = self._current_entity_slugs() - pre_entity_slugs
if new_slugs and (self.eval_after_source or self.classify_after_source):
self._run_per_source_followups(new_slugs)
if not self.no_commit:
self._git_commit(source_id)
@@ -636,7 +654,13 @@ class SourcePipeline:
# ── Git Integration ───────────────────────────────────────────────
def _git_commit(self, source_id: str) -> None:
"""Stage all output changes and commit them for *source_id*."""
"""Stage all output changes and commit them for *source_id*.
The commit message body summarises what actually changed — counts
of entities / evaluations / classifications / analyses added — so
``git log`` reads like the chapter-by-chapter story of the
infospace growing, not a wall of identical messages.
"""
output_dir = self.root / "output"
try:
subprocess.run(
@@ -645,11 +669,11 @@ class SourcePipeline:
check=True,
capture_output=True,
)
body = self._compose_commit_body(source_id)
result = subprocess.run(
[
"git", "commit", "-m",
f"infospace: process {source_id}\n\n"
f"Extract entities, map to VSM, and synthesize analysis.",
f"infospace: process {source_id}\n\n{body}",
],
cwd=str(self.root),
capture_output=True,
@@ -666,3 +690,146 @@ class SourcePipeline:
except subprocess.CalledProcessError as e:
stderr = e.stderr.decode() if isinstance(e.stderr, bytes) else (e.stderr or "")
print(f" Warning: Git error: {stderr.strip()}")
# ── Per-source helpers ────────────────────────────────────────────
def _current_entity_slugs(self) -> set:
"""Return the set of entity file stems currently on disk."""
entities_dir = self.root / self.config.entities_dir
if not entities_dir.is_dir():
return set()
return {p.stem for p in entities_dir.glob("*.md")}
def _run_per_source_followups(self, new_slugs: set) -> None:
"""Run per-source evaluation and/or classification on *new_slugs*.
Called after a source's pipeline stages succeed, before the git
commit, so each chapter's commit contains the full set of
artefacts derived from it.
"""
from markitect.infospace.entity_parser import parse_entity_directory
entities_dir = self.root / self.config.entities_dir
all_entities = parse_entity_directory(entities_dir)
new_entities = [e for e in all_entities if e.slug in new_slugs]
if not new_entities:
return
if self.adapter is None:
print(
" Skipping per-source eval/classify: no LLM adapter "
"configured (run with --provider)."
)
return
from markitect.prompts.execution.models import RunConfig
run_config = RunConfig(
model_name=self.model or None, temperature=0.3, max_tokens=2000
)
if self.eval_after_source:
from markitect.infospace.evaluate import run_entity_evaluation
print(f" Evaluating {len(new_entities)} new entity/entities…")
try:
run_entity_evaluation(
config=self.config,
entities=new_entities,
adapter=self.adapter,
run_config=run_config,
output_dir=self.root / self.config.evaluations_dir,
)
except Exception as exc:
print(f" Warning: per-source evaluation failed: {exc}")
if self.classify_after_source:
from markitect.infospace.classifier import run_entity_classification
print(f" Classifying {len(new_entities)} new entity/entities…")
try:
run_entity_classification(
config=self.config,
entities=new_entities,
adapter=self.adapter,
run_config=run_config,
output_dir=self.root / self.config.classifications_dir,
)
except Exception as exc:
print(f" Warning: per-source classification failed: {exc}")
def _compose_commit_body(self, source_id: str) -> str:
"""Summarise staged output changes into a commit-message body.
Counts added files per output subdirectory (entities, evaluations,
classifications, analyses, mappings…) and produces one line per
bucket that actually saw additions. Modified/deleted files are
noted separately for auditability.
"""
default = "Extract entities, map to VSM, and synthesize analysis."
try:
result = subprocess.run(
["git", "diff", "--cached", "--name-status", "--", "output"],
cwd=str(self.root),
check=True,
capture_output=True,
text=True,
)
except subprocess.CalledProcessError:
return default
added_by_bucket: Dict[str, int] = {}
modified = 0
deleted = 0
for line in result.stdout.splitlines():
parts = line.split("\t")
if len(parts) < 2:
continue
status = parts[0]
path = parts[-1]
if status.startswith("A"):
bucket = self._bucket_for(path)
if bucket:
added_by_bucket[bucket] = added_by_bucket.get(bucket, 0) + 1
elif status.startswith("M"):
modified += 1
elif status.startswith("D"):
deleted += 1
if not added_by_bucket and not modified and not deleted:
return default
# Emit buckets in a deterministic, reader-friendly order.
order = ["entities", "mappings", "analyses", "evaluations",
"classifications", "metrics", "logs", "other"]
lines: List[str] = []
for bucket in order:
n = added_by_bucket.get(bucket, 0)
if n:
lines.append(f"- {bucket}: +{n}")
if modified:
lines.append(f"- modified: {modified}")
if deleted:
lines.append(f"- deleted: {deleted}")
return "\n".join(lines) if lines else default
def _bucket_for(self, path: str) -> Optional[str]:
"""Map an ``output/...`` path to a commit-summary bucket name."""
# Use configured directory basenames where possible so non-default
# layouts still bucket correctly.
buckets = {
Path(self.config.entities_dir).name: "entities",
Path(self.config.evaluations_dir).name: "evaluations",
Path(self.config.classifications_dir).name: "classifications",
}
parts = Path(path).parts
if len(parts) < 2 or parts[0] != "output":
return None
sub = parts[1]
if sub in buckets:
return buckets[sub]
# Heuristic fallback for common additional output subdirectories.
known = {"mappings", "analyses", "metrics", "logs"}
if sub in known:
return sub
return "other"

View File

@@ -131,6 +131,12 @@ def build_state(
This is a convenience function that assembles the state object
and optionally runs viability checks if *metrics* are provided.
"""
if not isinstance(config, InfospaceConfig):
raise TypeError(
f"build_state(config=...) expects an InfospaceConfig instance, "
f"got {type(config).__name__}. If you have a path, load the "
f"config first with load_infospace_config(path)."
)
state = InfospaceState(
config=config,
entities=entities or [],

View File

@@ -12,6 +12,8 @@ Quick start::
response = adapter.execute_prompt(prompt, run_config)
"""
from markitect.llm.models import RunConfig, LLMResponse
from markitect.llm.adapter import LLMAdapter, MockLLMAdapter, ErrorLLMAdapter
from markitect.llm.factory import create_adapter
from markitect.llm.openrouter import OpenRouterAdapter
from markitect.llm.claude_code import ClaudeCodeAdapter
@@ -37,6 +39,11 @@ from markitect.llm.similarity import (
)
__all__ = [
"RunConfig",
"LLMResponse",
"LLMAdapter",
"MockLLMAdapter",
"ErrorLLMAdapter",
"create_adapter",
"OpenRouterAdapter",
"ClaudeCodeAdapter",

169
markitect/llm/adapter.py Normal file
View File

@@ -0,0 +1,169 @@
"""
LLM adapter interface for pluggable model providers.
Implements abstraction layer for LLM integration, supporting
multiple providers (OpenAI, Anthropic, local models, etc.).
"""
from abc import ABC, abstractmethod
from typing import Dict, Any
from markitect.llm.models import RunConfig, LLMResponse
class LLMAdapter(ABC):
"""
Abstract base class for LLM providers.
Enables pluggable LLM backends without prescribing implementation.
Implementations can wrap OpenAI, Anthropic, or other APIs.
"""
@abstractmethod
def execute_prompt(
self,
prompt: str,
config: RunConfig,
) -> LLMResponse:
"""
Execute a prompt with the LLM.
Args:
prompt: Compiled prompt text
config: Execution configuration
Returns:
LLMResponse with generated content
Raises:
Exception: On LLM API errors
"""
pass
@abstractmethod
def validate_config(self, config: RunConfig) -> bool:
"""
Validate that configuration is supported.
Args:
config: Configuration to validate
Returns:
True if valid, False otherwise
"""
pass
class MockLLMAdapter(LLMAdapter):
"""
Mock LLM adapter for testing.
Returns deterministic responses without calling external APIs.
"""
def __init__(self, mock_response: str = "Mock LLM response"):
"""
Initialize mock adapter.
Args:
mock_response: Response to return
"""
self.mock_response = mock_response
self.call_count = 0
self.last_prompt = None
self.last_config = None
def execute_prompt(
self,
prompt: str,
config: RunConfig,
) -> LLMResponse:
"""
Return mock response.
Args:
prompt: Prompt (stored for inspection)
config: Config (stored for inspection)
Returns:
Mock LLMResponse
"""
self.call_count += 1
self.last_prompt = prompt
self.last_config = config
return LLMResponse(
content=self.mock_response,
model=config.model_name,
usage={
"prompt_tokens": len(prompt.split()),
"completion_tokens": len(self.mock_response.split()),
"total_tokens": len(prompt.split()) + len(self.mock_response.split()),
},
finish_reason="stop",
metadata={"mock": True},
)
def validate_config(self, config: RunConfig) -> bool:
"""
Mock validation always succeeds.
Args:
config: Configuration
Returns:
Always True
"""
return True
def reset(self) -> None:
"""Reset mock state."""
self.call_count = 0
self.last_prompt = None
self.last_config = None
class ErrorLLMAdapter(LLMAdapter):
"""
Mock adapter that always raises an error.
Useful for testing error handling.
"""
def __init__(self, error_message: str = "Mock LLM error"):
"""
Initialize error adapter.
Args:
error_message: Error message to raise
"""
self.error_message = error_message
def execute_prompt(
self,
prompt: str,
config: RunConfig,
) -> LLMResponse:
"""
Raise error.
Args:
prompt: Prompt
config: Config
Raises:
RuntimeError: Always
"""
raise RuntimeError(self.error_message)
def validate_config(self, config: RunConfig) -> bool:
"""
Validation succeeds.
Args:
config: Configuration
Returns:
True
"""
return True

View File

@@ -5,8 +5,8 @@ Claude Code CLI adapter — runs the ``claude`` CLI as a subprocess.
import subprocess
from typing import Optional
from markitect.prompts.execution.llm_adapter import LLMAdapter
from markitect.prompts.execution.models import RunConfig, LLMResponse
from markitect.llm.adapter import LLMAdapter
from markitect.llm.models import RunConfig, LLMResponse
from markitect.llm.config import LLMConfig
from markitect.llm._token_estimator import estimate_tokens
from markitect.llm.exceptions import (

View File

@@ -4,7 +4,7 @@ Factory for creating LLM adapters by provider name.
from typing import Optional, Dict, Any
from markitect.prompts.execution.llm_adapter import LLMAdapter
from markitect.llm.adapter import LLMAdapter
from markitect.llm.exceptions import LLMConfigurationError
# Lazy imports to avoid pulling in every adapter at module load time.

View File

@@ -5,11 +5,15 @@ Google Gemini adapter — calls the Generative Language REST API directly.
import time
from typing import Optional, Dict, Any
from markitect.prompts.execution.llm_adapter import LLMAdapter
from markitect.prompts.execution.models import RunConfig, LLMResponse
from markitect.llm.adapter import LLMAdapter
from markitect.llm.models import RunConfig, LLMResponse
from markitect.llm.config import resolve_api_key, find_project_root
from markitect.llm._http import post_json
from markitect.llm.exceptions import LLMConfigurationError
from markitect.llm.exceptions import (
LLMConfigurationError,
LLMAPIError,
LLMRateLimitError,
)
_DEFAULT_MODEL = "gemini-2.5-flash"
_API_BASE = "https://generativelanguage.googleapis.com/v1beta"
@@ -26,10 +30,12 @@ class GeminiAdapter(LLMAdapter):
model: Optional[str] = None,
api_key: Optional[str] = None,
system_prompt: Optional[str] = None,
max_retries: int = 3,
**_kwargs: Any,
):
self._model = model or _DEFAULT_MODEL
self._system_prompt = system_prompt
self._max_retries = max_retries
root = find_project_root()
key_file_paths = [root / "apikey-geminifree.txt"] if root else []
@@ -77,7 +83,7 @@ class GeminiAdapter(LLMAdapter):
url = f"{_API_BASE}/models/{model}:generateContent?key={self._api_key}"
start = time.time()
data = post_json(url, payload, timeout=config.timeout_seconds)
data = self._post_with_retries(url, payload, timeout=config.timeout_seconds)
latency = time.time() - start
# Parse Gemini response
@@ -113,3 +119,27 @@ class GeminiAdapter(LLMAdapter):
if not (0.0 <= config.temperature <= 2.0):
return False
return True
# ── Internals ───────────────────────────────────────────────────
def _post_with_retries(
self,
url: str,
payload: Dict[str, Any],
timeout: int,
) -> Dict[str, Any]:
last_exc: Optional[Exception] = None
for attempt in range(self._max_retries + 1):
try:
return post_json(url, payload, timeout=timeout)
except LLMRateLimitError as exc:
last_exc = exc
if attempt < self._max_retries:
time.sleep(2 ** attempt)
except LLMAPIError as exc:
if exc.status_code in (502, 503, 504) and attempt < self._max_retries:
last_exc = exc
time.sleep(2 ** attempt)
else:
raise
raise last_exc # type: ignore[misc]

86
markitect/llm/models.py Normal file
View File

@@ -0,0 +1,86 @@
"""
Shared data models for LLM execution.
These classes are the canonical definitions; they are re-exported by
markitect.prompts.execution.models for backward compatibility.
"""
from dataclasses import dataclass, field
from typing import Dict, Any
@dataclass
class RunConfig:
"""
Configuration for prompt execution.
Attributes:
model_name: LLM model to use
temperature: Model temperature (0.0-1.0)
max_tokens: Maximum tokens to generate
model_params: Additional model parameters
max_depth: Maximum generation depth for nested runs
skip_if_exists: Skip if identical InputBundleHash exists
timeout_seconds: Execution timeout
"""
model_name: str = "gpt-4"
temperature: float = 0.7
max_tokens: int = 2000
model_params: Dict[str, Any] = field(default_factory=dict)
max_depth: int = 3
skip_if_exists: bool = True
timeout_seconds: int = 300
def to_dict(self) -> Dict[str, Any]:
"""Convert to dictionary."""
return {
"model_name": self.model_name,
"temperature": self.temperature,
"max_tokens": self.max_tokens,
"model_params": self.model_params,
"max_depth": self.max_depth,
"skip_if_exists": self.skip_if_exists,
"timeout_seconds": self.timeout_seconds,
}
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "RunConfig":
"""Create from dictionary."""
return cls(
model_name=data.get("model_name", "gpt-4"),
temperature=data.get("temperature", 0.7),
max_tokens=data.get("max_tokens", 2000),
model_params=data.get("model_params", {}),
max_depth=data.get("max_depth", 3),
skip_if_exists=data.get("skip_if_exists", True),
timeout_seconds=data.get("timeout_seconds", 300),
)
@dataclass
class LLMResponse:
"""
Response from LLM execution.
Attributes:
content: Generated content
model: Model used
usage: Token usage statistics
finish_reason: Why generation stopped
metadata: Additional response metadata
"""
content: str
model: str
usage: Dict[str, int] = field(default_factory=dict)
finish_reason: str = "stop"
metadata: Dict[str, Any] = field(default_factory=dict)
def to_dict(self) -> Dict[str, Any]:
"""Convert to dictionary."""
return {
"content": self.content,
"model": self.model,
"usage": self.usage,
"finish_reason": self.finish_reason,
"metadata": self.metadata,
}

View File

@@ -5,8 +5,8 @@ OpenAI (ChatGPT) adapter — calls the OpenAI chat completions API.
import time
from typing import Optional, Dict, Any
from markitect.prompts.execution.llm_adapter import LLMAdapter
from markitect.prompts.execution.models import RunConfig, LLMResponse
from markitect.llm.adapter import LLMAdapter
from markitect.llm.models import RunConfig, LLMResponse
from markitect.llm.config import resolve_api_key, find_project_root
from markitect.llm._http import post_json
from markitect.llm.exceptions import (

View File

@@ -5,8 +5,8 @@ OpenRouter adapter — calls the OpenAI-compatible chat completions API.
import time
from typing import Optional, Dict, Any
from markitect.prompts.execution.llm_adapter import LLMAdapter
from markitect.prompts.execution.models import RunConfig, LLMResponse
from markitect.llm.adapter import LLMAdapter
from markitect.llm.models import RunConfig, LLMResponse
from markitect.llm.config import LLMConfig, resolve_api_key, find_project_root
from markitect.llm._http import post_json
from markitect.llm.exceptions import (

View File

@@ -28,13 +28,28 @@ from markitect.llm.config import find_project_root
HARDCODED_PROVIDER = "gemini"
HARDCODED_MODEL = "gemini-2.5-flash"
MODEL_ENV_VAR = "MARKITECT_HELPER_MODEL"
# Default (markitect) values kept for backward compatibility.
MODEL_ENV_VAR = "MARKITECT_HELPER_MODEL"
USER_CONFIG_DIR = Path.home() / ".config" / "markitect"
USER_CONFIG_PATH = USER_CONFIG_DIR / "config.toml"
DIR_CONFIG_NAME = ".markitect.toml"
# ── App-name helpers ───────────────────────────────────────────────────────
def _model_env_var(app_name: str) -> str:
return f"{app_name.upper()}_HELPER_MODEL"
def _user_config_path(app_name: str) -> Path:
return Path.home() / ".config" / app_name / "config.toml"
def _dir_config_name(app_name: str) -> str:
return f".{app_name}.toml"
# ── Data classes ──────────────────────────────────────────────────────────
@dataclass
@@ -114,11 +129,11 @@ def _clear_llm_section(path: Path, section: str) -> bool:
# ── Directory config path helper ─────────────────────────────────────────
def _dir_config_path() -> Optional[Path]:
def _dir_config_path(app_name: str = "markitect") -> Optional[Path]:
root = find_project_root()
if root is None:
return None
return root / DIR_CONFIG_NAME
return root / _dir_config_name(app_name)
# ── Resolution ───────────────────────────────────────────────────────────
@@ -126,13 +141,23 @@ def _dir_config_path() -> Optional[Path]:
def resolve_llm(
cli_provider: Optional[str] = None,
cli_model: Optional[str] = None,
app_name: str = "markitect",
) -> ResolvedLLM:
"""Walk the 7-level priority chain and return a fully resolved config.
Provider and model are resolved independently — each takes the value
from its highest-priority source.
Args:
cli_provider: Provider override from CLI.
cli_model: Model override from CLI.
app_name: Application name used to derive config paths and the
env-var prefix (e.g. ``"railiance"`` → ``RAILIANCE_HELPER_MODEL``
and ``~/.config/railiance/config.toml``).
"""
dir_path = _dir_config_path()
dir_path = _dir_config_path(app_name)
user_cfg = _user_config_path(app_name)
env_var = _model_env_var(app_name)
# Build the layers (highest priority first).
layers: list[tuple[str, LLMLayer]] = []
@@ -141,13 +166,13 @@ def resolve_llm(
layers.append(("CLI flag", LLMLayer(provider=cli_provider, model=cli_model)))
# 2. Env var (model only)
env_model = os.environ.get(MODEL_ENV_VAR) or None
layers.append(("env MARKITECT_HELPER_MODEL", LLMLayer(model=env_model)))
env_model = os.environ.get(env_var) or None
layers.append((f"env {env_var}", LLMLayer(model=env_model)))
# 3. User preference
layers.append((
"user preference",
_read_llm_section(USER_CONFIG_PATH, "preference"),
_read_llm_section(user_cfg, "preference"),
))
# 4. Directory preference
@@ -167,7 +192,7 @@ def resolve_llm(
# 6. User default
layers.append((
"user default",
_read_llm_section(USER_CONFIG_PATH, "default"),
_read_llm_section(user_cfg, "default"),
))
# 7. Hardcoded
@@ -199,20 +224,22 @@ def resolve_llm(
)
def get_default_layers() -> list[tuple[str, LLMLayer]]:
def get_default_layers(app_name: str = "markitect") -> list[tuple[str, LLMLayer]]:
"""Return only the default layers for display."""
dir_path = _dir_config_path()
dir_path = _dir_config_path(app_name)
user_cfg = _user_config_path(app_name)
dir_cfg_name = _dir_config_name(app_name)
layers: list[tuple[str, LLMLayer]] = []
if dir_path:
layers.append((
f"Directory default ({DIR_CONFIG_NAME})",
f"Directory default ({dir_cfg_name})",
_read_llm_section(dir_path, "default"),
))
layers.append((
f"User default ({USER_CONFIG_PATH})",
_read_llm_section(USER_CONFIG_PATH, "default"),
f"User default ({user_cfg})",
_read_llm_section(user_cfg, "default"),
))
layers.append((
@@ -223,19 +250,21 @@ def get_default_layers() -> list[tuple[str, LLMLayer]]:
return layers
def get_preference_layers() -> list[tuple[str, LLMLayer]]:
def get_preference_layers(app_name: str = "markitect") -> list[tuple[str, LLMLayer]]:
"""Return only the preference layers for display."""
dir_path = _dir_config_path()
dir_path = _dir_config_path(app_name)
user_cfg = _user_config_path(app_name)
dir_cfg_name = _dir_config_name(app_name)
layers: list[tuple[str, LLMLayer]] = []
layers.append((
f"User preference ({USER_CONFIG_PATH})",
_read_llm_section(USER_CONFIG_PATH, "preference"),
f"User preference ({user_cfg})",
_read_llm_section(user_cfg, "preference"),
))
if dir_path:
layers.append((
f"Directory preference ({DIR_CONFIG_NAME})",
f"Directory preference ({dir_cfg_name})",
_read_llm_section(dir_path, "preference"),
))

View File

@@ -1,169 +1,9 @@
"""
LLM adapter interface for pluggable model providers.
Re-exports from markitect.llm.adapter for backward compatibility.
Implements abstraction layer for LLM integration, supporting
multiple providers (OpenAI, Anthropic, local models, etc.).
The LLM adapter interface was moved to markitect.llm.adapter in v1.1.
"""
from abc import ABC, abstractmethod
from typing import Dict, Any
from markitect.llm.adapter import LLMAdapter, MockLLMAdapter, ErrorLLMAdapter
from markitect.prompts.execution.models import RunConfig, LLMResponse
class LLMAdapter(ABC):
"""
Abstract base class for LLM providers.
Enables pluggable LLM backends without prescribing implementation.
Implementations can wrap OpenAI, Anthropic, or other APIs.
"""
@abstractmethod
def execute_prompt(
self,
prompt: str,
config: RunConfig,
) -> LLMResponse:
"""
Execute a prompt with the LLM.
Args:
prompt: Compiled prompt text
config: Execution configuration
Returns:
LLMResponse with generated content
Raises:
Exception: On LLM API errors
"""
pass
@abstractmethod
def validate_config(self, config: RunConfig) -> bool:
"""
Validate that configuration is supported.
Args:
config: Configuration to validate
Returns:
True if valid, False otherwise
"""
pass
class MockLLMAdapter(LLMAdapter):
"""
Mock LLM adapter for testing.
Returns deterministic responses without calling external APIs.
"""
def __init__(self, mock_response: str = "Mock LLM response"):
"""
Initialize mock adapter.
Args:
mock_response: Response to return
"""
self.mock_response = mock_response
self.call_count = 0
self.last_prompt = None
self.last_config = None
def execute_prompt(
self,
prompt: str,
config: RunConfig,
) -> LLMResponse:
"""
Return mock response.
Args:
prompt: Prompt (stored for inspection)
config: Config (stored for inspection)
Returns:
Mock LLMResponse
"""
self.call_count += 1
self.last_prompt = prompt
self.last_config = config
return LLMResponse(
content=self.mock_response,
model=config.model_name,
usage={
"prompt_tokens": len(prompt.split()),
"completion_tokens": len(self.mock_response.split()),
"total_tokens": len(prompt.split()) + len(self.mock_response.split()),
},
finish_reason="stop",
metadata={"mock": True},
)
def validate_config(self, config: RunConfig) -> bool:
"""
Mock validation always succeeds.
Args:
config: Configuration
Returns:
Always True
"""
return True
def reset(self) -> None:
"""Reset mock state."""
self.call_count = 0
self.last_prompt = None
self.last_config = None
class ErrorLLMAdapter(LLMAdapter):
"""
Mock adapter that always raises an error.
Useful for testing error handling.
"""
def __init__(self, error_message: str = "Mock LLM error"):
"""
Initialize error adapter.
Args:
error_message: Error message to raise
"""
self.error_message = error_message
def execute_prompt(
self,
prompt: str,
config: RunConfig,
) -> LLMResponse:
"""
Raise error.
Args:
prompt: Prompt
config: Config
Raises:
RuntimeError: Always
"""
raise RuntimeError(self.error_message)
def validate_config(self, config: RunConfig) -> bool:
"""
Validation succeeds.
Args:
config: Configuration
Returns:
True
"""
return True
__all__ = ["LLMAdapter", "MockLLMAdapter", "ErrorLLMAdapter"]

View File

@@ -12,6 +12,7 @@ from typing import Dict, Any, List, Optional
from enum import Enum
from markitect.prompts.models import calculate_bundle_digest
from markitect.llm.models import RunConfig, LLMResponse # canonical; re-exported here
class ExecutionStage(Enum):
@@ -37,54 +38,6 @@ class RunStatus(Enum):
SKIPPED = "skipped" # Skipped due to identical InputBundleHash
@dataclass
class RunConfig:
"""
Configuration for prompt execution.
Attributes:
model_name: LLM model to use
temperature: Model temperature (0.0-1.0)
max_tokens: Maximum tokens to generate
model_params: Additional model parameters
max_depth: Maximum generation depth for nested runs
skip_if_exists: Skip if identical InputBundleHash exists (FR-4.4)
timeout_seconds: Execution timeout
"""
model_name: str = "gpt-4"
temperature: float = 0.7
max_tokens: int = 2000
model_params: Dict[str, Any] = field(default_factory=dict)
max_depth: int = 3
skip_if_exists: bool = True
timeout_seconds: int = 300
def to_dict(self) -> Dict[str, Any]:
"""Convert to dictionary."""
return {
"model_name": self.model_name,
"temperature": self.temperature,
"max_tokens": self.max_tokens,
"model_params": self.model_params,
"max_depth": self.max_depth,
"skip_if_exists": self.skip_if_exists,
"timeout_seconds": self.timeout_seconds,
}
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "RunConfig":
"""Create from dictionary."""
return cls(
model_name=data.get("model_name", "gpt-4"),
temperature=data.get("temperature", 0.7),
max_tokens=data.get("max_tokens", 2000),
model_params=data.get("model_params", {}),
max_depth=data.get("max_depth", 3),
skip_if_exists=data.get("skip_if_exists", True),
timeout_seconds=data.get("timeout_seconds", 300),
)
@dataclass
class InputBundle:
"""
@@ -151,35 +104,6 @@ class InputBundle:
}
@dataclass
class LLMResponse:
"""
Response from LLM execution.
Attributes:
content: Generated content
model: Model used
usage: Token usage statistics
finish_reason: Why generation stopped
metadata: Additional response metadata
"""
content: str
model: str
usage: Dict[str, int] = field(default_factory=dict)
finish_reason: str = "stop"
metadata: Dict[str, Any] = field(default_factory=dict)
def to_dict(self) -> Dict[str, Any]:
"""Convert to dictionary."""
return {
"content": self.content,
"model": self.model,
"usage": self.usage,
"finish_reason": self.finish_reason,
"metadata": self.metadata,
}
@dataclass
class PromptRun:
"""

4
package-lock.json generated
View File

@@ -1,11 +1,11 @@
{
"name": "markitect_project",
"name": "markitect-main",
"version": "1.0.0",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "markitect_project",
"name": "markitect-main",
"version": "1.0.0",
"license": "ISC",
"dependencies": {

View File

@@ -1,5 +1,5 @@
{
"name": "markitect_project",
"name": "markitect-main",
"version": "1.0.0",
"description": "",
"main": "index.js",
@@ -14,7 +14,7 @@
},
"repository": {
"type": "git",
"url": "http://92.205.130.254:32166/coulomb/markitect_project"
"url": "http://92.205.130.254:32166/coulomb/markitect-main"
},
"keywords": [],
"author": "",

View File

@@ -18,6 +18,9 @@ dependencies = [
"aiohttp>=3.8.0",
"toml",
# Extracted LLM adapter library (standalone repo)
"llm-connect @ file:///home/worsch/llm-connect",
# Core capabilities (required for basic functionality)
"release-management @ file:./capabilities/release-management",
"testdrive-jsui @ file:./capabilities/testdrive-jsui",

12
registry/README.md Normal file
View File

@@ -0,0 +1,12 @@
# Capability Registry
Markdown-first capability index for federation and reuse planning.
## Authoring
1. Copy a capability entry template (see reuse-surface `templates/capability-entry.template.md`).
2. Add the row to `indexes/capabilities.yaml`.
3. Run `reuse-surface validate` from a checkout with the CLI installed.
4. Merge to `main` and verify publish with `reuse-surface establish --publish-check`.
Federation contract: reuse-surface `docs/RegistryFederation.md`.

View File

View File

@@ -0,0 +1,4 @@
version: 1
updated: '2026-06-16'
domain: helix_forge
capabilities: []

View File

@@ -0,0 +1,202 @@
# Infospace Tooling — Stage 3 Close-out
## Context
Stages 1 and 2 of the infospace tooling roadmap are complete. Stage 3 used the
Wealth of Nations / VSM example to validate the tooling end-to-end. Most of S3
is done; this workstream finishes the remaining tasks, addresses deferred cleanup,
and formally closes the roadmap.
**Parent roadmap:** `roadmap/infospace-tooling/PLAN.md`
**Example location:** `examples/infospace-with-history/`
**Status: CLOSED (2026-04-22).** All acceptance criteria except the cosmetic
per-chapter history (C.7) are met. Final metrics: 988 entities, 988 evaluations,
6/6 viability thresholds PASS (`per_entity_mean = 3.957`). Tooling work that
came out of this close-out landed as commits `c0615c2d` (gemini retry,
unified skip-existing, non-destructive metrics I/O) and `d44a4cd3`
(`infospace entity` lookup, `evaluate --model-fallback`, `llm-check`
stale-key advisory, `build_state` type guard).
### State at workstream open (2026-02-26)
| Item | Status |
|------|--------|
| S3.1 Migrate example to infospace config | ✅ Done |
| S3.3 Per-entity eval batch | ✅ 985/988 complete; metrics.yaml updated |
| S3.4 Tutorial rewrite | ✅ Done |
| S3.5 Supply-chain-vsm composition demo | ✅ Done |
| S3.2 Clean per-chapter git history | ⏳ Deferred — included here |
| 3 missing evaluations | ⏳ Outstanding |
| 4 follow-up items (commit b055c8d7) | ⏳ Outstanding |
### State at workstream close (2026-04-22)
| Task | Status |
|------|--------|
| C.1 Complete 3 missing entity evaluations | ✅ Done (commit f325f89d) |
| C.2 Run eval-summary and verify viability | ✅ Done — 6/6 PASS |
| C.3 Refresh metrics report (988 entities) | ✅ Done — snapshot `090bb961` |
| C.4 Document advanced usage patterns | ✅ Done — `examples/infospace-with-history/docs/advanced-usage.md` |
| C.5 Composition-examples documentation | ✅ Done — `docs/composition-guide.md` |
| C.6 Performance benchmarking note | ✅ Done — `examples/infospace-with-history/docs/performance-notes.md` |
| C.7 Clean per-chapter git history | ⏭️ Deferred indefinitely — see note below |
| C.8 Formally close S3 roadmap | ✅ This commit |
**C.7 disposition.** The task assumed a pre-existing `clean-example-history`
branch with chapters 18 already committed; that branch no longer exists in
the repo. The task is explicitly cosmetic ("does not change output files"),
and the output files themselves are canonical. Reconstructing a 35-commit
per-chapter history from scratch would be archaeological rather than useful.
Closing as "won't do" unless a specific archival need surfaces. If revisited,
entities can be grouped by their `## Source Chapter` markdown section to
reconstruct chapter membership.
---
## Tasks
### C.1 — Complete the 3 missing entity evaluations
985 of 988 entities have evaluation files. Identify and evaluate the remaining 3.
```bash
cd examples/infospace-with-history
# Identify missing slugs
comm -23 \
<(ls output/entities/*.md | xargs -I{} basename {} .md | sort) \
<(ls output/evaluations/*.md | xargs -I{} basename {} .md | sort)
# Evaluate each missing entity individually
markitect infospace evaluate --entity <slug> --provider openrouter
```
**Acceptance:** `ls output/evaluations/*.md | wc -l` returns 988.
---
### C.2 — Run eval-summary and verify viability
Run the aggregation command to update per_entity_mean from all 988 evaluations,
then check all 6 viability gates pass.
```bash
cd examples/infospace-with-history
unset OPENROUTER_API_KEY # stale env var guard
markitect infospace eval-summary --update-metrics
markitect infospace viability
```
Current sample reading (985 entities): `per_entity_mean = 3.956` against threshold 3.5.
Expected: all 6 metrics pass.
**Acceptance:** `markitect infospace viability` exits 0 and shows 6/6 PASS.
---
### C.3 — Refresh the metrics report
The metrics report was generated from chapters 14 only. Regenerate it from
the full 988-entity set.
```bash
cd examples/infospace-with-history
markitect infospace check --provider openrouter # or reuse existing check outputs
markitect infospace history # confirm snapshot recorded
```
**Acceptance:** `output/metrics/metrics.yaml` reflects all 988 entities; a dated
snapshot exists in the metrics history.
---
### C.4 — Document advanced usage patterns
Write `examples/infospace-with-history/docs/advanced-usage.md` covering:
- Incremental evaluation (adding entities after initial run, skip-if-exists behaviour)
- Re-evaluating after guideline changes (`--force` flag)
- Interpreting per-entity score distributions and identifying outliers
- Using `markitect infospace entities --sort-by score` to triage low scorers
- Reading and acting on collection check outputs (redundancy pairs, coverage gaps)
**Acceptance:** File exists with ≥ 4 documented patterns, each with a worked command example.
---
### C.5 — Add composition examples to documentation
Document how the supply-chain-vsm example (`examples/supply-chain-vsm/`) demonstrates
composition. Add a `docs/composition-guide.md` covering:
- What composition means (discipline binding)
- How supply-chain-vsm binds WoN as a discipline
- How to create a new infospace that uses an existing one as a discipline
- Viability requirement: the discipline must pass its own thresholds before binding
Reference `examples/supply-chain-vsm/` throughout.
**Acceptance:** `docs/composition-guide.md` exists and links to supply-chain-vsm.
---
### C.6 — Performance benchmarking note
Rather than a full benchmarking guide (out of scope for a 988-entity example),
record observed timings in a `docs/performance-notes.md`:
- Eval batch duration (~4 hrs for 988 entities via OpenRouter)
- Tokens per entity (rough estimate from usage logs)
- Embedding cache hit rate after first run
- Recommendation: provider choice (OpenRouter vs Gemini) for different dataset sizes
**Acceptance:** File exists with at least 4 concrete measurements or estimates.
---
### C.7 — S3.2: Clean per-chapter git history (deferred cleanup)
Create a clean branch where each of the 35 processed chapters has its own commit.
Chapters 18 are already done on branch `clean-example-history`; 27 remain.
This is a cosmetic/archival task — it does not change output files.
```bash
git checkout clean-example-history
# For each remaining chapter (935):
# cherry-pick or re-commit the chapter output files with a per-chapter message
git log --oneline clean-example-history # verify 35 chapter commits
```
**Acceptance:** Branch `clean-example-history` has exactly 35 chapter commits
(one per chapter), rebased onto current main.
**Note:** This task can be done independently of C.1C.6. Low urgency — do last.
---
### C.8 — Formally close the S3 roadmap
Update `roadmap/infospace-tooling/PLAN.md` to mark all S3 tasks as complete.
Add a close-out summary at the top of the file with final metrics and date.
Commit with a `docs(roadmap)` message.
**Acceptance:** PLAN.md header shows all stages complete; committed to main.
---
## Task order
```
C.1 → C.2 → C.3
C.4, C.5, C.6 (parallel)
C.8
C.7 (independent, do last)
```
## Out of scope
- Adding new entities or chapters (the WoN example is complete at 988 entities)
- Re-running collection checks from scratch (existing results are valid)
- Publishing the example as a standalone dataset

View File

@@ -1,5 +1,31 @@
# Viable Infospace Tooling — Roadmap
## Status: CLOSED (2026-04-22)
All three stages complete.
| Stage | Status | Notes |
|-------|--------|-------|
| Stage 1 — Platform additions (S1.1S1.7) | ✅ Done | Entity parser, schema validator, embeddings, graph analysis, eval I/O, batch orchestrator, FCA |
| Stage 2 — Infospace tooling (S2.1S2.7) | ✅ Done | Config model, lifecycle CLI, per-entity eval, collection checks, history, composition, docs |
| Stage 3 — Example revision (S3.1S3.5) | ✅ Done (except cosmetic S3.2) | See `roadmap/infospace-s3-closeout/PLAN.md` |
**Final validation (Wealth of Nations / VSM example, 988 entities):**
- 988 per-entity evaluations landed
- Collection checks pass 6/6 viability thresholds (`per_entity_mean = 3.957`
against threshold 3.5; `redundancy_ratio = 0.006`; `coverage_ratio = 0.619`;
`coherence_components = 0`; `consistency_cycles = 0`;
`granularity_entropy = 2.675`)
- Composition demonstrated via `examples/supply-chain-vsm/`
- S3.2 (clean per-chapter git history) deferred as cosmetic-only; rationale
in the close-out plan
See `roadmap/infospace-s3-closeout/PLAN.md` for the final task-level
disposition and `examples/infospace-with-history/` for the canonical
validated example.
---
## Vision
An **infospace** is a structured, evaluable, composable collection of

View File

@@ -0,0 +1,214 @@
# LLM Adapter Layer — Extract as Shared Library
## Vision
The `markitect.llm` module is a clean, stdlib-only adapter layer for calling
LLMs via OpenRouter, Gemini, OpenAI, and the Claude Code CLI. It implements a
uniform interface, a 7-layer TOML config chain, embedding support with caching,
and typed exceptions. It should be usable by all projects in the Bernd Worsch
ecosystem without pulling in all of markitect.
This roadmap tracks extracting it into a standalone installable library.
---
## Current State
The module lives at `markitect/llm/` (~16 files, ~1500 LOC, stdlib-only) and
provides:
- **4 text adapters**: OpenRouter, Gemini, OpenAI, Claude Code CLI
- **2 embedding adapters**: OpenAI-compatible (OpenAI + OpenRouter)
- **Embedding cache**: JSON-backed, content-digest validated
- **Similarity utilities**: pure-Python cosine similarity, matrix, pair-finding
- **7-layer TOML config chain**: CLI > env > user/dir preference/default > hardcoded
- **Typed exceptions**: LLMError hierarchy
- **HTTP wrapper**: urllib-only, typed exception translation
### Two Coupling Issues Blocking Clean Extraction
| Issue | Location | Severity |
|-------|----------|----------|
| `RunConfig` and `LLMResponse` are defined in `markitect.prompts.execution.models`, not in `markitect.llm` | `markitect/prompts/execution/models.py` | High — creates cross-module import for all consumers |
| TOML config chain hardcodes `"markitect"` as app name (paths: `~/.config/markitect/`, env prefix `MARKITECT_`, files: `.markitect.toml`) | `markitect/llm/toml_config.py` | Medium — consumers either accept markitect config or can't use the chain |
---
## Terminology
- **adapter**: concrete implementation of `LLMAdapter` for a single provider
- **factory**: `create_adapter()` / `create_embedding_adapter()` — provider-agnostic entry points
- **config chain**: 7-layer resolution of provider + model (CLI → env → TOML → hardcoded)
- **standalone library**: a Python package installable with `pip install` from a git URL or local path, without PyPI
- **consumer**: any project that imports and uses the library (markitect itself, custodian, railiance, etc.)
---
## Packaging Decision (Pending)
Before Phase 2 starts, one architectural decision must be resolved:
> **D1: Where does the extracted library live?**
>
> **Option A — Standalone repo** (`~/bw-llm` or similar):
> - Clean separation, versioned independently, installable via `pip install git+file:///...` or git URL
> - Adds a repo to maintain; changes require bumping version in dependents
>
> **Option B — Subfolder of markitect with own `pyproject.toml`** (monorepo-lite):
> - Stays co-located with the main codebase that will use it most
> - Less friction for iteration; single git history
> - Slightly unorthodox but valid for personal infrastructure
>
> **Option C — Just `pip install markitect` in other projects**:
> - Zero extraction work; reuse today
> - Pulls all of markitect (prompts, infospace, CLI, etc.) as transitive deps
> - Acceptable short-term if other projects are small
---
## Stages
### Stage 1 — Decouple (within markitect)
Prepare the module for extraction without changing its public API.
#### S1.1 — Move RunConfig + LLMResponse into markitect.llm
`RunConfig` and `LLMResponse` are currently in `markitect.prompts.execution.models`.
The LLM adapters import from there, creating a hard dependency on the prompt system.
**Work:**
- Move both dataclasses to `markitect/llm/models.py`
- Update all imports in `markitect.llm` and `markitect.prompts`
- Keep a re-export shim in `markitect.prompts.execution.models` for backwards compat
**Acceptance:** `markitect/llm/` has zero imports from `markitect.prompts.*`
#### S1.2 — Parameterize the TOML config chain
Replace the hardcoded `"markitect"` app name with a configurable `app_name` parameter.
**Work:**
- Add `app_name: str = "markitect"` parameter to `resolve_llm()` and the config
path helpers in `toml_config.py`
- Derive config file path (`~/.config/{app_name}/config.toml`), env prefix
(`{APP_NAME}_HELPER_MODEL`), and local config file (`.{app_name}.toml`) from it
- All existing behaviour is preserved when `app_name="markitect"` (default)
**Acceptance:** A consumer can call `resolve_llm(app_name="railiance")` and get
config from `~/.config/railiance/config.toml` and `RAILIANCE_HELPER_MODEL`.
#### S1.3 — Isolation tests
Write a test file that imports only from `markitect.llm.*` and verifies no
accidental coupling remains.
**Acceptance:** `pytest tests/test_llm_isolation.py` passes; no import of
`markitect.prompts` or `markitect.infospace` in the LLM module tree.
---
### Stage 2 — Extract
#### S2.1 — Resolve D1: packaging location
Record the decision and create the package scaffold.
**Acceptance:** D1 resolved, `pyproject.toml` for the library exists at the
chosen location with name, version `0.1.0`, and declared dependencies.
#### S2.2 — Create standalone package
Move (or symlink) the llm module into the new package structure. Wire up
the `pyproject.toml` entry points. Verify `pip install -e <path>` works.
**Files to carry over:**
```
llm/
__init__.py # re-exports: create_adapter, create_embedding_adapter,
# LLMAdapter, EmbeddingAdapter, LLMConfig, exceptions
models.py # RunConfig, LLMResponse (moved from S1.1)
config.py # load_config, resolve_api_key
toml_config.py # resolve_llm (parameterized from S1.2)
factory.py # create_adapter
exceptions.py # LLM exception hierarchy
openrouter.py
claude_code.py
gemini.py
openai.py
embedding_adapter.py
embedding_openai.py
embedding_factory.py # create_embedding_adapter
embedding_cache.py
similarity.py
_http.py
_token_estimator.py
```
**Acceptance:** `python -c "from bw_llm import create_adapter; print('ok')"` works
in a fresh venv with only the new package installed.
#### S2.3 — Update markitect to depend on extracted package
Replace `markitect/llm/` with an import alias pointing to the new package, or
add the package as a path dependency in markitect's `pyproject.toml`.
**Acceptance:** All markitect tests pass; `markitect/llm/__init__.py` is either
removed or becomes a thin re-export of `bw_llm`.
#### S2.4 — Integration smoke test
Run the full markitect infospace pipeline (entity extraction + evaluation) end-to-end
against a small fixture to confirm nothing broke.
**Acceptance:** `markitect infospace evaluate --dry-run` succeeds on a 3-entity fixture.
---
### Stage 3 — Adopt in First Consumer
#### S3.1 — Integrate in one other project
Pick the first real consumer (likely the custodian state-hub, for LLM-assisted
state summaries or decision rationale generation) and wire up the library.
**Work:**
- Add `bw-llm` (or equivalent) as a dependency
- Write a small usage example (e.g., `llm_helper.py`)
- Confirm config chain works with the consumer's own app name
#### S3.2 — Usage guide
Write `README.md` for the library covering:
- Installation (local path / git URL)
- Supported providers and env vars
- TOML config file locations and format
- `create_adapter()` / `create_embedding_adapter()` quick-start
- Error handling
**Acceptance:** Another developer (or agent) can follow the README to use the library
in a new project without reading source code.
---
## Stage Summary
| Stage | Description | Key Deliverable | Blocks |
|-------|-------------|-----------------|--------|
| S1.1 | Move RunConfig/LLMResponse to llm | Zero cross-module deps | S2.2 |
| S1.2 | Parameterize app name | Configurable config chain | S2.2 |
| S1.3 | Isolation tests | Green test suite | S2.1 |
| S2.1 | Resolve packaging decision (D1) | pyproject.toml scaffold | S2.2 |
| S2.2 | Create standalone package | `pip install` works | S2.3 |
| S2.3 | Update markitect | markitect uses extracted lib | S2.4 |
| S2.4 | Integration smoke test | Full pipeline passes | S3.1 |
| S3.1 | First consumer integration | Library used in real project | S3.2 |
| S3.2 | Usage guide | README published | — |
---
## Out of Scope
- Publishing to PyPI (unnecessary for personal infrastructure; git/local installs suffice)
- Adding new LLM providers (separate concern)
- Porting the helper CLI to the library (the CLI is markitect-specific)
- Async adapters (current sync interface is sufficient; can be added later)

View File

@@ -0,0 +1,176 @@
# TestDrive-JSUI — npm Publication
## Context
TestDrive-JSUI is a JavaScript-first markdown editor library living at
`capabilities/testdrive-jsui/`. Phases 16 (build system, bundling, testing,
migration) are complete. 84 tests pass (68 JS + 15 Python + 1 fixes).
Single source of truth: `capabilities/testdrive-jsui/js/`.
This workstream covers the remaining work to publish the library to npm and
close out the capability.
**Source:** `capabilities/testdrive-jsui/TODO.md` (Phases 79)
**Package name:** `testdrive-jsui` (to be confirmed in P.1)
**Current version:** 1.0.0
---
## Tasks
### P.1 — Pre-publication: decide repository structure
The library currently lives inside the markitect monorepo. Before publishing to
npm, decide whether it ships from here or from a dedicated repo.
**Options:**
- A: Publish directly from `capabilities/testdrive-jsui/` — simpler, no repo split
- B: Extract to a standalone `testdrive-jsui` repo — cleaner for npm consumers
Record the decision and proceed accordingly.
**Acceptance:** Decision recorded; if B, standalone repo created and code copied.
---
### P.2 — Pre-publication: verify Markitect integration
Confirm the main Markitect application still works correctly with the current
capability code before publishing.
```bash
cd /home/worsch/markitect-main
make testdrive-jsui-test-all # 84 tests must pass
# Manually verify view and edit modes in the running Markitect app
```
**Acceptance:** All 84 tests pass; view and edit modes confirmed working.
---
### P.3 — Pre-publication: decide STANDALONE_PLAN.md
`STANDALONE_PLAN.md` exists in the capability but its status is unclear. Either:
- Implement it (if it describes meaningful standalone work)
- Explicitly archive it with a note that the standalone use case is covered by the npm package
**Acceptance:** File updated with a clear status note; or deleted if obsolete.
---
### P.4 — Pre-publication: pack and dry-run
Run the full pre-publish checklist.
```bash
cd capabilities/testdrive-jsui
npm run lint # zero errors
npm test # all 84 tests pass
npm run build:prod # clean production build
npm pack # creates testdrive-jsui-1.0.0.tgz
npm install ./testdrive-jsui-1.0.0.tgz --dry-run # verify install
npm publish --dry-run # verify what will be published
```
Review `--dry-run` output: confirm only intended files are included (check
`.npmignore` or `files` field in `package.json`).
**Acceptance:** `npm publish --dry-run` succeeds with expected file list; no
test files, source maps, or internal docs included unintentionally.
---
### P.5 — Pre-publication: create release tag
```bash
git tag -a v1.0.0 -m "Release testdrive-jsui v1.0.0"
# (push tag to remote when ready)
```
**Acceptance:** Tag `v1.0.0` exists on main; CHANGELOG.md entry present for 1.0.0.
---
### P.6 — Publication: publish to npm
```bash
cd capabilities/testdrive-jsui
npm login # if not already logged in
npm publish
```
Then verify:
- Package visible at `https://www.npmjs.com/package/testdrive-jsui`
- Wait 510 minutes, then check CDN availability:
- `https://cdn.jsdelivr.net/npm/testdrive-jsui@1.0.0/dist/testdrive-jsui.min.js`
- `https://unpkg.com/testdrive-jsui@1.0.0/dist/testdrive-jsui.min.js`
**Acceptance:** Package installable via `npm install testdrive-jsui`.
---
### P.7 — Publication: fresh install test
In a clean temporary directory, install from npm and verify the library works
with a minimal HTML file.
```bash
mkdir /tmp/testdrive-test && cd /tmp/testdrive-test
npm install testdrive-jsui marked
# Open standalone.html equivalent, confirm editor initialises
```
**Acceptance:** `new TestDriveJSUI({...})` works in a fresh install with no
reference to the capability source directory.
---
### P.8 — Publication: GitHub release
Create a GitHub release from the v1.0.0 tag with:
- Release notes (summary from CHANGELOG.md 1.0.0 entry)
- Link to npm package
- Link to CDN URLs (jsdelivr, unpkg)
**Acceptance:** GitHub release published and visible.
---
### P.9 — Post-publication: README badges and monitoring
Add npm badges to `capabilities/testdrive-jsui/README.md`:
```markdown
[![npm version](https://badge.fury.io/js/testdrive-jsui.svg)](...)
[![npm downloads](https://img.shields.io/npm/dm/testdrive-jsui.svg)](...)
```
Set a reminder to check download stats after 1 week.
Demo page and GitHub Pages are optional — do only if there's a specific audience
to point at it.
**Acceptance:** README has version and download count badges; committed.
---
## Task order
```
P.1 (repo decision)
P.2 (Markitect integration check) ← can run in parallel with P.1
P.3 (STANDALONE_PLAN decision) ← can run in parallel
P.4 (pack + dry-run) ← needs P.1, P.2, P.3 all done
P.5 (release tag) ← can run with P.4
P.6 (publish)
P.7 (fresh install test)
P.8 (GitHub release)
P.9 (badges + monitoring)
```
## Out of scope
- Adding new features before publication (ship what's there)
- Ruby or Java adapters (optional integrations, not blocking publication)
- Paid npm features (keep on free tier)

View File

@@ -30,7 +30,7 @@ class TestActualRoundtripBehavior:
cmd = ["python", "-m", "markitect.cli"] + args
result = subprocess.run(
cmd,
cwd="/home/worsch/markitect_project",
cwd="/home/worsch/markitect-main",
capture_output=True,
text=True
)

View File

@@ -5,7 +5,7 @@ This test implements the requirements for initializing a SQLite database
and storing markdown files with front matter parsing.
Issue #1: Initialize Database and Store Example Markdown File
https://gitea.coulomb.social/coulomb/markitect_project/issues/1
https://gitea.coulomb.social/coulomb/markitect-main/issues/1
"""
import pytest

159
tests/test_llm_isolation.py Normal file
View File

@@ -0,0 +1,159 @@
"""
S1.3 — LLM isolation gate.
Confirms that markitect.llm.* has zero imports from markitect.prompts.*
or markitect.infospace.*, making the module safe to extract into a
standalone llm-connect library.
These tests must pass before extraction (S2).
"""
import importlib
import pkgutil
import sys
from pathlib import Path
def _collect_llm_modules() -> list[str]:
"""Return fully-qualified names of all modules under markitect.llm."""
import markitect.llm as pkg
pkg_path = Path(pkg.__file__).parent
names = []
for info in pkgutil.walk_packages([str(pkg_path)], prefix="markitect.llm."):
names.append(info.name)
# Include the package itself
names.insert(0, "markitect.llm")
return names
def _direct_imports(module_name: str) -> set[str]:
"""Return set of top-level module names imported by *module_name*."""
mod = importlib.import_module(module_name)
src_file = getattr(mod, "__file__", None)
if not src_file or not src_file.endswith(".py"):
return set()
imports: set[str] = set()
with open(src_file) as f:
for line in f:
stripped = line.strip()
if stripped.startswith("from ") or stripped.startswith("import "):
# Extract the root package of the imported name
parts = stripped.split()
if parts[0] == "from" and len(parts) >= 2:
imports.add(parts[1].split(".")[0] + "." + parts[1].split(".")[1]
if "." in parts[1] else parts[1])
# Also capture full dotted path for cross-module check
imports.add(parts[1])
return imports
def _import_lines(src_file: str) -> list[str]:
"""Return only import-statement lines from a Python source file."""
lines = []
with open(src_file) as f:
for line in f:
stripped = line.strip()
if stripped.startswith("from ") or stripped.startswith("import "):
lines.append(stripped)
return lines
def test_no_prompts_import_in_llm_tree():
"""markitect.llm must not import anything from markitect.prompts.*"""
violations = []
for mod_name in _collect_llm_modules():
try:
mod = importlib.import_module(mod_name)
except ImportError:
continue
src_file = getattr(mod, "__file__", None)
if not src_file or not src_file.endswith(".py"):
continue
for line in _import_lines(src_file):
if "markitect.prompts" in line:
violations.append(mod_name)
break
assert violations == [], (
f"These llm modules still import from markitect.prompts: {violations}"
)
def test_no_infospace_import_in_llm_tree():
"""markitect.llm must not import anything from markitect.infospace.*"""
violations = []
for mod_name in _collect_llm_modules():
try:
mod = importlib.import_module(mod_name)
except ImportError:
continue
src_file = getattr(mod, "__file__", None)
if not src_file or not src_file.endswith(".py"):
continue
for line in _import_lines(src_file):
if "markitect.infospace" in line:
violations.append(mod_name)
break
assert violations == [], (
f"These llm modules still import from markitect.infospace: {violations}"
)
def test_runconfig_and_llmresponse_canonical_in_llm():
"""RunConfig and LLMResponse must be defined in markitect.llm.models."""
from markitect.llm.models import RunConfig, LLMResponse
assert RunConfig.__module__ == "markitect.llm.models", (
f"RunConfig.module = {RunConfig.__module__!r}, expected 'markitect.llm.models'"
)
assert LLMResponse.__module__ == "markitect.llm.models", (
f"LLMResponse.module = {LLMResponse.__module__!r}, expected 'markitect.llm.models'"
)
def test_llmadapter_canonical_in_llm():
"""LLMAdapter must be defined in markitect.llm.adapter."""
from markitect.llm.adapter import LLMAdapter
assert LLMAdapter.__module__ == "markitect.llm.adapter", (
f"LLMAdapter.module = {LLMAdapter.__module__!r}, expected 'markitect.llm.adapter'"
)
def test_backward_compat_prompts_reexport():
"""markitect.prompts.execution.models must still export RunConfig/LLMResponse."""
from markitect.prompts.execution.models import RunConfig, LLMResponse
from markitect.llm.models import RunConfig as RC, LLMResponse as LR
assert RunConfig is RC, "prompts re-export RunConfig must be the same object as llm.models.RunConfig"
assert LLMResponse is LR, "prompts re-export LLMResponse must be the same object as llm.models.LLMResponse"
def test_backward_compat_llmadapter_reexport():
"""markitect.prompts.execution.llm_adapter must still export LLMAdapter."""
from markitect.prompts.execution.llm_adapter import LLMAdapter
from markitect.llm.adapter import LLMAdapter as LA
assert LLMAdapter is LA, "prompts re-export LLMAdapter must be the same object as llm.adapter.LLMAdapter"
def test_app_name_parameterization():
"""resolve_llm(app_name=X) uses ~/.config/X/config.toml and X_HELPER_MODEL."""
from markitect.llm.toml_config import (
_model_env_var,
_user_config_path,
_dir_config_name,
resolve_llm,
)
assert _model_env_var("railiance") == "RAILIANCE_HELPER_MODEL"
assert _model_env_var("markitect") == "MARKITECT_HELPER_MODEL"
assert str(_user_config_path("railiance")).endswith(".config/railiance/config.toml")
assert _dir_config_name("railiance") == ".railiance.toml"
# Smoke: resolve falls back to hardcoded for unknown app
r = resolve_llm(app_name="nonexistent_app_xyz")
assert r.provider_source == "hardcoded"
assert r.model_source == "hardcoded"

View File

@@ -33,7 +33,7 @@ class TestRoundtripBase:
cmd,
capture_output=True,
text=True,
cwd="/home/worsch/markitect_project"
cwd="/home/worsch/markitect-main"
)
def validate_basic_structure_preservation(self, original: str, reconstructed: str) -> Dict[str, Any]:

View File

@@ -223,3 +223,129 @@ class TestViabilityCommand:
)
assert result.exit_code == 0
assert "No viability thresholds" in result.output
# ── chapters (per-source triage view) ────────────────────────────────
class TestChaptersCommand:
@pytest.fixture
def chapters_dir(self, tmp_path):
"""Infospace with 2 source files and matching entities."""
config_yaml = """\
topic:
name: "WoN"
domain: "Economics"
sources: artifacts/sources
"""
(tmp_path / "infospace.yaml").write_text(config_yaml)
sources = tmp_path / "artifacts" / "sources"
sources.mkdir(parents=True)
(sources / "book-1-chapter-01.md").write_text("# Chapter 1\n\nText.\n")
(sources / "book-1-chapter-02.md").write_text("# Chapter 2\n\nText.\n")
entities = tmp_path / "output" / "entities"
entities.mkdir(parents=True)
(entities / "alpha.md").write_text(
"# Alpha\n\n## Definition\n\nX.\n\n"
"## Source Chapter\n\nBook I, Chapter 1\n"
)
(entities / "beta.md").write_text(
"# Beta\n\n## Definition\n\nY.\n\n"
"## Source Chapter\n\nBook I, Chapter 2\n"
)
(entities / "gamma.md").write_text(
"# Gamma\n\n## Definition\n\nZ.\n\n"
"## Source Chapter\n\nBook I, Chapter 2\n"
)
return tmp_path
def test_lists_sources_with_counts(self, runner, chapters_dir):
result = runner.invoke(
infospace_commands,
["chapters", "--config", str(chapters_dir / "infospace.yaml")],
)
assert result.exit_code == 0
assert "book-1-chapter-01" in result.output
assert "book-1-chapter-02" in result.output
# ch 1 -> 1 entity, ch 2 -> 2 entities
assert "2 source file(s); 3 entities" in result.output
def test_json_format(self, runner, chapters_dir):
result = runner.invoke(
infospace_commands,
["chapters", "--config", str(chapters_dir / "infospace.yaml"),
"--format", "json"],
)
assert result.exit_code == 0
import json
rows = json.loads(result.output)
by_id = {r["source_id"]: r for r in rows}
assert by_id["book-1-chapter-01"]["entities"] == 1
assert by_id["book-1-chapter-02"]["entities"] == 2
def test_no_sources_dir(self, runner, tmp_path):
(tmp_path / "infospace.yaml").write_text(
"topic:\n name: X\n sources: missing\n"
)
result = runner.invoke(
infospace_commands,
["chapters", "--config", str(tmp_path / "infospace.yaml")],
)
assert result.exit_code == 1
# ── process: eval-after-source / classify-after-source flags ─────────
class TestProcessAfterSourceFlags:
def test_flags_registered_in_help(self, runner):
result = runner.invoke(infospace_commands, ["process", "--help"])
assert result.exit_code == 0
assert "--eval-after-source" in result.output
assert "--classify-after-source" in result.output
def test_flags_require_provider(self, runner, tmp_path):
(tmp_path / "infospace.yaml").write_text(
"topic:\n name: X\n sources: sources\n"
"pipeline:\n stages:\n - template: extract-entities\n"
)
sources = tmp_path / "sources"
sources.mkdir()
(sources / "s1.md").write_text("source")
result = runner.invoke(
infospace_commands,
["process", "--all",
"--config", str(tmp_path / "infospace.yaml"),
"--eval-after-source"],
)
assert result.exit_code == 1
assert "require --provider" in result.output
# ── pipeline: commit body composition ────────────────────────────────
class TestCommitBodyComposition:
def test_bucket_for(self, tmp_path):
from markitect.infospace.config import InfospaceConfig, TopicConfig
from markitect.infospace.pipeline import SourcePipeline
cfg = InfospaceConfig(topic=TopicConfig(name="T", domain="D"))
p = SourcePipeline(cfg, tmp_path)
assert p._bucket_for("output/entities/x.md") == "entities"
assert p._bucket_for("output/evaluations/x.md") == "evaluations"
assert p._bucket_for("output/classifications/x.md") == "classifications"
assert p._bucket_for("output/mappings/x.md") == "mappings"
assert p._bucket_for("output/notes/x.md") == "other"
assert p._bucket_for("README.md") is None # not under output/
def test_compose_body_uses_default_on_no_diff(self, tmp_path):
"""When git diff fails or returns empty, fall back to the default blurb."""
from markitect.infospace.config import InfospaceConfig, TopicConfig
from markitect.infospace.pipeline import SourcePipeline
cfg = InfospaceConfig(topic=TopicConfig(name="T", domain="D"))
# Not a git repo, so `git diff --cached` will raise CalledProcessError.
p = SourcePipeline(cfg, tmp_path)
body = p._compose_commit_body("some-source")
assert "Extract entities" in body

View File

@@ -124,6 +124,33 @@ class TestMetricsFileIO:
path.write_text("just a string", encoding="utf-8")
assert read_metrics_file(path) == {}
def test_round_trip_preserves_structured_values(self, tmp_path):
"""Non-numeric values like type_distribution must survive a round-trip.
Regression: eval-summary --update-metrics used to drop any key
whose value wasn't a bare number, silently erasing type_distribution
from the file on every run.
"""
path = tmp_path / "metrics.yaml"
metrics = {
"per_entity_mean": 3.9567,
"vsm_type_matrix_cells": 29,
"type_distribution": {
"Element": 315,
"Institution": 122,
"Principle": 102,
},
}
write_metrics_file(metrics, path)
loaded = read_metrics_file(path)
assert loaded["type_distribution"] == {
"Element": 315, "Institution": 122, "Principle": 102,
}
# And the int stayed an int on disk, not 29.0.
raw = path.read_text(encoding="utf-8")
assert "vsm_type_matrix_cells: 29\n" in raw
assert "vsm_type_matrix_cells: 29.0" not in raw
# ── record_check_results ────────────────────────────────────────────

View File

@@ -0,0 +1,82 @@
"""Tests for markitect.llm.gemini — retry behavior + happy path."""
from unittest import mock
import pytest
from markitect.llm.gemini import GeminiAdapter
from markitect.llm.exceptions import LLMAPIError, LLMRateLimitError
from markitect.prompts.execution.models import RunConfig, LLMResponse
def _api_response(text="hello", model="gemini-2.5-flash"):
return {
"candidates": [
{
"content": {"parts": [{"text": text}], "role": "model"},
"finishReason": "STOP",
}
],
"modelVersion": model,
"usageMetadata": {
"promptTokenCount": 3,
"candidatesTokenCount": 2,
"totalTokenCount": 5,
},
}
class TestGeminiAdapter:
def _adapter(self, **kwargs):
defaults = {"api_key": "AIza-test"}
defaults.update(kwargs)
return GeminiAdapter(**defaults)
@mock.patch("markitect.llm.gemini.post_json")
def test_success(self, mock_post):
mock_post.return_value = _api_response("generated")
adapter = self._adapter()
resp = adapter.execute_prompt("hi", RunConfig())
assert isinstance(resp, LLMResponse)
assert resp.content == "generated"
assert resp.metadata["provider"] == "gemini"
@mock.patch("markitect.llm.gemini.post_json")
@mock.patch("markitect.llm.gemini.time.sleep")
def test_retry_on_429(self, mock_sleep, mock_post):
mock_post.side_effect = [
LLMRateLimitError("rate limited", status_code=429),
_api_response("recovered"),
]
adapter = self._adapter(max_retries=2)
resp = adapter.execute_prompt("hi", RunConfig())
assert resp.content == "recovered"
assert mock_sleep.call_count == 1
@mock.patch("markitect.llm.gemini.post_json")
@mock.patch("markitect.llm.gemini.time.sleep")
def test_retry_on_503(self, mock_sleep, mock_post):
mock_post.side_effect = [
LLMAPIError("unavailable", status_code=503),
_api_response("back"),
]
adapter = self._adapter(max_retries=2)
resp = adapter.execute_prompt("hi", RunConfig())
assert resp.content == "back"
@mock.patch("markitect.llm.gemini.post_json")
def test_no_retry_on_400(self, mock_post):
mock_post.side_effect = LLMAPIError("bad request", status_code=400)
adapter = self._adapter(max_retries=2)
with pytest.raises(LLMAPIError) as exc_info:
adapter.execute_prompt("hi", RunConfig())
assert exc_info.value.status_code == 400
@mock.patch("markitect.llm.gemini.post_json")
@mock.patch("markitect.llm.gemini.time.sleep")
def test_exhausted_retries_raises(self, mock_sleep, mock_post):
mock_post.side_effect = LLMRateLimitError("rate limited", status_code=429)
adapter = self._adapter(max_retries=1)
with pytest.raises(LLMRateLimitError):
adapter.execute_prompt("hi", RunConfig())
assert mock_sleep.call_count == 1 # 1 retry before giving up

View File

@@ -0,0 +1,67 @@
---
id: MARKITECT-WP-0001
type: workplan
title: "Bootstrap State Hub integration"
domain: communication
repo: markitect-main
status: finished
owner: codex
topic_slug: communication
created: "2026-06-22"
updated: "2026-06-22"
state_hub_workstream_id: "dfc40b03-fe8e-49fe-b8d4-86eb1fe26b4a"
---
# Bootstrap State Hub integration
Knowledge artifact management and markdown engine platform.
## Review Generated Integration Files
```task
id: MARKITECT-WP-0001-T01
status: done
priority: high
state_hub_task_id: "7455a381-a93d-4220-8f80-3b6ccf953cff"
```
Result 2026-06-22: SCOPE.md and INTRODUCTION.md reviewed; AGENTS.md confirmed.
Review `INTENT.md`, `SCOPE.md`, `AGENTS.md`, and `.custodian-brief.md`.
Replace generated placeholders with repo-specific facts where needed.
## Verify Local Developer Workflow
```task
id: MARKITECT-WP-0001-T02
status: done
priority: high
state_hub_task_id: "7e34bdab-aa49-49ca-b28a-b254725dd8db"
```
Result 2026-06-22: Documented make-based Python/JS workflow.
Identify the repo's install, test, lint, build, and run commands. Add or refine
those commands in the agent instructions so future coding sessions can verify
changes confidently.
## Seed First Real Workplan
```task
id: MARKITECT-WP-0001-T03
status: done
priority: medium
state_hub_task_id: "35a64da7-dda9-4315-901d-88c6827432d9"
```
Result 2026-06-22: MARKITECT-WP-0002 already exists (TestDrive npm publication).
Create the first implementation workplan for the repository's most important
next change. After workplan file updates, run from `~/state-hub`:
```bash
make fix-consistency REPO=markitect-main
```

View File

@@ -0,0 +1,28 @@
---
id: MARKITECT-WP-0002
type: workplan
title: "TestDrive-JSUI — npm Publication"
domain: communication
repo: markitect-main
status: backlog
owner: codex
topic_slug: communication
created: "2026-06-22"
updated: "2026-06-22"
state_hub_workstream_id: "e203d487-01f1-494a-b14d-a436241a4c01"
---
# TestDrive-JSUI — npm Publication
Backlog workstream for publishing the TestDrive JSUI package to npm.
## Publication Readiness
```task
id: MARKITECT-WP-0002-T01
status: todo
priority: medium
state_hub_task_id: "88b3c206-4d45-4bb3-bbb3-47443cdf2123"
```
Define package scope, versioning, and publication checklist for TestDrive-JSUI.