Compare commits
32 Commits
b055c8d7bb
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| cd8339ecef | |||
| f8ab58edbe | |||
| 2b5e9743fe | |||
| 753c3d4fc6 | |||
| 94e84f0db9 | |||
| a765ccda21 | |||
| 4472fa6c7f | |||
| 526fa1e3bc | |||
| 86de18c247 | |||
| ca9d0d7030 | |||
| bc527ec09a | |||
| ce984482e2 | |||
| 9266f124e6 | |||
| 8740a66611 | |||
| b7e9edbb4b | |||
| 479fa95fdf | |||
| eb9b622499 | |||
| e3e5b8ecc1 | |||
| 9e8d73fa7d | |||
| d44a4cd3df | |||
| c0615c2d50 | |||
| 965508ec06 | |||
| f325f89dc9 | |||
| 36a5136bdf | |||
| b7e11461f4 | |||
| 3966814868 | |||
| f4610a46e3 | |||
| 0d95e6dbcf | |||
| 36c20f37d0 | |||
| 72b87fd82e | |||
| eaf4a955af | |||
| e9dc9a8517 |
20
.claude/rules/agents.md
Normal file
20
.claude/rules/agents.md
Normal file
@@ -0,0 +1,20 @@
|
||||
## Kaizen Agents
|
||||
|
||||
Specialized agent personas available on demand via the state-hub MCP.
|
||||
|
||||
**Discover:** `list_kaizen_agents()` — returns all agents with name, description, category
|
||||
**Load:** `get_kaizen_agent("tdd-workflow")` — returns full instructions; read and follow them
|
||||
|
||||
Common agents:
|
||||
|
||||
| Agent | Category | When to use |
|
||||
|-------|----------|-------------|
|
||||
| `tdd-workflow` | testing | Step-by-step TDD8 workflow for any feature |
|
||||
| `code-refactoring` | quality | Code quality analysis and safe refactoring |
|
||||
| `test-maintenance` | testing | Diagnose and fix failing tests |
|
||||
| `requirements-engineering` | process | Prevent interface/mock mismatches upfront |
|
||||
| `keepaTodofile` | process | Maintain TODO.md during work |
|
||||
| `project-management` | process | Track status, determine next steps |
|
||||
| `datamodel-optimization` | quality | Optimize dataclasses and data structures |
|
||||
|
||||
All 17 agents: call `list_kaizen_agents()` for the full list.
|
||||
8
.claude/rules/architecture.md
Normal file
8
.claude/rules/architecture.md
Normal file
@@ -0,0 +1,8 @@
|
||||
## Architecture
|
||||
|
||||
<!-- TODO: Describe the key design decisions and component structure.
|
||||
Key modules, data flows, external integrations, state machines, etc. -->
|
||||
|
||||
## Quick Reference
|
||||
|
||||
`~/state-hub/mcp_server/TOOLS.md` — MCP tool reference
|
||||
50
.claude/rules/credential-routing.md
Normal file
50
.claude/rules/credential-routing.md
Normal file
@@ -0,0 +1,50 @@
|
||||
# Credential and access routing
|
||||
|
||||
**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
|
||||
for inference. Run this check **before** requesting secrets, API keys, SSH access,
|
||||
login tokens, or database passwords — in any repo, not only `ops-warden`.
|
||||
|
||||
ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
|
||||
other credential need belongs to another subsystem. **Do not** message
|
||||
`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
|
||||
|
||||
### Lookup (do this first)
|
||||
|
||||
```bash
|
||||
warden route find "<describe your need>" --json
|
||||
warden route show <catalog-id> --json
|
||||
```
|
||||
|
||||
Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
|
||||
|
||||
| Agent runtime | How to orient |
|
||||
| --- | --- |
|
||||
| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=markitect-main` is for coordination, not secret vending |
|
||||
| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
|
||||
| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
|
||||
|
||||
### Quick routing table
|
||||
|
||||
| I need… | Owner | ops-warden executes? |
|
||||
| --- | --- | --- |
|
||||
| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes** — `warden sign` |
|
||||
| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
|
||||
| Login / OIDC / MFA | key-cape / Keycloak | No — route only |
|
||||
| Authorization decision | flex-auth | No — route only |
|
||||
| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
|
||||
| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
|
||||
|
||||
### Anti-patterns (do not do these)
|
||||
|
||||
- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
|
||||
- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
|
||||
- Pasting secrets into Git, State Hub, workplans, logs, or chat
|
||||
|
||||
### Other capabilities (reuse-surface)
|
||||
|
||||
Non-credential capabilities are usually discovered through **reuse-surface** federation
|
||||
(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
|
||||
every repo's agent instructions because it is high-frequency, high-risk, and easy to
|
||||
get wrong.
|
||||
|
||||
**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`
|
||||
38
.claude/rules/first-session.md
Normal file
38
.claude/rules/first-session.md
Normal file
@@ -0,0 +1,38 @@
|
||||
## First Session Protocol
|
||||
|
||||
Triggered when `get_domain_summary("communication")` shows **no workstreams**.
|
||||
The project is registered but work has not yet been structured.
|
||||
|
||||
**Step 1 — Read, don't write**
|
||||
- `~/the-custodian/canon/projects/communication/project_charter_v0.1.md` — purpose, scope
|
||||
- `~/the-custodian/canon/projects/communication/roadmap_v0.1.md` — planned phases
|
||||
- Scan repo root: README, directory structure, existing code or docs
|
||||
|
||||
**Step 2 — Survey in-progress work**
|
||||
Look for TODOs, open branches, half-finished files. Note done vs. started but incomplete.
|
||||
|
||||
**Step 3 — Propose workstreams to Bernd**
|
||||
Propose 1–3 workstreams — each a coherent strand, weeks to months, anchored to a
|
||||
roadmap phase. **Wait for approval before creating.**
|
||||
|
||||
**Step 4 — Create workplan file first, then DB record (ADR-001)**
|
||||
```
|
||||
workplans/MARKITECT-WP-NNNN-<slug>.md ← write this first
|
||||
```
|
||||
Then register in the hub:
|
||||
```
|
||||
create_workstream(topic_id="36c7421b-c537-4723-bf75-42a3ebc6a1dc", title="...", owner="...", description="...")
|
||||
create_task(workstream_id="<id>", title="...", priority="high|medium|low")
|
||||
```
|
||||
|
||||
**Step 5 — Record the setup**
|
||||
```
|
||||
add_progress_event(
|
||||
summary="First session: structured communication into N workstreams, M tasks",
|
||||
event_type="milestone",
|
||||
topic_id="36c7421b-c537-4723-bf75-42a3ebc6a1dc",
|
||||
detail={"workstreams": [...], "tasks_created": M}
|
||||
)
|
||||
```
|
||||
|
||||
<!-- Delete or archive this file once past first session -->
|
||||
8
.claude/rules/repo-boundary.md
Normal file
8
.claude/rules/repo-boundary.md
Normal file
@@ -0,0 +1,8 @@
|
||||
## Repo boundary
|
||||
|
||||
This repo owns **Markitect Main** only. It does not own:
|
||||
|
||||
<!-- TODO: List what belongs in adjacent repos, e.g.:
|
||||
- SSH key management → railiance-infra/
|
||||
- State hub code → state-hub/
|
||||
-->
|
||||
5
.claude/rules/repo-identity.md
Normal file
5
.claude/rules/repo-identity.md
Normal file
@@ -0,0 +1,5 @@
|
||||
**Purpose:** Markitect Main - (fill in purpose)
|
||||
|
||||
**Domain:** communication
|
||||
**Repo slug:** markitect-main
|
||||
**Topic ID:** 36c7421b-c537-4723-bf75-42a3ebc6a1dc
|
||||
85
.claude/rules/session-protocol.md
Normal file
85
.claude/rules/session-protocol.md
Normal file
@@ -0,0 +1,85 @@
|
||||
## Session Protocol
|
||||
|
||||
Dev Hub (State Hub API): http://127.0.0.1:8000
|
||||
MCP server name in `~/.claude.json`: `dev-hub`
|
||||
|
||||
**Step 1 — Orient**
|
||||
|
||||
Read the offline-safe brief first — it works without a live hub connection:
|
||||
```bash
|
||||
cat .custodian-brief.md
|
||||
```
|
||||
Then call the MCP tool for richer cross-domain context when MCP tools are exposed:
|
||||
```
|
||||
get_domain_summary("communication")
|
||||
```
|
||||
If MCP tools are unavailable in the current agent session, use the REST API:
|
||||
```bash
|
||||
curl -s "http://127.0.0.1:8000/state/summary" | python3 -m json.tool
|
||||
```
|
||||
If the hub is offline: `cd ~/state-hub && make api`
|
||||
|
||||
**Step 2 — Check inbox**
|
||||
With MCP tools:
|
||||
```
|
||||
get_messages(to_agent="markitect-main", unread_only=True)
|
||||
```
|
||||
Mark read with `mark_message_read(message_id)`. Reply or act on coordination
|
||||
requests before proceeding.
|
||||
|
||||
Without MCP tools:
|
||||
```bash
|
||||
curl -s "http://127.0.0.1:8000/messages/?to_agent=markitect-main&unread_only=true" \
|
||||
| python3 -m json.tool
|
||||
curl -s -X PATCH "http://127.0.0.1:8000/messages/<id>/read" \
|
||||
-H "Content-Type: application/json" -d '{}'
|
||||
```
|
||||
|
||||
**Step 3 — Scan workplans**
|
||||
```bash
|
||||
ls workplans/
|
||||
```
|
||||
For each file with `status: ready`, `active`, or `blocked`, note pending
|
||||
`wait`/`todo`/`progress` tasks.
|
||||
|
||||
**Step 4 — Present brief**
|
||||
|
||||
1. **Active workstreams** for `communication` — title, task counts, blocking decisions
|
||||
2. **Pending tasks** from `workplans/` + any `[repo:markitect-main]` hub tasks
|
||||
3. **Goal guidance** — if `goal_guidance` in summary:
|
||||
- `needs_workplan`: surface as top action — *"Repo goal '{title}' has no workplan yet"*
|
||||
- `alignment_warnings`: flag if active work is not aligned with current goal
|
||||
4. **Suggested next action** — highest-priority open item
|
||||
5. **SBOM status** — flag if `last_sbom_at` is unset for this repo
|
||||
|
||||
If no workstreams: follow First Session Protocol (`first-session.md`).
|
||||
|
||||
**During work:** `record_decision()` · `add_progress_event()` · `resolve_decision()`
|
||||
|
||||
> State Hub is a *read model*. Bootstrap tools (`create_workstream`, `create_task`)
|
||||
> are First Session Protocol only. Work structure belongs in repo files (ADR-001).
|
||||
|
||||
**Session close:**
|
||||
With MCP tools:
|
||||
```
|
||||
add_progress_event(summary="...", topic_id="36c7421b-c537-4723-bf75-42a3ebc6a1dc", workstream_id="<uuid>")
|
||||
```
|
||||
Without MCP tools:
|
||||
```bash
|
||||
curl -s -X POST http://127.0.0.1:8000/progress/ \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"topic_id":"36c7421b-c537-4723-bf75-42a3ebc6a1dc","workstream_id":"<uuid>","event_type":"note","summary":"what changed","author":"codex"}'
|
||||
```
|
||||
If workplan files were modified, ensure the local copy is up to date first:
|
||||
```bash
|
||||
git -C <repo_path> pull --ff-only
|
||||
cd ~/state-hub && make fix-consistency REPO=markitect-main
|
||||
```
|
||||
For repos where implementation runs on a remote machine (e.g. CoulombCore),
|
||||
use the combined target which pulls before fixing:
|
||||
```bash
|
||||
cd ~/state-hub && make fix-consistency-remote REPO=markitect-main
|
||||
```
|
||||
**C-15** (DB task ahead of file) is normal in multi-machine workflows — writeback
|
||||
will sync the file to match DB. **C-16** (repo behind remote) blocks all writes
|
||||
until you pull — intentional to prevent clobbering remote progress.
|
||||
16
.claude/rules/stack-and-commands.md
Normal file
16
.claude/rules/stack-and-commands.md
Normal file
@@ -0,0 +1,16 @@
|
||||
## Stack
|
||||
|
||||
- **Language:** Python 3.12+ (monorepo) + JavaScript UI (testdrive-jsui)
|
||||
- **Key deps:** uv/pip, pytest, npm; see `pyproject.toml`, `package.json`, `Makefile`
|
||||
|
||||
## Dev Commands
|
||||
|
||||
```bash
|
||||
make setup
|
||||
make test
|
||||
make test-js
|
||||
make test-all
|
||||
make lint
|
||||
make build
|
||||
make help
|
||||
```
|
||||
40
.claude/rules/workplan-convention.md
Normal file
40
.claude/rules/workplan-convention.md
Normal file
@@ -0,0 +1,40 @@
|
||||
## Workplan Convention (ADR-001)
|
||||
|
||||
File location: `workplans/MARKITECT-WP-NNNN-<slug>.md`
|
||||
ID prefix: `MARKITECT-WP-`
|
||||
|
||||
Work items originate as files in this repo **before** being registered in the hub.
|
||||
|
||||
Canonical workplan/workstream frontmatter statuses are:
|
||||
`proposed`, `ready`, `active`, `blocked`, `backlog`, `finished`, `archived`.
|
||||
Use `proposed` for a newly drafted plan, `ready` after review against current
|
||||
repo state, and `finished` when implementation is complete. `stalled` and
|
||||
`needs_review` are derived health labels, not stored statuses.
|
||||
|
||||
Closed workplans may be moved to `workplans/archived/` with a completion-date
|
||||
prefix: `YYMMDD-MARKITECT-WP-NNNN-<slug>.md`. The frontmatter id remains
|
||||
unchanged; the prefix is only for quick visual reference.
|
||||
|
||||
Small opportunistic tasks discovered during another session use **Ad Hoc Tasks**:
|
||||
`workplans/ADHOC-YYYY-MM-DD.md`, workstream slug `adhoc-YYYY-MM-DD`, and task ids
|
||||
`ADHOC-YYYY-MM-DD-T01`, `T02`, etc. Use adhocs only for low-risk work completed
|
||||
directly. Promote anything requiring analysis, design, approval, dependencies, or
|
||||
multiple planned phases into a normal workplan.
|
||||
|
||||
Ecosystem todos from other agents arrive as `[repo:markitect-main]` hub tasks —
|
||||
visible at session start. Pick one up by creating the workplan file, then registering
|
||||
the workstream.
|
||||
|
||||
Task blocks use this shape:
|
||||
|
||||
```task
|
||||
id: MARKITECT-WP-NNNN-T01
|
||||
status: wait | todo | progress | done | cancel
|
||||
priority: high | medium | low
|
||||
state_hub_task_id: "<uuid>" # written by fix-consistency — do not edit
|
||||
```
|
||||
|
||||
Status progression is `todo` → `progress` → `done`; use `wait` for waiting or
|
||||
blocked work and `cancel` for stopped work.
|
||||
|
||||
<!-- Ralph Loop rules and HEUREKA sequence: ~/.claude/CLAUDE.md — do not duplicate here -->
|
||||
@@ -10,7 +10,7 @@ principles with strict separation of concerns.
|
||||
|
||||
## Directory Structure & Clean Architecture
|
||||
```
|
||||
markitect_project/
|
||||
markitect-main/
|
||||
├── domain/ # Business logic (innermost layer)
|
||||
├── application/ # Use cases and workflows
|
||||
├── infrastructure/ # External interfaces (database, file system)
|
||||
|
||||
18
.custodian-brief.md
Normal file
18
.custodian-brief.md
Normal file
@@ -0,0 +1,18 @@
|
||||
<!-- custodian-brief: generated by fix-consistency — do not edit manually -->
|
||||
# Custodian Brief — markitect-main
|
||||
|
||||
**Domain:** communication
|
||||
**Last synced:** 2026-06-22 21:32 UTC
|
||||
**State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)*
|
||||
|
||||
## Active Workstreams
|
||||
|
||||
*(none — repo may need first-session setup)*
|
||||
|
||||
---
|
||||
## MCP Orientation (when available)
|
||||
|
||||
If the state-hub MCP server is reachable, call:
|
||||
`get_domain_summary("communication")`
|
||||
This provides richer cross-domain context.
|
||||
If the MCP call fails, use this file as your orientation source.
|
||||
2
.gitignore
vendored
2
.gitignore
vendored
@@ -91,6 +91,8 @@ debug_*.py
|
||||
|
||||
# Claude Code local settings (user-specific permissions)
|
||||
.claude/settings.local.json
|
||||
# Claude Code runtime session locks (per-session, not content)
|
||||
.claude/*.lock
|
||||
|
||||
.aider*
|
||||
|
||||
|
||||
2
.gitmodules
vendored
2
.gitmodules
vendored
@@ -1,6 +1,6 @@
|
||||
[submodule "wiki"]
|
||||
path = wiki
|
||||
url = http://92.205.130.254:32166/coulomb/markitect_project.wiki.git
|
||||
url = http://92.205.130.254:32166/coulomb/markitect-main.wiki.git
|
||||
branch = main
|
||||
[submodule "capabilities/kaizen-agentic"]
|
||||
path = capabilities/kaizen-agentic
|
||||
|
||||
25
.repo-classification.yaml
Normal file
25
.repo-classification.yaml
Normal file
@@ -0,0 +1,25 @@
|
||||
repo_classification:
|
||||
standard: Repo Classification Standard
|
||||
version: '1.0'
|
||||
classified_at: '2026-06-22'
|
||||
classified_by: human
|
||||
category: product
|
||||
domain: communication
|
||||
secondary_domains:
|
||||
- infotech
|
||||
- agents
|
||||
capability_tags:
|
||||
- knowledge
|
||||
- documentation
|
||||
- product-development
|
||||
- platform
|
||||
business_stake:
|
||||
- product
|
||||
- technology
|
||||
- execution
|
||||
business_mechanics:
|
||||
- intention
|
||||
- coordination
|
||||
- operation
|
||||
- adaptation
|
||||
notes: Markitect successor to archived markitect-project; human confirmed.
|
||||
219
AGENTS.md
Normal file
219
AGENTS.md
Normal file
@@ -0,0 +1,219 @@
|
||||
# Markitect Main — Agent Instructions
|
||||
|
||||
## Repo Identity
|
||||
|
||||
**Purpose:** Markitect Main - (fill in purpose)
|
||||
|
||||
**Domain:** communication
|
||||
**Repo slug:** markitect-main
|
||||
**Topic ID:** `36c7421b-c537-4723-bf75-42a3ebc6a1dc`
|
||||
**Workplan prefix:** `MARKITECT-WP-`
|
||||
|
||||
---
|
||||
|
||||
## State Hub Integration
|
||||
|
||||
The Custodian State Hub tracks work across all domains. Interact via HTTP REST —
|
||||
there is no MCP server for Codex agents.
|
||||
|
||||
| Context | URL |
|
||||
|---------|-----|
|
||||
| Local workstation | `http://127.0.0.1:8000` |
|
||||
| Remote via tunnel | `http://127.0.0.1:18000` |
|
||||
|
||||
### Orient at session start
|
||||
|
||||
```bash
|
||||
# Offline brief — works without hub connection
|
||||
cat .custodian-brief.md
|
||||
|
||||
# Active workstreams for this domain
|
||||
curl -s "http://127.0.0.1:8000/workstreams/?topic_id=36c7421b-c537-4723-bf75-42a3ebc6a1dc&status=active" \
|
||||
| python3 -m json.tool
|
||||
|
||||
# Check inbox
|
||||
curl -s "http://127.0.0.1:8000/messages/?to_agent=markitect-main&unread_only=true" \
|
||||
| python3 -m json.tool
|
||||
```
|
||||
|
||||
Mark a message read:
|
||||
```bash
|
||||
curl -s -X PATCH "http://127.0.0.1:8000/messages/<id>/read" \
|
||||
-H "Content-Type: application/json" -d '{}'
|
||||
```
|
||||
|
||||
### Log progress (required at session close)
|
||||
|
||||
```bash
|
||||
curl -s -X POST http://127.0.0.1:8000/progress/ \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"summary": "what was done",
|
||||
"event_type": "note",
|
||||
"author": "codex",
|
||||
"workstream_id": "<uuid>",
|
||||
"task_id": "<uuid>"
|
||||
}'
|
||||
```
|
||||
|
||||
Omit `workstream_id` / `task_id` when not applicable.
|
||||
|
||||
### Update task status
|
||||
|
||||
```bash
|
||||
curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"status": "progress"}'
|
||||
# values: wait | todo | progress | done | cancel
|
||||
```
|
||||
|
||||
### Flag a task for human review
|
||||
|
||||
```bash
|
||||
curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"needs_human": true, "intervention_note": "reason"}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Session Protocol
|
||||
|
||||
**Start:**
|
||||
1. `cat .custodian-brief.md` — domain goal and open workstreams (offline-safe)
|
||||
2. Check inbox: `GET /messages/?to_agent=markitect-main&unread_only=true`; mark read
|
||||
3. Scan workplans: `ls workplans/` — note `status: ready`, `active`, or `blocked` files and open tasks
|
||||
4. Check human-needed tasks: `GET /tasks/?needs_human=true`
|
||||
|
||||
**During work:**
|
||||
- Update task statuses in workplan files as tasks progress
|
||||
- Record significant decisions via `POST /decisions/`
|
||||
|
||||
**Close:**
|
||||
1. Update workplan file task statuses to reflect progress
|
||||
2. Log: `POST /progress/` with a summary of what changed
|
||||
3. Note for the custodian operator: after workplan file changes, run from
|
||||
`~/state-hub`:
|
||||
```bash
|
||||
make fix-consistency REPO=markitect-main
|
||||
```
|
||||
This syncs task status from files into the hub DB.
|
||||
|
||||
---
|
||||
|
||||
## Credential and access routing
|
||||
|
||||
**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
|
||||
for inference. Run this check **before** requesting secrets, API keys, SSH access,
|
||||
login tokens, or database passwords — in any repo, not only `ops-warden`.
|
||||
|
||||
ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
|
||||
other credential need belongs to another subsystem. **Do not** message
|
||||
`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
|
||||
|
||||
### Lookup (do this first)
|
||||
|
||||
```bash
|
||||
warden route find "<describe your need>" --json
|
||||
warden route show <catalog-id> --json
|
||||
```
|
||||
|
||||
Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
|
||||
|
||||
| Agent runtime | How to orient |
|
||||
| --- | --- |
|
||||
| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=markitect-main` is for coordination, not secret vending |
|
||||
| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
|
||||
| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
|
||||
|
||||
### Quick routing table
|
||||
|
||||
| I need… | Owner | ops-warden executes? |
|
||||
| --- | --- | --- |
|
||||
| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes** — `warden sign` |
|
||||
| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
|
||||
| Login / OIDC / MFA | key-cape / Keycloak | No — route only |
|
||||
| Authorization decision | flex-auth | No — route only |
|
||||
| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
|
||||
| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
|
||||
|
||||
### Anti-patterns (do not do these)
|
||||
|
||||
- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
|
||||
- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
|
||||
- Pasting secrets into Git, State Hub, workplans, logs, or chat
|
||||
|
||||
### Other capabilities (reuse-surface)
|
||||
|
||||
Non-credential capabilities are usually discovered through **reuse-surface** federation
|
||||
(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
|
||||
every repo's agent instructions because it is high-frequency, high-risk, and easy to
|
||||
get wrong.
|
||||
|
||||
**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`
|
||||
|
||||
<!-- REPO-AGENTS-EXTENSIONS -->
|
||||
<!-- Append repo-specific agent instructions below this marker.
|
||||
The state-hub template sync preserves content after this line. -->
|
||||
|
||||
---
|
||||
|
||||
## Workplan Convention (ADR-001)
|
||||
|
||||
Work items originate as files in this repo — not in the hub. The hub is a
|
||||
read/cache/index layer that rebuilds from files.
|
||||
|
||||
**File location:** `workplans/MARKITECT-WP-NNNN-<slug>.md`
|
||||
|
||||
**Archived location:** finished workplans may move to
|
||||
`workplans/archived/YYMMDD-MARKITECT-WP-NNNN-<slug>.md`. The `YYMMDD` prefix is
|
||||
the completion/archive date; the frontmatter `id` does not change.
|
||||
|
||||
**Ad Hoc Tasks:** small opportunistic fixes discovered during a session use
|
||||
`workplans/ADHOC-YYYY-MM-DD.md` with task ids `ADHOC-YYYY-MM-DD-T01`, etc. Use
|
||||
this only for low-risk work completed directly; create a normal workplan for
|
||||
anything needing analysis, design, approval, dependencies, or multiple phases.
|
||||
|
||||
**Frontmatter:**
|
||||
|
||||
```yaml
|
||||
---
|
||||
id: MARKITECT-WP-NNNN
|
||||
type: workplan
|
||||
title: "..."
|
||||
domain: communication
|
||||
repo: markitect-main
|
||||
status: proposed | ready | active | blocked | backlog | finished | archived
|
||||
owner: codex
|
||||
topic_slug: ...
|
||||
created: "YYYY-MM-DD"
|
||||
updated: "YYYY-MM-DD"
|
||||
state_hub_workstream_id: "<uuid>" # written by fix-consistency — do not edit
|
||||
---
|
||||
```
|
||||
|
||||
Use `proposed` for a new draft, `ready` after review against current repo
|
||||
state, and `finished` after implementation. `stalled` and `needs_review` are
|
||||
derived health labels, not frontmatter statuses.
|
||||
|
||||
**Task block format** (one per `##` section):
|
||||
|
||||
```
|
||||
## Task Title
|
||||
|
||||
` ` `task
|
||||
id: MARKITECT-WP-NNNN-T01
|
||||
status: wait | todo | progress | done | cancel
|
||||
priority: high | medium | low
|
||||
state_hub_task_id: "<uuid>" # written by fix-consistency — do not edit
|
||||
` ` `
|
||||
|
||||
Task description text.
|
||||
```
|
||||
|
||||
Status progression: `todo` → `progress` → `done`; use `wait` for waiting/blocked work and `cancel` for stopped work.
|
||||
|
||||
To create a new workplan:
|
||||
1. Write the file following the format above
|
||||
2. Notify the custodian operator to run `make fix-consistency REPO=markitect-main`
|
||||
(or send a message to the hub agent via `POST /messages/`)
|
||||
12
CLAUDE.md
Normal file
12
CLAUDE.md
Normal file
@@ -0,0 +1,12 @@
|
||||
# Markitect Main — Claude Code Instructions
|
||||
|
||||
@SCOPE.md
|
||||
@.claude/rules/repo-identity.md
|
||||
@.claude/rules/session-protocol.md
|
||||
@.claude/rules/first-session.md
|
||||
@.claude/rules/workplan-convention.md
|
||||
@.claude/rules/stack-and-commands.md
|
||||
@.claude/rules/architecture.md
|
||||
@.claude/rules/repo-boundary.md
|
||||
@.claude/rules/credential-routing.md
|
||||
@.claude/rules/agents.md
|
||||
@@ -457,7 +457,7 @@ Sister projects can reuse these capabilities directly:
|
||||
Install capabilities via local file references:
|
||||
```toml
|
||||
[project.dependencies]
|
||||
release-management = {path = "../markitect_project/capabilities/release-management"}
|
||||
release-management = {path = "../markitect-main/capabilities/release-management"}
|
||||
```
|
||||
|
||||
### Shared Infrastructure
|
||||
|
||||
129
SCOPE.md
Normal file
129
SCOPE.md
Normal file
@@ -0,0 +1,129 @@
|
||||
# SCOPE
|
||||
|
||||
> This file helps you quickly understand what this repository is about,
|
||||
> when it is relevant, and when it is not.
|
||||
> It is intentionally lightweight and may be incomplete.
|
||||
|
||||
---
|
||||
|
||||
## One-liner
|
||||
|
||||
Intelligent markdown engine and information management platform — treats documents as structured, queryable information spaces with schema validation, transclusion, LLM-driven evaluation, and infospace lifecycle management.
|
||||
|
||||
---
|
||||
|
||||
## Core Idea
|
||||
|
||||
MarkiTect turns fragmented knowledge (scattered docs, chats, notes) into structured, versioned, reusable artifacts. The core abstraction is an **infospace**: a curated collection of typed entities (concepts, mechanisms, observations) governed by a YAML config, validated against schemas, and evaluated for quality across five dimensions. The platform automates generation, validation, and transformation at scale, delegating domain-level judgment to LLMs while Python handles structure and evaluation.
|
||||
|
||||
---
|
||||
|
||||
## In Scope
|
||||
|
||||
- Parse, validate, and analyze markdown documents against schemas
|
||||
- Generate schemas from example documents; enforce naming convention `{domain}-schema-v{major}.{minor}.md`
|
||||
- Infospace lifecycle: create, populate, evaluate (per-entity + collection quality scores), compose, export
|
||||
- Transclusion: embed content from one document into another, maintaining single source of truth
|
||||
- LLM-driven prompt execution with dependency resolution and quality gates
|
||||
- Relationship graph export (Mermaid, DOT) and analysis (networkx, FCA)
|
||||
- Batch document processing; CLI (`markitect <command>`) and programmatic API
|
||||
- Rendering: markdown → interactive HTML via plugin system (testdrive-jsui)
|
||||
- Asset management (image embedding, resource handling)
|
||||
|
||||
---
|
||||
|
||||
## Out of Scope
|
||||
|
||||
- Visual/WYSIWYG editing (markdown-first, text-based workflows only)
|
||||
- Real-time collaborative editing (git-based versioning instead)
|
||||
- Financial transactions or external payment integration
|
||||
- Making domain-level judgments in Python code (delegated to LLM via prompt templates)
|
||||
- Storing secrets or credentials in plaintext
|
||||
- Full GraphQL API (structure exists but not fully implemented)
|
||||
- Vendor-specific integrations or lock-in
|
||||
|
||||
---
|
||||
|
||||
## Relevant When
|
||||
|
||||
- Managing large document sets (hundreds to thousands) needing consistent structure and validation
|
||||
- Building or maintaining institutional knowledge bases, technical documentation, or canon releases
|
||||
- Automating document generation from schemas or templates
|
||||
- Tracking relationships and dependencies between knowledge artifacts
|
||||
- Needing programmatic access to document structure (beyond file reading)
|
||||
- Applying quality evaluation to a structured concept collection
|
||||
|
||||
---
|
||||
|
||||
## Not Relevant When
|
||||
|
||||
- Working with a handful of simple, unrelated documents
|
||||
- Visual editor required
|
||||
- Exclusively non-markdown source formats (PDF/Word need conversion first)
|
||||
- No consistency, validation, or automation needed
|
||||
|
||||
---
|
||||
|
||||
## Current State
|
||||
|
||||
- Status: active (v0.13.0-dev, ~90 commits ahead of release)
|
||||
- Implementation: substantial — core modules mature (CLI, parsing, schema management, prompt execution, infospace); infospace S3 close-out in progress; LLM adapter extracted to standalone `llm-connect` package
|
||||
- Stability: stable core; plugin system and infospace tooling evolving; 200+ CHANGELOG entries since v0.6.0
|
||||
- Usage: active personal development; examples with 988 entities and full evaluation pipeline
|
||||
|
||||
---
|
||||
|
||||
## How It Fits
|
||||
|
||||
- Upstream dependencies: `llm-connect` (LLM adapter library, extracted), `testdrive-jsui` (rendering plugin submodule), `markitect-utils` (utility library)
|
||||
- Downstream consumers: Custodian — MarkiTect is the knowledge artifact platform in the canonical dependency order (Railiance → **Markitect** → Coulomb.social → Personhood/Foerster → Custodian)
|
||||
- Often used with: the-custodian (state hub tracks markitect domain workstreams), kaizen-agentic (project-management agent for session workflow)
|
||||
|
||||
---
|
||||
|
||||
## Terminology
|
||||
|
||||
- Preferred terms: infospace, topic, discipline, entity, evaluation, viability, transclusion, schema, quality gates
|
||||
- Also known as: "markitect", "the markdown engine"
|
||||
- Potentially confusing terms: "topic" = the subject matter an infospace explains (not a chat thread); "discipline" = a reusable framework of concepts (itself a viable infospace); "infospace" ≠ filesystem directory (it's a curated conceptual collection with explicit quality thresholds)
|
||||
|
||||
---
|
||||
|
||||
## Related / Overlapping
|
||||
|
||||
- `llm-connect` — standalone LLM adapter extracted from MarkiTect (dependency)
|
||||
- `the-custodian` — tracks markitect workstreams; custodian canon includes a markitect domain charter
|
||||
- `marki-docx` — separate repo (on tegwick machine); relationship: docx export capability for MarkiTect artifacts
|
||||
|
||||
---
|
||||
|
||||
## Provided Capabilities
|
||||
|
||||
```capability
|
||||
type: documentation
|
||||
title: Structured document validation and schema management
|
||||
description: Parse, validate, and enforce schemas on markdown documents — generate schemas from examples, validate entity collections, report naming convention compliance.
|
||||
keywords: [markdown, schema, validation, document, structure, linting]
|
||||
```
|
||||
|
||||
```capability
|
||||
type: documentation
|
||||
title: Infospace lifecycle management
|
||||
description: Create, populate, evaluate (quality scores), compose, and export curated knowledge collections (infospaces) with transclusion and relationship graph analysis.
|
||||
keywords: [infospace, knowledge, curation, evaluation, transclusion, quality, graph]
|
||||
```
|
||||
|
||||
```capability
|
||||
type: data
|
||||
title: LLM-driven knowledge artifact generation
|
||||
description: Execute prompts with dependency resolution and quality gates to generate typed entities — concepts, mechanisms, observations — at scale from schemas and templates.
|
||||
keywords: [llm, generation, prompt, entity, artifact, knowledge, automation]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Getting Oriented
|
||||
|
||||
- Start with: `CLAUDE.md` (dev commands, LLM config, infospace lifecycle), `INTRODUCTION.md` (use cases, philosophy)
|
||||
- Key files / directories: `markitect/cli.py` (CLI entry point), `markitect/infospace/` (primary active area), `markitect/prompts/` (LLM execution), `roadmap/` (6 active planning tracks), `examples/infospace-with-history/` (988-entity reference implementation)
|
||||
- Entry points: `markitect --help`; `markitect infospace --help`; `pytest tests/unit/` (inner TDD loop)
|
||||
@@ -15,7 +15,7 @@ You are responsible for:
|
||||
|
||||
### Directory Structure
|
||||
```
|
||||
markitect_project/
|
||||
markitect-main/
|
||||
├── Makefile # Main project Makefile
|
||||
├── scripts/
|
||||
│ └── capability_discovery.mk # Auto-discovery and delegation system
|
||||
|
||||
@@ -7,7 +7,7 @@ detachment:
|
||||
capability_name: issue-facade
|
||||
capability_family: issue-tracking
|
||||
integration_pattern: capabilities-directory
|
||||
original_location: /home/worsch/markitect_project/capabilities/issue-facade
|
||||
original_location: /home/worsch/markitect-main/capabilities/issue-facade
|
||||
|
||||
capability_metadata:
|
||||
spec_file: CAPABILITY-issue-tracking.yaml
|
||||
@@ -17,23 +17,23 @@ capability_metadata:
|
||||
|
||||
integration_details:
|
||||
parent_project: capabilities
|
||||
parent_path: /home/worsch/markitect_project/capabilities
|
||||
parent_path: /home/worsch/markitect-main/capabilities
|
||||
|
||||
re_integration_guide: |
|
||||
To re-integrate this capability using the new architecture:
|
||||
|
||||
# Option 1: Git submodule (recommended)
|
||||
cd /home/worsch/markitect_project/capabilities
|
||||
cd /home/worsch/markitect-main/capabilities
|
||||
git submodule add <repo-url> _issue-facade
|
||||
pip install -e _issue-facade/
|
||||
|
||||
# Option 2: Clone directly
|
||||
cd /home/worsch/markitect_project/capabilities
|
||||
cd /home/worsch/markitect-main/capabilities
|
||||
git clone <repo-url> _issue-facade
|
||||
pip install -e _issue-facade/
|
||||
|
||||
# Option 3: Copy into project
|
||||
cd /home/worsch/markitect_project/capabilities
|
||||
cd /home/worsch/markitect-main/capabilities
|
||||
cp -r /path/to/issue-facade _issue-facade
|
||||
pip install -e _issue-facade/
|
||||
|
||||
|
||||
@@ -8,7 +8,7 @@ This test module validates outline mode schema generation improvements including
|
||||
- Content instruction integration
|
||||
- End-to-end workflow from example document to generated drafts
|
||||
|
||||
Created for Issue #46: https://gitea.coulomb.social/coulomb/markitect_project/issues/46
|
||||
Created for Issue #46: https://gitea.coulomb.social/coulomb/markitect-main/issues/46
|
||||
"""
|
||||
|
||||
import pytest
|
||||
|
||||
@@ -209,7 +209,7 @@ tests/
|
||||
## 🎯 Detailed File Structure After Migration
|
||||
|
||||
```
|
||||
markitect_project/
|
||||
markitect-main/
|
||||
├── capabilities/
|
||||
│ └── release-management/
|
||||
│ ├── README.md ✅ CREATED
|
||||
|
||||
@@ -162,7 +162,7 @@ clean_before_build = true
|
||||
[tool.release-management.registries.gitea]
|
||||
url = "http://92.205.130.254:32166"
|
||||
owner = "coulomb"
|
||||
repo = "markitect_project"
|
||||
repo = "markitect-main"
|
||||
auth_token_env = "GITEA_API_TOKEN"
|
||||
|
||||
[tool.release-management.registries.pypi]
|
||||
|
||||
@@ -141,7 +141,7 @@ make release-publish VERSION=0.8.0
|
||||
## Registry Information
|
||||
|
||||
- **Gitea URL**: http://92.205.130.254:32166
|
||||
- **Repository**: coulomb/markitect_project
|
||||
- **Repository**: coulomb/markitect-main
|
||||
- **PyPI Registry URL**: http://92.205.130.254:32166/api/packages/coulomb/pypi
|
||||
- **Package List URL**: http://92.205.130.254:32166/api/v1/packages/coulomb
|
||||
|
||||
|
||||
@@ -8,7 +8,7 @@
|
||||
|
||||
```bash
|
||||
# ❌ WRONG - Don't edit capability files from main repo
|
||||
cd /home/worsch/markitect_project/capabilities/testdrive-jsui
|
||||
cd /home/worsch/markitect-main/capabilities/testdrive-jsui
|
||||
vim src/testdrive_jsui/core.py # DON'T DO THIS!
|
||||
|
||||
# ✅ CORRECT - Use separate Claude instance/session
|
||||
@@ -29,7 +29,7 @@ cd /path/to/work/testdrive-jsui
|
||||
|
||||
| Session | Purpose | Location |
|
||||
|---------|---------|----------|
|
||||
| **Main Repo** | Integration, configuration | `/home/worsch/markitect_project` |
|
||||
| **Main Repo** | Integration, configuration | `/home/worsch/markitect-main` |
|
||||
| **Capability** | Feature development, bugs | Separate clone or `capabilities/capability-name` |
|
||||
|
||||
**Why?** Prevents accidental cross-contamination and respects repository boundaries.
|
||||
@@ -40,7 +40,7 @@ cd /path/to/work/testdrive-jsui
|
||||
|
||||
```bash
|
||||
# After pushing changes to capability repo
|
||||
cd /home/worsch/markitect_project
|
||||
cd /home/worsch/markitect-main
|
||||
git submodule update --remote capabilities/testdrive-jsui
|
||||
git add capabilities/testdrive-jsui
|
||||
git commit -m "chore: update testdrive-jsui to latest"
|
||||
@@ -50,7 +50,7 @@ git push
|
||||
### Add New Capability
|
||||
|
||||
```bash
|
||||
cd /home/worsch/markitect_project
|
||||
cd /home/worsch/markitect-main
|
||||
|
||||
# Add as submodule
|
||||
git submodule add http://gitea/coulomb/new-capability.git capabilities/new-capability
|
||||
@@ -67,7 +67,7 @@ git commit -m "feat: add new-capability submodule"
|
||||
|
||||
```bash
|
||||
# Option 1: In submodule directory (careful!)
|
||||
cd /home/worsch/markitect_project/capabilities/testdrive-jsui
|
||||
cd /home/worsch/markitect-main/capabilities/testdrive-jsui
|
||||
git checkout -b feature-branch
|
||||
# make changes
|
||||
git commit -m "feat: new feature"
|
||||
@@ -86,7 +86,7 @@ git push origin feature-branch
|
||||
### Check Capability Status
|
||||
|
||||
```bash
|
||||
cd /home/worsch/markitect_project
|
||||
cd /home/worsch/markitect-main
|
||||
|
||||
# List all capabilities
|
||||
make capabilities-list
|
||||
|
||||
@@ -9,7 +9,7 @@ MarkiTect is a markdown processing toolkit with transclusion, schema validation,
|
||||
## Current Directory Structure
|
||||
|
||||
```
|
||||
markitect_project/
|
||||
markitect-main/
|
||||
├── markitect/ # Main package
|
||||
│ ├── [34 root-level .py files] # Core functionality (see below)
|
||||
│ ├── assets/ # Asset discovery, management, caching (21 files)
|
||||
|
||||
@@ -8,7 +8,7 @@ MarkiTect uses a **capabilities-based architecture** where functionality is orga
|
||||
|
||||
### 1. **Separation of Concerns**
|
||||
|
||||
**Critical Rule:** The main repository (`markitect_project`) **MUST NOT** directly modify capability code.
|
||||
**Critical Rule:** The main repository (`markitect-main`) **MUST NOT** directly modify capability code.
|
||||
|
||||
- ✅ **DO**: Use capabilities as dependencies
|
||||
- ✅ **DO**: Configure capabilities through documented interfaces
|
||||
@@ -28,7 +28,7 @@ MarkiTect uses a **capabilities-based architecture** where functionality is orga
|
||||
Capabilities are integrated as **git submodules**, not regular directories:
|
||||
|
||||
```
|
||||
markitect_project/
|
||||
markitect-main/
|
||||
├── .gitmodules # Submodule configuration
|
||||
├── capabilities/
|
||||
│ ├── testdrive-jsui/ # Git submodule → separate repo
|
||||
@@ -80,8 +80,8 @@ engine.render_document(content, mode='edit', config=config)
|
||||
|
||||
#### Main Repository Session
|
||||
```bash
|
||||
# In markitect_project/
|
||||
cd /home/worsch/markitect_project
|
||||
# In markitect-main/
|
||||
cd /home/worsch/markitect-main
|
||||
|
||||
# Main repo tasks:
|
||||
# - Integrate capabilities
|
||||
@@ -93,7 +93,7 @@ cd /home/worsch/markitect_project
|
||||
#### Capability Session
|
||||
```bash
|
||||
# In capability repository
|
||||
cd /home/worsch/markitect_project/capabilities/testdrive-jsui
|
||||
cd /home/worsch/markitect-main/capabilities/testdrive-jsui
|
||||
|
||||
# OR clone separately
|
||||
git clone http://gitea/coulomb/testdrive-jsui.git
|
||||
@@ -122,7 +122,7 @@ cd testdrive-jsui
|
||||
|
||||
2. **Update main project** (different Claude instance)
|
||||
```bash
|
||||
cd /home/worsch/markitect_project
|
||||
cd /home/worsch/markitect-main
|
||||
git submodule update --remote capabilities/testdrive-jsui
|
||||
git commit -m "chore: update testdrive-jsui submodule"
|
||||
```
|
||||
@@ -139,7 +139,7 @@ When a capability releases a new version:
|
||||
|
||||
```bash
|
||||
# In main repo
|
||||
cd /home/worsch/markitect_project
|
||||
cd /home/worsch/markitect-main
|
||||
|
||||
# Update specific capability
|
||||
cd capabilities/testdrive-jsui
|
||||
@@ -160,7 +160,7 @@ git commit -am "chore: update all capabilities"
|
||||
# http://gitea/coulomb/new-capability
|
||||
|
||||
# 2. Add as submodule to main repo
|
||||
cd /home/worsch/markitect_project
|
||||
cd /home/worsch/markitect-main
|
||||
git submodule add http://gitea/coulomb/new-capability.git capabilities/new-capability
|
||||
|
||||
# 3. Add dependency to pyproject.toml
|
||||
@@ -324,7 +324,7 @@ def test_testdrive_jsui_integration():
|
||||
1. **Create separate git repo**
|
||||
```bash
|
||||
cd /tmp
|
||||
cp -r markitect_project/capabilities/capability-name capability-name
|
||||
cp -r markitect-main/capabilities/capability-name capability-name
|
||||
cd capability-name
|
||||
git init
|
||||
git add .
|
||||
@@ -335,7 +335,7 @@ def test_testdrive_jsui_integration():
|
||||
|
||||
2. **Remove from main repo**
|
||||
```bash
|
||||
cd markitect_project
|
||||
cd markitect-main
|
||||
git rm -rf capabilities/capability-name
|
||||
git commit -m "chore: remove capability-name for submodule conversion"
|
||||
```
|
||||
|
||||
203
docs/composition-guide.md
Normal file
203
docs/composition-guide.md
Normal file
@@ -0,0 +1,203 @@
|
||||
# Infospace Composition Guide
|
||||
|
||||
One completed, viable infospace can be reused as a **discipline** for
|
||||
another infospace — a lens applied to a different topic. This guide
|
||||
explains how composition works and walks through the live
|
||||
`examples/supply-chain-vsm/` reference.
|
||||
|
||||
---
|
||||
|
||||
## What composition means
|
||||
|
||||
An **infospace** is a directory of typed entities governed by
|
||||
`infospace.yaml`. Its entities and relations describe a specific topic
|
||||
(for example, Adam Smith's *Wealth of Nations*).
|
||||
|
||||
A **discipline** is an infospace declared as a reusable analytical
|
||||
framework by another infospace. When infospace B binds infospace A as a
|
||||
discipline:
|
||||
|
||||
1. B's entities can reference A's entities in `## WoN Concept` (or
|
||||
equivalent) sections.
|
||||
2. Properties A has already computed on its entities — such as VSM system
|
||||
placement — become available to B by transitivity through the mapping.
|
||||
3. B can impose its own viability thresholds independently of A's. The two
|
||||
infospaces each pass or fail viability on their own terms.
|
||||
|
||||
The binding is declarative: a relative path in `infospace.yaml` plus a
|
||||
display name. No code. No import. The discipline is looked up on disk at
|
||||
the declared path when B's commands run.
|
||||
|
||||
---
|
||||
|
||||
## The viability pre-condition
|
||||
|
||||
Binding a non-viable infospace as a discipline is a mistake: a framework
|
||||
that fails its own thresholds is not a stable reference frame. Before
|
||||
binding, confirm the candidate discipline is viable:
|
||||
|
||||
```bash
|
||||
cd examples/infospace-with-history
|
||||
markitect infospace viability
|
||||
```
|
||||
|
||||
```
|
||||
Metric Value Threshold Status
|
||||
---------------------------------------------------------------
|
||||
redundancy_ratio 0.0061 max=0.1 PASS
|
||||
coverage_ratio 0.6190 min=0.4 PASS
|
||||
coherence_components 0.0000 max=3 PASS
|
||||
consistency_cycles 0.0000 max=0 PASS
|
||||
granularity_entropy 2.6748 min=1.0 PASS
|
||||
per_entity_mean 3.9556 min=3.5 PASS
|
||||
|
||||
Viable: YES (6/6 thresholds met)
|
||||
```
|
||||
|
||||
If the discipline is not viable, fix it first (see
|
||||
`examples/infospace-with-history/docs/advanced-usage.md` §4 for triaging
|
||||
low scorers).
|
||||
|
||||
---
|
||||
|
||||
## Example — how `supply-chain-vsm` binds WoN
|
||||
|
||||
The supply-chain infospace declares WoN as a discipline in its
|
||||
`infospace.yaml`:
|
||||
|
||||
```yaml
|
||||
topic:
|
||||
name: "Modern Supply Chain Management"
|
||||
domain: "Operations Management"
|
||||
sources: artifacts/sources/
|
||||
|
||||
disciplines:
|
||||
- name: "Wealth of Nations"
|
||||
path: ../infospace-with-history
|
||||
```
|
||||
|
||||
The binding is a **relative path**, so the two infospaces travel together
|
||||
(they can be moved as a pair without breaking the link).
|
||||
|
||||
Verify the binding resolves and the discipline is viable:
|
||||
|
||||
```bash
|
||||
cd examples/supply-chain-vsm
|
||||
markitect infospace disciplines
|
||||
```
|
||||
|
||||
```
|
||||
Name Entities Viable Path
|
||||
----------------------------------------------------------------------
|
||||
Wealth of Nations 988 YES ../infospace-with-history
|
||||
```
|
||||
|
||||
Each supply-chain entity then carries a `## WoN Concept` section
|
||||
mapping it to exactly one WoN entity. The consolidated mapping files
|
||||
(`output/mappings/*-mappings.md`) record the pairing, rationale, and a
|
||||
conceptual-continuity rating (Strong / Moderate / Weak):
|
||||
|
||||
| Supply Chain Entity | WoN Concept | Strength | VSM |
|
||||
|------------------------------|----------------------------------|----------|-------|
|
||||
| Demand Signal | Effectual Demand | Strong | S2 |
|
||||
| Vendor-Managed Inventory | Division of Labour | Strong | S1/S2 |
|
||||
| Just-in-Time Inventory | Circulating Capital | Strong | S1/S3 |
|
||||
| Bullwhip Effect | Natural Price as Central Price | Moderate | S2 |
|
||||
| Safety Stock | Accumulation of Stock | Moderate | S3 |
|
||||
|
||||
Because each WoN entity already has a VSM system placement (S1–S5), the
|
||||
supply-chain entities inherit a VSM position by transitivity through
|
||||
their mapping — without supply-chain-vsm needing its own VSM reference.
|
||||
|
||||
---
|
||||
|
||||
## Creating a new infospace that binds an existing one
|
||||
|
||||
Step-by-step, using WoN as the discipline for a hypothetical "Modern
|
||||
Monetary Policy" infospace:
|
||||
|
||||
### 1. Start from the target topic
|
||||
|
||||
```bash
|
||||
mkdir -p examples/monetary-policy/artifacts/sources
|
||||
cd examples/monetary-policy
|
||||
markitect infospace init
|
||||
```
|
||||
|
||||
### 2. Declare the discipline in `infospace.yaml`
|
||||
|
||||
```yaml
|
||||
topic:
|
||||
name: "Modern Monetary Policy"
|
||||
domain: "Macroeconomics"
|
||||
sources: artifacts/sources/
|
||||
|
||||
disciplines:
|
||||
- name: "Wealth of Nations"
|
||||
path: ../infospace-with-history
|
||||
```
|
||||
|
||||
Alternatively, bind imperatively after `init`:
|
||||
|
||||
```bash
|
||||
markitect infospace bind-discipline ../infospace-with-history --name "Wealth of Nations"
|
||||
```
|
||||
|
||||
### 3. Set your own viability thresholds
|
||||
|
||||
Copy the `viability:` block from a reference infospace and tune the
|
||||
numbers to the scale and maturity of your topic. A smaller infospace
|
||||
(50 entities, not 988) may need laxer `coverage_ratio` and stricter
|
||||
`redundancy_ratio`.
|
||||
|
||||
### 4. Verify the binding
|
||||
|
||||
```bash
|
||||
markitect infospace disciplines
|
||||
```
|
||||
|
||||
If `Viable` is `NO`, stop and fix the discipline before continuing.
|
||||
|
||||
### 5. Reference discipline entities in your own entities
|
||||
|
||||
For each entity in the new infospace, add a `## <Discipline> Concept`
|
||||
section that names the WoN entity the concept maps to, plus a rationale.
|
||||
The exact section heading is configured per schema — see
|
||||
`schemas/won-mapping-schema-v1.0.md` in `supply-chain-vsm` for the
|
||||
template used there.
|
||||
|
||||
### 6. Run checks and evaluate
|
||||
|
||||
```bash
|
||||
markitect infospace check
|
||||
markitect infospace evaluate --provider openrouter
|
||||
markitect infospace eval-summary --update-metrics
|
||||
markitect infospace viability
|
||||
```
|
||||
|
||||
The new infospace passes or fails viability independently of WoN.
|
||||
|
||||
---
|
||||
|
||||
## Why composition, not inclusion?
|
||||
|
||||
An alternative would be to copy WoN entities directly into the target
|
||||
infospace. Composition avoids that by design:
|
||||
|
||||
- **One source of truth** — if WoN is refined, every infospace that binds
|
||||
it picks up the improvement on the next run without a sync step.
|
||||
- **Separation of concerns** — each infospace owns its own schema,
|
||||
thresholds, and entity set. Changing the target topic cannot pollute
|
||||
the discipline.
|
||||
- **Bounded dependency** — the binding is a path, so the coupling is
|
||||
visible in one place (`infospace.yaml`) and easy to remove.
|
||||
|
||||
---
|
||||
|
||||
## See also
|
||||
|
||||
- `examples/supply-chain-vsm/README.md` — the full reference composition.
|
||||
- `examples/supply-chain-vsm/output/mappings/` — consolidated mapping
|
||||
files showing the rationale and strength rating for each pairing.
|
||||
- `examples/infospace-with-history/docs/advanced-usage.md` — patterns for
|
||||
maintaining the discipline once it is in use.
|
||||
141
docs/successor-gap-assessment.md
Normal file
141
docs/successor-gap-assessment.md
Normal file
@@ -0,0 +1,141 @@
|
||||
# markitect-main → Successor Repos: Gap Assessment
|
||||
|
||||
**Date:** 2026-05-23
|
||||
**Author:** Claude (custodian session)
|
||||
**Status:** Draft — awaiting Bernd's decisions on items A/B/C below
|
||||
|
||||
## Purpose
|
||||
|
||||
Bernd is retiring `markitect-main` and has transferred most functionality to
|
||||
sibling repos. This document identifies what was provided by `markitect-main`
|
||||
that is **not addressed** in those successors, and flags candidates that may
|
||||
not fit any successor's intent.
|
||||
|
||||
## Successor Ecosystem (5 repos, not 3)
|
||||
|
||||
| Repo | Role |
|
||||
|---|---|
|
||||
| `markitect-tool` | Markdown syntax layer + structured-document primitives; defines source-adapter and render-adapter contracts. CLI: `mkt`. |
|
||||
| `kontextual-engine` | Headless knowledge operations engine: artifacts, collections, persistence, relationships, workflow runs/manifests, query, quality/assessment, API. |
|
||||
| `infospace-bench` | Application layer — concrete infospaces, evaluation methodology, reference pilots. |
|
||||
| `markitect-filter` | Source-format ingestion adapters (`source.epub3`, `source.pdf`) implementing the markitect-tool source-adapter contract. |
|
||||
| `markitect-quarkdown` | Render/export adapter — implements the markitect-tool render-adapter contract via Quarkdown. |
|
||||
|
||||
## Method
|
||||
|
||||
Analysis is grounded in each successor's own assessment docs (recent, May 2026):
|
||||
|
||||
- `markitect-tool/docs/markitect-main-scope-assessment.md`
|
||||
- `kontextual-engine/docs/markitect-main-scope-assessment.md`
|
||||
- `kontextual-engine/docs/system-layer-extraction-inventory.md`
|
||||
- `kontextual-engine/docs/system-layer-migration-backlog.md`
|
||||
- `infospace-bench/docs/markitect-main-scope-assessment.md`
|
||||
- `infospace-bench/docs/legacy-infospace-feature-inventory.md`
|
||||
- `infospace-bench/docs/replacement-acceptance-matrix.md`
|
||||
|
||||
Cross-checked against actual `markitect-main` module sizing (Python LOC) and
|
||||
`__init__.py` docstrings.
|
||||
|
||||
**Confidence:** These successor docs are authoritative on *intent*. They have
|
||||
**not** been line-verified to confirm every "reimplement"-classified item
|
||||
actually landed in the successor. Where verification matters, it's flagged.
|
||||
|
||||
---
|
||||
|
||||
## A. Doesn't fit any successor's intent — needs a new home or explicit retirement
|
||||
|
||||
These are explicitly pushed away by tool/engine/bench and are unrelated to
|
||||
filter/quarkdown.
|
||||
|
||||
| markitect-main area | LOC | What it is | Status |
|
||||
|---|---|---|---|
|
||||
| `markitect/finance/` | ~8,100 | Cost-tracking system: cost items, period allocation to issues, financial reports, audit trails | **Orphan.** markitect-main's own SCOPE.md lists "financial transactions" as out-of-scope. Belongs with issue/project-ops, not knowledge tooling. |
|
||||
| `issue_tracker/` + `_issue-tracking/` + `.issues/` | ~1,200 | Issue tracking (finance allocates costs to these issues) | **Orphan to the five** — but likely already superseded by the `issue-facade` capability / `use-issues` skill. **Verify before retiring.** |
|
||||
| `markitect/profile/` | ~1,600 | User-profile CRUD, multi-profile, DB-backed | **Orphan.** Unrelated to all five. (Distinct from quarkdown's *render* "profile".) |
|
||||
| `markitect/production/` | ~3,800 | Deployment-readiness validation, cross-platform checks, perf benchmarking | Engine keeps only "structured error/audit *ideas*". Deployment-validation bulk is orphan. |
|
||||
| `tools/`, `services/`, gitea/tddai glue | ~5,500 | Project-ops tooling | Out-of-scope everywhere. |
|
||||
| `markitect/legacy/` + `legacy_compat.py` | ~2,700 | Backward-compat shims | Retire by definition. |
|
||||
|
||||
## B. Rendering / asset / plugin layer — only *partially* covered, real residual gap
|
||||
|
||||
**This is the most consequential gap.** `SCOPE.md` lists "Rendering: markdown
|
||||
→ interactive HTML via plugin system (testdrive-jsui)" as an in-scope
|
||||
capability of markitect-main.
|
||||
|
||||
| Area | LOC | Covered? |
|
||||
|---|---|---|
|
||||
| `markitect/plugins/` (generic processor/formatter/validator/exporter plugin system) | ~8,000 | **No.** tool defines a render-adapter *contract* and an *extension* point, but the general plugin runtime isn't carried. |
|
||||
| `markitect/assets/` (content-addressable asset store, dedup, `.mdpkg` ZIP packaging, symlink handling) + `asset_registry.json` (277 KB) | ~6,000 | **No.** Bench says "leave behind unless a concrete export needs assets." |
|
||||
| Interactive-HTML / testdrive-jsui rendering, `static/`, `themes/`, `templates/document.html`, JS UI | — | **Partial only.** quarkdown covers a *Quarkdown* export path; the interactive-HTML / JS-UI path has no home. |
|
||||
|
||||
**Decision needed:** spin these into a dedicated render/asset repo (sibling to
|
||||
quarkdown), fold the asset store into one of the existing repos, or retire the
|
||||
interactive-HTML path.
|
||||
|
||||
## C. The other "Information Space" lineage — `markitect/spaces/` (~11,000 LOC)
|
||||
|
||||
**Distinct from `markitect/infospace/`** (which infospace-bench inherited).
|
||||
`spaces/` is an older/parallel abstraction with features bench did *not* take:
|
||||
|
||||
- event-driven change tracking & notifications
|
||||
- persistent transclusion context with cross-space references
|
||||
- bidirectional directory synchronization
|
||||
- HTML rendering of spaces with caching/themes
|
||||
|
||||
Engine takes generic persistence concepts and bench takes infospace semantics,
|
||||
but **these specific `spaces/` behaviors (bidirectional sync, event
|
||||
notifications, cross-space transclusion context) aren't mapped anywhere.**
|
||||
|
||||
Likely intended as dead/superseded — but 11k LOC warrants an explicit "retire
|
||||
vs salvage" call.
|
||||
|
||||
## D. Declined-by-design (confirm retirement, don't re-extract)
|
||||
|
||||
| Area | LOC | Disposition |
|
||||
|---|---|---|
|
||||
| `markitect/graphql/` | ~4,000 | All three explicitly declined GraphQL ("evidence of API need, not a commitment"). |
|
||||
| `markitect/query_paradigms/` | ~3,500 | Engine/tool keep the *QueryResult envelope* concept but say "do not port the registry wholesale." |
|
||||
| `markitect/proxy/` | ~870 | Non-markdown→md proxy with checksum/freshness tracking. **Overlaps markitect-filter.** Freshness/staleness-tracking mechanism may be worth checking against bench's deferred "stale-mappings." |
|
||||
| `capabilities/` (top-level) | ~8,300 | Capability-packaging architecture; partially maps to tool (schema generation) but the packaging approach itself isn't carried. |
|
||||
|
||||
---
|
||||
|
||||
## What this means
|
||||
|
||||
The successors are, by their own assessments, **near complete for the
|
||||
in-scope core** (parsing/schema → tool; persistence/workflow → engine;
|
||||
infospace lifecycle → bench; ingestion → filter; one render path →
|
||||
quarkdown). The truly unaddressed functionality is almost entirely the stuff
|
||||
markitect-main accreted **beyond** its stated scope: finance, issue tracking,
|
||||
user profiles, production/deployment validation, the asset/plugin/interactive-HTML
|
||||
rendering stack, and the older `spaces/` abstraction.
|
||||
|
||||
## Decisions for Bernd
|
||||
|
||||
Three live decisions, not a long extraction backlog:
|
||||
|
||||
### Decision 1 — Render/asset stack (Section B)
|
||||
The one with genuine product value left.
|
||||
- **Option 1a:** new repo (sibling to quarkdown) for plugin runtime + asset store + interactive-HTML
|
||||
- **Option 1b:** fold the asset store into an existing repo (most likely markitect-tool, behind a flag); retire interactive-HTML
|
||||
- **Option 1c:** retire the interactive-HTML path entirely; trust quarkdown export as the single render story
|
||||
|
||||
### Decision 2 — `markitect/spaces/` (Section C)
|
||||
- **Option 2a:** salvage bidirectional-sync / event-tracking / cross-space transclusion into engine (engine has the persistence story to support it)
|
||||
- **Option 2b:** retire wholesale as superseded by infospace
|
||||
|
||||
### Decision 3 — Project-ops cluster (Section A: finance + issues + profile)
|
||||
- **Option 3a:** confirm `issue-facade` already replaces `issue_tracker/` + `finance/`; retire both
|
||||
- **Option 3b:** identify a home for any pieces worth keeping
|
||||
|
||||
---
|
||||
|
||||
## Suggested verification before deciding
|
||||
|
||||
If verification matters before committing:
|
||||
|
||||
- **For Decision 1:** grep the five repos for any render/asset adapter that already covers the HTML path beyond Quarkdown.
|
||||
- **For Decision 2:** check whether engine's `OperationRun` + collection model can express bidirectional-sync semantics, or whether new primitives would be needed.
|
||||
- **For Decision 3:** confirm whether `issue-facade` truly replaces `issue_tracker/` + `finance/` end-to-end.
|
||||
|
||||
Happy to do any of these focused passes when you're ready to decide.
|
||||
@@ -117,7 +117,7 @@ This graph enables:
|
||||
|
||||
```bash
|
||||
# Ensure MarkiTect is installed
|
||||
cd /path/to/markitect_project
|
||||
cd /path/to/markitect-main
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
|
||||
230
examples/infospace-with-history/docs/advanced-usage.md
Normal file
230
examples/infospace-with-history/docs/advanced-usage.md
Normal file
@@ -0,0 +1,230 @@
|
||||
# Advanced Usage — Wealth of Nations Infospace
|
||||
|
||||
Patterns for working with the WoN infospace (988 entities) after the initial
|
||||
pipeline run. Every command in this file has been run against the actual
|
||||
infospace at the time of writing (2026-04-21); output shapes are excerpted
|
||||
verbatim.
|
||||
|
||||
All commands assume `cwd = examples/infospace-with-history` and the
|
||||
`markitect-venv` Python environment.
|
||||
|
||||
---
|
||||
|
||||
## 1. Incremental evaluation — add entities after the initial run
|
||||
|
||||
`markitect infospace evaluate` writes one file per entity under
|
||||
`output/evaluations/<slug>.md`. It skips any entity whose evaluation file
|
||||
already exists, so re-running after adding a new entity processes only the
|
||||
new one.
|
||||
|
||||
```bash
|
||||
# Add a new entity file
|
||||
vim output/entities/new-concept.md
|
||||
|
||||
# Evaluate only the new entity (explicit)
|
||||
markitect infospace evaluate --entity new-concept --provider openrouter
|
||||
|
||||
# Or re-run the whole pass — existing 988 are skipped, only the new file hits the LLM
|
||||
markitect infospace evaluate --provider openrouter
|
||||
```
|
||||
|
||||
**How skip detection works.** Evaluation slugs are normalised to underscores
|
||||
with `_s_` preserving apostrophes (`farmers-capital` entity →
|
||||
`farmer_s_capital.md` evaluation). If a new entity slug collides with an
|
||||
existing evaluation under this normalisation, the eval will be skipped.
|
||||
To be sure an entity was picked up, check:
|
||||
|
||||
```bash
|
||||
# Count entities vs evaluations
|
||||
ls output/entities/*.md | grep -Ev 'book-[0-9]+-(chapter-[0-9]+|introduction)-' | wc -l
|
||||
ls output/evaluations/*.md | wc -l
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Re-evaluating after guideline changes
|
||||
|
||||
`evaluate` has no `--force` flag; re-evaluation requires deleting the
|
||||
existing file first.
|
||||
|
||||
```bash
|
||||
# Re-evaluate a single entity after updating the evaluation rubric
|
||||
rm output/evaluations/accumulation_of_stock.md
|
||||
markitect infospace evaluate --entity accumulation-of-stock --provider openrouter
|
||||
|
||||
# Re-evaluate a whole chapter
|
||||
ls output/entities/book-1-chapter-06-entities.md # see which entities the chapter produced
|
||||
# Map chapter entities to eval filenames (apostrophe/underscore normalisation) and rm them
|
||||
```
|
||||
|
||||
After re-evaluating, refresh the aggregate:
|
||||
|
||||
```bash
|
||||
markitect infospace eval-summary --update-metrics
|
||||
```
|
||||
|
||||
This merges `per_entity_mean` into `output/metrics/metrics.yaml` so the next
|
||||
`markitect infospace viability` check reflects the new scores.
|
||||
|
||||
---
|
||||
|
||||
## 3. Interpreting per-entity score distributions
|
||||
|
||||
`eval-summary` shows the mean for each of the five evaluation dimensions
|
||||
plus the overall range:
|
||||
|
||||
```
|
||||
$ markitect infospace eval-summary
|
||||
Evaluation summary — 985 entities evaluated
|
||||
|
||||
Dimension Mean
|
||||
--------------------------------------
|
||||
overall 3.956
|
||||
definition_precision 3.620
|
||||
domain_placement 4.559
|
||||
explanatory_value 3.936
|
||||
source_grounding 4.358
|
||||
vsm_relevance 3.305
|
||||
|
||||
Range: 1.00 – 4.80
|
||||
```
|
||||
|
||||
Interpretation:
|
||||
- `overall` above the 3.5 viability threshold → the collection passes
|
||||
`per_entity_mean`.
|
||||
- The lowest dimension (`vsm_relevance` = 3.305) is the weakest signal. If
|
||||
the collection is meant to be VSM-grounded, this is the dimension most
|
||||
worth improving (via sharper entity definitions or schema changes).
|
||||
- A wide range (1.00 – 4.80) tells you there are outliers at both ends —
|
||||
worth triaging (see pattern 4).
|
||||
|
||||
---
|
||||
|
||||
## 4. Triaging low scorers
|
||||
|
||||
`markitect infospace entities --by-type` prints each entity's star score
|
||||
in-line:
|
||||
|
||||
```
|
||||
$ markitect infospace entities --by-type | head
|
||||
=== Element (315 entities) ===
|
||||
active_and_productive_stock Accumulation S1 ★4.6
|
||||
advanced_state_of_society General Theory S5
|
||||
agio_of_bank_money Exchange S2 ★4.8
|
||||
```
|
||||
|
||||
Entities with no `★` have no evaluation yet. To list the lowest-scoring
|
||||
entities across the whole collection:
|
||||
|
||||
```bash
|
||||
# Extract overall_score from every evaluation file and sort ascending
|
||||
for f in output/evaluations/*.md; do
|
||||
score=$(awk '/^overall_score:/ {print $2; exit}' "$f")
|
||||
printf "%s\t%s\n" "$score" "$(basename "$f" .md)"
|
||||
done | sort -n | head -20
|
||||
```
|
||||
|
||||
The 20 lowest scorers are the natural triage list — inspect their
|
||||
`output/entities/<slug>.md` and evaluation rationales to decide whether to
|
||||
refine the entity, merge it with a better-formed neighbour, or drop it.
|
||||
|
||||
---
|
||||
|
||||
## 5. Reading and acting on collection-check output
|
||||
|
||||
`markitect infospace check` runs five concerns (C1–C5). Use `--concern` to
|
||||
focus on one and `--json` for machine-readable output:
|
||||
|
||||
```bash
|
||||
# Redundancy — which pairs of entities are suspiciously similar?
|
||||
markitect infospace check --concern redundancy --json
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"redundancy": {
|
||||
"concern": "C1",
|
||||
"redundancy_ratio": 0.0061,
|
||||
"similar_pairs": [
|
||||
{"entity_a": "bank_economic_contribution_metrics",
|
||||
"entity_b": "bank_economic_development_metrics",
|
||||
"similarity": 1.0, "method": "word_overlap"},
|
||||
{"entity_a": "economic_system_objectives",
|
||||
"entity_b": "economic_system_purpose",
|
||||
"similarity": 0.9394, "method": "word_overlap"}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Acting on this:
|
||||
- **Similarity = 1.0** is almost certainly a duplicate — pick one slug and
|
||||
merge or delete the other.
|
||||
- **0.85–0.99** usually means two entities genuinely cover the same idea
|
||||
with slight phrasing differences. Merging is the cleanest fix.
|
||||
- **< 0.85** usually represents legitimate adjacent concepts — leave as-is
|
||||
unless the definition rubric says otherwise.
|
||||
|
||||
For coverage and coherence, the pattern is the same: the `--json` output
|
||||
surfaces the specific entities / missing links / disconnected components
|
||||
you need to look at, rather than a bare ratio.
|
||||
|
||||
---
|
||||
|
||||
## 5. Systematic processing of long texts
|
||||
|
||||
For long source material (books, multi-chapter specifications, corpora), the
|
||||
pipeline can produce a clean chapter-by-chapter git history on its own if
|
||||
you let it. The pattern:
|
||||
|
||||
```bash
|
||||
# Process all sources in canonical order, eval and classify per chapter,
|
||||
# snapshot metrics after each chapter.
|
||||
markitect infospace process --all \
|
||||
--provider openrouter \
|
||||
--eval-after-source \
|
||||
--classify-after-source \
|
||||
--check-after-each
|
||||
```
|
||||
|
||||
What you get:
|
||||
|
||||
- **One commit per source file**, not per batch run. The commit message body
|
||||
lists counts by bucket (`entities: +23`, `evaluations: +23`,
|
||||
`classifications: +23`) derived from the actual staged diff, so `git log`
|
||||
reads like the story of the infospace growing.
|
||||
- **Chapter-atomic commits.** `--eval-after-source` and
|
||||
`--classify-after-source` evaluate and classify *only the new entities*
|
||||
from the just-processed source before the commit lands, so each commit is
|
||||
a self-contained chapter snapshot.
|
||||
- **Metrics-per-chapter trail.** `--check-after-each` appends a snapshot to
|
||||
`output/metrics/history.yaml` after every chapter, so `markitect infospace
|
||||
history` later shows the metric trajectory rather than just start/end.
|
||||
|
||||
**Cost tradeoff.** `--eval-after-source` pays LLM latency per chapter rather
|
||||
than amortising it across one bulk batch. It's worth it when you care about
|
||||
the git history or want early quality signal, not when you're bulk-backfilling
|
||||
a known-good corpus.
|
||||
|
||||
**Triage during the run.** While processing, use `markitect infospace
|
||||
chapters` in another shell to see per-source entity/eval/classify counts and
|
||||
mean scores — handy for spotting chapters that under-extracted or evaluated
|
||||
poorly.
|
||||
|
||||
```
|
||||
$ markitect infospace chapters
|
||||
source entities evaluated classified mean_score
|
||||
------------------- -------- --------- ---------- ----------
|
||||
book-1-chapter-01 96 96 79 4.22
|
||||
book-1-chapter-02 16 16 10 4.06
|
||||
…
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## See also
|
||||
|
||||
- `METRICS-METHODOLOGY.md` — how each metric is computed.
|
||||
- `docs/composition-guide.md` — using this infospace as a discipline for a
|
||||
different domain.
|
||||
- `docs/performance-notes.md` — observed timings and provider choices.
|
||||
106
examples/infospace-with-history/docs/performance-notes.md
Normal file
106
examples/infospace-with-history/docs/performance-notes.md
Normal file
@@ -0,0 +1,106 @@
|
||||
# Performance Notes — Wealth of Nations Infospace
|
||||
|
||||
Observed timings, file sizes, and provider choices from the 988-entity WoN
|
||||
example. These are **operational notes**, not a benchmark — numbers come
|
||||
from the actual S3.3 evaluation run (2026-02-23) rather than a controlled
|
||||
experiment.
|
||||
|
||||
---
|
||||
|
||||
## Evaluation batch duration
|
||||
|
||||
The initial evaluation pass produced 985 `output/evaluations/*.md` files:
|
||||
|
||||
- First `evaluated_at`: `2026-02-23T00:11:52`
|
||||
- Last `evaluated_at`: `2026-02-23T06:39:45`
|
||||
- **Total wall time: ~6h 28m**
|
||||
- **Effective throughput: ~2.5 entities/min** (~152 entities/hour)
|
||||
|
||||
Extracted from evaluation frontmatter:
|
||||
```bash
|
||||
grep -h '^evaluated_at:' output/evaluations/*.md | sort | sed -n '1p;$p'
|
||||
```
|
||||
|
||||
Caveats:
|
||||
- This was against OpenRouter's free tier, which applies implicit
|
||||
rate-limiting and occasional retries.
|
||||
- Throughput is not constant — gaps between bursts show up as plateaus
|
||||
when you plot the timestamps.
|
||||
- The batch was not fully parallelised; a tuned concurrent client could
|
||||
likely 2–4× this throughput on a paid OpenRouter tier.
|
||||
|
||||
---
|
||||
|
||||
## Tokens per entity (estimate)
|
||||
|
||||
Direct token counts are not logged in the evaluation files, but the
|
||||
inputs and outputs are on disk:
|
||||
|
||||
- **Input per request**: evaluation schema (~3.7 KB) + entity file
|
||||
(~0.7 KB median) + fixed system prompt ≈ **~1500–2500 tokens in**
|
||||
- **Output per request**: structured evaluation with 5 dimensions and
|
||||
rationales, median eval file 3.6 KB ≈ **~600–800 tokens out**
|
||||
- **Round-trip total**: **~2000–3000 tokens per entity**
|
||||
- **Batch total estimate**: 985 entities × ~2500 tokens ≈ **~2.5M tokens**
|
||||
for the full pass
|
||||
|
||||
The constant per-entity input means the cheapest way to reduce spend on a
|
||||
re-run is to narrow the targeted entities (`--entity <slug>` or
|
||||
`--chapter <n>`), not to shorten the schema.
|
||||
|
||||
---
|
||||
|
||||
## Embedding cache and collection checks
|
||||
|
||||
`markitect infospace check --concern redundancy` supports two similarity
|
||||
backends (see `markitect/infospace/checks/redundancy.py`):
|
||||
|
||||
- **`word_overlap`** — the default, used when no embeddings are provided.
|
||||
Pure-Python set intersection over tokenised entity text. **No LLM calls,
|
||||
no cache needed.** This is what the current WoN check runs.
|
||||
- **`embedding`** — active when a pre-computed `{slug: vector}` mapping is
|
||||
passed in. No persistent on-disk embedding cache exists today; the
|
||||
caller is responsible for computing and supplying the vectors.
|
||||
|
||||
Implication: the 988-entity `check` runs in seconds because it's all
|
||||
word-overlap. Switching to embedding similarity would add an embedding
|
||||
API pass (another ~988 requests) which is currently a manual step
|
||||
outside the CLI.
|
||||
|
||||
---
|
||||
|
||||
## Provider choice — recommendation
|
||||
|
||||
For the WoN dataset specifically (text-heavy entities, 5-dimension
|
||||
rubric):
|
||||
|
||||
| Scale | Recommended provider | Rationale |
|
||||
|-----------------------|----------------------------------|-----------|
|
||||
| < 50 entities | `gemini/gemini-2.5-flash` | Fast default; free tier is generous enough; consistent with `markitect llm-check` out of the box. |
|
||||
| 50 – 1000 entities | `openrouter` with a `:free` model (e.g. `arcee-ai/trinity-large-preview:free`) | What the S3.3 batch used; gets through 988 entities in one overnight run without cost. |
|
||||
| > 1000 entities | `openrouter` with a paid small-context model, or `openai` | Free-tier rate limits start to dominate wall time; paying for higher concurrency is cheaper than calendar time. |
|
||||
|
||||
All providers are accepted by `markitect infospace evaluate --provider`.
|
||||
The evaluation schema doesn't assume any provider-specific features.
|
||||
|
||||
Note on provider mixing: if part of a collection is evaluated under one
|
||||
provider/model and the rest under another, `per_entity_mean` can drift
|
||||
slightly (different models calibrate scores differently). For the
|
||||
viability threshold of 3.5 the drift is usually negligible, but for
|
||||
fine-grained outlier analysis prefer a single provider per batch.
|
||||
|
||||
---
|
||||
|
||||
## What is *not* measured here
|
||||
|
||||
- **End-to-end pipeline time** (entity extraction from raw chapters,
|
||||
classification, relation graph) — only the evaluation phase is timed.
|
||||
- **Memory footprint** — the full in-memory state for 988 entities is
|
||||
small (< 200 MB observed), but not systematically measured.
|
||||
- **Failure/retry rates** — the 985 vs 988 gap is three entities the
|
||||
original run missed (plus one added later); no structured retry log
|
||||
was kept.
|
||||
|
||||
Expanding any of these into a proper benchmark is **out of scope** for
|
||||
the WoN example and should live alongside a synthetic corpus that can be
|
||||
regenerated deterministically.
|
||||
@@ -0,0 +1,28 @@
|
||||
---
|
||||
entity_slug: advanced_state_of_society
|
||||
evaluator: gemini-2.5-flash
|
||||
evaluated_at: '2026-04-21T21:32:17.135192'
|
||||
overall_score: 4.5
|
||||
scores:
|
||||
- name: definition_precision
|
||||
value: 4.0
|
||||
max_value: 5.0
|
||||
rationale: The definition is precise, listing key characteristics like accumulated
|
||||
stock and private property. It clearly distinguishes the concept by contrasting
|
||||
it with earlier economic conditions.
|
||||
- name: source_grounding
|
||||
value: 5.0
|
||||
max_value: 5.0
|
||||
rationale: This entity is deeply grounded in Smith's work, particularly in Book
|
||||
I
|
||||
---
|
||||
|
||||
# Evaluation: Advanced State Of Society
|
||||
|
||||
## definition_precision — 4.0 / 5.0
|
||||
|
||||
The definition is precise, listing key characteristics like accumulated stock and private property. It clearly distinguishes the concept by contrasting it with earlier economic conditions.
|
||||
|
||||
## source_grounding — 5.0 / 5.0
|
||||
|
||||
This entity is deeply grounded in Smith's work, particularly in Book I
|
||||
@@ -0,0 +1,61 @@
|
||||
---
|
||||
entity_slug: bank_notes
|
||||
evaluator: null
|
||||
evaluated_at: '2026-04-21T21:33:16.736926'
|
||||
overall_score: 4.4
|
||||
scores:
|
||||
- name: definition_precision
|
||||
value: 5.0
|
||||
max_value: 5.0
|
||||
rationale: The definition is precise, clearly distinguishing bank notes by their
|
||||
issuer, form, and key characteristics (payable on demand, confidence-based). It
|
||||
avoids circularity and captures a distinct concept.
|
||||
- name: source_grounding
|
||||
value: 5.0
|
||||
max_value: 5.0
|
||||
rationale: The entity is excellently grounded in "The Wealth of Nations," specifically
|
||||
Book II, Chapter 2, where Smith extensively discusses bank notes' role in economizing
|
||||
precious metals and their reliance on public confidence.
|
||||
- name: domain_placement
|
||||
value: 4.0
|
||||
max_value: 5.0
|
||||
rationale: '"Exchange" is an appropriate domain as bank notes primarily function
|
||||
as a medium for facilitating transactions. While "Money" or "Finance" could also
|
||||
fit, "Exchange" accurately reflects their operational role in the economy.'
|
||||
- name: vsm_relevance
|
||||
value: 3.0
|
||||
max_value: 5.0
|
||||
rationale: Bank notes are a critical *medium* or *tool* that enables the primary
|
||||
operations (S1) of an economy (i.e., exchange of goods and services). However,
|
||||
they are not a VSM system or management function themselves, making their direct
|
||||
mapping somewhat abstract.
|
||||
- name: explanatory_value
|
||||
value: 5.0
|
||||
max_value: 5.0
|
||||
rationale: This entity offers significant explanatory power by detailing how paper
|
||||
money functions, its reliance on confidence, and its role in reducing the need
|
||||
for precious metals, thereby illuminating a key mechanism in Smith's economic
|
||||
theory.
|
||||
---
|
||||
|
||||
# Evaluation: Bank Notes
|
||||
|
||||
## definition_precision — 5.0 / 5.0
|
||||
|
||||
The definition is precise, clearly distinguishing bank notes by their issuer, form, and key characteristics (payable on demand, confidence-based). It avoids circularity and captures a distinct concept.
|
||||
|
||||
## source_grounding — 5.0 / 5.0
|
||||
|
||||
The entity is excellently grounded in "The Wealth of Nations," specifically Book II, Chapter 2, where Smith extensively discusses bank notes' role in economizing precious metals and their reliance on public confidence.
|
||||
|
||||
## domain_placement — 4.0 / 5.0
|
||||
|
||||
"Exchange" is an appropriate domain as bank notes primarily function as a medium for facilitating transactions. While "Money" or "Finance" could also fit, "Exchange" accurately reflects their operational role in the economy.
|
||||
|
||||
## vsm_relevance — 3.0 / 5.0
|
||||
|
||||
Bank notes are a critical *medium* or *tool* that enables the primary operations (S1) of an economy (i.e., exchange of goods and services). However, they are not a VSM system or management function themselves, making their direct mapping somewhat abstract.
|
||||
|
||||
## explanatory_value — 5.0 / 5.0
|
||||
|
||||
This entity offers significant explanatory power by detailing how paper money functions, its reliance on confidence, and its role in reducing the need for precious metals, thereby illuminating a key mechanism in Smith's economic theory.
|
||||
@@ -0,0 +1,60 @@
|
||||
---
|
||||
entity_slug: bank_systemic_risk_management
|
||||
evaluator: gemini-2.5-flash-lite
|
||||
evaluated_at: '2026-04-21T21:49:35.222637'
|
||||
overall_score: 4.0
|
||||
scores:
|
||||
- name: definition_precision
|
||||
value: 4.0
|
||||
max_value: 5.0
|
||||
rationale: The definition is precise and clearly outlines the purpose of bank systemic
|
||||
risk management. It avoids being an overly broad umbrella term.
|
||||
- name: source_grounding
|
||||
value: 3.0
|
||||
max_value: 5.0
|
||||
rationale: While the concept of managing risks to the banking system is present
|
||||
in Book II, Chapter 2, the explicit framing of "systemic risk management" as a
|
||||
distinct entity with specific practices might be a slight abstraction beyond Smith's
|
||||
direct terminology.
|
||||
- name: domain_placement
|
||||
value: 5.0
|
||||
max_value: 5.0
|
||||
rationale: The "Regulation" domain is highly appropriate. Managing systemic risk
|
||||
is fundamentally a regulatory concern aimed at ensuring the stability of the financial
|
||||
system.
|
||||
- name: vsm_relevance
|
||||
value: 4.0
|
||||
max_value: 5.0
|
||||
rationale: This entity strongly maps to VSM System 3 (Internal Regulation/Audit)
|
||||
as it involves monitoring and controlling internal operations to prevent systemic
|
||||
failures. It also has elements of System 5 (Policy) in setting overall stability
|
||||
goals.
|
||||
- name: explanatory_value
|
||||
value: 4.0
|
||||
max_value: 5.0
|
||||
rationale: The entity provides good explanatory value by highlighting a crucial
|
||||
mechanism for maintaining financial stability. It explains *how* the banking system
|
||||
can be protected from cascading failures.
|
||||
---
|
||||
|
||||
# Evaluation: Bank Systemic Risk Management
|
||||
|
||||
## definition_precision — 4.0 / 5.0
|
||||
|
||||
The definition is precise and clearly outlines the purpose of bank systemic risk management. It avoids being an overly broad umbrella term.
|
||||
|
||||
## source_grounding — 3.0 / 5.0
|
||||
|
||||
While the concept of managing risks to the banking system is present in Book II, Chapter 2, the explicit framing of "systemic risk management" as a distinct entity with specific practices might be a slight abstraction beyond Smith's direct terminology.
|
||||
|
||||
## domain_placement — 5.0 / 5.0
|
||||
|
||||
The "Regulation" domain is highly appropriate. Managing systemic risk is fundamentally a regulatory concern aimed at ensuring the stability of the financial system.
|
||||
|
||||
## vsm_relevance — 4.0 / 5.0
|
||||
|
||||
This entity strongly maps to VSM System 3 (Internal Regulation/Audit) as it involves monitoring and controlling internal operations to prevent systemic failures. It also has elements of System 5 (Policy) in setting overall stability goals.
|
||||
|
||||
## explanatory_value — 4.0 / 5.0
|
||||
|
||||
The entity provides good explanatory value by highlighting a crucial mechanism for maintaining financial stability. It explains *how* the banking system can be protected from cascading failures.
|
||||
@@ -3,7 +3,7 @@ consistency_cycles: 0.0
|
||||
coverage_ratio: 0.619048
|
||||
granularity_entropy: 2.674752
|
||||
modularity: 0.0
|
||||
per_entity_mean: 3.955635
|
||||
per_entity_mean: 3.95668
|
||||
redundancy_ratio: 0.006073
|
||||
type_distribution:
|
||||
Element: 315
|
||||
|
||||
@@ -240,8 +240,14 @@ def llm_catalog(output_format):
|
||||
)
|
||||
def llm_check(provider, model):
|
||||
"""Send a minimal prompt to verify a provider is reachable and responding."""
|
||||
import os
|
||||
|
||||
from markitect.llm import create_adapter
|
||||
from markitect.llm.exceptions import LLMConfigurationError, LLMError
|
||||
from markitect.llm.exceptions import (
|
||||
LLMAPIError,
|
||||
LLMConfigurationError,
|
||||
LLMError,
|
||||
)
|
||||
from markitect.prompts.execution.models import RunConfig
|
||||
|
||||
resolved = resolve_llm(cli_provider=provider, cli_model=model)
|
||||
@@ -252,6 +258,17 @@ def llm_check(provider, model):
|
||||
f" model from: {resolved.model_source}"
|
||||
)
|
||||
|
||||
# Advisory: OPENROUTER_API_KEY is set but this call won't use it. Common
|
||||
# source of "works for me, fails for agents" when the env var holds a
|
||||
# stale key that overrides a clean config entry.
|
||||
if resolved.provider != "openrouter" and os.environ.get("OPENROUTER_API_KEY"):
|
||||
click.echo(
|
||||
" note: OPENROUTER_API_KEY is set but won't be used for this "
|
||||
"provider. If OpenRouter calls fail elsewhere with 401, the env "
|
||||
"var may be stale — unset or update it.",
|
||||
err=True,
|
||||
)
|
||||
|
||||
try:
|
||||
adapter = create_adapter(
|
||||
provider=resolved.provider,
|
||||
@@ -273,6 +290,19 @@ def llm_check(provider, model):
|
||||
except LLMError as exc:
|
||||
elapsed = time.monotonic() - start
|
||||
click.echo(f"ERROR \u2014 LLM error after {elapsed:.1f}s: {exc}", err=True)
|
||||
# Targeted hint: 401 on openrouter almost always means a stale key.
|
||||
if (
|
||||
resolved.provider == "openrouter"
|
||||
and isinstance(exc, LLMAPIError)
|
||||
and exc.status_code == 401
|
||||
):
|
||||
click.echo(
|
||||
" hint: OpenRouter returned 401 (unauthorized). Check whether "
|
||||
"OPENROUTER_API_KEY is stale (`unset OPENROUTER_API_KEY` to "
|
||||
"fall back to the key in ~/.config/markitect/config.toml, or "
|
||||
"update the env var).",
|
||||
err=True,
|
||||
)
|
||||
sys.exit(1)
|
||||
except Exception as exc:
|
||||
elapsed = time.monotonic() - start
|
||||
|
||||
@@ -7,8 +7,9 @@ inspecting, and evaluating infospaces.
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
from typing import Dict, Optional
|
||||
|
||||
import click
|
||||
|
||||
@@ -228,6 +229,227 @@ def _entities_by_type(cfg, root: "Path", entity_list: list) -> None:
|
||||
click.echo(f"\nTotal: {total} entities")
|
||||
|
||||
|
||||
# ── chapters (per-source triage view) ────────────────────────────────
|
||||
|
||||
|
||||
@infospace_commands.command()
|
||||
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
|
||||
@click.option(
|
||||
"--format", "output_format",
|
||||
type=click.Choice(["text", "json"]),
|
||||
default="text",
|
||||
help="Output format.",
|
||||
)
|
||||
def chapters(config_path: Optional[str], output_format: str):
|
||||
"""List source files in canonical order with per-source stats.
|
||||
|
||||
For each source file in the sources directory, reports entity count,
|
||||
mean per-entity score (if evaluated), classification coverage, and
|
||||
processing status. Useful for triaging long-text infospaces.
|
||||
"""
|
||||
cfg, cfg_path = _load_config_or_exit(config_path)
|
||||
root = cfg_path.parent
|
||||
|
||||
sources_dir = root / cfg.topic.sources if cfg.topic.sources else root
|
||||
if not sources_dir.is_dir():
|
||||
click.echo(f"No sources directory at {sources_dir}.", err=True)
|
||||
raise SystemExit(1)
|
||||
|
||||
source_files = sorted(sources_dir.glob("*.md"))
|
||||
if not source_files:
|
||||
click.echo(f"No source files in {sources_dir}.", err=True)
|
||||
raise SystemExit(1)
|
||||
|
||||
entities_dir = root / cfg.entities_dir
|
||||
entity_list = (
|
||||
parse_entity_directory(entities_dir) if entities_dir.is_dir() else []
|
||||
)
|
||||
|
||||
# Build a source_id → [entities] map using the source_chapter field.
|
||||
# Matching is lenient: entities with a source_chapter substring-equal
|
||||
# to a normalized form of the source stem count as belonging to it.
|
||||
def _chapter_keys(source_id: str) -> list:
|
||||
"""Return strings an entity's source_chapter might contain."""
|
||||
keys = [source_id, source_id.replace("-", " ")]
|
||||
m = re.match(r"book-(\d+)-chapter-(\d+)", source_id)
|
||||
if m:
|
||||
book, chap = m.group(1), m.group(2)
|
||||
roman = {"1": "I", "2": "II", "3": "III", "4": "IV", "5": "V"}
|
||||
if book in roman:
|
||||
keys.append(f"Book {roman[book]}, Chapter {int(chap)}")
|
||||
keys.append(f"Book {roman[book]} Chapter {int(chap)}")
|
||||
return keys
|
||||
|
||||
# Precompute evaluation scores and classification slugs once.
|
||||
evals_dir = root / cfg.evaluations_dir
|
||||
cls_dir = root / cfg.classifications_dir
|
||||
eval_scores: Dict[str, float] = {}
|
||||
if evals_dir.is_dir():
|
||||
from markitect.infospace.evaluation_io import read_entity_evaluation
|
||||
for ev_path in evals_dir.glob("*.md"):
|
||||
try:
|
||||
ev = read_entity_evaluation(ev_path)
|
||||
if ev.overall_score is not None:
|
||||
eval_scores[ev_path.stem] = ev.overall_score
|
||||
except Exception:
|
||||
continue
|
||||
classified_slugs = (
|
||||
{p.stem for p in cls_dir.glob("*.md")} if cls_dir.is_dir() else set()
|
||||
)
|
||||
|
||||
rows = []
|
||||
for source_file in source_files:
|
||||
source_id = source_file.stem
|
||||
keys = _chapter_keys(source_id)
|
||||
matched = [
|
||||
e for e in entity_list
|
||||
if any(k.lower() in (e.source_chapter or "").lower() for k in keys)
|
||||
]
|
||||
slugs = {e.slug for e in matched}
|
||||
evaluated = slugs & set(eval_scores)
|
||||
classified = slugs & classified_slugs
|
||||
mean = (
|
||||
sum(eval_scores[s] for s in evaluated) / len(evaluated)
|
||||
if evaluated else None
|
||||
)
|
||||
rows.append({
|
||||
"source_id": source_id,
|
||||
"entities": len(matched),
|
||||
"evaluated": len(evaluated),
|
||||
"classified": len(classified),
|
||||
"mean_score": round(mean, 2) if mean is not None else None,
|
||||
})
|
||||
|
||||
if output_format == "json":
|
||||
import json
|
||||
click.echo(json.dumps(rows, indent=2))
|
||||
return
|
||||
|
||||
# Text: aligned table.
|
||||
headers = ("source", "entities", "evaluated", "classified", "mean_score")
|
||||
widths = [
|
||||
max(len(h), max((len(str(r[h.replace(' ', '_')])) if h != "source"
|
||||
else len(r["source_id"]))
|
||||
for r in rows)) if rows else len(h)
|
||||
for h in headers
|
||||
]
|
||||
fmt = " ".join(f"{{:<{w}}}" for w in widths)
|
||||
click.echo(fmt.format(*headers))
|
||||
click.echo(fmt.format(*("-" * w for w in widths)))
|
||||
for r in rows:
|
||||
click.echo(fmt.format(
|
||||
r["source_id"],
|
||||
r["entities"],
|
||||
r["evaluated"],
|
||||
r["classified"],
|
||||
"-" if r["mean_score"] is None else f"{r['mean_score']:.2f}",
|
||||
))
|
||||
totals = {
|
||||
"entities": sum(r["entities"] for r in rows),
|
||||
"evaluated": sum(r["evaluated"] for r in rows),
|
||||
"classified": sum(r["classified"] for r in rows),
|
||||
}
|
||||
click.echo(
|
||||
f"\n{len(rows)} source file(s); "
|
||||
f"{totals['entities']} entities, "
|
||||
f"{totals['evaluated']} evaluated, "
|
||||
f"{totals['classified']} classified."
|
||||
)
|
||||
|
||||
|
||||
# ── entity (single lookup) ───────────────────────────────────────────
|
||||
|
||||
|
||||
@infospace_commands.command()
|
||||
@click.argument("name")
|
||||
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
|
||||
def entity(name: str, config_path: Optional[str]):
|
||||
"""Look up one entity by name, tolerating case / hyphens / underscores.
|
||||
|
||||
Prints slug, source path, domain, chapter, word count, overall score,
|
||||
VSM system (if classified), and evaluation-file path.
|
||||
"""
|
||||
cfg, cfg_path = _load_config_or_exit(config_path)
|
||||
root = cfg_path.parent
|
||||
entities_dir = root / cfg.entities_dir
|
||||
|
||||
if not entities_dir.is_dir():
|
||||
click.echo("No entities directory found.", err=True)
|
||||
raise SystemExit(1)
|
||||
|
||||
entity_list = parse_entity_directory(entities_dir)
|
||||
if not entity_list:
|
||||
click.echo("No entities found.", err=True)
|
||||
raise SystemExit(1)
|
||||
|
||||
# Normalize: lowercase, underscores.
|
||||
def norm(s: str) -> str:
|
||||
return s.lower().replace("-", "_").replace(" ", "_")
|
||||
|
||||
target = norm(name)
|
||||
by_slug = {e.slug: e for e in entity_list}
|
||||
|
||||
match = by_slug.get(target)
|
||||
if match is None:
|
||||
# Substring fallback for partial input.
|
||||
candidates = [e for e in entity_list if target in norm(e.slug)]
|
||||
if len(candidates) == 1:
|
||||
match = candidates[0]
|
||||
elif len(candidates) > 1:
|
||||
click.echo(f"Ambiguous — '{name}' matches multiple entities:", err=True)
|
||||
for c in sorted(candidates, key=lambda e: e.slug)[:10]:
|
||||
click.echo(f" {c.slug}", err=True)
|
||||
if len(candidates) > 10:
|
||||
click.echo(f" … and {len(candidates) - 10} more", err=True)
|
||||
raise SystemExit(1)
|
||||
else:
|
||||
click.echo(f"No entity matching '{name}'.", err=True)
|
||||
near = sorted(
|
||||
e.slug for e in entity_list
|
||||
if target.split("_", 1)[0] in e.slug
|
||||
)[:5]
|
||||
if near:
|
||||
click.echo(f" Near matches: {', '.join(near)}", err=True)
|
||||
raise SystemExit(1)
|
||||
|
||||
# Load score + classification (best-effort).
|
||||
score: Optional[float] = None
|
||||
evaluator: Optional[str] = None
|
||||
eval_file = root / cfg.evaluations_dir / f"{match.slug}.md"
|
||||
if eval_file.is_file():
|
||||
try:
|
||||
from markitect.infospace.evaluation_io import read_entity_evaluation
|
||||
ev = read_entity_evaluation(eval_file)
|
||||
score = ev.overall_score
|
||||
evaluator = ev.evaluator
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
vsm: Optional[str] = None
|
||||
cls_file = root / cfg.classifications_dir / f"{match.slug}.md"
|
||||
if cls_file.is_file():
|
||||
try:
|
||||
from markitect.infospace.classification_io import read_entity_classification
|
||||
cls = read_entity_classification(cls_file)
|
||||
vsm = cls.vsm_system
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
# Output — one field per line so it's easy to grep or pipe.
|
||||
click.echo(f"slug: {match.slug}")
|
||||
click.echo(f"source_path: {match.source_path}")
|
||||
click.echo(f"domain: {match.domain or '-'}")
|
||||
click.echo(f"chapter: {match.source_chapter or '-'}")
|
||||
click.echo(f"word_count: {match.total_word_count}")
|
||||
click.echo(f"vsm_system: {vsm or '-'}")
|
||||
if score is not None:
|
||||
click.echo(f"overall_score: {score:.2f}")
|
||||
click.echo(f"evaluator: {evaluator or '-'}")
|
||||
click.echo(f"evaluation: {eval_file}")
|
||||
else:
|
||||
click.echo("evaluation: (not yet evaluated)")
|
||||
|
||||
|
||||
# ── evaluate ─────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@@ -237,7 +459,14 @@ def _entities_by_type(cfg, root: "Path", entity_list: list) -> None:
|
||||
@click.option("--model", default=None, help="LLM model name.")
|
||||
@click.option("--entity", "entity_slug", default=None, help="Evaluate a single entity by slug.")
|
||||
@click.option("--chapter", default=None, help="Evaluate entities from a specific chapter.")
|
||||
def evaluate(config_path, provider, model, entity_slug, chapter):
|
||||
@click.option("--force", is_flag=True, default=False,
|
||||
help="Re-evaluate entities whose evaluation file already exists.")
|
||||
@click.option("--model-fallback", "model_fallback", default=None,
|
||||
help="If the primary model hits a rate limit (429), retry the "
|
||||
"failed entities once with this model. Useful on free tiers "
|
||||
"where models have separate quota buckets (e.g. "
|
||||
"gemini-2.5-flash → gemini-2.5-flash-lite).")
|
||||
def evaluate(config_path, provider, model, entity_slug, chapter, force, model_fallback):
|
||||
"""Evaluate entities using LLM-based quality assessment."""
|
||||
cfg, cfg_path = _load_config_or_exit(config_path)
|
||||
root = cfg_path.parent
|
||||
@@ -252,32 +481,44 @@ def evaluate(config_path, provider, model, entity_slug, chapter):
|
||||
click.echo("No entities to evaluate.")
|
||||
return
|
||||
|
||||
# Filter
|
||||
# Filter. Accept hyphenated input for --entity by normalizing to the
|
||||
# underscore slug format produced by parse_entity_directory.
|
||||
if entity_slug:
|
||||
entity_list = [e for e in entity_list if e.slug == entity_slug]
|
||||
if not entity_list:
|
||||
click.echo(f"Error: Entity '{entity_slug}' not found.", err=True)
|
||||
normalized = entity_slug.replace("-", "_")
|
||||
matches = [e for e in entity_list if e.slug == normalized]
|
||||
if not matches:
|
||||
# Build a short "did you mean…" list from entities sharing a stem.
|
||||
stem = normalized.split("_", 1)[0]
|
||||
near = sorted(e.slug for e in entity_list if e.slug.startswith(stem))[:5]
|
||||
msg = f"Error: Entity '{entity_slug}' not found."
|
||||
if near:
|
||||
msg += f" Did you mean: {', '.join(near)} ?"
|
||||
click.echo(msg, err=True)
|
||||
raise SystemExit(1)
|
||||
entity_list = matches
|
||||
elif chapter:
|
||||
entity_list = [e for e in entity_list if chapter in e.source_chapter]
|
||||
if not entity_list:
|
||||
click.echo(f"No entities found for chapter '{chapter}'.")
|
||||
return
|
||||
|
||||
# Skip entities that already have evaluation files (incremental resume)
|
||||
# Skip entities that already have evaluation files (incremental resume).
|
||||
# Applies uniformly to full-pass, --entity, and --chapter runs unless
|
||||
# --force is set.
|
||||
from markitect.infospace.evaluate import run_entity_evaluation
|
||||
output_dir = root / cfg.evaluations_dir
|
||||
if not entity_slug and not chapter and output_dir.is_dir():
|
||||
previous_digests = {
|
||||
p.stem: "" # non-empty sentinel → triggers skip in BatchEvaluator
|
||||
for p in output_dir.glob("*.md")
|
||||
}
|
||||
entity_list = [e for e in entity_list if e.slug not in previous_digests]
|
||||
if not force and output_dir.is_dir():
|
||||
existing = {p.stem for p in output_dir.glob("*.md")}
|
||||
before = len(entity_list)
|
||||
entity_list = [e for e in entity_list if e.slug not in existing]
|
||||
skipped = before - len(entity_list)
|
||||
if not entity_list:
|
||||
click.echo("All entities already evaluated. Nothing to do.")
|
||||
click.echo("All selected entities already evaluated. "
|
||||
"Re-run with --force to overwrite.")
|
||||
return
|
||||
if previous_digests:
|
||||
click.echo(f"Skipping {len(previous_digests)} already-evaluated entities.")
|
||||
if skipped:
|
||||
click.echo(f"Skipping {skipped} already-evaluated entities. "
|
||||
"Use --force to re-evaluate.")
|
||||
|
||||
# Create adapter
|
||||
from markitect.llm import create_adapter
|
||||
@@ -285,10 +526,14 @@ def evaluate(config_path, provider, model, entity_slug, chapter):
|
||||
adapter = create_adapter(provider, model=model)
|
||||
run_config = RunConfig(model_name=model, temperature=0.3, max_tokens=2000)
|
||||
|
||||
# Progress callback
|
||||
# Progress callback — surface error detail so agents don't have to
|
||||
# drop into Python to see whether an ERROR was 429, 503, or auth.
|
||||
def on_progress(done, total, result):
|
||||
status = result.status.upper()
|
||||
click.echo(f" [{done}/{total}] {result.key}: {status}")
|
||||
if status == "ERROR" and result.error:
|
||||
click.echo(f" [{done}/{total}] {result.key}: ERROR — {result.error}")
|
||||
else:
|
||||
click.echo(f" [{done}/{total}] {result.key}: {status}")
|
||||
|
||||
click.echo(f"Evaluating {len(entity_list)} entities via {provider}...")
|
||||
|
||||
@@ -301,6 +546,42 @@ def evaluate(config_path, provider, model, entity_slug, chapter):
|
||||
progress_callback=on_progress,
|
||||
)
|
||||
|
||||
# Model fallback: if any entities failed with a rate-limit-looking
|
||||
# error and the user opted in with --model-fallback, retry them once
|
||||
# with a fresh adapter on the fallback model. Different free-tier
|
||||
# models have separate quota buckets, so this often succeeds when
|
||||
# the primary is exhausted.
|
||||
if model_fallback and summary.failed > 0:
|
||||
rate_limited = [
|
||||
r for r in summary.results
|
||||
if r.status == "error"
|
||||
and r.error
|
||||
and ("429" in r.error or "rate" in r.error.lower())
|
||||
]
|
||||
if rate_limited:
|
||||
retry_slugs = {r.key for r in rate_limited}
|
||||
retry_entities = [e for e in entity_list if e.slug in retry_slugs]
|
||||
click.echo(
|
||||
f"\n{len(retry_entities)} rate-limited entities — "
|
||||
f"retrying with --model-fallback {model_fallback}..."
|
||||
)
|
||||
fb_adapter = create_adapter(provider, model=model_fallback)
|
||||
fb_run_config = RunConfig(
|
||||
model_name=model_fallback, temperature=0.3, max_tokens=2000
|
||||
)
|
||||
fb_summary = run_entity_evaluation(
|
||||
config=cfg,
|
||||
entities=retry_entities,
|
||||
adapter=fb_adapter,
|
||||
run_config=fb_run_config,
|
||||
output_dir=output_dir,
|
||||
progress_callback=on_progress,
|
||||
)
|
||||
summary.succeeded += fb_summary.succeeded
|
||||
summary.failed = (summary.failed - len(retry_entities)) + fb_summary.failed
|
||||
summary.total_prompt_tokens += fb_summary.total_prompt_tokens
|
||||
summary.total_completion_tokens += fb_summary.total_completion_tokens
|
||||
|
||||
click.echo(f"\nDone: {summary.succeeded} succeeded, {summary.failed} failed, {summary.skipped} skipped")
|
||||
if summary.total_tokens > 0:
|
||||
click.echo(f"Tokens used: {summary.total_tokens}")
|
||||
@@ -1015,6 +1296,18 @@ def disciplines(config_path: Optional[str]):
|
||||
help="Run collection checks (C1–C5) after each source file.",
|
||||
)
|
||||
@click.option("--no-commit", is_flag=True, help="Skip git commits.")
|
||||
@click.option(
|
||||
"--eval-after-source",
|
||||
is_flag=True,
|
||||
help="After each source's stages succeed, evaluate just the newly-"
|
||||
"added entities so the per-source commit is self-contained.",
|
||||
)
|
||||
@click.option(
|
||||
"--classify-after-source",
|
||||
is_flag=True,
|
||||
help="After each source's stages succeed, classify just the newly-"
|
||||
"added entities so the per-source commit is self-contained.",
|
||||
)
|
||||
def process(
|
||||
glob_pattern: Optional[str],
|
||||
process_all: bool,
|
||||
@@ -1023,6 +1316,8 @@ def process(
|
||||
model: Optional[str],
|
||||
check_after_each: bool,
|
||||
no_commit: bool,
|
||||
eval_after_source: bool,
|
||||
classify_after_source: bool,
|
||||
):
|
||||
"""Process source files through the pipeline defined in infospace.yaml.
|
||||
|
||||
@@ -1096,12 +1391,22 @@ def process(
|
||||
# Run pipeline
|
||||
from markitect.infospace.pipeline import SourcePipeline
|
||||
|
||||
if (eval_after_source or classify_after_source) and adapter is None:
|
||||
click.echo(
|
||||
"Error: --eval-after-source / --classify-after-source require "
|
||||
"--provider (they call the LLM).",
|
||||
err=True,
|
||||
)
|
||||
raise SystemExit(1)
|
||||
|
||||
pipeline = SourcePipeline(
|
||||
cfg, root,
|
||||
adapter=adapter,
|
||||
provider=provider or "",
|
||||
model=(model or _PROVIDER_DEFAULTS.get(provider or "", "")) if provider else "",
|
||||
no_commit=no_commit,
|
||||
eval_after_source=eval_after_source,
|
||||
classify_after_source=classify_after_source,
|
||||
)
|
||||
|
||||
total = len(source_files)
|
||||
|
||||
@@ -195,12 +195,23 @@ def run_entity_evaluation(
|
||||
"""
|
||||
topic = config.topic.name
|
||||
evaluations_path = output_dir or Path(config.evaluations_dir)
|
||||
evaluator_name = (run_config.model_name if run_config else "unknown")
|
||||
# Fall back from run_config.model_name (may be None if the CLI user did
|
||||
# not pass --model) to the adapter's resolved model, and only then to
|
||||
# "unknown". Keeps the evaluator field in the written frontmatter
|
||||
# informative for later audits.
|
||||
default_evaluator = (
|
||||
(run_config.model_name if run_config else None)
|
||||
or getattr(adapter, "_model", None)
|
||||
or "unknown"
|
||||
)
|
||||
|
||||
def _write_and_notify(done: int, total: int, result) -> None:
|
||||
# Write file immediately on success (incremental — run is resumable)
|
||||
if result.status == "success" and result.response is not None:
|
||||
scores = parse_evaluation_response(result.response.content, dimensions)
|
||||
# Prefer the model name the adapter actually echoed back — it
|
||||
# reflects post-resolution fallbacks (e.g. flash → flash-lite).
|
||||
evaluator_name = result.response.model or default_evaluator
|
||||
evaluation = EntityEvaluation(
|
||||
entity_slug=result.key,
|
||||
evaluator=evaluator_name,
|
||||
|
||||
@@ -81,17 +81,26 @@ def snapshot_from_checks(
|
||||
# ── Metrics file I/O ────────────────────────────────────────────────
|
||||
|
||||
|
||||
def write_metrics_file(metrics: Dict[str, float], path: Path) -> None:
|
||||
def write_metrics_file(metrics: Dict[str, Any], path: Path) -> None:
|
||||
"""Write the latest metrics to a simple YAML file.
|
||||
|
||||
This file is used by ``markitect infospace viability`` for quick
|
||||
threshold checking.
|
||||
threshold checking. Non-numeric values (e.g. ``type_distribution``)
|
||||
are passed through unchanged; floats are rounded to 6 dp; ints are
|
||||
preserved as ints so external consumers don't see ``29`` silently
|
||||
become ``29.0`` on every round-trip.
|
||||
"""
|
||||
def _normalize(v: Any) -> Any:
|
||||
if isinstance(v, bool):
|
||||
return v
|
||||
if isinstance(v, float):
|
||||
return round(v, 6)
|
||||
return v
|
||||
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
path.write_text(
|
||||
yaml.safe_dump(
|
||||
{k: round(v, 6) if isinstance(v, float) else v
|
||||
for k, v in sorted(metrics.items())},
|
||||
{k: _normalize(v) for k, v in sorted(metrics.items())},
|
||||
default_flow_style=False,
|
||||
sort_keys=True,
|
||||
),
|
||||
@@ -99,14 +108,20 @@ def write_metrics_file(metrics: Dict[str, float], path: Path) -> None:
|
||||
)
|
||||
|
||||
|
||||
def read_metrics_file(path: Path) -> Dict[str, float]:
|
||||
"""Read the latest metrics from a YAML file."""
|
||||
def read_metrics_file(path: Path) -> Dict[str, Any]:
|
||||
"""Read the latest metrics from a YAML file.
|
||||
|
||||
Returns all keys as written on disk, preserving types verbatim so a
|
||||
round-trip via :func:`write_metrics_file` does not silently drop
|
||||
structured values (e.g. ``type_distribution``) or flatten ints to
|
||||
floats.
|
||||
"""
|
||||
if not path.is_file():
|
||||
return {}
|
||||
raw = yaml.safe_load(path.read_text(encoding="utf-8"))
|
||||
if not isinstance(raw, dict):
|
||||
return {}
|
||||
return {k: float(v) for k, v in raw.items() if isinstance(v, (int, float))}
|
||||
return raw
|
||||
|
||||
|
||||
# ── History operations ───────────────────────────────────────────────
|
||||
|
||||
@@ -62,6 +62,8 @@ class SourcePipeline:
|
||||
provider: str = "",
|
||||
model: str = "",
|
||||
no_commit: bool = False,
|
||||
eval_after_source: bool = False,
|
||||
classify_after_source: bool = False,
|
||||
) -> None:
|
||||
self.config = config
|
||||
self.root = root
|
||||
@@ -69,6 +71,8 @@ class SourcePipeline:
|
||||
self.provider = provider
|
||||
self.model = model
|
||||
self.no_commit = no_commit
|
||||
self.eval_after_source = eval_after_source
|
||||
self.classify_after_source = classify_after_source
|
||||
|
||||
# ── Public API ────────────────────────────────────────────────────
|
||||
|
||||
@@ -110,6 +114,12 @@ class SourcePipeline:
|
||||
stage_outputs: Dict[str, str] = {}
|
||||
stage_logs: List[Dict[str, Any]] = []
|
||||
|
||||
# Snapshot entity slugs before any stage runs so we can identify
|
||||
# which entities were newly produced by this source. Used to scope
|
||||
# --eval-after-source / --classify-after-source to only the new
|
||||
# entities.
|
||||
pre_entity_slugs = self._current_entity_slugs()
|
||||
|
||||
print(f"\nProcessing: {source_id}")
|
||||
print("=" * 60)
|
||||
|
||||
@@ -133,6 +143,14 @@ class SourcePipeline:
|
||||
|
||||
print(f"\n {source_id}: all stages complete.")
|
||||
self._write_processing_log(source_id, stage_logs, success=True)
|
||||
|
||||
# Per-source follow-ups: evaluate and/or classify just the new
|
||||
# entities this source produced, so the next commit contains a
|
||||
# fully-processed chapter.
|
||||
new_slugs = self._current_entity_slugs() - pre_entity_slugs
|
||||
if new_slugs and (self.eval_after_source or self.classify_after_source):
|
||||
self._run_per_source_followups(new_slugs)
|
||||
|
||||
if not self.no_commit:
|
||||
self._git_commit(source_id)
|
||||
|
||||
@@ -636,7 +654,13 @@ class SourcePipeline:
|
||||
# ── Git Integration ───────────────────────────────────────────────
|
||||
|
||||
def _git_commit(self, source_id: str) -> None:
|
||||
"""Stage all output changes and commit them for *source_id*."""
|
||||
"""Stage all output changes and commit them for *source_id*.
|
||||
|
||||
The commit message body summarises what actually changed — counts
|
||||
of entities / evaluations / classifications / analyses added — so
|
||||
``git log`` reads like the chapter-by-chapter story of the
|
||||
infospace growing, not a wall of identical messages.
|
||||
"""
|
||||
output_dir = self.root / "output"
|
||||
try:
|
||||
subprocess.run(
|
||||
@@ -645,11 +669,11 @@ class SourcePipeline:
|
||||
check=True,
|
||||
capture_output=True,
|
||||
)
|
||||
body = self._compose_commit_body(source_id)
|
||||
result = subprocess.run(
|
||||
[
|
||||
"git", "commit", "-m",
|
||||
f"infospace: process {source_id}\n\n"
|
||||
f"Extract entities, map to VSM, and synthesize analysis.",
|
||||
f"infospace: process {source_id}\n\n{body}",
|
||||
],
|
||||
cwd=str(self.root),
|
||||
capture_output=True,
|
||||
@@ -666,3 +690,146 @@ class SourcePipeline:
|
||||
except subprocess.CalledProcessError as e:
|
||||
stderr = e.stderr.decode() if isinstance(e.stderr, bytes) else (e.stderr or "")
|
||||
print(f" Warning: Git error: {stderr.strip()}")
|
||||
|
||||
# ── Per-source helpers ────────────────────────────────────────────
|
||||
|
||||
def _current_entity_slugs(self) -> set:
|
||||
"""Return the set of entity file stems currently on disk."""
|
||||
entities_dir = self.root / self.config.entities_dir
|
||||
if not entities_dir.is_dir():
|
||||
return set()
|
||||
return {p.stem for p in entities_dir.glob("*.md")}
|
||||
|
||||
def _run_per_source_followups(self, new_slugs: set) -> None:
|
||||
"""Run per-source evaluation and/or classification on *new_slugs*.
|
||||
|
||||
Called after a source's pipeline stages succeed, before the git
|
||||
commit, so each chapter's commit contains the full set of
|
||||
artefacts derived from it.
|
||||
"""
|
||||
from markitect.infospace.entity_parser import parse_entity_directory
|
||||
|
||||
entities_dir = self.root / self.config.entities_dir
|
||||
all_entities = parse_entity_directory(entities_dir)
|
||||
new_entities = [e for e in all_entities if e.slug in new_slugs]
|
||||
if not new_entities:
|
||||
return
|
||||
|
||||
if self.adapter is None:
|
||||
print(
|
||||
" Skipping per-source eval/classify: no LLM adapter "
|
||||
"configured (run with --provider)."
|
||||
)
|
||||
return
|
||||
|
||||
from markitect.prompts.execution.models import RunConfig
|
||||
|
||||
run_config = RunConfig(
|
||||
model_name=self.model or None, temperature=0.3, max_tokens=2000
|
||||
)
|
||||
|
||||
if self.eval_after_source:
|
||||
from markitect.infospace.evaluate import run_entity_evaluation
|
||||
|
||||
print(f" Evaluating {len(new_entities)} new entity/entities…")
|
||||
try:
|
||||
run_entity_evaluation(
|
||||
config=self.config,
|
||||
entities=new_entities,
|
||||
adapter=self.adapter,
|
||||
run_config=run_config,
|
||||
output_dir=self.root / self.config.evaluations_dir,
|
||||
)
|
||||
except Exception as exc:
|
||||
print(f" Warning: per-source evaluation failed: {exc}")
|
||||
|
||||
if self.classify_after_source:
|
||||
from markitect.infospace.classifier import run_entity_classification
|
||||
|
||||
print(f" Classifying {len(new_entities)} new entity/entities…")
|
||||
try:
|
||||
run_entity_classification(
|
||||
config=self.config,
|
||||
entities=new_entities,
|
||||
adapter=self.adapter,
|
||||
run_config=run_config,
|
||||
output_dir=self.root / self.config.classifications_dir,
|
||||
)
|
||||
except Exception as exc:
|
||||
print(f" Warning: per-source classification failed: {exc}")
|
||||
|
||||
def _compose_commit_body(self, source_id: str) -> str:
|
||||
"""Summarise staged output changes into a commit-message body.
|
||||
|
||||
Counts added files per output subdirectory (entities, evaluations,
|
||||
classifications, analyses, mappings…) and produces one line per
|
||||
bucket that actually saw additions. Modified/deleted files are
|
||||
noted separately for auditability.
|
||||
"""
|
||||
default = "Extract entities, map to VSM, and synthesize analysis."
|
||||
try:
|
||||
result = subprocess.run(
|
||||
["git", "diff", "--cached", "--name-status", "--", "output"],
|
||||
cwd=str(self.root),
|
||||
check=True,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
except subprocess.CalledProcessError:
|
||||
return default
|
||||
|
||||
added_by_bucket: Dict[str, int] = {}
|
||||
modified = 0
|
||||
deleted = 0
|
||||
for line in result.stdout.splitlines():
|
||||
parts = line.split("\t")
|
||||
if len(parts) < 2:
|
||||
continue
|
||||
status = parts[0]
|
||||
path = parts[-1]
|
||||
if status.startswith("A"):
|
||||
bucket = self._bucket_for(path)
|
||||
if bucket:
|
||||
added_by_bucket[bucket] = added_by_bucket.get(bucket, 0) + 1
|
||||
elif status.startswith("M"):
|
||||
modified += 1
|
||||
elif status.startswith("D"):
|
||||
deleted += 1
|
||||
|
||||
if not added_by_bucket and not modified and not deleted:
|
||||
return default
|
||||
|
||||
# Emit buckets in a deterministic, reader-friendly order.
|
||||
order = ["entities", "mappings", "analyses", "evaluations",
|
||||
"classifications", "metrics", "logs", "other"]
|
||||
lines: List[str] = []
|
||||
for bucket in order:
|
||||
n = added_by_bucket.get(bucket, 0)
|
||||
if n:
|
||||
lines.append(f"- {bucket}: +{n}")
|
||||
if modified:
|
||||
lines.append(f"- modified: {modified}")
|
||||
if deleted:
|
||||
lines.append(f"- deleted: {deleted}")
|
||||
return "\n".join(lines) if lines else default
|
||||
|
||||
def _bucket_for(self, path: str) -> Optional[str]:
|
||||
"""Map an ``output/...`` path to a commit-summary bucket name."""
|
||||
# Use configured directory basenames where possible so non-default
|
||||
# layouts still bucket correctly.
|
||||
buckets = {
|
||||
Path(self.config.entities_dir).name: "entities",
|
||||
Path(self.config.evaluations_dir).name: "evaluations",
|
||||
Path(self.config.classifications_dir).name: "classifications",
|
||||
}
|
||||
parts = Path(path).parts
|
||||
if len(parts) < 2 or parts[0] != "output":
|
||||
return None
|
||||
sub = parts[1]
|
||||
if sub in buckets:
|
||||
return buckets[sub]
|
||||
# Heuristic fallback for common additional output subdirectories.
|
||||
known = {"mappings", "analyses", "metrics", "logs"}
|
||||
if sub in known:
|
||||
return sub
|
||||
return "other"
|
||||
|
||||
@@ -131,6 +131,12 @@ def build_state(
|
||||
This is a convenience function that assembles the state object
|
||||
and optionally runs viability checks if *metrics* are provided.
|
||||
"""
|
||||
if not isinstance(config, InfospaceConfig):
|
||||
raise TypeError(
|
||||
f"build_state(config=...) expects an InfospaceConfig instance, "
|
||||
f"got {type(config).__name__}. If you have a path, load the "
|
||||
f"config first with load_infospace_config(path)."
|
||||
)
|
||||
state = InfospaceState(
|
||||
config=config,
|
||||
entities=entities or [],
|
||||
|
||||
@@ -12,6 +12,8 @@ Quick start::
|
||||
response = adapter.execute_prompt(prompt, run_config)
|
||||
"""
|
||||
|
||||
from markitect.llm.models import RunConfig, LLMResponse
|
||||
from markitect.llm.adapter import LLMAdapter, MockLLMAdapter, ErrorLLMAdapter
|
||||
from markitect.llm.factory import create_adapter
|
||||
from markitect.llm.openrouter import OpenRouterAdapter
|
||||
from markitect.llm.claude_code import ClaudeCodeAdapter
|
||||
@@ -37,6 +39,11 @@ from markitect.llm.similarity import (
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"RunConfig",
|
||||
"LLMResponse",
|
||||
"LLMAdapter",
|
||||
"MockLLMAdapter",
|
||||
"ErrorLLMAdapter",
|
||||
"create_adapter",
|
||||
"OpenRouterAdapter",
|
||||
"ClaudeCodeAdapter",
|
||||
|
||||
169
markitect/llm/adapter.py
Normal file
169
markitect/llm/adapter.py
Normal file
@@ -0,0 +1,169 @@
|
||||
"""
|
||||
LLM adapter interface for pluggable model providers.
|
||||
|
||||
Implements abstraction layer for LLM integration, supporting
|
||||
multiple providers (OpenAI, Anthropic, local models, etc.).
|
||||
"""
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import Dict, Any
|
||||
|
||||
from markitect.llm.models import RunConfig, LLMResponse
|
||||
|
||||
|
||||
class LLMAdapter(ABC):
|
||||
"""
|
||||
Abstract base class for LLM providers.
|
||||
|
||||
Enables pluggable LLM backends without prescribing implementation.
|
||||
Implementations can wrap OpenAI, Anthropic, or other APIs.
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def execute_prompt(
|
||||
self,
|
||||
prompt: str,
|
||||
config: RunConfig,
|
||||
) -> LLMResponse:
|
||||
"""
|
||||
Execute a prompt with the LLM.
|
||||
|
||||
Args:
|
||||
prompt: Compiled prompt text
|
||||
config: Execution configuration
|
||||
|
||||
Returns:
|
||||
LLMResponse with generated content
|
||||
|
||||
Raises:
|
||||
Exception: On LLM API errors
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def validate_config(self, config: RunConfig) -> bool:
|
||||
"""
|
||||
Validate that configuration is supported.
|
||||
|
||||
Args:
|
||||
config: Configuration to validate
|
||||
|
||||
Returns:
|
||||
True if valid, False otherwise
|
||||
"""
|
||||
pass
|
||||
|
||||
|
||||
class MockLLMAdapter(LLMAdapter):
|
||||
"""
|
||||
Mock LLM adapter for testing.
|
||||
|
||||
Returns deterministic responses without calling external APIs.
|
||||
"""
|
||||
|
||||
def __init__(self, mock_response: str = "Mock LLM response"):
|
||||
"""
|
||||
Initialize mock adapter.
|
||||
|
||||
Args:
|
||||
mock_response: Response to return
|
||||
"""
|
||||
self.mock_response = mock_response
|
||||
self.call_count = 0
|
||||
self.last_prompt = None
|
||||
self.last_config = None
|
||||
|
||||
def execute_prompt(
|
||||
self,
|
||||
prompt: str,
|
||||
config: RunConfig,
|
||||
) -> LLMResponse:
|
||||
"""
|
||||
Return mock response.
|
||||
|
||||
Args:
|
||||
prompt: Prompt (stored for inspection)
|
||||
config: Config (stored for inspection)
|
||||
|
||||
Returns:
|
||||
Mock LLMResponse
|
||||
"""
|
||||
self.call_count += 1
|
||||
self.last_prompt = prompt
|
||||
self.last_config = config
|
||||
|
||||
return LLMResponse(
|
||||
content=self.mock_response,
|
||||
model=config.model_name,
|
||||
usage={
|
||||
"prompt_tokens": len(prompt.split()),
|
||||
"completion_tokens": len(self.mock_response.split()),
|
||||
"total_tokens": len(prompt.split()) + len(self.mock_response.split()),
|
||||
},
|
||||
finish_reason="stop",
|
||||
metadata={"mock": True},
|
||||
)
|
||||
|
||||
def validate_config(self, config: RunConfig) -> bool:
|
||||
"""
|
||||
Mock validation always succeeds.
|
||||
|
||||
Args:
|
||||
config: Configuration
|
||||
|
||||
Returns:
|
||||
Always True
|
||||
"""
|
||||
return True
|
||||
|
||||
def reset(self) -> None:
|
||||
"""Reset mock state."""
|
||||
self.call_count = 0
|
||||
self.last_prompt = None
|
||||
self.last_config = None
|
||||
|
||||
|
||||
class ErrorLLMAdapter(LLMAdapter):
|
||||
"""
|
||||
Mock adapter that always raises an error.
|
||||
|
||||
Useful for testing error handling.
|
||||
"""
|
||||
|
||||
def __init__(self, error_message: str = "Mock LLM error"):
|
||||
"""
|
||||
Initialize error adapter.
|
||||
|
||||
Args:
|
||||
error_message: Error message to raise
|
||||
"""
|
||||
self.error_message = error_message
|
||||
|
||||
def execute_prompt(
|
||||
self,
|
||||
prompt: str,
|
||||
config: RunConfig,
|
||||
) -> LLMResponse:
|
||||
"""
|
||||
Raise error.
|
||||
|
||||
Args:
|
||||
prompt: Prompt
|
||||
config: Config
|
||||
|
||||
Raises:
|
||||
RuntimeError: Always
|
||||
"""
|
||||
raise RuntimeError(self.error_message)
|
||||
|
||||
def validate_config(self, config: RunConfig) -> bool:
|
||||
"""
|
||||
Validation succeeds.
|
||||
|
||||
Args:
|
||||
config: Configuration
|
||||
|
||||
Returns:
|
||||
True
|
||||
"""
|
||||
return True
|
||||
@@ -5,8 +5,8 @@ Claude Code CLI adapter — runs the ``claude`` CLI as a subprocess.
|
||||
import subprocess
|
||||
from typing import Optional
|
||||
|
||||
from markitect.prompts.execution.llm_adapter import LLMAdapter
|
||||
from markitect.prompts.execution.models import RunConfig, LLMResponse
|
||||
from markitect.llm.adapter import LLMAdapter
|
||||
from markitect.llm.models import RunConfig, LLMResponse
|
||||
from markitect.llm.config import LLMConfig
|
||||
from markitect.llm._token_estimator import estimate_tokens
|
||||
from markitect.llm.exceptions import (
|
||||
|
||||
@@ -4,7 +4,7 @@ Factory for creating LLM adapters by provider name.
|
||||
|
||||
from typing import Optional, Dict, Any
|
||||
|
||||
from markitect.prompts.execution.llm_adapter import LLMAdapter
|
||||
from markitect.llm.adapter import LLMAdapter
|
||||
from markitect.llm.exceptions import LLMConfigurationError
|
||||
|
||||
# Lazy imports to avoid pulling in every adapter at module load time.
|
||||
|
||||
@@ -5,11 +5,15 @@ Google Gemini adapter — calls the Generative Language REST API directly.
|
||||
import time
|
||||
from typing import Optional, Dict, Any
|
||||
|
||||
from markitect.prompts.execution.llm_adapter import LLMAdapter
|
||||
from markitect.prompts.execution.models import RunConfig, LLMResponse
|
||||
from markitect.llm.adapter import LLMAdapter
|
||||
from markitect.llm.models import RunConfig, LLMResponse
|
||||
from markitect.llm.config import resolve_api_key, find_project_root
|
||||
from markitect.llm._http import post_json
|
||||
from markitect.llm.exceptions import LLMConfigurationError
|
||||
from markitect.llm.exceptions import (
|
||||
LLMConfigurationError,
|
||||
LLMAPIError,
|
||||
LLMRateLimitError,
|
||||
)
|
||||
|
||||
_DEFAULT_MODEL = "gemini-2.5-flash"
|
||||
_API_BASE = "https://generativelanguage.googleapis.com/v1beta"
|
||||
@@ -26,10 +30,12 @@ class GeminiAdapter(LLMAdapter):
|
||||
model: Optional[str] = None,
|
||||
api_key: Optional[str] = None,
|
||||
system_prompt: Optional[str] = None,
|
||||
max_retries: int = 3,
|
||||
**_kwargs: Any,
|
||||
):
|
||||
self._model = model or _DEFAULT_MODEL
|
||||
self._system_prompt = system_prompt
|
||||
self._max_retries = max_retries
|
||||
|
||||
root = find_project_root()
|
||||
key_file_paths = [root / "apikey-geminifree.txt"] if root else []
|
||||
@@ -77,7 +83,7 @@ class GeminiAdapter(LLMAdapter):
|
||||
url = f"{_API_BASE}/models/{model}:generateContent?key={self._api_key}"
|
||||
|
||||
start = time.time()
|
||||
data = post_json(url, payload, timeout=config.timeout_seconds)
|
||||
data = self._post_with_retries(url, payload, timeout=config.timeout_seconds)
|
||||
latency = time.time() - start
|
||||
|
||||
# Parse Gemini response
|
||||
@@ -113,3 +119,27 @@ class GeminiAdapter(LLMAdapter):
|
||||
if not (0.0 <= config.temperature <= 2.0):
|
||||
return False
|
||||
return True
|
||||
|
||||
# ── Internals ───────────────────────────────────────────────────
|
||||
|
||||
def _post_with_retries(
|
||||
self,
|
||||
url: str,
|
||||
payload: Dict[str, Any],
|
||||
timeout: int,
|
||||
) -> Dict[str, Any]:
|
||||
last_exc: Optional[Exception] = None
|
||||
for attempt in range(self._max_retries + 1):
|
||||
try:
|
||||
return post_json(url, payload, timeout=timeout)
|
||||
except LLMRateLimitError as exc:
|
||||
last_exc = exc
|
||||
if attempt < self._max_retries:
|
||||
time.sleep(2 ** attempt)
|
||||
except LLMAPIError as exc:
|
||||
if exc.status_code in (502, 503, 504) and attempt < self._max_retries:
|
||||
last_exc = exc
|
||||
time.sleep(2 ** attempt)
|
||||
else:
|
||||
raise
|
||||
raise last_exc # type: ignore[misc]
|
||||
|
||||
86
markitect/llm/models.py
Normal file
86
markitect/llm/models.py
Normal file
@@ -0,0 +1,86 @@
|
||||
"""
|
||||
Shared data models for LLM execution.
|
||||
|
||||
These classes are the canonical definitions; they are re-exported by
|
||||
markitect.prompts.execution.models for backward compatibility.
|
||||
"""
|
||||
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Dict, Any
|
||||
|
||||
|
||||
@dataclass
|
||||
class RunConfig:
|
||||
"""
|
||||
Configuration for prompt execution.
|
||||
|
||||
Attributes:
|
||||
model_name: LLM model to use
|
||||
temperature: Model temperature (0.0-1.0)
|
||||
max_tokens: Maximum tokens to generate
|
||||
model_params: Additional model parameters
|
||||
max_depth: Maximum generation depth for nested runs
|
||||
skip_if_exists: Skip if identical InputBundleHash exists
|
||||
timeout_seconds: Execution timeout
|
||||
"""
|
||||
model_name: str = "gpt-4"
|
||||
temperature: float = 0.7
|
||||
max_tokens: int = 2000
|
||||
model_params: Dict[str, Any] = field(default_factory=dict)
|
||||
max_depth: int = 3
|
||||
skip_if_exists: bool = True
|
||||
timeout_seconds: int = 300
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
"""Convert to dictionary."""
|
||||
return {
|
||||
"model_name": self.model_name,
|
||||
"temperature": self.temperature,
|
||||
"max_tokens": self.max_tokens,
|
||||
"model_params": self.model_params,
|
||||
"max_depth": self.max_depth,
|
||||
"skip_if_exists": self.skip_if_exists,
|
||||
"timeout_seconds": self.timeout_seconds,
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: Dict[str, Any]) -> "RunConfig":
|
||||
"""Create from dictionary."""
|
||||
return cls(
|
||||
model_name=data.get("model_name", "gpt-4"),
|
||||
temperature=data.get("temperature", 0.7),
|
||||
max_tokens=data.get("max_tokens", 2000),
|
||||
model_params=data.get("model_params", {}),
|
||||
max_depth=data.get("max_depth", 3),
|
||||
skip_if_exists=data.get("skip_if_exists", True),
|
||||
timeout_seconds=data.get("timeout_seconds", 300),
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class LLMResponse:
|
||||
"""
|
||||
Response from LLM execution.
|
||||
|
||||
Attributes:
|
||||
content: Generated content
|
||||
model: Model used
|
||||
usage: Token usage statistics
|
||||
finish_reason: Why generation stopped
|
||||
metadata: Additional response metadata
|
||||
"""
|
||||
content: str
|
||||
model: str
|
||||
usage: Dict[str, int] = field(default_factory=dict)
|
||||
finish_reason: str = "stop"
|
||||
metadata: Dict[str, Any] = field(default_factory=dict)
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
"""Convert to dictionary."""
|
||||
return {
|
||||
"content": self.content,
|
||||
"model": self.model,
|
||||
"usage": self.usage,
|
||||
"finish_reason": self.finish_reason,
|
||||
"metadata": self.metadata,
|
||||
}
|
||||
@@ -5,8 +5,8 @@ OpenAI (ChatGPT) adapter — calls the OpenAI chat completions API.
|
||||
import time
|
||||
from typing import Optional, Dict, Any
|
||||
|
||||
from markitect.prompts.execution.llm_adapter import LLMAdapter
|
||||
from markitect.prompts.execution.models import RunConfig, LLMResponse
|
||||
from markitect.llm.adapter import LLMAdapter
|
||||
from markitect.llm.models import RunConfig, LLMResponse
|
||||
from markitect.llm.config import resolve_api_key, find_project_root
|
||||
from markitect.llm._http import post_json
|
||||
from markitect.llm.exceptions import (
|
||||
|
||||
@@ -5,8 +5,8 @@ OpenRouter adapter — calls the OpenAI-compatible chat completions API.
|
||||
import time
|
||||
from typing import Optional, Dict, Any
|
||||
|
||||
from markitect.prompts.execution.llm_adapter import LLMAdapter
|
||||
from markitect.prompts.execution.models import RunConfig, LLMResponse
|
||||
from markitect.llm.adapter import LLMAdapter
|
||||
from markitect.llm.models import RunConfig, LLMResponse
|
||||
from markitect.llm.config import LLMConfig, resolve_api_key, find_project_root
|
||||
from markitect.llm._http import post_json
|
||||
from markitect.llm.exceptions import (
|
||||
|
||||
@@ -28,13 +28,28 @@ from markitect.llm.config import find_project_root
|
||||
|
||||
HARDCODED_PROVIDER = "gemini"
|
||||
HARDCODED_MODEL = "gemini-2.5-flash"
|
||||
MODEL_ENV_VAR = "MARKITECT_HELPER_MODEL"
|
||||
|
||||
# Default (markitect) values kept for backward compatibility.
|
||||
MODEL_ENV_VAR = "MARKITECT_HELPER_MODEL"
|
||||
USER_CONFIG_DIR = Path.home() / ".config" / "markitect"
|
||||
USER_CONFIG_PATH = USER_CONFIG_DIR / "config.toml"
|
||||
DIR_CONFIG_NAME = ".markitect.toml"
|
||||
|
||||
|
||||
# ── App-name helpers ───────────────────────────────────────────────────────
|
||||
|
||||
def _model_env_var(app_name: str) -> str:
|
||||
return f"{app_name.upper()}_HELPER_MODEL"
|
||||
|
||||
|
||||
def _user_config_path(app_name: str) -> Path:
|
||||
return Path.home() / ".config" / app_name / "config.toml"
|
||||
|
||||
|
||||
def _dir_config_name(app_name: str) -> str:
|
||||
return f".{app_name}.toml"
|
||||
|
||||
|
||||
# ── Data classes ──────────────────────────────────────────────────────────
|
||||
|
||||
@dataclass
|
||||
@@ -114,11 +129,11 @@ def _clear_llm_section(path: Path, section: str) -> bool:
|
||||
|
||||
# ── Directory config path helper ─────────────────────────────────────────
|
||||
|
||||
def _dir_config_path() -> Optional[Path]:
|
||||
def _dir_config_path(app_name: str = "markitect") -> Optional[Path]:
|
||||
root = find_project_root()
|
||||
if root is None:
|
||||
return None
|
||||
return root / DIR_CONFIG_NAME
|
||||
return root / _dir_config_name(app_name)
|
||||
|
||||
|
||||
# ── Resolution ───────────────────────────────────────────────────────────
|
||||
@@ -126,13 +141,23 @@ def _dir_config_path() -> Optional[Path]:
|
||||
def resolve_llm(
|
||||
cli_provider: Optional[str] = None,
|
||||
cli_model: Optional[str] = None,
|
||||
app_name: str = "markitect",
|
||||
) -> ResolvedLLM:
|
||||
"""Walk the 7-level priority chain and return a fully resolved config.
|
||||
|
||||
Provider and model are resolved independently — each takes the value
|
||||
from its highest-priority source.
|
||||
|
||||
Args:
|
||||
cli_provider: Provider override from CLI.
|
||||
cli_model: Model override from CLI.
|
||||
app_name: Application name used to derive config paths and the
|
||||
env-var prefix (e.g. ``"railiance"`` → ``RAILIANCE_HELPER_MODEL``
|
||||
and ``~/.config/railiance/config.toml``).
|
||||
"""
|
||||
dir_path = _dir_config_path()
|
||||
dir_path = _dir_config_path(app_name)
|
||||
user_cfg = _user_config_path(app_name)
|
||||
env_var = _model_env_var(app_name)
|
||||
|
||||
# Build the layers (highest priority first).
|
||||
layers: list[tuple[str, LLMLayer]] = []
|
||||
@@ -141,13 +166,13 @@ def resolve_llm(
|
||||
layers.append(("CLI flag", LLMLayer(provider=cli_provider, model=cli_model)))
|
||||
|
||||
# 2. Env var (model only)
|
||||
env_model = os.environ.get(MODEL_ENV_VAR) or None
|
||||
layers.append(("env MARKITECT_HELPER_MODEL", LLMLayer(model=env_model)))
|
||||
env_model = os.environ.get(env_var) or None
|
||||
layers.append((f"env {env_var}", LLMLayer(model=env_model)))
|
||||
|
||||
# 3. User preference
|
||||
layers.append((
|
||||
"user preference",
|
||||
_read_llm_section(USER_CONFIG_PATH, "preference"),
|
||||
_read_llm_section(user_cfg, "preference"),
|
||||
))
|
||||
|
||||
# 4. Directory preference
|
||||
@@ -167,7 +192,7 @@ def resolve_llm(
|
||||
# 6. User default
|
||||
layers.append((
|
||||
"user default",
|
||||
_read_llm_section(USER_CONFIG_PATH, "default"),
|
||||
_read_llm_section(user_cfg, "default"),
|
||||
))
|
||||
|
||||
# 7. Hardcoded
|
||||
@@ -199,20 +224,22 @@ def resolve_llm(
|
||||
)
|
||||
|
||||
|
||||
def get_default_layers() -> list[tuple[str, LLMLayer]]:
|
||||
def get_default_layers(app_name: str = "markitect") -> list[tuple[str, LLMLayer]]:
|
||||
"""Return only the default layers for display."""
|
||||
dir_path = _dir_config_path()
|
||||
dir_path = _dir_config_path(app_name)
|
||||
user_cfg = _user_config_path(app_name)
|
||||
dir_cfg_name = _dir_config_name(app_name)
|
||||
layers: list[tuple[str, LLMLayer]] = []
|
||||
|
||||
if dir_path:
|
||||
layers.append((
|
||||
f"Directory default ({DIR_CONFIG_NAME})",
|
||||
f"Directory default ({dir_cfg_name})",
|
||||
_read_llm_section(dir_path, "default"),
|
||||
))
|
||||
|
||||
layers.append((
|
||||
f"User default ({USER_CONFIG_PATH})",
|
||||
_read_llm_section(USER_CONFIG_PATH, "default"),
|
||||
f"User default ({user_cfg})",
|
||||
_read_llm_section(user_cfg, "default"),
|
||||
))
|
||||
|
||||
layers.append((
|
||||
@@ -223,19 +250,21 @@ def get_default_layers() -> list[tuple[str, LLMLayer]]:
|
||||
return layers
|
||||
|
||||
|
||||
def get_preference_layers() -> list[tuple[str, LLMLayer]]:
|
||||
def get_preference_layers(app_name: str = "markitect") -> list[tuple[str, LLMLayer]]:
|
||||
"""Return only the preference layers for display."""
|
||||
dir_path = _dir_config_path()
|
||||
dir_path = _dir_config_path(app_name)
|
||||
user_cfg = _user_config_path(app_name)
|
||||
dir_cfg_name = _dir_config_name(app_name)
|
||||
layers: list[tuple[str, LLMLayer]] = []
|
||||
|
||||
layers.append((
|
||||
f"User preference ({USER_CONFIG_PATH})",
|
||||
_read_llm_section(USER_CONFIG_PATH, "preference"),
|
||||
f"User preference ({user_cfg})",
|
||||
_read_llm_section(user_cfg, "preference"),
|
||||
))
|
||||
|
||||
if dir_path:
|
||||
layers.append((
|
||||
f"Directory preference ({DIR_CONFIG_NAME})",
|
||||
f"Directory preference ({dir_cfg_name})",
|
||||
_read_llm_section(dir_path, "preference"),
|
||||
))
|
||||
|
||||
|
||||
@@ -1,169 +1,9 @@
|
||||
"""
|
||||
LLM adapter interface for pluggable model providers.
|
||||
Re-exports from markitect.llm.adapter for backward compatibility.
|
||||
|
||||
Implements abstraction layer for LLM integration, supporting
|
||||
multiple providers (OpenAI, Anthropic, local models, etc.).
|
||||
The LLM adapter interface was moved to markitect.llm.adapter in v1.1.
|
||||
"""
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import Dict, Any
|
||||
from markitect.llm.adapter import LLMAdapter, MockLLMAdapter, ErrorLLMAdapter
|
||||
|
||||
from markitect.prompts.execution.models import RunConfig, LLMResponse
|
||||
|
||||
|
||||
class LLMAdapter(ABC):
|
||||
"""
|
||||
Abstract base class for LLM providers.
|
||||
|
||||
Enables pluggable LLM backends without prescribing implementation.
|
||||
Implementations can wrap OpenAI, Anthropic, or other APIs.
|
||||
"""
|
||||
|
||||
@abstractmethod
|
||||
def execute_prompt(
|
||||
self,
|
||||
prompt: str,
|
||||
config: RunConfig,
|
||||
) -> LLMResponse:
|
||||
"""
|
||||
Execute a prompt with the LLM.
|
||||
|
||||
Args:
|
||||
prompt: Compiled prompt text
|
||||
config: Execution configuration
|
||||
|
||||
Returns:
|
||||
LLMResponse with generated content
|
||||
|
||||
Raises:
|
||||
Exception: On LLM API errors
|
||||
"""
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def validate_config(self, config: RunConfig) -> bool:
|
||||
"""
|
||||
Validate that configuration is supported.
|
||||
|
||||
Args:
|
||||
config: Configuration to validate
|
||||
|
||||
Returns:
|
||||
True if valid, False otherwise
|
||||
"""
|
||||
pass
|
||||
|
||||
|
||||
class MockLLMAdapter(LLMAdapter):
|
||||
"""
|
||||
Mock LLM adapter for testing.
|
||||
|
||||
Returns deterministic responses without calling external APIs.
|
||||
"""
|
||||
|
||||
def __init__(self, mock_response: str = "Mock LLM response"):
|
||||
"""
|
||||
Initialize mock adapter.
|
||||
|
||||
Args:
|
||||
mock_response: Response to return
|
||||
"""
|
||||
self.mock_response = mock_response
|
||||
self.call_count = 0
|
||||
self.last_prompt = None
|
||||
self.last_config = None
|
||||
|
||||
def execute_prompt(
|
||||
self,
|
||||
prompt: str,
|
||||
config: RunConfig,
|
||||
) -> LLMResponse:
|
||||
"""
|
||||
Return mock response.
|
||||
|
||||
Args:
|
||||
prompt: Prompt (stored for inspection)
|
||||
config: Config (stored for inspection)
|
||||
|
||||
Returns:
|
||||
Mock LLMResponse
|
||||
"""
|
||||
self.call_count += 1
|
||||
self.last_prompt = prompt
|
||||
self.last_config = config
|
||||
|
||||
return LLMResponse(
|
||||
content=self.mock_response,
|
||||
model=config.model_name,
|
||||
usage={
|
||||
"prompt_tokens": len(prompt.split()),
|
||||
"completion_tokens": len(self.mock_response.split()),
|
||||
"total_tokens": len(prompt.split()) + len(self.mock_response.split()),
|
||||
},
|
||||
finish_reason="stop",
|
||||
metadata={"mock": True},
|
||||
)
|
||||
|
||||
def validate_config(self, config: RunConfig) -> bool:
|
||||
"""
|
||||
Mock validation always succeeds.
|
||||
|
||||
Args:
|
||||
config: Configuration
|
||||
|
||||
Returns:
|
||||
Always True
|
||||
"""
|
||||
return True
|
||||
|
||||
def reset(self) -> None:
|
||||
"""Reset mock state."""
|
||||
self.call_count = 0
|
||||
self.last_prompt = None
|
||||
self.last_config = None
|
||||
|
||||
|
||||
class ErrorLLMAdapter(LLMAdapter):
|
||||
"""
|
||||
Mock adapter that always raises an error.
|
||||
|
||||
Useful for testing error handling.
|
||||
"""
|
||||
|
||||
def __init__(self, error_message: str = "Mock LLM error"):
|
||||
"""
|
||||
Initialize error adapter.
|
||||
|
||||
Args:
|
||||
error_message: Error message to raise
|
||||
"""
|
||||
self.error_message = error_message
|
||||
|
||||
def execute_prompt(
|
||||
self,
|
||||
prompt: str,
|
||||
config: RunConfig,
|
||||
) -> LLMResponse:
|
||||
"""
|
||||
Raise error.
|
||||
|
||||
Args:
|
||||
prompt: Prompt
|
||||
config: Config
|
||||
|
||||
Raises:
|
||||
RuntimeError: Always
|
||||
"""
|
||||
raise RuntimeError(self.error_message)
|
||||
|
||||
def validate_config(self, config: RunConfig) -> bool:
|
||||
"""
|
||||
Validation succeeds.
|
||||
|
||||
Args:
|
||||
config: Configuration
|
||||
|
||||
Returns:
|
||||
True
|
||||
"""
|
||||
return True
|
||||
__all__ = ["LLMAdapter", "MockLLMAdapter", "ErrorLLMAdapter"]
|
||||
|
||||
@@ -12,6 +12,7 @@ from typing import Dict, Any, List, Optional
|
||||
from enum import Enum
|
||||
|
||||
from markitect.prompts.models import calculate_bundle_digest
|
||||
from markitect.llm.models import RunConfig, LLMResponse # canonical; re-exported here
|
||||
|
||||
|
||||
class ExecutionStage(Enum):
|
||||
@@ -37,54 +38,6 @@ class RunStatus(Enum):
|
||||
SKIPPED = "skipped" # Skipped due to identical InputBundleHash
|
||||
|
||||
|
||||
@dataclass
|
||||
class RunConfig:
|
||||
"""
|
||||
Configuration for prompt execution.
|
||||
|
||||
Attributes:
|
||||
model_name: LLM model to use
|
||||
temperature: Model temperature (0.0-1.0)
|
||||
max_tokens: Maximum tokens to generate
|
||||
model_params: Additional model parameters
|
||||
max_depth: Maximum generation depth for nested runs
|
||||
skip_if_exists: Skip if identical InputBundleHash exists (FR-4.4)
|
||||
timeout_seconds: Execution timeout
|
||||
"""
|
||||
model_name: str = "gpt-4"
|
||||
temperature: float = 0.7
|
||||
max_tokens: int = 2000
|
||||
model_params: Dict[str, Any] = field(default_factory=dict)
|
||||
max_depth: int = 3
|
||||
skip_if_exists: bool = True
|
||||
timeout_seconds: int = 300
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
"""Convert to dictionary."""
|
||||
return {
|
||||
"model_name": self.model_name,
|
||||
"temperature": self.temperature,
|
||||
"max_tokens": self.max_tokens,
|
||||
"model_params": self.model_params,
|
||||
"max_depth": self.max_depth,
|
||||
"skip_if_exists": self.skip_if_exists,
|
||||
"timeout_seconds": self.timeout_seconds,
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: Dict[str, Any]) -> "RunConfig":
|
||||
"""Create from dictionary."""
|
||||
return cls(
|
||||
model_name=data.get("model_name", "gpt-4"),
|
||||
temperature=data.get("temperature", 0.7),
|
||||
max_tokens=data.get("max_tokens", 2000),
|
||||
model_params=data.get("model_params", {}),
|
||||
max_depth=data.get("max_depth", 3),
|
||||
skip_if_exists=data.get("skip_if_exists", True),
|
||||
timeout_seconds=data.get("timeout_seconds", 300),
|
||||
)
|
||||
|
||||
|
||||
@dataclass
|
||||
class InputBundle:
|
||||
"""
|
||||
@@ -151,35 +104,6 @@ class InputBundle:
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class LLMResponse:
|
||||
"""
|
||||
Response from LLM execution.
|
||||
|
||||
Attributes:
|
||||
content: Generated content
|
||||
model: Model used
|
||||
usage: Token usage statistics
|
||||
finish_reason: Why generation stopped
|
||||
metadata: Additional response metadata
|
||||
"""
|
||||
content: str
|
||||
model: str
|
||||
usage: Dict[str, int] = field(default_factory=dict)
|
||||
finish_reason: str = "stop"
|
||||
metadata: Dict[str, Any] = field(default_factory=dict)
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
"""Convert to dictionary."""
|
||||
return {
|
||||
"content": self.content,
|
||||
"model": self.model,
|
||||
"usage": self.usage,
|
||||
"finish_reason": self.finish_reason,
|
||||
"metadata": self.metadata,
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class PromptRun:
|
||||
"""
|
||||
|
||||
4
package-lock.json
generated
4
package-lock.json
generated
@@ -1,11 +1,11 @@
|
||||
{
|
||||
"name": "markitect_project",
|
||||
"name": "markitect-main",
|
||||
"version": "1.0.0",
|
||||
"lockfileVersion": 3,
|
||||
"requires": true,
|
||||
"packages": {
|
||||
"": {
|
||||
"name": "markitect_project",
|
||||
"name": "markitect-main",
|
||||
"version": "1.0.0",
|
||||
"license": "ISC",
|
||||
"dependencies": {
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
{
|
||||
"name": "markitect_project",
|
||||
"name": "markitect-main",
|
||||
"version": "1.0.0",
|
||||
"description": "",
|
||||
"main": "index.js",
|
||||
@@ -14,7 +14,7 @@
|
||||
},
|
||||
"repository": {
|
||||
"type": "git",
|
||||
"url": "http://92.205.130.254:32166/coulomb/markitect_project"
|
||||
"url": "http://92.205.130.254:32166/coulomb/markitect-main"
|
||||
},
|
||||
"keywords": [],
|
||||
"author": "",
|
||||
|
||||
@@ -18,6 +18,9 @@ dependencies = [
|
||||
"aiohttp>=3.8.0",
|
||||
"toml",
|
||||
|
||||
# Extracted LLM adapter library (standalone repo)
|
||||
"llm-connect @ file:///home/worsch/llm-connect",
|
||||
|
||||
# Core capabilities (required for basic functionality)
|
||||
"release-management @ file:./capabilities/release-management",
|
||||
"testdrive-jsui @ file:./capabilities/testdrive-jsui",
|
||||
|
||||
12
registry/README.md
Normal file
12
registry/README.md
Normal file
@@ -0,0 +1,12 @@
|
||||
# Capability Registry
|
||||
|
||||
Markdown-first capability index for federation and reuse planning.
|
||||
|
||||
## Authoring
|
||||
|
||||
1. Copy a capability entry template (see reuse-surface `templates/capability-entry.template.md`).
|
||||
2. Add the row to `indexes/capabilities.yaml`.
|
||||
3. Run `reuse-surface validate` from a checkout with the CLI installed.
|
||||
4. Merge to `main` and verify publish with `reuse-surface establish --publish-check`.
|
||||
|
||||
Federation contract: reuse-surface `docs/RegistryFederation.md`.
|
||||
0
registry/capabilities/.gitkeep
Normal file
0
registry/capabilities/.gitkeep
Normal file
4
registry/indexes/capabilities.yaml
Normal file
4
registry/indexes/capabilities.yaml
Normal file
@@ -0,0 +1,4 @@
|
||||
version: 1
|
||||
updated: '2026-06-16'
|
||||
domain: helix_forge
|
||||
capabilities: []
|
||||
202
roadmap/infospace-s3-closeout/PLAN.md
Normal file
202
roadmap/infospace-s3-closeout/PLAN.md
Normal file
@@ -0,0 +1,202 @@
|
||||
# Infospace Tooling — Stage 3 Close-out
|
||||
|
||||
## Context
|
||||
|
||||
Stages 1 and 2 of the infospace tooling roadmap are complete. Stage 3 used the
|
||||
Wealth of Nations / VSM example to validate the tooling end-to-end. Most of S3
|
||||
is done; this workstream finishes the remaining tasks, addresses deferred cleanup,
|
||||
and formally closes the roadmap.
|
||||
|
||||
**Parent roadmap:** `roadmap/infospace-tooling/PLAN.md`
|
||||
**Example location:** `examples/infospace-with-history/`
|
||||
|
||||
**Status: CLOSED (2026-04-22).** All acceptance criteria except the cosmetic
|
||||
per-chapter history (C.7) are met. Final metrics: 988 entities, 988 evaluations,
|
||||
6/6 viability thresholds PASS (`per_entity_mean = 3.957`). Tooling work that
|
||||
came out of this close-out landed as commits `c0615c2d` (gemini retry,
|
||||
unified skip-existing, non-destructive metrics I/O) and `d44a4cd3`
|
||||
(`infospace entity` lookup, `evaluate --model-fallback`, `llm-check`
|
||||
stale-key advisory, `build_state` type guard).
|
||||
|
||||
### State at workstream open (2026-02-26)
|
||||
|
||||
| Item | Status |
|
||||
|------|--------|
|
||||
| S3.1 Migrate example to infospace config | ✅ Done |
|
||||
| S3.3 Per-entity eval batch | ✅ 985/988 complete; metrics.yaml updated |
|
||||
| S3.4 Tutorial rewrite | ✅ Done |
|
||||
| S3.5 Supply-chain-vsm composition demo | ✅ Done |
|
||||
| S3.2 Clean per-chapter git history | ⏳ Deferred — included here |
|
||||
| 3 missing evaluations | ⏳ Outstanding |
|
||||
| 4 follow-up items (commit b055c8d7) | ⏳ Outstanding |
|
||||
|
||||
### State at workstream close (2026-04-22)
|
||||
|
||||
| Task | Status |
|
||||
|------|--------|
|
||||
| C.1 Complete 3 missing entity evaluations | ✅ Done (commit f325f89d) |
|
||||
| C.2 Run eval-summary and verify viability | ✅ Done — 6/6 PASS |
|
||||
| C.3 Refresh metrics report (988 entities) | ✅ Done — snapshot `090bb961` |
|
||||
| C.4 Document advanced usage patterns | ✅ Done — `examples/infospace-with-history/docs/advanced-usage.md` |
|
||||
| C.5 Composition-examples documentation | ✅ Done — `docs/composition-guide.md` |
|
||||
| C.6 Performance benchmarking note | ✅ Done — `examples/infospace-with-history/docs/performance-notes.md` |
|
||||
| C.7 Clean per-chapter git history | ⏭️ Deferred indefinitely — see note below |
|
||||
| C.8 Formally close S3 roadmap | ✅ This commit |
|
||||
|
||||
**C.7 disposition.** The task assumed a pre-existing `clean-example-history`
|
||||
branch with chapters 1–8 already committed; that branch no longer exists in
|
||||
the repo. The task is explicitly cosmetic ("does not change output files"),
|
||||
and the output files themselves are canonical. Reconstructing a 35-commit
|
||||
per-chapter history from scratch would be archaeological rather than useful.
|
||||
Closing as "won't do" unless a specific archival need surfaces. If revisited,
|
||||
entities can be grouped by their `## Source Chapter` markdown section to
|
||||
reconstruct chapter membership.
|
||||
|
||||
---
|
||||
|
||||
## Tasks
|
||||
|
||||
### C.1 — Complete the 3 missing entity evaluations
|
||||
|
||||
985 of 988 entities have evaluation files. Identify and evaluate the remaining 3.
|
||||
|
||||
```bash
|
||||
cd examples/infospace-with-history
|
||||
# Identify missing slugs
|
||||
comm -23 \
|
||||
<(ls output/entities/*.md | xargs -I{} basename {} .md | sort) \
|
||||
<(ls output/evaluations/*.md | xargs -I{} basename {} .md | sort)
|
||||
# Evaluate each missing entity individually
|
||||
markitect infospace evaluate --entity <slug> --provider openrouter
|
||||
```
|
||||
|
||||
**Acceptance:** `ls output/evaluations/*.md | wc -l` returns 988.
|
||||
|
||||
---
|
||||
|
||||
### C.2 — Run eval-summary and verify viability
|
||||
|
||||
Run the aggregation command to update per_entity_mean from all 988 evaluations,
|
||||
then check all 6 viability gates pass.
|
||||
|
||||
```bash
|
||||
cd examples/infospace-with-history
|
||||
unset OPENROUTER_API_KEY # stale env var guard
|
||||
markitect infospace eval-summary --update-metrics
|
||||
markitect infospace viability
|
||||
```
|
||||
|
||||
Current sample reading (985 entities): `per_entity_mean = 3.956` against threshold 3.5.
|
||||
Expected: all 6 metrics pass.
|
||||
|
||||
**Acceptance:** `markitect infospace viability` exits 0 and shows 6/6 PASS.
|
||||
|
||||
---
|
||||
|
||||
### C.3 — Refresh the metrics report
|
||||
|
||||
The metrics report was generated from chapters 1–4 only. Regenerate it from
|
||||
the full 988-entity set.
|
||||
|
||||
```bash
|
||||
cd examples/infospace-with-history
|
||||
markitect infospace check --provider openrouter # or reuse existing check outputs
|
||||
markitect infospace history # confirm snapshot recorded
|
||||
```
|
||||
|
||||
**Acceptance:** `output/metrics/metrics.yaml` reflects all 988 entities; a dated
|
||||
snapshot exists in the metrics history.
|
||||
|
||||
---
|
||||
|
||||
### C.4 — Document advanced usage patterns
|
||||
|
||||
Write `examples/infospace-with-history/docs/advanced-usage.md` covering:
|
||||
|
||||
- Incremental evaluation (adding entities after initial run, skip-if-exists behaviour)
|
||||
- Re-evaluating after guideline changes (`--force` flag)
|
||||
- Interpreting per-entity score distributions and identifying outliers
|
||||
- Using `markitect infospace entities --sort-by score` to triage low scorers
|
||||
- Reading and acting on collection check outputs (redundancy pairs, coverage gaps)
|
||||
|
||||
**Acceptance:** File exists with ≥ 4 documented patterns, each with a worked command example.
|
||||
|
||||
---
|
||||
|
||||
### C.5 — Add composition examples to documentation
|
||||
|
||||
Document how the supply-chain-vsm example (`examples/supply-chain-vsm/`) demonstrates
|
||||
composition. Add a `docs/composition-guide.md` covering:
|
||||
|
||||
- What composition means (discipline binding)
|
||||
- How supply-chain-vsm binds WoN as a discipline
|
||||
- How to create a new infospace that uses an existing one as a discipline
|
||||
- Viability requirement: the discipline must pass its own thresholds before binding
|
||||
|
||||
Reference `examples/supply-chain-vsm/` throughout.
|
||||
|
||||
**Acceptance:** `docs/composition-guide.md` exists and links to supply-chain-vsm.
|
||||
|
||||
---
|
||||
|
||||
### C.6 — Performance benchmarking note
|
||||
|
||||
Rather than a full benchmarking guide (out of scope for a 988-entity example),
|
||||
record observed timings in a `docs/performance-notes.md`:
|
||||
|
||||
- Eval batch duration (~4 hrs for 988 entities via OpenRouter)
|
||||
- Tokens per entity (rough estimate from usage logs)
|
||||
- Embedding cache hit rate after first run
|
||||
- Recommendation: provider choice (OpenRouter vs Gemini) for different dataset sizes
|
||||
|
||||
**Acceptance:** File exists with at least 4 concrete measurements or estimates.
|
||||
|
||||
---
|
||||
|
||||
### C.7 — S3.2: Clean per-chapter git history (deferred cleanup)
|
||||
|
||||
Create a clean branch where each of the 35 processed chapters has its own commit.
|
||||
Chapters 1–8 are already done on branch `clean-example-history`; 27 remain.
|
||||
|
||||
This is a cosmetic/archival task — it does not change output files.
|
||||
|
||||
```bash
|
||||
git checkout clean-example-history
|
||||
# For each remaining chapter (9–35):
|
||||
# cherry-pick or re-commit the chapter output files with a per-chapter message
|
||||
git log --oneline clean-example-history # verify 35 chapter commits
|
||||
```
|
||||
|
||||
**Acceptance:** Branch `clean-example-history` has exactly 35 chapter commits
|
||||
(one per chapter), rebased onto current main.
|
||||
|
||||
**Note:** This task can be done independently of C.1–C.6. Low urgency — do last.
|
||||
|
||||
---
|
||||
|
||||
### C.8 — Formally close the S3 roadmap
|
||||
|
||||
Update `roadmap/infospace-tooling/PLAN.md` to mark all S3 tasks as complete.
|
||||
Add a close-out summary at the top of the file with final metrics and date.
|
||||
Commit with a `docs(roadmap)` message.
|
||||
|
||||
**Acceptance:** PLAN.md header shows all stages complete; committed to main.
|
||||
|
||||
---
|
||||
|
||||
## Task order
|
||||
|
||||
```
|
||||
C.1 → C.2 → C.3
|
||||
↓
|
||||
C.4, C.5, C.6 (parallel)
|
||||
↓
|
||||
C.8
|
||||
C.7 (independent, do last)
|
||||
```
|
||||
|
||||
## Out of scope
|
||||
|
||||
- Adding new entities or chapters (the WoN example is complete at 988 entities)
|
||||
- Re-running collection checks from scratch (existing results are valid)
|
||||
- Publishing the example as a standalone dataset
|
||||
@@ -1,5 +1,31 @@
|
||||
# Viable Infospace Tooling — Roadmap
|
||||
|
||||
## Status: CLOSED (2026-04-22)
|
||||
|
||||
All three stages complete.
|
||||
|
||||
| Stage | Status | Notes |
|
||||
|-------|--------|-------|
|
||||
| Stage 1 — Platform additions (S1.1–S1.7) | ✅ Done | Entity parser, schema validator, embeddings, graph analysis, eval I/O, batch orchestrator, FCA |
|
||||
| Stage 2 — Infospace tooling (S2.1–S2.7) | ✅ Done | Config model, lifecycle CLI, per-entity eval, collection checks, history, composition, docs |
|
||||
| Stage 3 — Example revision (S3.1–S3.5) | ✅ Done (except cosmetic S3.2) | See `roadmap/infospace-s3-closeout/PLAN.md` |
|
||||
|
||||
**Final validation (Wealth of Nations / VSM example, 988 entities):**
|
||||
- 988 per-entity evaluations landed
|
||||
- Collection checks pass 6/6 viability thresholds (`per_entity_mean = 3.957`
|
||||
against threshold 3.5; `redundancy_ratio = 0.006`; `coverage_ratio = 0.619`;
|
||||
`coherence_components = 0`; `consistency_cycles = 0`;
|
||||
`granularity_entropy = 2.675`)
|
||||
- Composition demonstrated via `examples/supply-chain-vsm/`
|
||||
- S3.2 (clean per-chapter git history) deferred as cosmetic-only; rationale
|
||||
in the close-out plan
|
||||
|
||||
See `roadmap/infospace-s3-closeout/PLAN.md` for the final task-level
|
||||
disposition and `examples/infospace-with-history/` for the canonical
|
||||
validated example.
|
||||
|
||||
---
|
||||
|
||||
## Vision
|
||||
|
||||
An **infospace** is a structured, evaluable, composable collection of
|
||||
|
||||
214
roadmap/llm-shared-library/PLAN.md
Normal file
214
roadmap/llm-shared-library/PLAN.md
Normal file
@@ -0,0 +1,214 @@
|
||||
# LLM Adapter Layer — Extract as Shared Library
|
||||
|
||||
## Vision
|
||||
|
||||
The `markitect.llm` module is a clean, stdlib-only adapter layer for calling
|
||||
LLMs via OpenRouter, Gemini, OpenAI, and the Claude Code CLI. It implements a
|
||||
uniform interface, a 7-layer TOML config chain, embedding support with caching,
|
||||
and typed exceptions. It should be usable by all projects in the Bernd Worsch
|
||||
ecosystem without pulling in all of markitect.
|
||||
|
||||
This roadmap tracks extracting it into a standalone installable library.
|
||||
|
||||
---
|
||||
|
||||
## Current State
|
||||
|
||||
The module lives at `markitect/llm/` (~16 files, ~1500 LOC, stdlib-only) and
|
||||
provides:
|
||||
- **4 text adapters**: OpenRouter, Gemini, OpenAI, Claude Code CLI
|
||||
- **2 embedding adapters**: OpenAI-compatible (OpenAI + OpenRouter)
|
||||
- **Embedding cache**: JSON-backed, content-digest validated
|
||||
- **Similarity utilities**: pure-Python cosine similarity, matrix, pair-finding
|
||||
- **7-layer TOML config chain**: CLI > env > user/dir preference/default > hardcoded
|
||||
- **Typed exceptions**: LLMError hierarchy
|
||||
- **HTTP wrapper**: urllib-only, typed exception translation
|
||||
|
||||
### Two Coupling Issues Blocking Clean Extraction
|
||||
|
||||
| Issue | Location | Severity |
|
||||
|-------|----------|----------|
|
||||
| `RunConfig` and `LLMResponse` are defined in `markitect.prompts.execution.models`, not in `markitect.llm` | `markitect/prompts/execution/models.py` | High — creates cross-module import for all consumers |
|
||||
| TOML config chain hardcodes `"markitect"` as app name (paths: `~/.config/markitect/`, env prefix `MARKITECT_`, files: `.markitect.toml`) | `markitect/llm/toml_config.py` | Medium — consumers either accept markitect config or can't use the chain |
|
||||
|
||||
---
|
||||
|
||||
## Terminology
|
||||
|
||||
- **adapter**: concrete implementation of `LLMAdapter` for a single provider
|
||||
- **factory**: `create_adapter()` / `create_embedding_adapter()` — provider-agnostic entry points
|
||||
- **config chain**: 7-layer resolution of provider + model (CLI → env → TOML → hardcoded)
|
||||
- **standalone library**: a Python package installable with `pip install` from a git URL or local path, without PyPI
|
||||
- **consumer**: any project that imports and uses the library (markitect itself, custodian, railiance, etc.)
|
||||
|
||||
---
|
||||
|
||||
## Packaging Decision (Pending)
|
||||
|
||||
Before Phase 2 starts, one architectural decision must be resolved:
|
||||
|
||||
> **D1: Where does the extracted library live?**
|
||||
>
|
||||
> **Option A — Standalone repo** (`~/bw-llm` or similar):
|
||||
> - Clean separation, versioned independently, installable via `pip install git+file:///...` or git URL
|
||||
> - Adds a repo to maintain; changes require bumping version in dependents
|
||||
>
|
||||
> **Option B — Subfolder of markitect with own `pyproject.toml`** (monorepo-lite):
|
||||
> - Stays co-located with the main codebase that will use it most
|
||||
> - Less friction for iteration; single git history
|
||||
> - Slightly unorthodox but valid for personal infrastructure
|
||||
>
|
||||
> **Option C — Just `pip install markitect` in other projects**:
|
||||
> - Zero extraction work; reuse today
|
||||
> - Pulls all of markitect (prompts, infospace, CLI, etc.) as transitive deps
|
||||
> - Acceptable short-term if other projects are small
|
||||
|
||||
---
|
||||
|
||||
## Stages
|
||||
|
||||
### Stage 1 — Decouple (within markitect)
|
||||
|
||||
Prepare the module for extraction without changing its public API.
|
||||
|
||||
#### S1.1 — Move RunConfig + LLMResponse into markitect.llm
|
||||
|
||||
`RunConfig` and `LLMResponse` are currently in `markitect.prompts.execution.models`.
|
||||
The LLM adapters import from there, creating a hard dependency on the prompt system.
|
||||
|
||||
**Work:**
|
||||
- Move both dataclasses to `markitect/llm/models.py`
|
||||
- Update all imports in `markitect.llm` and `markitect.prompts`
|
||||
- Keep a re-export shim in `markitect.prompts.execution.models` for backwards compat
|
||||
|
||||
**Acceptance:** `markitect/llm/` has zero imports from `markitect.prompts.*`
|
||||
|
||||
#### S1.2 — Parameterize the TOML config chain
|
||||
|
||||
Replace the hardcoded `"markitect"` app name with a configurable `app_name` parameter.
|
||||
|
||||
**Work:**
|
||||
- Add `app_name: str = "markitect"` parameter to `resolve_llm()` and the config
|
||||
path helpers in `toml_config.py`
|
||||
- Derive config file path (`~/.config/{app_name}/config.toml`), env prefix
|
||||
(`{APP_NAME}_HELPER_MODEL`), and local config file (`.{app_name}.toml`) from it
|
||||
- All existing behaviour is preserved when `app_name="markitect"` (default)
|
||||
|
||||
**Acceptance:** A consumer can call `resolve_llm(app_name="railiance")` and get
|
||||
config from `~/.config/railiance/config.toml` and `RAILIANCE_HELPER_MODEL`.
|
||||
|
||||
#### S1.3 — Isolation tests
|
||||
|
||||
Write a test file that imports only from `markitect.llm.*` and verifies no
|
||||
accidental coupling remains.
|
||||
|
||||
**Acceptance:** `pytest tests/test_llm_isolation.py` passes; no import of
|
||||
`markitect.prompts` or `markitect.infospace` in the LLM module tree.
|
||||
|
||||
---
|
||||
|
||||
### Stage 2 — Extract
|
||||
|
||||
#### S2.1 — Resolve D1: packaging location
|
||||
|
||||
Record the decision and create the package scaffold.
|
||||
|
||||
**Acceptance:** D1 resolved, `pyproject.toml` for the library exists at the
|
||||
chosen location with name, version `0.1.0`, and declared dependencies.
|
||||
|
||||
#### S2.2 — Create standalone package
|
||||
|
||||
Move (or symlink) the llm module into the new package structure. Wire up
|
||||
the `pyproject.toml` entry points. Verify `pip install -e <path>` works.
|
||||
|
||||
**Files to carry over:**
|
||||
```
|
||||
llm/
|
||||
__init__.py # re-exports: create_adapter, create_embedding_adapter,
|
||||
# LLMAdapter, EmbeddingAdapter, LLMConfig, exceptions
|
||||
models.py # RunConfig, LLMResponse (moved from S1.1)
|
||||
config.py # load_config, resolve_api_key
|
||||
toml_config.py # resolve_llm (parameterized from S1.2)
|
||||
factory.py # create_adapter
|
||||
exceptions.py # LLM exception hierarchy
|
||||
openrouter.py
|
||||
claude_code.py
|
||||
gemini.py
|
||||
openai.py
|
||||
embedding_adapter.py
|
||||
embedding_openai.py
|
||||
embedding_factory.py # create_embedding_adapter
|
||||
embedding_cache.py
|
||||
similarity.py
|
||||
_http.py
|
||||
_token_estimator.py
|
||||
```
|
||||
|
||||
**Acceptance:** `python -c "from bw_llm import create_adapter; print('ok')"` works
|
||||
in a fresh venv with only the new package installed.
|
||||
|
||||
#### S2.3 — Update markitect to depend on extracted package
|
||||
|
||||
Replace `markitect/llm/` with an import alias pointing to the new package, or
|
||||
add the package as a path dependency in markitect's `pyproject.toml`.
|
||||
|
||||
**Acceptance:** All markitect tests pass; `markitect/llm/__init__.py` is either
|
||||
removed or becomes a thin re-export of `bw_llm`.
|
||||
|
||||
#### S2.4 — Integration smoke test
|
||||
|
||||
Run the full markitect infospace pipeline (entity extraction + evaluation) end-to-end
|
||||
against a small fixture to confirm nothing broke.
|
||||
|
||||
**Acceptance:** `markitect infospace evaluate --dry-run` succeeds on a 3-entity fixture.
|
||||
|
||||
---
|
||||
|
||||
### Stage 3 — Adopt in First Consumer
|
||||
|
||||
#### S3.1 — Integrate in one other project
|
||||
|
||||
Pick the first real consumer (likely the custodian state-hub, for LLM-assisted
|
||||
state summaries or decision rationale generation) and wire up the library.
|
||||
|
||||
**Work:**
|
||||
- Add `bw-llm` (or equivalent) as a dependency
|
||||
- Write a small usage example (e.g., `llm_helper.py`)
|
||||
- Confirm config chain works with the consumer's own app name
|
||||
|
||||
#### S3.2 — Usage guide
|
||||
|
||||
Write `README.md` for the library covering:
|
||||
- Installation (local path / git URL)
|
||||
- Supported providers and env vars
|
||||
- TOML config file locations and format
|
||||
- `create_adapter()` / `create_embedding_adapter()` quick-start
|
||||
- Error handling
|
||||
|
||||
**Acceptance:** Another developer (or agent) can follow the README to use the library
|
||||
in a new project without reading source code.
|
||||
|
||||
---
|
||||
|
||||
## Stage Summary
|
||||
|
||||
| Stage | Description | Key Deliverable | Blocks |
|
||||
|-------|-------------|-----------------|--------|
|
||||
| S1.1 | Move RunConfig/LLMResponse to llm | Zero cross-module deps | S2.2 |
|
||||
| S1.2 | Parameterize app name | Configurable config chain | S2.2 |
|
||||
| S1.3 | Isolation tests | Green test suite | S2.1 |
|
||||
| S2.1 | Resolve packaging decision (D1) | pyproject.toml scaffold | S2.2 |
|
||||
| S2.2 | Create standalone package | `pip install` works | S2.3 |
|
||||
| S2.3 | Update markitect | markitect uses extracted lib | S2.4 |
|
||||
| S2.4 | Integration smoke test | Full pipeline passes | S3.1 |
|
||||
| S3.1 | First consumer integration | Library used in real project | S3.2 |
|
||||
| S3.2 | Usage guide | README published | — |
|
||||
|
||||
---
|
||||
|
||||
## Out of Scope
|
||||
|
||||
- Publishing to PyPI (unnecessary for personal infrastructure; git/local installs suffice)
|
||||
- Adding new LLM providers (separate concern)
|
||||
- Porting the helper CLI to the library (the CLI is markitect-specific)
|
||||
- Async adapters (current sync interface is sufficient; can be added later)
|
||||
176
roadmap/testdrive-jsui-publication/PLAN.md
Normal file
176
roadmap/testdrive-jsui-publication/PLAN.md
Normal file
@@ -0,0 +1,176 @@
|
||||
# TestDrive-JSUI — npm Publication
|
||||
|
||||
## Context
|
||||
|
||||
TestDrive-JSUI is a JavaScript-first markdown editor library living at
|
||||
`capabilities/testdrive-jsui/`. Phases 1–6 (build system, bundling, testing,
|
||||
migration) are complete. 84 tests pass (68 JS + 15 Python + 1 fixes).
|
||||
Single source of truth: `capabilities/testdrive-jsui/js/`.
|
||||
|
||||
This workstream covers the remaining work to publish the library to npm and
|
||||
close out the capability.
|
||||
|
||||
**Source:** `capabilities/testdrive-jsui/TODO.md` (Phases 7–9)
|
||||
**Package name:** `testdrive-jsui` (to be confirmed in P.1)
|
||||
**Current version:** 1.0.0
|
||||
|
||||
---
|
||||
|
||||
## Tasks
|
||||
|
||||
### P.1 — Pre-publication: decide repository structure
|
||||
|
||||
The library currently lives inside the markitect monorepo. Before publishing to
|
||||
npm, decide whether it ships from here or from a dedicated repo.
|
||||
|
||||
**Options:**
|
||||
- A: Publish directly from `capabilities/testdrive-jsui/` — simpler, no repo split
|
||||
- B: Extract to a standalone `testdrive-jsui` repo — cleaner for npm consumers
|
||||
|
||||
Record the decision and proceed accordingly.
|
||||
|
||||
**Acceptance:** Decision recorded; if B, standalone repo created and code copied.
|
||||
|
||||
---
|
||||
|
||||
### P.2 — Pre-publication: verify Markitect integration
|
||||
|
||||
Confirm the main Markitect application still works correctly with the current
|
||||
capability code before publishing.
|
||||
|
||||
```bash
|
||||
cd /home/worsch/markitect-main
|
||||
make testdrive-jsui-test-all # 84 tests must pass
|
||||
# Manually verify view and edit modes in the running Markitect app
|
||||
```
|
||||
|
||||
**Acceptance:** All 84 tests pass; view and edit modes confirmed working.
|
||||
|
||||
---
|
||||
|
||||
### P.3 — Pre-publication: decide STANDALONE_PLAN.md
|
||||
|
||||
`STANDALONE_PLAN.md` exists in the capability but its status is unclear. Either:
|
||||
- Implement it (if it describes meaningful standalone work)
|
||||
- Explicitly archive it with a note that the standalone use case is covered by the npm package
|
||||
|
||||
**Acceptance:** File updated with a clear status note; or deleted if obsolete.
|
||||
|
||||
---
|
||||
|
||||
### P.4 — Pre-publication: pack and dry-run
|
||||
|
||||
Run the full pre-publish checklist.
|
||||
|
||||
```bash
|
||||
cd capabilities/testdrive-jsui
|
||||
npm run lint # zero errors
|
||||
npm test # all 84 tests pass
|
||||
npm run build:prod # clean production build
|
||||
npm pack # creates testdrive-jsui-1.0.0.tgz
|
||||
npm install ./testdrive-jsui-1.0.0.tgz --dry-run # verify install
|
||||
npm publish --dry-run # verify what will be published
|
||||
```
|
||||
|
||||
Review `--dry-run` output: confirm only intended files are included (check
|
||||
`.npmignore` or `files` field in `package.json`).
|
||||
|
||||
**Acceptance:** `npm publish --dry-run` succeeds with expected file list; no
|
||||
test files, source maps, or internal docs included unintentionally.
|
||||
|
||||
---
|
||||
|
||||
### P.5 — Pre-publication: create release tag
|
||||
|
||||
```bash
|
||||
git tag -a v1.0.0 -m "Release testdrive-jsui v1.0.0"
|
||||
# (push tag to remote when ready)
|
||||
```
|
||||
|
||||
**Acceptance:** Tag `v1.0.0` exists on main; CHANGELOG.md entry present for 1.0.0.
|
||||
|
||||
---
|
||||
|
||||
### P.6 — Publication: publish to npm
|
||||
|
||||
```bash
|
||||
cd capabilities/testdrive-jsui
|
||||
npm login # if not already logged in
|
||||
npm publish
|
||||
```
|
||||
|
||||
Then verify:
|
||||
- Package visible at `https://www.npmjs.com/package/testdrive-jsui`
|
||||
- Wait 5–10 minutes, then check CDN availability:
|
||||
- `https://cdn.jsdelivr.net/npm/testdrive-jsui@1.0.0/dist/testdrive-jsui.min.js`
|
||||
- `https://unpkg.com/testdrive-jsui@1.0.0/dist/testdrive-jsui.min.js`
|
||||
|
||||
**Acceptance:** Package installable via `npm install testdrive-jsui`.
|
||||
|
||||
---
|
||||
|
||||
### P.7 — Publication: fresh install test
|
||||
|
||||
In a clean temporary directory, install from npm and verify the library works
|
||||
with a minimal HTML file.
|
||||
|
||||
```bash
|
||||
mkdir /tmp/testdrive-test && cd /tmp/testdrive-test
|
||||
npm install testdrive-jsui marked
|
||||
# Open standalone.html equivalent, confirm editor initialises
|
||||
```
|
||||
|
||||
**Acceptance:** `new TestDriveJSUI({...})` works in a fresh install with no
|
||||
reference to the capability source directory.
|
||||
|
||||
---
|
||||
|
||||
### P.8 — Publication: GitHub release
|
||||
|
||||
Create a GitHub release from the v1.0.0 tag with:
|
||||
- Release notes (summary from CHANGELOG.md 1.0.0 entry)
|
||||
- Link to npm package
|
||||
- Link to CDN URLs (jsdelivr, unpkg)
|
||||
|
||||
**Acceptance:** GitHub release published and visible.
|
||||
|
||||
---
|
||||
|
||||
### P.9 — Post-publication: README badges and monitoring
|
||||
|
||||
Add npm badges to `capabilities/testdrive-jsui/README.md`:
|
||||
|
||||
```markdown
|
||||
[](...)
|
||||
[](...)
|
||||
```
|
||||
|
||||
Set a reminder to check download stats after 1 week.
|
||||
Demo page and GitHub Pages are optional — do only if there's a specific audience
|
||||
to point at it.
|
||||
|
||||
**Acceptance:** README has version and download count badges; committed.
|
||||
|
||||
---
|
||||
|
||||
## Task order
|
||||
|
||||
```
|
||||
P.1 (repo decision)
|
||||
P.2 (Markitect integration check) ← can run in parallel with P.1
|
||||
P.3 (STANDALONE_PLAN decision) ← can run in parallel
|
||||
↓
|
||||
P.4 (pack + dry-run) ← needs P.1, P.2, P.3 all done
|
||||
P.5 (release tag) ← can run with P.4
|
||||
↓
|
||||
P.6 (publish)
|
||||
P.7 (fresh install test)
|
||||
P.8 (GitHub release)
|
||||
P.9 (badges + monitoring)
|
||||
```
|
||||
|
||||
## Out of scope
|
||||
|
||||
- Adding new features before publication (ship what's there)
|
||||
- Ruby or Java adapters (optional integrations, not blocking publication)
|
||||
- Paid npm features (keep on free tier)
|
||||
@@ -30,7 +30,7 @@ class TestActualRoundtripBehavior:
|
||||
cmd = ["python", "-m", "markitect.cli"] + args
|
||||
result = subprocess.run(
|
||||
cmd,
|
||||
cwd="/home/worsch/markitect_project",
|
||||
cwd="/home/worsch/markitect-main",
|
||||
capture_output=True,
|
||||
text=True
|
||||
)
|
||||
|
||||
@@ -5,7 +5,7 @@ This test implements the requirements for initializing a SQLite database
|
||||
and storing markdown files with front matter parsing.
|
||||
|
||||
Issue #1: Initialize Database and Store Example Markdown File
|
||||
https://gitea.coulomb.social/coulomb/markitect_project/issues/1
|
||||
https://gitea.coulomb.social/coulomb/markitect-main/issues/1
|
||||
"""
|
||||
|
||||
import pytest
|
||||
|
||||
159
tests/test_llm_isolation.py
Normal file
159
tests/test_llm_isolation.py
Normal file
@@ -0,0 +1,159 @@
|
||||
"""
|
||||
S1.3 — LLM isolation gate.
|
||||
|
||||
Confirms that markitect.llm.* has zero imports from markitect.prompts.*
|
||||
or markitect.infospace.*, making the module safe to extract into a
|
||||
standalone llm-connect library.
|
||||
|
||||
These tests must pass before extraction (S2).
|
||||
"""
|
||||
|
||||
import importlib
|
||||
import pkgutil
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
def _collect_llm_modules() -> list[str]:
|
||||
"""Return fully-qualified names of all modules under markitect.llm."""
|
||||
import markitect.llm as pkg
|
||||
pkg_path = Path(pkg.__file__).parent
|
||||
names = []
|
||||
for info in pkgutil.walk_packages([str(pkg_path)], prefix="markitect.llm."):
|
||||
names.append(info.name)
|
||||
# Include the package itself
|
||||
names.insert(0, "markitect.llm")
|
||||
return names
|
||||
|
||||
|
||||
def _direct_imports(module_name: str) -> set[str]:
|
||||
"""Return set of top-level module names imported by *module_name*."""
|
||||
mod = importlib.import_module(module_name)
|
||||
src_file = getattr(mod, "__file__", None)
|
||||
if not src_file or not src_file.endswith(".py"):
|
||||
return set()
|
||||
|
||||
imports: set[str] = set()
|
||||
with open(src_file) as f:
|
||||
for line in f:
|
||||
stripped = line.strip()
|
||||
if stripped.startswith("from ") or stripped.startswith("import "):
|
||||
# Extract the root package of the imported name
|
||||
parts = stripped.split()
|
||||
if parts[0] == "from" and len(parts) >= 2:
|
||||
imports.add(parts[1].split(".")[0] + "." + parts[1].split(".")[1]
|
||||
if "." in parts[1] else parts[1])
|
||||
# Also capture full dotted path for cross-module check
|
||||
imports.add(parts[1])
|
||||
return imports
|
||||
|
||||
|
||||
def _import_lines(src_file: str) -> list[str]:
|
||||
"""Return only import-statement lines from a Python source file."""
|
||||
lines = []
|
||||
with open(src_file) as f:
|
||||
for line in f:
|
||||
stripped = line.strip()
|
||||
if stripped.startswith("from ") or stripped.startswith("import "):
|
||||
lines.append(stripped)
|
||||
return lines
|
||||
|
||||
|
||||
def test_no_prompts_import_in_llm_tree():
|
||||
"""markitect.llm must not import anything from markitect.prompts.*"""
|
||||
violations = []
|
||||
for mod_name in _collect_llm_modules():
|
||||
try:
|
||||
mod = importlib.import_module(mod_name)
|
||||
except ImportError:
|
||||
continue
|
||||
src_file = getattr(mod, "__file__", None)
|
||||
if not src_file or not src_file.endswith(".py"):
|
||||
continue
|
||||
for line in _import_lines(src_file):
|
||||
if "markitect.prompts" in line:
|
||||
violations.append(mod_name)
|
||||
break
|
||||
|
||||
assert violations == [], (
|
||||
f"These llm modules still import from markitect.prompts: {violations}"
|
||||
)
|
||||
|
||||
|
||||
def test_no_infospace_import_in_llm_tree():
|
||||
"""markitect.llm must not import anything from markitect.infospace.*"""
|
||||
violations = []
|
||||
for mod_name in _collect_llm_modules():
|
||||
try:
|
||||
mod = importlib.import_module(mod_name)
|
||||
except ImportError:
|
||||
continue
|
||||
src_file = getattr(mod, "__file__", None)
|
||||
if not src_file or not src_file.endswith(".py"):
|
||||
continue
|
||||
for line in _import_lines(src_file):
|
||||
if "markitect.infospace" in line:
|
||||
violations.append(mod_name)
|
||||
break
|
||||
|
||||
assert violations == [], (
|
||||
f"These llm modules still import from markitect.infospace: {violations}"
|
||||
)
|
||||
|
||||
|
||||
def test_runconfig_and_llmresponse_canonical_in_llm():
|
||||
"""RunConfig and LLMResponse must be defined in markitect.llm.models."""
|
||||
from markitect.llm.models import RunConfig, LLMResponse
|
||||
|
||||
assert RunConfig.__module__ == "markitect.llm.models", (
|
||||
f"RunConfig.module = {RunConfig.__module__!r}, expected 'markitect.llm.models'"
|
||||
)
|
||||
assert LLMResponse.__module__ == "markitect.llm.models", (
|
||||
f"LLMResponse.module = {LLMResponse.__module__!r}, expected 'markitect.llm.models'"
|
||||
)
|
||||
|
||||
|
||||
def test_llmadapter_canonical_in_llm():
|
||||
"""LLMAdapter must be defined in markitect.llm.adapter."""
|
||||
from markitect.llm.adapter import LLMAdapter
|
||||
|
||||
assert LLMAdapter.__module__ == "markitect.llm.adapter", (
|
||||
f"LLMAdapter.module = {LLMAdapter.__module__!r}, expected 'markitect.llm.adapter'"
|
||||
)
|
||||
|
||||
|
||||
def test_backward_compat_prompts_reexport():
|
||||
"""markitect.prompts.execution.models must still export RunConfig/LLMResponse."""
|
||||
from markitect.prompts.execution.models import RunConfig, LLMResponse
|
||||
from markitect.llm.models import RunConfig as RC, LLMResponse as LR
|
||||
|
||||
assert RunConfig is RC, "prompts re-export RunConfig must be the same object as llm.models.RunConfig"
|
||||
assert LLMResponse is LR, "prompts re-export LLMResponse must be the same object as llm.models.LLMResponse"
|
||||
|
||||
|
||||
def test_backward_compat_llmadapter_reexport():
|
||||
"""markitect.prompts.execution.llm_adapter must still export LLMAdapter."""
|
||||
from markitect.prompts.execution.llm_adapter import LLMAdapter
|
||||
from markitect.llm.adapter import LLMAdapter as LA
|
||||
|
||||
assert LLMAdapter is LA, "prompts re-export LLMAdapter must be the same object as llm.adapter.LLMAdapter"
|
||||
|
||||
|
||||
def test_app_name_parameterization():
|
||||
"""resolve_llm(app_name=X) uses ~/.config/X/config.toml and X_HELPER_MODEL."""
|
||||
from markitect.llm.toml_config import (
|
||||
_model_env_var,
|
||||
_user_config_path,
|
||||
_dir_config_name,
|
||||
resolve_llm,
|
||||
)
|
||||
|
||||
assert _model_env_var("railiance") == "RAILIANCE_HELPER_MODEL"
|
||||
assert _model_env_var("markitect") == "MARKITECT_HELPER_MODEL"
|
||||
assert str(_user_config_path("railiance")).endswith(".config/railiance/config.toml")
|
||||
assert _dir_config_name("railiance") == ".railiance.toml"
|
||||
|
||||
# Smoke: resolve falls back to hardcoded for unknown app
|
||||
r = resolve_llm(app_name="nonexistent_app_xyz")
|
||||
assert r.provider_source == "hardcoded"
|
||||
assert r.model_source == "hardcoded"
|
||||
@@ -33,7 +33,7 @@ class TestRoundtripBase:
|
||||
cmd,
|
||||
capture_output=True,
|
||||
text=True,
|
||||
cwd="/home/worsch/markitect_project"
|
||||
cwd="/home/worsch/markitect-main"
|
||||
)
|
||||
|
||||
def validate_basic_structure_preservation(self, original: str, reconstructed: str) -> Dict[str, Any]:
|
||||
|
||||
@@ -223,3 +223,129 @@ class TestViabilityCommand:
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert "No viability thresholds" in result.output
|
||||
|
||||
|
||||
# ── chapters (per-source triage view) ────────────────────────────────
|
||||
|
||||
|
||||
class TestChaptersCommand:
|
||||
@pytest.fixture
|
||||
def chapters_dir(self, tmp_path):
|
||||
"""Infospace with 2 source files and matching entities."""
|
||||
config_yaml = """\
|
||||
topic:
|
||||
name: "WoN"
|
||||
domain: "Economics"
|
||||
sources: artifacts/sources
|
||||
"""
|
||||
(tmp_path / "infospace.yaml").write_text(config_yaml)
|
||||
|
||||
sources = tmp_path / "artifacts" / "sources"
|
||||
sources.mkdir(parents=True)
|
||||
(sources / "book-1-chapter-01.md").write_text("# Chapter 1\n\nText.\n")
|
||||
(sources / "book-1-chapter-02.md").write_text("# Chapter 2\n\nText.\n")
|
||||
|
||||
entities = tmp_path / "output" / "entities"
|
||||
entities.mkdir(parents=True)
|
||||
(entities / "alpha.md").write_text(
|
||||
"# Alpha\n\n## Definition\n\nX.\n\n"
|
||||
"## Source Chapter\n\nBook I, Chapter 1\n"
|
||||
)
|
||||
(entities / "beta.md").write_text(
|
||||
"# Beta\n\n## Definition\n\nY.\n\n"
|
||||
"## Source Chapter\n\nBook I, Chapter 2\n"
|
||||
)
|
||||
(entities / "gamma.md").write_text(
|
||||
"# Gamma\n\n## Definition\n\nZ.\n\n"
|
||||
"## Source Chapter\n\nBook I, Chapter 2\n"
|
||||
)
|
||||
return tmp_path
|
||||
|
||||
def test_lists_sources_with_counts(self, runner, chapters_dir):
|
||||
result = runner.invoke(
|
||||
infospace_commands,
|
||||
["chapters", "--config", str(chapters_dir / "infospace.yaml")],
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
assert "book-1-chapter-01" in result.output
|
||||
assert "book-1-chapter-02" in result.output
|
||||
# ch 1 -> 1 entity, ch 2 -> 2 entities
|
||||
assert "2 source file(s); 3 entities" in result.output
|
||||
|
||||
def test_json_format(self, runner, chapters_dir):
|
||||
result = runner.invoke(
|
||||
infospace_commands,
|
||||
["chapters", "--config", str(chapters_dir / "infospace.yaml"),
|
||||
"--format", "json"],
|
||||
)
|
||||
assert result.exit_code == 0
|
||||
import json
|
||||
rows = json.loads(result.output)
|
||||
by_id = {r["source_id"]: r for r in rows}
|
||||
assert by_id["book-1-chapter-01"]["entities"] == 1
|
||||
assert by_id["book-1-chapter-02"]["entities"] == 2
|
||||
|
||||
def test_no_sources_dir(self, runner, tmp_path):
|
||||
(tmp_path / "infospace.yaml").write_text(
|
||||
"topic:\n name: X\n sources: missing\n"
|
||||
)
|
||||
result = runner.invoke(
|
||||
infospace_commands,
|
||||
["chapters", "--config", str(tmp_path / "infospace.yaml")],
|
||||
)
|
||||
assert result.exit_code == 1
|
||||
|
||||
|
||||
# ── process: eval-after-source / classify-after-source flags ─────────
|
||||
|
||||
|
||||
class TestProcessAfterSourceFlags:
|
||||
def test_flags_registered_in_help(self, runner):
|
||||
result = runner.invoke(infospace_commands, ["process", "--help"])
|
||||
assert result.exit_code == 0
|
||||
assert "--eval-after-source" in result.output
|
||||
assert "--classify-after-source" in result.output
|
||||
|
||||
def test_flags_require_provider(self, runner, tmp_path):
|
||||
(tmp_path / "infospace.yaml").write_text(
|
||||
"topic:\n name: X\n sources: sources\n"
|
||||
"pipeline:\n stages:\n - template: extract-entities\n"
|
||||
)
|
||||
sources = tmp_path / "sources"
|
||||
sources.mkdir()
|
||||
(sources / "s1.md").write_text("source")
|
||||
result = runner.invoke(
|
||||
infospace_commands,
|
||||
["process", "--all",
|
||||
"--config", str(tmp_path / "infospace.yaml"),
|
||||
"--eval-after-source"],
|
||||
)
|
||||
assert result.exit_code == 1
|
||||
assert "require --provider" in result.output
|
||||
|
||||
|
||||
# ── pipeline: commit body composition ────────────────────────────────
|
||||
|
||||
|
||||
class TestCommitBodyComposition:
|
||||
def test_bucket_for(self, tmp_path):
|
||||
from markitect.infospace.config import InfospaceConfig, TopicConfig
|
||||
from markitect.infospace.pipeline import SourcePipeline
|
||||
cfg = InfospaceConfig(topic=TopicConfig(name="T", domain="D"))
|
||||
p = SourcePipeline(cfg, tmp_path)
|
||||
assert p._bucket_for("output/entities/x.md") == "entities"
|
||||
assert p._bucket_for("output/evaluations/x.md") == "evaluations"
|
||||
assert p._bucket_for("output/classifications/x.md") == "classifications"
|
||||
assert p._bucket_for("output/mappings/x.md") == "mappings"
|
||||
assert p._bucket_for("output/notes/x.md") == "other"
|
||||
assert p._bucket_for("README.md") is None # not under output/
|
||||
|
||||
def test_compose_body_uses_default_on_no_diff(self, tmp_path):
|
||||
"""When git diff fails or returns empty, fall back to the default blurb."""
|
||||
from markitect.infospace.config import InfospaceConfig, TopicConfig
|
||||
from markitect.infospace.pipeline import SourcePipeline
|
||||
cfg = InfospaceConfig(topic=TopicConfig(name="T", domain="D"))
|
||||
# Not a git repo, so `git diff --cached` will raise CalledProcessError.
|
||||
p = SourcePipeline(cfg, tmp_path)
|
||||
body = p._compose_commit_body("some-source")
|
||||
assert "Extract entities" in body
|
||||
|
||||
@@ -124,6 +124,33 @@ class TestMetricsFileIO:
|
||||
path.write_text("just a string", encoding="utf-8")
|
||||
assert read_metrics_file(path) == {}
|
||||
|
||||
def test_round_trip_preserves_structured_values(self, tmp_path):
|
||||
"""Non-numeric values like type_distribution must survive a round-trip.
|
||||
|
||||
Regression: eval-summary --update-metrics used to drop any key
|
||||
whose value wasn't a bare number, silently erasing type_distribution
|
||||
from the file on every run.
|
||||
"""
|
||||
path = tmp_path / "metrics.yaml"
|
||||
metrics = {
|
||||
"per_entity_mean": 3.9567,
|
||||
"vsm_type_matrix_cells": 29,
|
||||
"type_distribution": {
|
||||
"Element": 315,
|
||||
"Institution": 122,
|
||||
"Principle": 102,
|
||||
},
|
||||
}
|
||||
write_metrics_file(metrics, path)
|
||||
loaded = read_metrics_file(path)
|
||||
assert loaded["type_distribution"] == {
|
||||
"Element": 315, "Institution": 122, "Principle": 102,
|
||||
}
|
||||
# And the int stayed an int on disk, not 29.0.
|
||||
raw = path.read_text(encoding="utf-8")
|
||||
assert "vsm_type_matrix_cells: 29\n" in raw
|
||||
assert "vsm_type_matrix_cells: 29.0" not in raw
|
||||
|
||||
|
||||
# ── record_check_results ────────────────────────────────────────────
|
||||
|
||||
|
||||
82
tests/unit/llm/test_gemini.py
Normal file
82
tests/unit/llm/test_gemini.py
Normal file
@@ -0,0 +1,82 @@
|
||||
"""Tests for markitect.llm.gemini — retry behavior + happy path."""
|
||||
|
||||
from unittest import mock
|
||||
|
||||
import pytest
|
||||
|
||||
from markitect.llm.gemini import GeminiAdapter
|
||||
from markitect.llm.exceptions import LLMAPIError, LLMRateLimitError
|
||||
from markitect.prompts.execution.models import RunConfig, LLMResponse
|
||||
|
||||
|
||||
def _api_response(text="hello", model="gemini-2.5-flash"):
|
||||
return {
|
||||
"candidates": [
|
||||
{
|
||||
"content": {"parts": [{"text": text}], "role": "model"},
|
||||
"finishReason": "STOP",
|
||||
}
|
||||
],
|
||||
"modelVersion": model,
|
||||
"usageMetadata": {
|
||||
"promptTokenCount": 3,
|
||||
"candidatesTokenCount": 2,
|
||||
"totalTokenCount": 5,
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
class TestGeminiAdapter:
|
||||
def _adapter(self, **kwargs):
|
||||
defaults = {"api_key": "AIza-test"}
|
||||
defaults.update(kwargs)
|
||||
return GeminiAdapter(**defaults)
|
||||
|
||||
@mock.patch("markitect.llm.gemini.post_json")
|
||||
def test_success(self, mock_post):
|
||||
mock_post.return_value = _api_response("generated")
|
||||
adapter = self._adapter()
|
||||
resp = adapter.execute_prompt("hi", RunConfig())
|
||||
assert isinstance(resp, LLMResponse)
|
||||
assert resp.content == "generated"
|
||||
assert resp.metadata["provider"] == "gemini"
|
||||
|
||||
@mock.patch("markitect.llm.gemini.post_json")
|
||||
@mock.patch("markitect.llm.gemini.time.sleep")
|
||||
def test_retry_on_429(self, mock_sleep, mock_post):
|
||||
mock_post.side_effect = [
|
||||
LLMRateLimitError("rate limited", status_code=429),
|
||||
_api_response("recovered"),
|
||||
]
|
||||
adapter = self._adapter(max_retries=2)
|
||||
resp = adapter.execute_prompt("hi", RunConfig())
|
||||
assert resp.content == "recovered"
|
||||
assert mock_sleep.call_count == 1
|
||||
|
||||
@mock.patch("markitect.llm.gemini.post_json")
|
||||
@mock.patch("markitect.llm.gemini.time.sleep")
|
||||
def test_retry_on_503(self, mock_sleep, mock_post):
|
||||
mock_post.side_effect = [
|
||||
LLMAPIError("unavailable", status_code=503),
|
||||
_api_response("back"),
|
||||
]
|
||||
adapter = self._adapter(max_retries=2)
|
||||
resp = adapter.execute_prompt("hi", RunConfig())
|
||||
assert resp.content == "back"
|
||||
|
||||
@mock.patch("markitect.llm.gemini.post_json")
|
||||
def test_no_retry_on_400(self, mock_post):
|
||||
mock_post.side_effect = LLMAPIError("bad request", status_code=400)
|
||||
adapter = self._adapter(max_retries=2)
|
||||
with pytest.raises(LLMAPIError) as exc_info:
|
||||
adapter.execute_prompt("hi", RunConfig())
|
||||
assert exc_info.value.status_code == 400
|
||||
|
||||
@mock.patch("markitect.llm.gemini.post_json")
|
||||
@mock.patch("markitect.llm.gemini.time.sleep")
|
||||
def test_exhausted_retries_raises(self, mock_sleep, mock_post):
|
||||
mock_post.side_effect = LLMRateLimitError("rate limited", status_code=429)
|
||||
adapter = self._adapter(max_retries=1)
|
||||
with pytest.raises(LLMRateLimitError):
|
||||
adapter.execute_prompt("hi", RunConfig())
|
||||
assert mock_sleep.call_count == 1 # 1 retry before giving up
|
||||
67
workplans/MARKITECT-WP-0001-statehub-bootstrap.md
Normal file
67
workplans/MARKITECT-WP-0001-statehub-bootstrap.md
Normal file
@@ -0,0 +1,67 @@
|
||||
---
|
||||
id: MARKITECT-WP-0001
|
||||
type: workplan
|
||||
title: "Bootstrap State Hub integration"
|
||||
domain: communication
|
||||
repo: markitect-main
|
||||
status: finished
|
||||
owner: codex
|
||||
topic_slug: communication
|
||||
created: "2026-06-22"
|
||||
updated: "2026-06-22"
|
||||
state_hub_workstream_id: "dfc40b03-fe8e-49fe-b8d4-86eb1fe26b4a"
|
||||
---
|
||||
|
||||
# Bootstrap State Hub integration
|
||||
|
||||
Knowledge artifact management and markdown engine platform.
|
||||
|
||||
## Review Generated Integration Files
|
||||
|
||||
```task
|
||||
id: MARKITECT-WP-0001-T01
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "7455a381-a93d-4220-8f80-3b6ccf953cff"
|
||||
|
||||
```
|
||||
|
||||
Result 2026-06-22: SCOPE.md and INTRODUCTION.md reviewed; AGENTS.md confirmed.
|
||||
|
||||
Review `INTENT.md`, `SCOPE.md`, `AGENTS.md`, and `.custodian-brief.md`.
|
||||
Replace generated placeholders with repo-specific facts where needed.
|
||||
|
||||
## Verify Local Developer Workflow
|
||||
|
||||
```task
|
||||
id: MARKITECT-WP-0001-T02
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "7e34bdab-aa49-49ca-b28a-b254725dd8db"
|
||||
|
||||
```
|
||||
|
||||
Result 2026-06-22: Documented make-based Python/JS workflow.
|
||||
|
||||
Identify the repo's install, test, lint, build, and run commands. Add or refine
|
||||
those commands in the agent instructions so future coding sessions can verify
|
||||
changes confidently.
|
||||
|
||||
## Seed First Real Workplan
|
||||
|
||||
```task
|
||||
id: MARKITECT-WP-0001-T03
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "35a64da7-dda9-4315-901d-88c6827432d9"
|
||||
|
||||
```
|
||||
|
||||
Result 2026-06-22: MARKITECT-WP-0002 already exists (TestDrive npm publication).
|
||||
|
||||
Create the first implementation workplan for the repository's most important
|
||||
next change. After workplan file updates, run from `~/state-hub`:
|
||||
|
||||
```bash
|
||||
make fix-consistency REPO=markitect-main
|
||||
```
|
||||
28
workplans/MARKITECT-WP-0002-testdrive-jsui-publication.md
Normal file
28
workplans/MARKITECT-WP-0002-testdrive-jsui-publication.md
Normal file
@@ -0,0 +1,28 @@
|
||||
---
|
||||
id: MARKITECT-WP-0002
|
||||
type: workplan
|
||||
title: "TestDrive-JSUI — npm Publication"
|
||||
domain: communication
|
||||
repo: markitect-main
|
||||
status: backlog
|
||||
owner: codex
|
||||
topic_slug: communication
|
||||
created: "2026-06-22"
|
||||
updated: "2026-06-22"
|
||||
state_hub_workstream_id: "e203d487-01f1-494a-b14d-a436241a4c01"
|
||||
---
|
||||
|
||||
# TestDrive-JSUI — npm Publication
|
||||
|
||||
Backlog workstream for publishing the TestDrive JSUI package to npm.
|
||||
|
||||
## Publication Readiness
|
||||
|
||||
```task
|
||||
id: MARKITECT-WP-0002-T01
|
||||
status: todo
|
||||
priority: medium
|
||||
state_hub_task_id: "88b3c206-4d45-4bb3-bbb3-47443cdf2123"
|
||||
```
|
||||
|
||||
Define package scope, versioning, and publication checklist for TestDrive-JSUI.
|
||||
Reference in New Issue
Block a user