Compare commits

...

21 Commits

Author SHA1 Message Date
43bea485aa established rules 2026-06-22 23:06:36 +02:00
63eb431db9 Add .repo-classification.yaml (CUST-WP-0050 T11 agent first-pass) 2026-06-22 17:47:34 +02:00
3250a1746f chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-21:
  - update .custodian-brief.md for agentic-resources
2026-06-21 16:09:45 +02:00
41bfb6e0f3 workplan: finish AGENTIC-WP-0011 and sync State Hub IDs
Mark kaizen correlation follow-up finished; add workstream and task IDs
written by fix-consistency so hub and file stay aligned.
2026-06-21 16:09:34 +02:00
d2e50cf96a chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-19:
  - update .custodian-brief.md for agentic-resources
2026-06-19 20:37:50 +02:00
01d2affc3b Implement AGENTIC-WP-0011 kaizen correlation follow-up
Add bidirectional doc links (PRD §9.1, README, DESIGN §11), session-close
HELIX_* env convention, stable digest JSON contract, and digest_lookup CLI
for read-only correlate lookups. All tasks done; 163 tests green.
2026-06-19 20:27:00 +02:00
292b656952 workplan: AGENTIC-WP-0011 kaizen correlation follow-up
File ready workplan for bidirectional doc links, session-close env export
convention, and stable digest read path per kaizen-agentic coordination.
2026-06-19 20:24:39 +02:00
0a5ba5c24a docs: add credential routing guidance for agent runtimes
Inline ops-warden CredentialRouting canon into AGENTS.md and mirror it
as a Claude Code rule so agents route secret and access requests correctly.
2026-06-19 20:24:35 +02:00
a66d502b95 docs: add kaizen-agentic project metrics correlation (WP-0005 T16)
Link Helix Forge fleet session memory to kaizen-agentic ADR-004 project
metrics via helix_session_uid. Reciprocal reference to the cross-repo
correlation contract.
2026-06-16 07:13:07 +02:00
f9f91a0ca8 Add capability registry scaffold (REUSE-WP-0014-T03 B01)
Empty helix_forge registry layout for federation publishing.
2026-06-16 01:50:07 +02:00
06bcfdc1d9 session-memory: refresh published retro report artifacts
Latest retro publish (30-day window) regenerated last_retro.{json,md} — 30
ranked suggestions across 13 repos with catalog-sourced recommendations. This is
the read model published to the hub to unblock activity-core ACTIVITY-WP-0008.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 21:48:18 +02:00
e237dcc622 session-memory: map signals to catalog recommendations via covers (WP-0010 follow-up)
Closes the gap where recurring_error suggestions showed generic 'Investigate'
instead of the curated recommendation. Added a covers[] field to SolutionPattern
(lowercase substrings a pattern's recommendation also applies to) + Catalog.find_for
(exact key first, then covers match against signal key+locus). Retro now resolves
recommendations through find_for. Tagged the read-before-edit pattern with
covers=['file has not been read','modified since read','file_not_read'] (v1.0.1).
Live: file-not-read suggestions across all repos now inherit 'Read the file before
Edit/Write'. 6 new tests; suite 158/158.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 21:09:44 +02:00
0d05dfcc5d session-memory: weekly retro entrypoint + hub publish (AGENTIC-WP-0010)
The analysis half of the weekly coding retrospection. retro/build.py: windowed
detect+measure -> top-3 improvement suggestions per repo (cross-flavor first,
recommendations pulled from the Pattern Catalog) + fleet snapshot. retro/publish.py:
publishes the report to the hub as the coding_retro read model (event_type=
coding_retro progress event) + local JSON/md, graceful degrade. retro entrypoint
with --window-days/--publish/--json. Live verify over real sessions surfaced
per-repo suggestions with catalog recommendations. 13 new tests; suite 152/152.

Consumed by activity-core ACTIVITY-WP-0008 (Weekly Coding Retrospection, Sat 19:00).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 19:17:24 +02:00
15ba625351 session-memory: fill real resolutions into auto-approved catalog stubs
Replaced the placeholder 'TODO: capture the recommended resolution' in the five
auto-approved patterns with grounded problem descriptions + concrete resolutions
drawn from the friction assessment: budget_overrun (read narrowly / checkpoint),
infra_overhead (batch hub writes / orient once), schema_thrash (front-load tool
schemas), tool_thrash (batch shell + larger edits), clean_pass (tests gate done).
Each versioned 1.0.0 -> 1.0.1 with the stub archived to <id>.history.jsonl.
Proposals regenerate with real content (0 TODO). Suite 139/139.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 16:26:56 +02:00
4f28cd67cf session-memory: Phase 4 Measure — baseline, effectiveness, trend (WP-0009)
Closes the loop. metrics.py: fleet metrics (infra-overhead share, error rate,
schema-thrash, token percentiles, success) + persisted baseline trend. effect.py:
before/after per-pattern effectiveness with an improved verdict per metric.
measure entrypoint with trend + --since effectiveness + JSON. Recorded pre-fix
baseline: 27 sessions, overhead median 11.7%, error rate 0.96, schema-thrash 8.
13 new tests; suite 139/139. Capture->Detect->Curate->Distribute->Measure complete.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 15:49:22 +02:00
035c7a20d3 session-memory: Read-before-Edit reflex + curated pattern (WP-0008)
Acts on the #1 friction finding. T01: added a data-cited Read-before-Edit /
re-read-on-stale reflex to AGENTS.md (top error: 'File has not been read yet',
12/27 sessions). T02: captured it as a curated SolutionPattern
(sp-problem-file_not_read-edit, approved/distribution_ready) with real
resolutions + per-flavor hints, so Distribute proposes it across repos/flavors —
closing assess->curate->distribute on a real pattern. Suite 126/126.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 15:27:22 +02:00
59632e94db session-memory: distribute entrypoint + live verify (WP-0007 T05)
python -m session_memory.distribute: reads approved catalog patterns, builds
targets from repo->domain map x flavors, renders scoped per-flavor proposals
(HITL) + active registry. Live verify against the real catalog: 12 renders
across 5 repos, idempotent, provisional skipped. proposals/ gitignored
(regenerated); active_patterns.json committed. README documents detect->curate->
distribute. Phase 3 finished; suite 126/126.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 15:25:20 +02:00
00e8958540 session-memory: scoping + proposals + active registry (WP-0007 T04)
distribute/proposals.py: Scope-aware targeting (FR-X2, empty axis = any), render
distributable (approved+distribution_ready) patterns into a proposals/ tree
mirroring target paths — proposed not applied (FR-X3, HITL), idempotent on re-run.
ActiveRegistry (FR-X4) records which pattern+version is proposed in which
(repo,flavor). 6 new tests; suite 123/123.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 15:09:40 +02:00
9e28b1b806 session-memory: Claude + Codex + Grok distributors + registry (WP-0007 T02/T03)
Thin per-flavor distributors over the shared base: Claude (CLAUDE.md, optional
skill-stub mode), Codex (AGENTS.md), Grok (.grok/instructions.md). registry maps
flavor->distributor — adding a flavor is one entry + one module. Same agnostic
body renders to distinct per-flavor targets (FR-A3). 7 new tests; suite 117/117.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 15:06:15 +02:00
7646cbc358 session-memory: distributor base + Artifact (WP-0007 T01)
distribute/base.py: Artifact dataclass + Distributor protocol + idempotent
BEGIN/END snippet markers (upsert_block replaces a pattern's block in place so
re-distribution doesn't duplicate) + agnostic markdown body rendering from
SolutionPattern fields. BaseDistributor honours per-flavor body/target hints.
8 new tests; suite 110/110.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 15:02:47 +02:00
9e6f8a6e08 Register WP-0007 (Distribute), WP-0008 (Read-before-Edit), WP-0009 (Measure)
Three workplans queued and registered with the State Hub (via REST — MCP write
layer is erroring this session):
- AGENTIC-WP-0007 Phase 3 Distribute: per-flavor distributor adapters render
  approved catalog patterns into proposed (HITL) artifacts, scoped by repo/domain.
- AGENTIC-WP-0008 Read-before-Edit reflex: act on the #1 friction finding.
- AGENTIC-WP-0009 Phase 4 Measure: baseline + before/after effectiveness + trend.
Proceeding in that order.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 14:58:03 +02:00
74 changed files with 3749 additions and 69 deletions

20
.claude/rules/agents.md Normal file
View File

@@ -0,0 +1,20 @@
## Kaizen Agents
Specialized agent personas available on demand via the state-hub MCP.
**Discover:** `list_kaizen_agents()` — returns all agents with name, description, category
**Load:** `get_kaizen_agent("tdd-workflow")` — returns full instructions; read and follow them
Common agents:
| Agent | Category | When to use |
|-------|----------|-------------|
| `tdd-workflow` | testing | Step-by-step TDD8 workflow for any feature |
| `code-refactoring` | quality | Code quality analysis and safe refactoring |
| `test-maintenance` | testing | Diagnose and fix failing tests |
| `requirements-engineering` | process | Prevent interface/mock mismatches upfront |
| `keepaTodofile` | process | Maintain TODO.md during work |
| `project-management` | process | Track status, determine next steps |
| `datamodel-optimization` | quality | Optimize dataclasses and data structures |
All 17 agents: call `list_kaizen_agents()` for the full list.

View File

@@ -0,0 +1,8 @@
## Architecture
<!-- TODO: Describe the key design decisions and component structure.
Key modules, data flows, external integrations, state machines, etc. -->
## Quick Reference
`~/state-hub/mcp_server/TOOLS.md` — MCP tool reference

View File

@@ -0,0 +1,50 @@
# Credential and access routing
**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
for inference. Run this check **before** requesting secrets, API keys, SSH access,
login tokens, or database passwords — in any repo, not only `ops-warden`.
ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
other credential need belongs to another subsystem. **Do not** message
`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
### Lookup (do this first)
```bash
warden route find "<describe your need>" --json
warden route show <catalog-id> --json
```
Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
| Agent runtime | How to orient |
| --- | --- |
| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=agentic-resources` is for coordination, not secret vending |
| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
### Quick routing table
| I need… | Owner | ops-warden executes? |
| --- | --- | --- |
| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes**`warden sign` |
| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
| Login / OIDC / MFA | key-cape / Keycloak | No — route only |
| Authorization decision | flex-auth | No — route only |
| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
### Anti-patterns (do not do these)
- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
- Pasting secrets into Git, State Hub, workplans, logs, or chat
### Other capabilities (reuse-surface)
Non-credential capabilities are usually discovered through **reuse-surface** federation
(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
every repo's agent instructions because it is high-frequency, high-risk, and easy to
get wrong.
**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`

View File

@@ -0,0 +1,38 @@
## First Session Protocol
Triggered when `get_domain_summary("infotech")` shows **no workstreams**.
The project is registered but work has not yet been structured.
**Step 1 — Read, don't write**
- `~/the-custodian/canon/projects/infotech/project_charter_v0.1.md` — purpose, scope
- `~/the-custodian/canon/projects/infotech/roadmap_v0.1.md` — planned phases
- Scan repo root: README, directory structure, existing code or docs
**Step 2 — Survey in-progress work**
Look for TODOs, open branches, half-finished files. Note done vs. started but incomplete.
**Step 3 — Propose workstreams to Bernd**
Propose 13 workstreams — each a coherent strand, weeks to months, anchored to a
roadmap phase. **Wait for approval before creating.**
**Step 4 — Create workplan file first, then DB record (ADR-001)**
```
workplans/AGENTIC-WP-NNNN-<slug>.md ← write this first
```
Then register in the hub:
```
create_workstream(topic_id="f39fa2a3-c491-414c-a91b-b4c5fcc6139c", title="...", owner="...", description="...")
create_task(workstream_id="<id>", title="...", priority="high|medium|low")
```
**Step 5 — Record the setup**
```
add_progress_event(
summary="First session: structured infotech into N workstreams, M tasks",
event_type="milestone",
topic_id="f39fa2a3-c491-414c-a91b-b4c5fcc6139c",
detail={"workstreams": [...], "tasks_created": M}
)
```
<!-- Delete or archive this file once past first session -->

View File

@@ -0,0 +1,8 @@
## Repo boundary
This repo owns **agentic-resources** only. It does not own:
<!-- TODO: List what belongs in adjacent repos, e.g.:
- SSH key management → railiance-infra/
- State hub code → state-hub/
-->

View File

@@ -0,0 +1,5 @@
**Purpose:** Iterating towards optimal agentic performance.
**Domain:** infotech
**Repo slug:** agentic-resources
**Topic ID:** f39fa2a3-c491-414c-a91b-b4c5fcc6139c

View File

@@ -0,0 +1,85 @@
## Session Protocol
Dev Hub (State Hub API): http://127.0.0.1:8000
MCP server name in `~/.claude.json`: `dev-hub`
**Step 1 — Orient**
Read the offline-safe brief first — it works without a live hub connection:
```bash
cat .custodian-brief.md
```
Then call the MCP tool for richer cross-domain context when MCP tools are exposed:
```
get_domain_summary("infotech")
```
If MCP tools are unavailable in the current agent session, use the REST API:
```bash
curl -s "http://127.0.0.1:8000/state/summary" | python3 -m json.tool
```
If the hub is offline: `cd ~/state-hub && make api`
**Step 2 — Check inbox**
With MCP tools:
```
get_messages(to_agent="agentic-resources", unread_only=True)
```
Mark read with `mark_message_read(message_id)`. Reply or act on coordination
requests before proceeding.
Without MCP tools:
```bash
curl -s "http://127.0.0.1:8000/messages/?to_agent=agentic-resources&unread_only=true" \
| python3 -m json.tool
curl -s -X PATCH "http://127.0.0.1:8000/messages/<id>/read" \
-H "Content-Type: application/json" -d '{}'
```
**Step 3 — Scan workplans**
```bash
ls workplans/
```
For each file with `status: ready`, `active`, or `blocked`, note pending
`wait`/`todo`/`progress` tasks.
**Step 4 — Present brief**
1. **Active workstreams** for `infotech` — title, task counts, blocking decisions
2. **Pending tasks** from `workplans/` + any `[repo:agentic-resources]` hub tasks
3. **Goal guidance** — if `goal_guidance` in summary:
- `needs_workplan`: surface as top action — *"Repo goal '{title}' has no workplan yet"*
- `alignment_warnings`: flag if active work is not aligned with current goal
4. **Suggested next action** — highest-priority open item
5. **SBOM status** — flag if `last_sbom_at` is unset for this repo
If no workstreams: follow First Session Protocol (`first-session.md`).
**During work:** `record_decision()` · `add_progress_event()` · `resolve_decision()`
> State Hub is a *read model*. Bootstrap tools (`create_workstream`, `create_task`)
> are First Session Protocol only. Work structure belongs in repo files (ADR-001).
**Session close:**
With MCP tools:
```
add_progress_event(summary="...", topic_id="f39fa2a3-c491-414c-a91b-b4c5fcc6139c", workstream_id="<uuid>")
```
Without MCP tools:
```bash
curl -s -X POST http://127.0.0.1:8000/progress/ \
-H "Content-Type: application/json" \
-d '{"topic_id":"f39fa2a3-c491-414c-a91b-b4c5fcc6139c","workstream_id":"<uuid>","event_type":"note","summary":"what changed","author":"codex"}'
```
If workplan files were modified, ensure the local copy is up to date first:
```bash
git -C <repo_path> pull --ff-only
cd ~/state-hub && make fix-consistency REPO=agentic-resources
```
For repos where implementation runs on a remote machine (e.g. CoulombCore),
use the combined target which pulls before fixing:
```bash
cd ~/state-hub && make fix-consistency-remote REPO=agentic-resources
```
**C-15** (DB task ahead of file) is normal in multi-machine workflows — writeback
will sync the file to match DB. **C-16** (repo behind remote) blocks all writes
until you pull — intentional to prevent clobbering remote progress.

View File

@@ -0,0 +1,19 @@
## Stack
<!-- TODO: Fill in language, frameworks, and key dependencies -->
- **Language:**
- **Key deps:**
## Dev Commands
```bash
# TODO: Fill in the standard commands for this repo
# Install dependencies
# Run tests
# Lint / type check
# Build / package (if applicable)
```

View File

@@ -0,0 +1,40 @@
## Workplan Convention (ADR-001)
File location: `workplans/AGENTIC-WP-NNNN-<slug>.md`
ID prefix: `AGENTIC-WP-`
Work items originate as files in this repo **before** being registered in the hub.
Canonical workplan/workstream frontmatter statuses are:
`proposed`, `ready`, `active`, `blocked`, `backlog`, `finished`, `archived`.
Use `proposed` for a newly drafted plan, `ready` after review against current
repo state, and `finished` when implementation is complete. `stalled` and
`needs_review` are derived health labels, not stored statuses.
Closed workplans may be moved to `workplans/archived/` with a completion-date
prefix: `YYMMDD-AGENTIC-WP-NNNN-<slug>.md`. The frontmatter id remains
unchanged; the prefix is only for quick visual reference.
Small opportunistic tasks discovered during another session use **Ad Hoc Tasks**:
`workplans/ADHOC-YYYY-MM-DD.md`, workstream slug `adhoc-YYYY-MM-DD`, and task ids
`ADHOC-YYYY-MM-DD-T01`, `T02`, etc. Use adhocs only for low-risk work completed
directly. Promote anything requiring analysis, design, approval, dependencies, or
multiple planned phases into a normal workplan.
Ecosystem todos from other agents arrive as `[repo:agentic-resources]` hub tasks —
visible at session start. Pick one up by creating the workplan file, then registering
the workstream.
Task blocks use this shape:
```task
id: AGENTIC-WP-NNNN-T01
status: wait | todo | progress | done | cancel
priority: high | medium | low
state_hub_task_id: "<uuid>" # written by fix-consistency — do not edit
```
Status progression is `todo``progress``done`; use `wait` for waiting or
blocked work and `cancel` for stopped work.
<!-- Ralph Loop rules and HEUREKA sequence: ~/.claude/CLAUDE.md — do not duplicate here -->

View File

@@ -2,7 +2,7 @@
# Custodian Brief — agentic-resources
**Domain:** helix_forge
**Last synced:** 2026-06-07 11:46 UTC
**Last synced:** 2026-06-21 14:09 UTC
**State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)*
## Active Workstreams

2
.gitignore vendored
View File

@@ -177,6 +177,8 @@ cython_debug/
# session-memory local store
session_memory/.store/
# generated per-flavor distribution proposals (HITL, regenerated each run)
session_memory/proposals/
__pycache__/
*.pyc
.pytest_cache/

18
.repo-classification.yaml Normal file
View File

@@ -0,0 +1,18 @@
repo_classification:
standard: Repo Classification Standard
version: '1.0'
classified_at: '2026-06-22'
classified_by: agent
category: project
domain: infotech
secondary_domains: []
capability_tags:
- automation
- orchestration
business_stake:
- technology
- product
- operations
business_mechanics:
- coordination
- operation

View File

@@ -4,38 +4,13 @@
**Purpose:** Iterating towards optimal agentic performance.
**Domain:** helix_forge
**Domain:** infotech
**Repo slug:** agentic-resources
**Topic ID:** `f39fa2a3-c491-414c-a91b-b4c5fcc6139c`
**Workplan prefix:** `AGENTIC-WP-`
---
## Dev Workflow
The deliverable code lives in `session_memory/` (the Helix Forge coding-session
memory system). It is **pure-stdlib Python 3.11+**`tomllib`, `sqlite3`,
`dataclasses`; no third-party runtime dependencies and no build step. `pytest` is
the only dev dependency. Run everything from the repo root.
| Need | Command |
|------|---------|
| Python | `python3` (3.11+ required for `tomllib`; developed on 3.12) |
| Install deps | none at runtime; for tests: `pip install pytest` (or `uv pip install pytest`) |
| Test | `python3 -m pytest` (full suite) · `python3 -m pytest tests/test_curate_review.py` (one file) · `-q` for quiet |
| Lint / build | none configured — keep changes matching surrounding style |
| Run: ingest sweep | `python3 -m session_memory.ingest` (`--dry-run`, `--config PATH`) |
| Run: detect | `python3 -m session_memory.detect` (`--json`, `--min-frequency N`) |
| Run: curate | `python3 -m session_memory.curate` (`--auto-approve`, `--json`) |
| Config | `session_memory/config.toml`; local store under `session_memory/.store/` (gitignored) |
**Verify a change before declaring it done:** run `python3 -m pytest` (expect all
green), and for pipeline changes do a live `ingest → detect → curate` pass against
the local store. See `session_memory/README.md` for the full layout and the
detect → curate → distribute flow.
---
## State Hub Integration
The Custodian State Hub tracks work across all domains. Interact via HTTP REST —
@@ -126,6 +101,63 @@ curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
---
## Credential and access routing
**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
for inference. Run this check **before** requesting secrets, API keys, SSH access,
login tokens, or database passwords — in any repo, not only `ops-warden`.
ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
other credential need belongs to another subsystem. **Do not** message
`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
### Lookup (do this first)
```bash
warden route find "<describe your need>" --json
warden route show <catalog-id> --json
```
Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
| Agent runtime | How to orient |
| --- | --- |
| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=agentic-resources` is for coordination, not secret vending |
| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
### Quick routing table
| I need… | Owner | ops-warden executes? |
| --- | --- | --- |
| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes** — `warden sign` |
| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
| Login / OIDC / MFA | key-cape / Keycloak | No — route only |
| Authorization decision | flex-auth | No — route only |
| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
### Anti-patterns (do not do these)
- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
- Pasting secrets into Git, State Hub, workplans, logs, or chat
### Other capabilities (reuse-surface)
Non-credential capabilities are usually discovered through **reuse-surface** federation
(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
every repo's agent instructions because it is high-frequency, high-risk, and easy to
get wrong.
**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`
<!-- REPO-AGENTS-EXTENSIONS -->
<!-- Append repo-specific agent instructions below this marker.
The state-hub template sync preserves content after this line. -->
---
## Workplan Convention (ADR-001)
Work items originate as files in this repo — not in the hub. The hub is a
@@ -149,7 +181,7 @@ anything needing analysis, design, approval, dependencies, or multiple phases.
id: AGENTIC-WP-NNNN
type: workplan
title: "..."
domain: helix_forge
domain: infotech
repo: agentic-resources
status: proposed | ready | active | blocked | backlog | finished | archived
owner: codex

12
CLAUDE.md Normal file
View File

@@ -0,0 +1,12 @@
# agentic-resources — Claude Code Instructions
@SCOPE.md
@.claude/rules/repo-identity.md
@.claude/rules/session-protocol.md
@.claude/rules/first-session.md
@.claude/rules/workplan-convention.md
@.claude/rules/stack-and-commands.md
@.claude/rules/architecture.md
@.claude/rules/repo-boundary.md
@.claude/rules/credential-routing.md
@.claude/rules/agents.md

View File

@@ -370,8 +370,89 @@ hub indexes).
---
*Next step: [AGENTIC-WP-0002] implements Phase 0 — the schema, the Claude
collector, the Tier1/Tier2 store, and the budget-based eviction sweep.*
## 11. Project metrics correlation (kaizen-agentic)
Helix Forge owns **fleet-level** session capture and digests (this repo). The
**kaizen-agentic** framework owns **project-scoped** agent execution metrics
(ADR-004: `.kaizen/metrics/<agent>/executions.jsonl`). The two layers correlate
by optional `helix_session_uid` on project records — link-by-reference, no
duplicate ingestion in either repo.
| Layer | Owner | Storage |
|-------|-------|---------|
| Fleet | agentic-resources (Helix Forge) | digest store (`digests` table) |
| Project | kaizen-agentic | `.kaizen/metrics/<agent>/executions.jsonl` |
**Cross-repo contract:** [Helix Forge Correlation Contract](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/docs/integrations/helix-forge-correlation.md)
(kaizen-agentic). Field mapping from `Session.session_uid``helix_session_uid`,
`digest.cost``tokens`, `tool_histogram` MCP share → `infra_overhead_share`.
**Read path:** `kaizen-agentic metrics correlate <uid>` looks up a digest via
`HELIX_STORE_DB` (this repo's session store). No write path from kaizen-agentic
into Helix Forge.
**Related kaizen-agentic docs:** [ADR-004 project metrics convention](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/docs/adr/ADR-004-project-metrics-convention.md),
[wiki/EcosystemIntegration.md](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/wiki/EcosystemIntegration.md).
### 11.1 Session-close env export (dual-layer agents)
Agents that run **both** Helix Forge capture and kaizen `metrics record` should
export the following **after** the ingest sweep has written the session digest
(`python -m session_memory.ingest` or an equivalent Stop/SessionEnd hook). Names
match kaizen-agentic ADR-004 — do not invent parallel aliases.
| Variable | Source in Helix Forge | Purpose |
|----------|----------------------|---------|
| `HELIX_SESSION_UID` | `Session.session_uid` | Primary correlation key → `helix_session_uid` |
| `HELIX_REPO` | `digest.repo` | Project/repo scoping |
| `HELIX_FLAVOR` | `digest.flavor` | Agent runtime (`claude` / `codex` / `grok`) |
| `HELIX_TOKENS` | `digest.cost.input_tokens + digest.cost.output_tokens` | Token rollup → `tokens` |
| `HELIX_INFRA_OVERHEAD_SHARE` | infra bucket share over `tool_histogram` (see `measure.metrics.session_metrics`) | MCP/plumbing overhead → `infra_overhead_share` |
Example (after digest exists):
```bash
export HELIX_SESSION_UID="claude:abc-123"
export HELIX_REPO="agentic-resources"
export HELIX_FLAVOR="claude"
export HELIX_TOKENS=125000
export HELIX_INFRA_OVERHEAD_SHARE=0.117
# optional — lets kaizen correlate without guessing the store location:
export HELIX_STORE_DB="$(pwd)/session_memory/.store/mem.db"
kaizen-agentic metrics record # merges HELIX_* when present
```
### 11.2 Digest store location and read API
- **`HELIX_STORE_DB`** — absolute path to the SQLite file holding Tier 2 digests.
Defaults to `config.toml` `[store].db_path` (`session_memory/.store/mem.db` relative
to the repo root). Export as an absolute path when setting the variable on session
close so `metrics correlate` works across hosts and working directories.
- **Thin CLI** — `python -m session_memory.digest_lookup <session_uid> [--json]`
prints one digest without running ingest. Exit `0` on hit, `1` when missing.
- **Programmatic** — `Store.get_digest(session_uid)` returns the JSON blob written
by `build_digest` / `analyze`.
**Stable digest JSON shape** (fields consumers may rely on):
| Field | Type | Notes |
|-------|------|-------|
| `session_uid` | string | Normalized uid (`<flavor>:<native-id>`) |
| `flavor`, `repo`, `domain` | string | Session attribution |
| `model` | string | Model id when known |
| `started_at`, `ended_at` | string | ISO timestamps |
| `outcome` | string | `success` / `fail` / `abandoned` / `unknown` |
| `cost` | object | `input_tokens`, `output_tokens`, `cache_tokens`, `wall_clock_s`, `turns`, `retries` |
| `tool_histogram` | object | Tool name → call count |
| `event_count`, `kind_counts`, `markers` | object/int | Compact activity summary |
| `first_prompt`, `last_assistant` | string | Short text snippets |
| `error_snippets` | array | `{fingerprint, sample, count, tool}` entries |
| `schema_version` | int | Digest schema version |
---
*Implemented:* Phases 04, weekly retro ([AGENTIC-WP-0002][AGENTIC-WP-0010]);
kaizen correlation follow-up ([AGENTIC-WP-0011]).
## Sources

View File

@@ -5,7 +5,7 @@
**Status:** Draft v0.1
**Author:** Claude (drafted with Bernd Worsch)
**Created:** 2026-06-06
**Updated:** 2026-06-06
**Updated:** 2026-06-19
---
@@ -223,6 +223,32 @@ record:
- The hub remains a **read model**; Helix Forge writes its durable artifacts as files
and lets the hub index them.
### 9.1 Downstream: kaizen-agentic project metrics correlation
Helix Forge is a **fleet-level** producer of normalized session digests. The
**kaizen-agentic** framework is a **project-scoped** consumer of optional
correlation fields on its execution metrics (ADR-004). The two layers link
**by reference** — kaizen-agentic does not re-implement JSONL ingestion or write
into the Helix Forge store.
| Layer | Owner | What it stores |
|-------|-------|----------------|
| Fleet | agentic-resources (`session_memory`) | Per-session digests in the local SQLite store |
| Project | kaizen-agentic | `.kaizen/metrics/<agent>/executions.jsonl` |
**Canonical spec in this repo:** [DESIGN-session-memory.md §11](DESIGN-session-memory.md#11-project-metrics-correlation-kaizen-agentic)
(session-close env export, digest read path, stable JSON shape).
**Authoritative cross-repo contract (kaizen-agentic):**
[Helix Forge Correlation Contract](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/docs/integrations/helix-forge-correlation.md).
Field mapping: `Session.session_uid``helix_session_uid`; digest token totals →
`tokens`; MCP/tool overhead share → `infra_overhead_share`.
**Read path for consumers:** `HELIX_STORE_DB` points at the digest SQLite file
(default `session_memory/.store/mem.db`); `python -m session_memory.digest_lookup
<uid> --json` or `kaizen-agentic metrics correlate <uid>` performs a read-only
lookup. No ingestion code belongs in kaizen-agentic.
## 10. Success Metrics
| Metric | Meaning | Target (directional, v1) |

12
registry/README.md Normal file
View File

@@ -0,0 +1,12 @@
# Capability Registry
Markdown-first capability index for federation and reuse planning.
## Authoring
1. Copy a capability entry template (see reuse-surface `templates/capability-entry.template.md`).
2. Add the row to `indexes/capabilities.yaml`.
3. Run `reuse-surface validate` from a checkout with the CLI installed.
4. Merge to `main` and verify publish with `reuse-surface establish --publish-check`.
Federation contract: reuse-surface `docs/RegistryFederation.md`.

View File

View File

@@ -0,0 +1,4 @@
version: 1
updated: '2026-06-16'
domain: helix_forge
capabilities: []

View File

@@ -33,6 +33,19 @@ session_memory/
curate/decisions.py # hub decision audit trail (graceful local-queue fallback)
curate/__main__.py # python -m session_memory.curate (interactive / --auto-approve)
catalog/ # the committed Pattern Catalog (source of truth)
distribute/base.py # Artifact + Distributor protocol + idempotent snippet markers
distribute/claude.py # CLAUDE.md (or skill) renderer } per-flavor edges
distribute/codex.py # AGENTS.md renderer } (agnostic body,
distribute/grok.py # native instruction renderer } different targets)
distribute/proposals.py # scoping + proposed-not-applied output + active registry
distribute/__main__.py # python -m session_memory.distribute
measure/metrics.py # fleet metrics + persisted baseline snapshots
measure/effect.py # before/after per-pattern effectiveness
measure/__main__.py # python -m session_memory.measure
retro/build.py # windowed top-3-per-repo suggestions
retro/publish.py # hub coding_retro read model + local report
retro/__main__.py # python -m session_memory.retro
digest_lookup.py # python -m session_memory.digest_lookup (read one digest, no ingest)
config.toml # store paths, retention caps, sources, repo->domain map, curate gate
```
@@ -114,6 +127,97 @@ python -m session_memory.curate --json # machine-readable result
| `dist_require_cross_flavor` | require cross-flavor evidence to be distribution-eligible |
| `dist_min_frequency` / `dist_min_cost_impact` | stricter floor for `distribution_ready` |
## Distribute patterns as per-flavor proposals
Render approved catalog patterns into per-flavor artifacts — **proposed, never
auto-applied** (HITL). Completes the loop: **detect → curate → distribute**.
```bash
python -m session_memory.distribute # proposals for all repos/flavors
python -m session_memory.distribute --repo state-hub --flavor claude
python -m session_memory.distribute --json
```
- Only `approved` + `distribution_ready` patterns are rendered; each pattern's
`Scope` (repos/domains/flavors) decides where it lands (FR-X2).
- Each flavor renders the **same agnostic body** to its own target (Claude →
`CLAUDE.md`/skill, Codex → `AGENTS.md`, Grok → native) via `rendering_hints`
(FR-A3); blocks carry stable `BEGIN/END` markers so re-running updates in place.
- Output goes to `session_memory/proposals/<repo>/<target>` (gitignored,
regenerated) — a reviewable diff a human applies (FR-X3). The committed
`distribute/active_patterns.json` records which pattern+version is proposed in
which `(repo, flavor)` (FR-X4).
## Measure effectiveness (closing the loop)
Track whether the fleet is getting cheaper / more reliable, and whether a
distributed pattern actually helped.
```bash
python -m session_memory.measure --label "baseline" # snapshot + trend
python -m session_memory.measure --since 2026-06-07 # before/after a change
python -m session_memory.measure --no-save --json
```
- A **snapshot** (infra-overhead share, error rate, schema-thrash, token
percentiles, success rate) is appended to `measure/baselines.jsonl` to build a
trend (FR-M3).
- `--since DATE` splits sessions before/after a change and diffs the metrics, with
an `improved` verdict per metric (FR-M1/FR-M2) — so ineffective patterns can be
retired. Recorded pre-fix baseline (2026-06-07): 27 sessions, infra-overhead
median 11.7 %, error rate 0.96, schema-thrash 8 sessions.
## Weekly retro (the input to the scheduled retrospection)
A windowed roll-up: detect + measure over the last N days → the **top-3
improvement suggestions per repo** (cross-flavor first; recommendations pulled
from the Pattern Catalog) → published to the hub as the `coding_retro` read model.
```bash
python -m session_memory.retro # last 7 days, local report
python -m session_memory.retro --window-days 30 --json
python -m session_memory.retro --publish # also post coding_retro to the hub
```
Writes `retro/last_retro.{json,md}` and (with `--publish`) posts an
`event_type=coding_retro` progress event. This is consumed by activity-core's
**Weekly Coding Retrospection** schedule (ACTIVITY-WP-0008, Saturday 19:00 Berlin),
which emits one improvement task per relevant repo. Hub publish degrades
gracefully when the hub is unreachable.
## Correlation with kaizen-agentic
Helix Forge owns **fleet-level** session digests; **kaizen-agentic** owns
**project-scoped** execution metrics (ADR-004). The two layers correlate by
optional `helix_session_uid` on project records — **link-by-reference only**;
kaizen-agentic does not ingest JSONL into this store.
| Layer | Storage |
|-------|---------|
| Fleet (here) | `session_memory/.store/mem.db``digests` table |
| Project (kaizen) | `.kaizen/metrics/<agent>/executions.jsonl` |
- **Spec:** [DESIGN-session-memory.md §11](../docs/DESIGN-session-memory.md#11-project-metrics-correlation-kaizen-agentic)
- **Contract (kaizen-agentic):** [Helix Forge Correlation Contract](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/docs/integrations/helix-forge-correlation.md)
### Session-close env export
After ingest has written the digest, agents using both layers export `HELIX_*`
vars for `kaizen-agentic metrics record` to merge (names match ADR-004):
`HELIX_SESSION_UID`, `HELIX_REPO`, `HELIX_FLAVOR`, `HELIX_TOKENS`,
`HELIX_INFRA_OVERHEAD_SHARE`, and optionally `HELIX_STORE_DB` (absolute path to
`mem.db`). See DESIGN §11.1 for field sources.
### Read one digest (for `metrics correlate`)
```bash
python -m session_memory.digest_lookup claude:abc-123 --json
HELIX_STORE_DB=/abs/path/to/mem.db python -m session_memory.digest_lookup <uid>
```
Defaults to `[store].db_path` in `config.toml`. Read-only — does not run ingest.
## Retention knobs (`[retention]` in config.toml)
| Key | Meaning |
@@ -141,4 +245,16 @@ python -m pytest # schema, adapters, store, digest, retention, ingest,
- **Phase 2** (AGENTIC-WP-0004): Curate — Solution Pattern schema, versioned
files-first Pattern Catalog, discuss/approve/reject review with an evidence bar +
bloat guard, and hub-decision audit trail.
- **Next — Phase 3 (Distribute) / Phase 4 (Measure)** follow per the PRD.
- **Detect hardening** (AGENTIC-WP-0005): session-quality filter + tool-mix /
infra-overhead signals. **Error mining** (AGENTIC-WP-0006): recurring error
fingerprints → root-cause patterns.
- **Phase 3** (AGENTIC-WP-0007): Distribute — per-flavor distributor adapters
render approved patterns into proposed (HITL) artifacts, scoped by repo/domain,
with an active-pattern registry.
- **Phase 4** (AGENTIC-WP-0009): Measure — fleet baseline/trend + before/after
per-pattern effectiveness. The Capture → Detect → Curate → Distribute → Measure
loop is closed.
- **Weekly retro** (AGENTIC-WP-0010): windowed top-3-per-repo + hub `coding_retro`
publish.
- **Kaizen correlation** (AGENTIC-WP-0011): bidirectional doc links, session-close
`HELIX_*` env convention, `digest_lookup` read path.

View File

@@ -0,0 +1 @@
{"created_at": "2026-06-07T09:13:20Z", "distribution_ready": true, "id": "sp-problem-budget_overrun-tokens", "name": "problem: budget overrun", "polarity": "problem", "problem": "problem: budget overrun", "provenance": {"detected_at": null, "evidence": {"cost_impact": 10.667, "cross_flavor": false, "flavors": ["claude"], "frequency": 3, "key": "problem:budget_overrun:tokens", "locus": "tokens", "polarity": "problem", "repos": ["artifact-store", "citation-evidence", "infospace-bench"], "score": 32.001, "sessions": ["claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca", "claude:6e0d3d68-872b-4d93-bb09-0691e091314b", "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78"], "signal_type": "budget_overrun", "title": "problem: budget overrun"}, "promoted_at": "2026-06-07T09:13:20Z", "source_key": "problem:budget_overrun:tokens"}, "rendering_hints": {"claude": {"note": "TODO: refine rendering", "target": "CLAUDE.md"}}, "resolutions": [{"detail": "", "steps": [], "summary": "TODO: capture the recommended resolution"}], "schema_version": 1, "scope": {"domains": [], "flavors": ["claude"], "repos": ["artifact-store", "citation-evidence", "infospace-bench"]}, "status": "superseded", "updated_at": "2026-06-07T09:13:20Z", "version": "1.0.0"}

View File

@@ -2,9 +2,9 @@
"created_at": "2026-06-07T09:13:20Z",
"distribution_ready": true,
"id": "sp-problem-budget_overrun-tokens",
"name": "problem: budget overrun",
"name": "Budget overrun: token cost above peers",
"polarity": "problem",
"problem": "problem: budget overrun",
"problem": "A session's token cost lands well above its peers (>p90). Usually driven by re-reading large files or tool outputs, carrying redundant context, or long exploratory loops without checkpoints.",
"provenance": {
"detected_at": null,
"evidence": {
@@ -36,15 +36,27 @@
},
"rendering_hints": {
"claude": {
"note": "TODO: refine rendering",
"target": "CLAUDE.md"
}
},
"resolutions": [
{
"detail": "",
"detail": "Use offset/limit; don't re-Read a file already in the transcript.",
"steps": [
"Locate with grep/glob first",
"Read only the relevant span"
],
"summary": "Read narrowly \u2014 target the region you need, not whole large files"
},
{
"detail": "Summarize progress; avoid re-pulling outputs already shown.",
"steps": [],
"summary": "TODO: capture the recommended resolution"
"summary": "Checkpoint and prune context instead of re-fetching it"
},
{
"detail": "grep/glob narrows scope far cheaper than reading whole trees.",
"steps": [],
"summary": "Prefer targeted search over broad reads to locate code"
}
],
"schema_version": 1,
@@ -60,6 +72,6 @@
]
},
"status": "approved",
"updated_at": "2026-06-07T09:13:20Z",
"version": "1.0.0"
"updated_at": "2026-06-07T14:21:06Z",
"version": "1.0.1"
}

View File

@@ -0,0 +1 @@
{"covers": [], "created_at": "2026-06-07T13:26:25Z", "distribution_ready": true, "id": "sp-problem-file_not_read-edit", "name": "Read before you Edit", "polarity": "problem", "problem": "Agents call Edit/Write on a file they have not read in the current session, or after it changed under them. The edit tools reject this ('File has not been read yet' / 'File has been modified since read'), and the retry burns a turn. Top recurring error in the corpus (12/27 sessions, 8 repos).", "provenance": {"detected_at": null, "evidence": {"frequency": 32, "origin": "AGENTIC-WP-0006 error mining / ASSESSMENT-infra-friction.md", "polarity": "problem", "repos": 8, "sessions": 12}, "promoted_at": null, "source_key": "problem:file_not_read:edit"}, "rendering_hints": {"claude": {"target": "CLAUDE.md"}, "codex": {"target": "AGENTS.md"}, "grok": {"target": ".grok/instructions.md"}}, "resolutions": [{"detail": "Never blind-write a file you haven't read this session.", "steps": ["Read the target file", "Then Edit/Write"], "summary": "Read the file (or the region you'll touch) before Edit/Write"}, {"detail": "A stale read means the file changed under you; refresh, don't loop.", "steps": ["Re-Read the file", "Re-apply the Edit"], "summary": "On 'modified since read', re-Read then re-Edit"}], "schema_version": 1, "scope": {"domains": [], "flavors": [], "repos": []}, "status": "superseded", "updated_at": "2026-06-07T13:26:25Z", "version": "1.0.0"}

View File

@@ -0,0 +1,63 @@
{
"covers": [
"file has not been read",
"modified since read",
"file_not_read"
],
"created_at": "2026-06-07T13:26:25Z",
"distribution_ready": true,
"id": "sp-problem-file_not_read-edit",
"name": "Read before you Edit",
"polarity": "problem",
"problem": "Agents call Edit/Write on a file they have not read in the current session, or after it changed under them. The edit tools reject this ('File has not been read yet' / 'File has been modified since read'), and the retry burns a turn. Top recurring error in the corpus (12/27 sessions, 8 repos).",
"provenance": {
"detected_at": null,
"evidence": {
"frequency": 32,
"origin": "AGENTIC-WP-0006 error mining / ASSESSMENT-infra-friction.md",
"polarity": "problem",
"repos": 8,
"sessions": 12
},
"promoted_at": null,
"source_key": "problem:file_not_read:edit"
},
"rendering_hints": {
"claude": {
"target": "CLAUDE.md"
},
"codex": {
"target": "AGENTS.md"
},
"grok": {
"target": ".grok/instructions.md"
}
},
"resolutions": [
{
"detail": "Never blind-write a file you haven't read this session.",
"steps": [
"Read the target file",
"Then Edit/Write"
],
"summary": "Read the file (or the region you'll touch) before Edit/Write"
},
{
"detail": "A stale read means the file changed under you; refresh, don't loop.",
"steps": [
"Re-Read the file",
"Re-apply the Edit"
],
"summary": "On 'modified since read', re-Read then re-Edit"
}
],
"schema_version": 1,
"scope": {
"domains": [],
"flavors": [],
"repos": []
},
"status": "approved",
"updated_at": "2026-06-07T19:06:45Z",
"version": "1.0.1"
}

View File

@@ -0,0 +1 @@
{"created_at": "2026-06-07T09:13:20Z", "distribution_ready": false, "id": "sp-problem-infra_overhead-infra_overhead", "name": "problem: infra overhead", "polarity": "problem", "problem": "problem: infra overhead", "provenance": {"detected_at": null, "evidence": {"cost_impact": 0.801, "cross_flavor": false, "flavors": ["claude"], "frequency": 2, "key": "problem:infra_overhead:infra_overhead", "locus": "infra_overhead", "polarity": "problem", "repos": ["markitect-main", "vergabe-teilnahme"], "score": 1.602, "sessions": ["claude:135002f9-98d2-4d1b-b8fb-543b20388782", "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74"], "signal_type": "infra_overhead", "title": "problem: infra overhead"}, "promoted_at": "2026-06-07T09:13:20Z", "source_key": "problem:infra_overhead:infra_overhead"}, "rendering_hints": {"claude": {"note": "TODO: refine rendering", "target": "CLAUDE.md"}}, "resolutions": [{"detail": "", "steps": [], "summary": "TODO: capture the recommended resolution"}], "schema_version": 1, "scope": {"domains": [], "flavors": ["claude"], "repos": ["markitect-main", "vergabe-teilnahme"]}, "status": "superseded", "updated_at": "2026-06-07T09:13:20Z", "version": "1.0.0"}

View File

@@ -2,9 +2,9 @@
"created_at": "2026-06-07T09:13:20Z",
"distribution_ready": false,
"id": "sp-problem-infra_overhead-infra_overhead",
"name": "problem: infra overhead",
"name": "Infrastructure overhead: too much coordination plumbing",
"polarity": "problem",
"problem": "problem: infra overhead",
"problem": "A large share of the session's tool calls are State Hub / task-management / schema-loading plumbing rather than touching the repo (corpus median 11.7%, up to 43% in the worst sessions; one session made 231 hub calls).",
"provenance": {
"detected_at": null,
"evidence": {
@@ -34,15 +34,27 @@
},
"rendering_hints": {
"claude": {
"note": "TODO: refine rendering",
"target": "CLAUDE.md"
}
},
"resolutions": [
{
"detail": "",
"detail": "Update several task statuses together; emit fewer, coarser progress events.",
"steps": [
"Do a chunk of work",
"Then sync statuses in one pass"
],
"summary": "Batch hub writes \u2014 sync at checkpoints, not per event"
},
{
"detail": "One scoped summary at session start beats many broad reads.",
"steps": [],
"summary": "TODO: capture the recommended resolution"
"summary": "Orient once with get_domain_summary, don't re-query repeatedly"
},
{
"detail": "See STATE-WP-0058 \u2014 stops the repeated ToolSearch for hub tools.",
"steps": [],
"summary": "Front-load hub tool knowledge via the State Hub skill"
}
],
"schema_version": 1,
@@ -57,6 +69,6 @@
]
},
"status": "provisional",
"updated_at": "2026-06-07T09:13:20Z",
"version": "1.0.0"
"updated_at": "2026-06-07T14:21:06Z",
"version": "1.0.1"
}

View File

@@ -0,0 +1 @@
{"created_at": "2026-06-07T09:13:20Z", "distribution_ready": true, "id": "sp-problem-schema_thrash-schema_load", "name": "problem: schema thrash", "polarity": "problem", "problem": "problem: schema thrash", "provenance": {"detected_at": null, "evidence": {"cost_impact": 79.0, "cross_flavor": false, "flavors": ["claude"], "frequency": 8, "key": "problem:schema_thrash:schema_load", "locus": "schema_load", "polarity": "problem", "repos": ["activity-core", "citation-evidence", "flex-auth", "infospace-bench", "ops-bridge", "vergabe-teilnahme"], "score": 632.0, "sessions": ["claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca", "claude:30dbad62-c042-41f2-80c1-5953a1100e7f", "claude:4340b160-2fb6-47d0-897c-3cac0a8855d8", "claude:63fd4df2-5add-4748-af21-c1544825e006", "claude:8313f946-f008-4e98-9915-31950380e39e", "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78", "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74", "claude:bbcf1c2b-14be-40e4-826b-4b2b49b9d212"], "signal_type": "schema_thrash", "title": "problem: schema thrash"}, "promoted_at": "2026-06-07T09:13:20Z", "source_key": "problem:schema_thrash:schema_load"}, "rendering_hints": {"claude": {"note": "TODO: refine rendering", "target": "CLAUDE.md"}}, "resolutions": [{"detail": "", "steps": [], "summary": "TODO: capture the recommended resolution"}], "schema_version": 1, "scope": {"domains": [], "flavors": ["claude"], "repos": ["activity-core", "citation-evidence", "flex-auth", "infospace-bench", "ops-bridge", "vergabe-teilnahme"]}, "status": "superseded", "updated_at": "2026-06-07T09:13:20Z", "version": "1.0.0"}

View File

@@ -2,9 +2,9 @@
"created_at": "2026-06-07T09:13:20Z",
"distribution_ready": true,
"id": "sp-problem-schema_thrash-schema_load",
"name": "problem: schema thrash",
"name": "Schema thrash: repeated ToolSearch",
"polarity": "problem",
"problem": "problem: schema thrash",
"problem": "ToolSearch fires repeatedly within a session (seen in 81% of sessions) because the State Hub MCP tools are deferred and their schemas get re-loaded each time they are needed \u2014 pure overhead with no work value.",
"provenance": {
"detected_at": null,
"evidence": {
@@ -44,15 +44,22 @@
},
"rendering_hints": {
"claude": {
"note": "TODO: refine rendering",
"target": "CLAUDE.md"
}
},
"resolutions": [
{
"detail": "",
"detail": "Resolve them by name in one ToolSearch (select:...) rather than searching ad hoc.",
"steps": [
"List the hub tools the session needs",
"Load them once at the start"
],
"summary": "Load the tool schemas you'll need once, up front"
},
{
"detail": "The skill carries the schemas so no per-use discovery is needed.",
"steps": [],
"summary": "TODO: capture the recommended resolution"
"summary": "Adopt the State Hub skill that front-loads common hub tool signatures"
}
],
"schema_version": 1,
@@ -71,6 +78,6 @@
]
},
"status": "approved",
"updated_at": "2026-06-07T09:13:20Z",
"version": "1.0.0"
"updated_at": "2026-06-07T14:21:06Z",
"version": "1.0.1"
}

View File

@@ -0,0 +1 @@
{"created_at": "2026-06-07T09:13:20Z", "distribution_ready": true, "id": "sp-problem-tool_thrash-tool-bash", "name": "problem: tool thrash", "polarity": "problem", "problem": "problem: tool thrash", "provenance": {"detected_at": null, "evidence": {"cost_impact": 1990.0, "cross_flavor": false, "flavors": ["claude"], "frequency": 11, "key": "problem:tool_thrash:tool:Bash", "locus": "tool:Bash", "polarity": "problem", "repos": ["activity-core", "artifact-store", "citation-evidence", "ihp-railiance-probe", "infospace-bench", "railiance-apps", "state-hub", "vergabe-teilnahme"], "score": 21890.0, "sessions": ["claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca", "claude:2c0d14e1-d089-4076-bf35-b134737a261d", "claude:30dbad62-c042-41f2-80c1-5953a1100e7f", "claude:4307eff6-cd39-4189-be58-79a3acb69d6c", "claude:4340b160-2fb6-47d0-897c-3cac0a8855d8", "claude:6e0d3d68-872b-4d93-bb09-0691e091314b", "claude:8313f946-f008-4e98-9915-31950380e39e", "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78", "claude:a9483f07-c9dc-4f71-9fa0-831790ea965e", "claude:b1dfbcfa-91f9-4540-823a-26fcfaab7fc8", "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74"], "signal_type": "tool_thrash", "title": "problem: tool thrash"}, "promoted_at": "2026-06-07T09:13:20Z", "source_key": "problem:tool_thrash:tool:Bash"}, "rendering_hints": {"claude": {"note": "TODO: refine rendering", "target": "CLAUDE.md"}}, "resolutions": [{"detail": "", "steps": [], "summary": "TODO: capture the recommended resolution"}], "schema_version": 1, "scope": {"domains": [], "flavors": ["claude"], "repos": ["activity-core", "artifact-store", "citation-evidence", "ihp-railiance-probe", "infospace-bench", "railiance-apps", "state-hub", "vergabe-teilnahme"]}, "status": "superseded", "updated_at": "2026-06-07T09:13:20Z", "version": "1.0.0"}

View File

@@ -2,9 +2,9 @@
"created_at": "2026-06-07T09:13:20Z",
"distribution_ready": true,
"id": "sp-problem-tool_thrash-tool-bash",
"name": "problem: tool thrash",
"name": "Tool thrash: one tool hammered",
"polarity": "problem",
"problem": "problem: tool thrash",
"problem": "A single tool (often Bash or Edit) is invoked far more than any other in a session \u2014 a sign of trial-and-error churn or missing higher-level tooling.",
"provenance": {
"detected_at": null,
"evidence": {
@@ -49,15 +49,27 @@
},
"rendering_hints": {
"claude": {
"note": "TODO: refine rendering",
"target": "CLAUDE.md"
}
},
"resolutions": [
{
"detail": "",
"detail": "Compose a single command/script; run independent calls in parallel.",
"steps": [
"Group the steps",
"Run them as one block"
],
"summary": "Batch related shell work into one script, not many small Bash calls"
},
{
"detail": "Read the region, then one substantive Edit beats many tiny ones.",
"steps": [],
"summary": "TODO: capture the recommended resolution"
"summary": "Make fewer, larger edits with full context"
},
{
"detail": "If the same invocation recurs, wrap it once.",
"steps": [],
"summary": "Factor a repeated command pattern into a helper"
}
],
"schema_version": 1,
@@ -78,6 +90,6 @@
]
},
"status": "approved",
"updated_at": "2026-06-07T09:13:20Z",
"version": "1.0.0"
"updated_at": "2026-06-07T14:21:06Z",
"version": "1.0.1"
}

View File

@@ -0,0 +1 @@
{"created_at": "2026-06-07T09:13:20Z", "distribution_ready": true, "id": "sp-success-clean_pass-outcome", "name": "cross-flavor success: clean pass", "polarity": "success", "problem": "cross-flavor success: clean pass", "provenance": {"detected_at": null, "evidence": {"cost_impact": 17.0, "cross_flavor": true, "flavors": ["claude", "grok"], "frequency": 17, "key": "success:clean_pass:outcome", "locus": "outcome", "polarity": "success", "repos": ["activity-core", "agentic-resources", "artifact-store", "can-you-assist", "citation-evidence", "infospace-bench", "issue-facade", "ops-bridge", "railiance-apps", "state-hub", "the-custodian", "vergabe-teilnahme"], "score": 433.5, "sessions": ["claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca", "claude:16bdbec4-b018-4902-9fb5-336f8f3d61c8", "claude:2c0d14e1-d089-4076-bf35-b134737a261d", "claude:30dbad62-c042-41f2-80c1-5953a1100e7f", "claude:4307eff6-cd39-4189-be58-79a3acb69d6c", "claude:4340b160-2fb6-47d0-897c-3cac0a8855d8", "claude:631de76e-fdee-43b5-b091-7b7675467ad1", "claude:63fd4df2-5add-4748-af21-c1544825e006", "claude:6e0d3d68-872b-4d93-bb09-0691e091314b", "claude:8313f946-f008-4e98-9915-31950380e39e", "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78", "claude:a9483f07-c9dc-4f71-9fa0-831790ea965e", "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74", "claude:eb837dd1-5b8e-472e-b9e1-4537b10e03e6", "claude:ee9e84f2-bc35-4eb5-a7ad-aaec5f31d965", "claude:f1b25697-0e5f-45f0-81d1-af0f1762c438", "grok:019e6122-00c0-79f3-b4e5-9c70b77c015d"], "signal_type": "clean_pass", "title": "cross-flavor success: clean pass"}, "promoted_at": "2026-06-07T09:13:20Z", "source_key": "success:clean_pass:outcome"}, "rendering_hints": {"claude": {"note": "TODO: refine rendering", "target": "CLAUDE.md"}, "grok": {"note": "TODO: refine rendering", "target": "instructions"}}, "resolutions": [{"detail": "", "steps": [], "summary": "TODO: capture the recommended resolution"}], "schema_version": 1, "scope": {"domains": [], "flavors": ["claude", "grok"], "repos": ["activity-core", "agentic-resources", "artifact-store", "can-you-assist", "citation-evidence", "infospace-bench", "issue-facade", "ops-bridge", "railiance-apps", "state-hub", "the-custodian", "vergabe-teilnahme"]}, "status": "superseded", "updated_at": "2026-06-07T09:13:20Z", "version": "1.0.0"}

View File

@@ -2,9 +2,9 @@
"created_at": "2026-06-07T09:13:20Z",
"distribution_ready": true,
"id": "sp-success-clean_pass-outcome",
"name": "cross-flavor success: clean pass",
"name": "Clean pass: tests green, no retries",
"polarity": "success",
"problem": "cross-flavor success: clean pass",
"problem": "The target session shape: ends in success, runs the test suite, with no errors and no retries \u2014 resolves cheaply and reliably. Seen across many sessions and both Claude and Grok (the highest-value pattern to reinforce).",
"provenance": {
"detected_at": null,
"evidence": {
@@ -60,19 +60,26 @@
},
"rendering_hints": {
"claude": {
"note": "TODO: refine rendering",
"target": "CLAUDE.md"
},
"grok": {
"note": "TODO: refine rendering",
"target": "instructions"
}
},
"resolutions": [
{
"detail": "",
"detail": "A passing suite is the cheapest proof the change works.",
"steps": [
"Make the change",
"Run the suite",
"Only then report done"
],
"summary": "Run the test suite before declaring done; let green gate completion"
},
{
"detail": "Small verified steps beat large unverified ones that bounce.",
"steps": [],
"summary": "TODO: capture the recommended resolution"
"summary": "Work incrementally and verify as you go to avoid retries"
}
],
"schema_version": 1,
@@ -98,6 +105,6 @@
]
},
"status": "approved",
"updated_at": "2026-06-07T09:13:20Z",
"version": "1.0.0"
"updated_at": "2026-06-07T14:21:06Z",
"version": "1.0.1"
}

View File

@@ -39,6 +39,24 @@ min_substantive = 3 # require >= this many substantive (edit/read/shell) tool
min_prompt_len = 25 # first prompt shorter than this is treated as trivial
# Curate phase (AGENTIC-WP-0004): catalog location + promotion evidence bar.
# Measure phase (AGENTIC-WP-0009): persisted baseline/trend of fleet metrics.
[measure]
baselines = "session_memory/measure/baselines.jsonl" # timestamped metric snapshots (committed)
# Weekly retro (AGENTIC-WP-0010): windowed top-3-per-repo report, published to the
# hub as the coding_retro read model that activity-core's weekly schedule consumes.
[retro]
window_days = 7
report_json = "session_memory/retro/last_retro.json" # latest report (committed)
report_md = "session_memory/retro/last_retro.md" # human-readable mirror
hub_url = "http://127.0.0.1:8000" # for --publish (best-effort)
# Distribute phase (AGENTIC-WP-0007): where per-flavor proposals + the active
# registry are written. Proposals are HITL — reviewed, never auto-applied.
[distribute]
proposals_dir = "session_memory/proposals" # reviewable proposals (gitignored, regenerated)
active_registry = "session_memory/distribute/active_patterns.json" # what's proposed/active where (committed)
[curate]
catalog_dir = "session_memory/catalog" # files-first Pattern Catalog (committed)
review_log = "session_memory/.store/reviews.jsonl" # remembered decisions (gitignored)

View File

@@ -30,7 +30,7 @@ from .schema import SolutionPattern
# Content fields that define a pattern's substance. Version, timestamps, status,
# and distribution_ready are metadata — changes to them never bump the version.
_CONTENT_KEYS = ("name", "polarity", "problem", "resolutions", "scope",
"provenance", "rendering_hints")
"provenance", "rendering_hints", "covers")
ADDED = "added"
UNCHANGED = "unchanged"
@@ -86,6 +86,22 @@ class Catalog:
with open(path, encoding="utf-8") as fh:
return [json.loads(line) for line in fh if line.strip()]
def find_for(self, signal_key: str, locus: str = "") -> Optional[SolutionPattern]:
"""Best catalog pattern for a detect signal: exact id first, then ``covers``.
Lets a signal that doesn't share a pattern's exact key (e.g. a
``recurring_error`` fingerprint) inherit the curated recommendation when a
pattern declares it covers that text.
"""
exact = self.load(SolutionPattern.make_id(signal_key))
if exact is not None:
return exact
hay = f"{signal_key} {locus}".lower()
for p in self.list(): # sorted by id -> deterministic
if any(c.lower() in hay for c in p.covers):
return p
return None
# --- the single write path ---------------------------------------------
def upsert(self, pattern: SolutionPattern) -> str:

View File

@@ -81,6 +81,11 @@ class SolutionPattern:
# per-flavor rendering hints, kept OUT of the agnostic core (OQ4):
# {"claude": {...}, "codex": {...}, "grok": {...}}
rendering_hints: dict[str, dict[str, Any]] = field(default_factory=dict)
# other signal keys/loci this pattern's recommendation also applies to —
# lowercase substrings matched against a candidate signal's key+locus, so a
# detect signal that doesn't share this pattern's exact key (e.g. a
# recurring_error fingerprint) can still inherit the curated resolution.
covers: list[str] = field(default_factory=list)
status: str = "provisional"
distribution_ready: bool = False
created_at: Optional[str] = None

View File

@@ -0,0 +1,76 @@
"""Read a single session digest from the local store (AGENTIC-WP-0011 T03).
Thin read path for ``kaizen-agentic metrics correlate`` and other consumers.
Does not run ingest.
Usage:
python -m session_memory.digest_lookup <session_uid> [--json]
HELIX_STORE_DB=/abs/path/to/mem.db python -m session_memory.digest_lookup <uid>
"""
from __future__ import annotations
import argparse
import json
import os
import sys
from .core.store import Store
from .ingest import _expand, load_config
def resolve_store_paths(*, config_path: str | None = None) -> tuple[str, str]:
"""Resolve db + blob paths from HELIX_STORE_DB or config.toml [store]."""
env_db = os.environ.get("HELIX_STORE_DB")
if env_db:
db_path = _expand(env_db)
blob_dir = os.path.join(os.path.dirname(db_path), "blobs")
return db_path, blob_dir
here = os.path.dirname(os.path.abspath(__file__))
cfg_path = config_path or os.path.join(here, "config.toml")
store_cfg = load_config(cfg_path).get("store", {})
return _expand(store_cfg.get("db_path", "session_memory/.store/mem.db")), _expand(
store_cfg.get("blob_dir", "session_memory/.store/blobs")
)
def lookup_digest(session_uid: str, *, config_path: str | None = None) -> dict | None:
db_path, blob_dir = resolve_store_paths(config_path=config_path)
store = Store(db_path, blob_dir)
try:
return store.get_digest(session_uid)
finally:
store.close()
def main(argv: list[str] | None = None) -> int:
here = os.path.dirname(os.path.abspath(__file__))
ap = argparse.ArgumentParser(
description="Read one session digest from the Helix Forge store (no ingest)."
)
ap.add_argument("session_uid", help="Normalized session uid, e.g. claude:abc-123")
ap.add_argument("--config", default=os.path.join(here, "config.toml"),
help="config.toml when HELIX_STORE_DB is unset")
ap.add_argument("--json", action="store_true", help="print digest JSON to stdout")
args = ap.parse_args(argv)
digest = lookup_digest(args.session_uid, config_path=args.config)
if digest is None:
print(f"digest not found: {args.session_uid}", file=sys.stderr)
return 1
if args.json:
print(json.dumps(digest, indent=2, sort_keys=True))
else:
cost = digest.get("cost") or {}
tokens = cost.get("input_tokens", 0) + cost.get("output_tokens", 0)
print(f"session_uid: {digest.get('session_uid')}")
print(f"repo: {digest.get('repo')} flavor: {digest.get('flavor')}")
print(f"outcome: {digest.get('outcome')} tokens: {tokens}")
print(f"started_at: {digest.get('started_at')} ended_at: {digest.get('ended_at')}")
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,9 @@
"""Distribute phase (PRD §6.4) — render approved Solution Patterns into per-flavor
artifacts. Mirror of the collector design: agnostic core, thin distributor edges.
base.py Artifact + Distributor protocol + idempotent snippet markers (T01)
claude.py CLAUDE.md snippet distributor (T02)
codex.py AGENTS.md snippet distributor (T03)
grok.py native instruction distributor (T03)
__main__.py `python -m session_memory.distribute` (T05)
"""

View File

@@ -0,0 +1,89 @@
"""Distribute entrypoint (T05): catalog -> per-flavor proposals (HITL).
python -m session_memory.distribute [--config PATH] [--repo R] [--flavor F] [--json]
Reads approved / distribution-ready Solution Patterns from the Pattern Catalog and
renders them into per-flavor **proposals** (never auto-applied) scoped by
repo/domain, recording what is proposed where in the active-pattern registry.
Targets are the repo->domain map in ``config.toml`` crossed with the known
distributor flavors; each pattern's own ``Scope`` filters where it actually lands.
"""
from __future__ import annotations
import argparse
import json
import os
from ..curate.catalog import Catalog
from ..ingest import _expand, load_config
from .proposals import ActiveRegistry, Target, propose
from .registry import all_flavors
def build_targets(config: dict, repo_filter=None, flavor_filter=None) -> list[Target]:
repo_map = config.get("repo_domain_map", {})
flavors = [flavor_filter] if flavor_filter else all_flavors()
targets = []
for repo, domain in repo_map.items():
if repo_filter and repo != repo_filter:
continue
for flavor in flavors:
targets.append(Target(repo=repo, domain=domain, flavor=flavor))
return targets
def run_distribute(config: dict, *, repo_filter=None, flavor_filter=None):
cur = config.get("curate", {})
dist = config.get("distribute", {})
catalog = Catalog(_expand(cur.get("catalog_dir", "session_memory/catalog")))
patterns = catalog.list()
targets = build_targets(config, repo_filter, flavor_filter)
registry = ActiveRegistry(_expand(dist.get("active_registry",
"session_memory/distribute/active_patterns.json")))
out_dir = _expand(dist.get("proposals_dir", "session_memory/proposals"))
return propose(patterns, targets, out_dir, registry)
def _summary(res) -> str:
by_repo = {}
for repo, flavor, pid, _ in res.proposals:
by_repo.setdefault(repo, []).append(f"{pid}[{flavor}]")
lines = [f"# Distribute proposals ({len(res.proposals)} renders, "
f"{len(res.files_written)} files)"]
for repo in sorted(by_repo):
lines.append(f" {repo}: {', '.join(sorted(by_repo[repo]))}")
if res.skipped_not_distributable:
lines.append(f" skipped (not distribution-ready): "
f"{len(set(res.skipped_not_distributable))} pattern(s)")
if not res.proposals:
lines.append(" (no approved/distribution-ready patterns matched any target)")
return "\n".join(lines)
def main(argv=None) -> int:
here = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
ap = argparse.ArgumentParser(description="Distribute approved patterns as per-flavor proposals.")
ap.add_argument("--config", default=os.path.join(here, "config.toml"))
ap.add_argument("--repo", default=None, help="limit to one target repo")
ap.add_argument("--flavor", default=None, help="limit to one flavor")
ap.add_argument("--json", action="store_true")
args = ap.parse_args(argv)
config = load_config(args.config)
res = run_distribute(config, repo_filter=args.repo, flavor_filter=args.flavor)
if args.json:
print(json.dumps({
"proposals": [{"repo": r, "flavor": f, "pattern_id": p, "path": path}
for r, f, p, path in res.proposals],
"files_written": res.files_written,
"skipped": sorted(set(res.skipped_not_distributable)),
}, indent=2))
else:
print(_summary(res))
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,242 @@
[
{
"flavor": "claude",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "agentic-resources",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "codex",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "agentic-resources",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "grok",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "agentic-resources",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "claude",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "can-you-assist",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "codex",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "can-you-assist",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "grok",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "can-you-assist",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "claude",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "net-kingdom",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "codex",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "net-kingdom",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "grok",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "net-kingdom",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "claude",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "ops-bridge",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "codex",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "ops-bridge",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "grok",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "ops-bridge",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "claude",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "state-hub",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "codex",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "state-hub",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "grok",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "state-hub",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "claude",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "the-custodian",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "codex",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "the-custodian",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "grok",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "the-custodian",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "claude",
"pattern_id": "sp-problem-schema_thrash-schema_load",
"repo": "ops-bridge",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "claude",
"pattern_id": "sp-problem-tool_thrash-tool-bash",
"repo": "state-hub",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "claude",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "agentic-resources",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "grok",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "agentic-resources",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "claude",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "can-you-assist",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "grok",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "can-you-assist",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "claude",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "ops-bridge",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "grok",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "ops-bridge",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "claude",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "state-hub",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "grok",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "state-hub",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "claude",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "the-custodian",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "grok",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "the-custodian",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
}
]

View File

@@ -0,0 +1,115 @@
"""Distributor base — Artifact, the Distributor protocol, and idempotent markers
(PRD §6.4 FR-X1; T01).
A **distributor** turns one agnostic :class:`SolutionPattern` into a per-flavor
:class:`Artifact` (a target path + a snippet of content). Everything flavor-neutral
lives here; each flavor adapter (T02/T03) only supplies its target filename and may
override the rendered body using the pattern's ``rendering_hints``.
Snippets carry stable ``BEGIN/END`` markers keyed on the pattern id, so
re-distributing a pattern **updates its block in place** instead of duplicating it
— the property that lets Distribute run repeatedly (HITL) without drift.
"""
from __future__ import annotations
import re
from dataclasses import dataclass
from typing import Any, Optional, Protocol, runtime_checkable
from ..curate.schema import SolutionPattern
@dataclass
class Artifact:
"""A proposed per-flavor rendering of a pattern (FR-X1/FR-X3 — proposed, not applied)."""
flavor: str
target_path: str # repo-relative file the snippet belongs in (e.g. "CLAUDE.md")
pattern_id: str
content: str # the marker-wrapped snippet block
@runtime_checkable
class Distributor(Protocol):
flavor: str
target_path: str
def render(self, pattern: SolutionPattern) -> Artifact: ...
# --- idempotent snippet markers ---------------------------------------------
_MARK = "helix-forge pattern"
def begin_marker(pattern_id: str) -> str:
return f"<!-- BEGIN {_MARK}:{pattern_id} -->"
def end_marker(pattern_id: str) -> str:
return f"<!-- END {_MARK}:{pattern_id} -->"
def wrap_block(pattern_id: str, body: str, version: str = "") -> str:
"""Wrap a rendered body in stable BEGIN/END markers."""
ver = f" v{version}" if version else ""
return f"{begin_marker(pattern_id)}{ver}\n{body.strip()}\n{end_marker(pattern_id)}"
def upsert_block(doc_text: str, pattern_id: str, block: str) -> str:
"""Insert or replace a pattern's marked block within a document (idempotent)."""
pat = re.compile(
re.escape(begin_marker(pattern_id)) + r".*?" + re.escape(end_marker(pattern_id)),
re.DOTALL,
)
if pat.search(doc_text):
return pat.sub(block, doc_text)
sep = "" if doc_text.endswith("\n\n") or not doc_text else "\n\n"
return f"{doc_text}{sep}{block}\n"
# --- agnostic body rendering ------------------------------------------------
def render_markdown_body(pattern: SolutionPattern) -> str:
"""Default flavor-neutral snippet body from the agnostic pattern fields."""
label = "Avoid" if pattern.polarity == "problem" else "Prefer"
lines = [f"### {pattern.name}", "", pattern.problem.strip(), ""]
if pattern.resolutions:
lines.append(f"**{label}:**")
for r in pattern.resolutions:
detail = f"{r.detail}" if r.detail else ""
lines.append(f"- {r.summary}{detail}")
for step in r.steps:
lines.append(f" - {step}")
return "\n".join(lines).strip()
def hint(pattern: SolutionPattern, flavor: str, key: str, default: Any = None) -> Any:
"""Read a per-flavor rendering hint, falling back to ``default``."""
return (pattern.rendering_hints.get(flavor) or {}).get(key, default)
class BaseDistributor:
"""Shared distributor: renders the agnostic body, honouring a ``body`` hint
override and a ``target`` hint, then wraps it in idempotent markers."""
flavor: str = ""
target_path: str = ""
def __init__(self, flavor: Optional[str] = None, target_path: Optional[str] = None) -> None:
if flavor is not None:
self.flavor = flavor
if target_path is not None:
self.target_path = target_path
def body(self, pattern: SolutionPattern) -> str:
return hint(pattern, self.flavor, "body") or render_markdown_body(pattern)
def target(self, pattern: SolutionPattern) -> str:
return hint(pattern, self.flavor, "target") or self.target_path
def render(self, pattern: SolutionPattern) -> Artifact:
block = wrap_block(pattern.id, self.body(pattern), pattern.version)
return Artifact(flavor=self.flavor, target_path=self.target(pattern),
pattern_id=pattern.id, content=block)

View File

@@ -0,0 +1,42 @@
"""Claude distributor (PRD §6.4 FR-X1; T02).
Renders an approved Solution Pattern into a ``CLAUDE.md`` snippet block. Most logic
is inherited from :class:`BaseDistributor`; the Claude-specific touch is an
optional **skill** rendering mode (``rendering_hints["claude"]["as"] == "skill"``)
that emits a skill-style stub instead of a plain instruction snippet — Claude's
native distribution targets are CLAUDE.md snippets, skills, or hooks.
"""
from __future__ import annotations
from ..curate.schema import SolutionPattern
from .base import BaseDistributor, hint, render_markdown_body
class ClaudeDistributor(BaseDistributor):
flavor = "claude"
target_path = "CLAUDE.md"
def body(self, pattern: SolutionPattern) -> str:
override = hint(pattern, self.flavor, "body")
if override:
return override
if hint(pattern, self.flavor, "as") == "skill":
return self._skill_stub(pattern)
return render_markdown_body(pattern)
@staticmethod
def _skill_stub(pattern: SolutionPattern) -> str:
trigger = "avoid" if pattern.polarity == "problem" else "apply"
lines = [
f"## Skill: {pattern.name}",
"",
f"**When:** situations where you would {trigger}{pattern.problem.strip()}",
"",
"**Steps:**",
]
for r in pattern.resolutions:
lines.append(f"- {r.summary}" + (f"{r.detail}" if r.detail else ""))
for step in r.steps:
lines.append(f" - {step}")
return "\n".join(lines).strip()

View File

@@ -0,0 +1,15 @@
"""Codex distributor (PRD §6.4 FR-X1; T03).
Renders an approved Solution Pattern into an ``AGENTS.md`` snippet — Codex's native
repo-convention surface. Identical agnostic body to the other flavors (FR-A3: one
pattern, expressible everywhere); only the target file differs.
"""
from __future__ import annotations
from .base import BaseDistributor
class CodexDistributor(BaseDistributor):
flavor = "codex"
target_path = "AGENTS.md"

View File

@@ -0,0 +1,15 @@
"""Grok distributor (PRD §6.4 FR-X1; T03).
Renders an approved Solution Pattern into Grok's native instruction format. Defaults
to a ``.grok/instructions.md`` snippet; the same agnostic body as the other flavors
(FR-A3), overridable via ``rendering_hints["grok"]``.
"""
from __future__ import annotations
from .base import BaseDistributor
class GrokDistributor(BaseDistributor):
flavor = "grok"
target_path = ".grok/instructions.md"

View File

@@ -0,0 +1,136 @@
"""Scoping, proposed-not-applied output, and the active-pattern registry
(PRD §6.4 FR-X2/FR-X3/FR-X4; T04).
* **Scope (FR-X2):** a pattern lands in a target environment only if the target's
repo/domain/flavor are within the pattern's :class:`Scope` (an empty scope list
means "unrestricted on that axis").
* **Proposed, not applied (FR-X3):** rendered artifacts are written under a
``proposals/`` tree mirroring the target path — a reviewable diff a human applies,
never auto-written into the live file. Re-running upserts each pattern's block in
place (idempotent), so proposals don't accumulate duplicates.
* **Active-pattern registry (FR-X4):** a JSON record of which pattern (and version)
is proposed/active in which (repo, flavor) environment.
"""
from __future__ import annotations
import json
import os
from dataclasses import dataclass
from datetime import datetime, timezone
from ..curate.schema import SolutionPattern
from .base import upsert_block
from .registry import get_distributor
def _now() -> str:
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
@dataclass(frozen=True)
class Target:
"""An environment a pattern could be distributed to."""
repo: str
domain: str = ""
flavor: str = "claude"
def applies(pattern: SolutionPattern, target: Target) -> bool:
"""True if ``target`` is within the pattern's scope (empty axis == any)."""
sc = pattern.scope
if sc.repos and target.repo not in sc.repos:
return False
if sc.domains and target.domain and target.domain not in sc.domains:
return False
if sc.flavors and target.flavor not in sc.flavors:
return False
return True
def is_distributable(pattern: SolutionPattern) -> bool:
return pattern.status == "approved" and pattern.distribution_ready
class ActiveRegistry:
"""JSON record of patterns proposed/active per (repo, flavor) — FR-X4."""
def __init__(self, path: str) -> None:
self.path = path
self._entries: dict[str, dict] = {}
if os.path.exists(path):
with open(path, encoding="utf-8") as fh:
for e in json.load(fh):
self._entries[self._key(e["pattern_id"], e["repo"], e["flavor"])] = e
@staticmethod
def _key(pid: str, repo: str, flavor: str) -> str:
return f"{pid}|{repo}|{flavor}"
def record(self, pid: str, repo: str, flavor: str, version: str,
status: str = "proposed") -> None:
self._entries[self._key(pid, repo, flavor)] = {
"pattern_id": pid, "repo": repo, "flavor": flavor,
"version": version, "status": status, "updated_at": _now(),
}
def entries(self) -> list[dict]:
return [self._entries[k] for k in sorted(self._entries)]
def save(self) -> None:
os.makedirs(os.path.dirname(self.path) or ".", exist_ok=True)
with open(self.path, "w", encoding="utf-8") as fh:
json.dump(self.entries(), fh, indent=2, sort_keys=True)
fh.write("\n")
@dataclass
class ProposalResult:
proposals: list = None # (repo, flavor, pattern_id, proposal_path)
files_written: list = None # absolute proposal paths
skipped_not_distributable: list = None # pattern ids
def __post_init__(self):
self.proposals = self.proposals or []
self.files_written = self.files_written or []
self.skipped_not_distributable = self.skipped_not_distributable or []
def propose(patterns: list[SolutionPattern], targets: list[Target], out_dir: str,
registry: ActiveRegistry) -> ProposalResult:
"""Render in-scope, distributable patterns into per-target proposal files."""
result = ProposalResult()
pending: dict[str, str] = {} # proposal path -> accumulated content
for p in patterns:
if not is_distributable(p):
result.skipped_not_distributable.append(p.id)
continue
for t in targets:
dist = get_distributor(t.flavor)
if dist is None or not applies(p, t):
continue
art = dist.render(p)
path = os.path.join(out_dir, t.repo, art.target_path)
if path not in pending:
pending[path] = _read(path)
pending[path] = upsert_block(pending[path], p.id, art.content)
registry.record(p.id, t.repo, t.flavor, p.version)
result.proposals.append((t.repo, t.flavor, p.id, path))
for path, content in pending.items():
os.makedirs(os.path.dirname(path), exist_ok=True)
with open(path, "w", encoding="utf-8") as fh:
fh.write(content if content.endswith("\n") else content + "\n")
result.files_written.append(path)
registry.save()
return result
def _read(path: str) -> str:
if os.path.exists(path):
with open(path, encoding="utf-8") as fh:
return fh.read()
return ""

View File

@@ -0,0 +1,26 @@
"""Distributor registry (T03) — flavor -> distributor, the one place that knows
about all flavor edges. Adding a flavor = one entry here + one adapter module.
"""
from __future__ import annotations
from typing import Optional
from .base import BaseDistributor
from .claude import ClaudeDistributor
from .codex import CodexDistributor
from .grok import GrokDistributor
_REGISTRY: dict[str, BaseDistributor] = {
"claude": ClaudeDistributor(),
"codex": CodexDistributor(),
"grok": GrokDistributor(),
}
def get_distributor(flavor: str) -> Optional[BaseDistributor]:
return _REGISTRY.get(flavor)
def all_flavors() -> list[str]:
return list(_REGISTRY)

View File

@@ -0,0 +1,9 @@
"""Measure phase (PRD §6.5) — the loop-closer.
metrics.py fleet metrics + persisted baseline snapshots (T01)
effect.py before/after per-pattern effectiveness (T02)
__main__.py python -m session_memory.measure (T03)
Computation over existing digests (reusing WP-0005 tool buckets + WP-0006 error
mining); no new capture.
"""

View File

@@ -0,0 +1,101 @@
"""Measure entrypoint (T03): fleet trend + per-pattern effectiveness.
python -m session_memory.measure [--config PATH] [--label L] [--since DATE]
[--no-save] [--json]
Computes current fleet metrics over the real (quality-filtered) sessions, appends
them to the baseline trend, and reports whether the fleet is getting cheaper /
more reliable over time (FR-M3). With ``--since DATE`` it also reports before/after
effectiveness around a change (FR-M1/FR-M2).
"""
from __future__ import annotations
import argparse
import json
import os
from ..core.store import Store
from ..detect.quality import filter_real, quality_config
from ..ingest import _expand, load_config
from .effect import effectiveness
from .metrics import load_baselines, save_baseline, snapshot
_TREND_KEYS = ("infra_overhead_share_median", "error_rate", "schema_thrash_sessions",
"tokens_p50", "success_rate")
def real_digests(config: dict) -> list[dict]:
s = config.get("store", {})
store = Store(_expand(s["db_path"]), _expand(s["blob_dir"]))
out = filter_real(store.list_digests(), quality_config(config))
store.close()
return out
def _fmt_trend(baselines: list[dict]) -> str:
if not baselines:
return " (no prior snapshots)"
lines = []
recent = baselines[-5:]
for b in recent:
when = (b.get("captured_at") or "")[:10]
lbl = f" {b['label']}" if b.get("label") else ""
lines.append(f" {when}{lbl}: overhead_med={b.get('infra_overhead_share_median')} "
f"err_rate={b.get('error_rate')} schema_thrash={b.get('schema_thrash_sessions')} "
f"tok_p50={b.get('tokens_p50')} success={b.get('success_rate')} "
f"(n={b.get('n_sessions')})")
return "\n".join(lines)
def _report(current: dict, baselines: list[dict], eff: dict | None) -> str:
lines = [f"# Fleet metrics (n={current.get('n_sessions')} real sessions)"]
for k in _TREND_KEYS:
lines.append(f" {k} = {current.get(k)}")
lines.append("\n## Trend (recent snapshots)")
lines.append(_fmt_trend(baselines))
if eff is not None:
lines.append(f"\n## Effectiveness since {eff['applied_at']} "
f"(before={eff['n_before']}, after={eff['n_after']})")
if eff["insufficient_data"]:
lines.append(" insufficient data on one side of the date")
else:
for k in _TREND_KEYS:
d = eff["deltas"].get(k, {})
mark = {True: "improved", False: "worse", None: ""}[d.get("improved")]
lines.append(f" {k}: {d.get('before')} -> {d.get('after')} "
f"({d.get('change'):+}) {mark}")
return "\n".join(lines)
def main(argv=None) -> int:
here = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
ap = argparse.ArgumentParser(description="Measure fleet metrics + per-pattern effectiveness.")
ap.add_argument("--config", default=os.path.join(here, "config.toml"))
ap.add_argument("--label", default="")
ap.add_argument("--since", default=None, help="ISO date for before/after effectiveness")
ap.add_argument("--no-save", action="store_true", help="don't append to the baseline trend")
ap.add_argument("--json", action="store_true")
args = ap.parse_args(argv)
config = load_config(args.config)
digests = real_digests(config)
current = snapshot(digests, label=args.label)
path = _expand(config.get("measure", {}).get("baselines", "session_memory/measure/baselines.jsonl"))
prior = load_baselines(path)
if not args.no_save:
save_baseline(current, path)
eff = effectiveness(digests, args.since, label=args.label) if args.since else None
if args.json:
print(json.dumps({"current": current, "trend": prior + [current], "effectiveness": eff},
indent=2))
else:
print(_report(current, prior + [current], eff))
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1 @@
{"captured_at": "2026-06-07T13:30:14Z", "error_rate": 0.963, "infra_overhead_share_median": 0.117, "infra_overhead_share_p90": 0.261, "label": "phase4-baseline (pre-fixes)", "n_sessions": 27, "recurring_error_occurrences": 505, "schema_thrash_sessions": 8, "success_rate": 1.0, "tokens_p50": 250725, "tokens_p90": 1423966}

View File

@@ -0,0 +1,60 @@
"""Before/after per-pattern effectiveness (PRD §6.5 FR-M1/FR-M2; T02).
Given a change/pattern with an ``applied_at`` date, split sessions into *before*
and *after* by their start time, aggregate each side, and diff the headline
metrics — so we can say whether a distributed pattern (e.g. the Read-before-Edit
reflex, or the State Hub skill) actually moved the numbers, and retire it if not.
"""
from __future__ import annotations
from .metrics import aggregate
# Metrics where a *lower* value after the change means improvement.
_LOWER_IS_BETTER = {
"infra_overhead_share_median", "infra_overhead_share_p90", "error_rate",
"recurring_error_occurrences", "schema_thrash_sessions", "tokens_p50", "tokens_p90",
}
# Metrics where a *higher* value is improvement.
_HIGHER_IS_BETTER = {"success_rate"}
def split_by_date(digests: list[dict], applied_at: str) -> tuple[list[dict], list[dict]]:
"""Partition digests into (before, after) by ``started_at`` vs ``applied_at``."""
before, after = [], []
for d in digests:
ts = d.get("started_at") or ""
(after if ts and ts >= applied_at else before).append(d)
return before, after
def _delta(metric: str, before: float, after: float) -> dict:
change = round(after - before, 3)
if metric in _LOWER_IS_BETTER:
improved = change < 0
elif metric in _HIGHER_IS_BETTER:
improved = change > 0
else:
improved = None
return {"before": before, "after": after, "change": change, "improved": improved}
def effectiveness(digests: list[dict], applied_at: str, *, label: str = "") -> dict:
"""Compare fleet metrics after ``applied_at`` against the prior period."""
before, after = split_by_date(digests, applied_at)
b_agg, a_agg = aggregate(before), aggregate(after)
metrics = (_LOWER_IS_BETTER | _HIGHER_IS_BETTER)
deltas = {}
if before and after:
for m in metrics:
deltas[m] = _delta(m, b_agg.get(m, 0.0), a_agg.get(m, 0.0))
return {
"label": label,
"applied_at": applied_at,
"n_before": len(before),
"n_after": len(after),
"before": b_agg,
"after": a_agg,
"deltas": deltas,
"insufficient_data": not (before and after),
}

View File

@@ -0,0 +1,102 @@
"""Fleet metrics + persisted baselines (PRD §6.5 FR-M3; T01).
Computes the headline health metrics of the captured corpus — the same quantities
the friction assessment reported — so they can be tracked over time and compared
before/after a change. Reuses :func:`detect.signals.tool_bucket` (WP-0005) and the
digest ``error_snippets`` (WP-0006); no new capture.
A **baseline** is a timestamped metrics snapshot appended to a JSONL file, so
successive runs build a trend the entrypoint (T03) can chart.
"""
from __future__ import annotations
import collections
import json
import os
from datetime import datetime, timezone
from ..detect.signals import tool_bucket
def _now() -> str:
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
def _pct(values: list[float], q: float) -> float:
if not values:
return 0.0
s = sorted(values)
return round(s[int(q * (len(s) - 1))], 3)
def _median(values: list[float]) -> float:
return _pct(values, 0.5)
def _buckets(digest: dict) -> collections.Counter:
b: collections.Counter = collections.Counter()
for tool, n in (digest.get("tool_histogram") or {}).items():
b[tool_bucket(tool)] += n
return b
def session_metrics(digest: dict) -> dict:
"""Per-session metrics used to build fleet aggregates."""
b = _buckets(digest)
total = sum(b.values()) or 1
overhead = b["statehub_mcp"] + b["task_mgmt"] + b["schema_load"]
cost = digest.get("cost", {})
tokens = cost.get("input_tokens", 0) + cost.get("output_tokens", 0)
return {
"infra_overhead_share": overhead / total,
"tool_calls": total,
"schema_load": b["schema_load"],
"error_occurrences": sum(s.get("count", 1) for s in (digest.get("error_snippets") or [])),
"has_error": bool(digest.get("error_snippets")),
"tokens": tokens,
"success": digest.get("outcome") == "success",
}
def aggregate(digests: list[dict], *, schema_thrash_threshold: int = 5) -> dict:
"""Fleet-level metrics over a set of (already quality-filtered) digests."""
per = [session_metrics(d) for d in digests]
n = len(per)
if n == 0:
return {"n_sessions": 0}
shares = [m["infra_overhead_share"] for m in per]
tokens = [m["tokens"] for m in per]
return {
"n_sessions": n,
"infra_overhead_share_median": _median(shares),
"infra_overhead_share_p90": _pct(shares, 0.9),
"error_rate": round(sum(m["has_error"] for m in per) / n, 3),
"recurring_error_occurrences": sum(m["error_occurrences"] for m in per),
"schema_thrash_sessions": sum(1 for m in per if m["schema_load"] >= schema_thrash_threshold),
"tokens_p50": _pct(tokens, 0.5),
"tokens_p90": _pct(tokens, 0.9),
"success_rate": round(sum(m["success"] for m in per) / n, 3),
}
def snapshot(digests: list[dict], *, label: str = "") -> dict:
m = aggregate(digests)
m["captured_at"] = _now()
m["label"] = label
return m
def save_baseline(metrics: dict, path: str) -> None:
"""Append a metrics snapshot to the baseline JSONL trend file."""
os.makedirs(os.path.dirname(path) or ".", exist_ok=True)
with open(path, "a", encoding="utf-8") as fh:
fh.write(json.dumps(metrics, sort_keys=True))
fh.write("\n")
def load_baselines(path: str) -> list[dict]:
if not os.path.exists(path):
return []
with open(path, encoding="utf-8") as fh:
return [json.loads(line) for line in fh if line.strip()]

View File

@@ -0,0 +1,9 @@
"""Weekly retro (AGENTIC-WP-0010) — the analysis half of the coding retrospection.
build.py windowed detect + measure -> ranked top-3 suggestions per repo (T01)
publish.py publish the retro to the hub read model + local report (T02)
__main__.py python -m session_memory.retro (T03)
Consumed by activity-core's weekly-coding-retro schedule (ACTIVITY-WP-0008) via
the ``event_type=coding_retro`` read model.
"""

View File

@@ -0,0 +1,68 @@
"""Weekly retro entrypoint (AGENTIC-WP-0010 T03).
python -m session_memory.retro [--window-days 7] [--since D] [--until D]
[--publish] [--json]
Builds the windowed top-3-per-repo retro over the captured sessions, writes a local
JSON + markdown report, and (with ``--publish``) posts it to the hub as the
``coding_retro`` read model that activity-core's weekly schedule consumes.
"""
from __future__ import annotations
import argparse
import json
import os
from ..core.store import Store
from ..curate.catalog import Catalog
from ..ingest import _expand, load_config
from .build import weekly_retro
from .publish import publish_to_hub, render_markdown, write_local
def run_retro(config: dict, *, window_days=None, since=None, until=None):
s = config.get("store", {})
store = Store(_expand(s["db_path"]), _expand(s["blob_dir"]))
digests = store.list_digests()
store.close()
cur = config.get("curate", {})
catalog = Catalog(_expand(cur.get("catalog_dir", "session_memory/catalog")))
rcfg = config.get("retro", {})
return weekly_retro(digests, catalog, since=since, until=until,
window_days=window_days or rcfg.get("window_days", 7))
def main(argv=None) -> int:
here = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
ap = argparse.ArgumentParser(description="Build (and optionally publish) the weekly coding retro.")
ap.add_argument("--config", default=os.path.join(here, "config.toml"))
ap.add_argument("--window-days", type=int, default=None)
ap.add_argument("--since", default=None)
ap.add_argument("--until", default=None)
ap.add_argument("--publish", action="store_true", help="post to the hub coding_retro read model")
ap.add_argument("--json", action="store_true")
args = ap.parse_args(argv)
config = load_config(args.config)
report = run_retro(config, window_days=args.window_days, since=args.since, until=args.until)
rcfg = config.get("retro", {})
write_local(report, _expand(rcfg.get("report_json", "session_memory/retro/last_retro.json")),
_expand(rcfg.get("report_md", "session_memory/retro/last_retro.md")))
published = None
if args.publish:
published = publish_to_hub(report, base_url=rcfg.get("hub_url", "http://127.0.0.1:8000"))
if args.json:
print(json.dumps({"report": report, "published": published}, indent=2))
else:
print(render_markdown(report))
if args.publish:
print(f"\npublished to hub: {published}")
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,99 @@
"""Windowed weekly retro report (AGENTIC-WP-0010 T01).
Runs the existing detect pipeline over a date window, ranks the recurring problem
patterns into **per-repo improvement suggestions** (top 3, cross-flavor first),
attaches a recommendation from the Pattern Catalog where one exists, and bundles a
fleet measure snapshot for context. Pure function over digests — the entrypoint
(T03) handles store/publish.
"""
from __future__ import annotations
import collections
from dataclasses import asdict, dataclass
from datetime import datetime, timedelta, timezone
from typing import Optional
from ..detect.cluster import cluster
from ..detect.quality import QualityConfig, filter_real
from ..detect.signals import extract_signals
from ..measure.metrics import aggregate
# score at/above which a suggestion is "high" priority even when single-flavor
_HIGH_SCORE = 100.0
def _parse(ts: str) -> datetime:
return datetime.fromisoformat(ts.replace("Z", "+00:00"))
def _iso(dt: datetime) -> str:
return dt.astimezone(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
def _now() -> datetime:
return datetime.now(timezone.utc)
@dataclass
class Suggestion:
repo: str
title: str
recommendation: str
priority: str # high | medium
score: float
signal_type: str
cross_flavor: bool
pattern_key: str
def _recommendation(pattern_key: str, locus: str, catalog) -> Optional[str]:
if catalog is None:
return None
sp = catalog.find_for(pattern_key, locus)
if sp and sp.resolutions:
return sp.resolutions[0].summary
return None
def weekly_retro(digests: list[dict], catalog=None, *, since: Optional[str] = None,
until: Optional[str] = None, window_days: int = 7,
max_per_repo: int = 3, min_frequency: int = 2,
quality: Optional[QualityConfig] = None) -> dict:
"""Build the ranked weekly retro report over a date window."""
until_dt = _parse(until) if until else _now()
since_dt = _parse(since) if since else until_dt - timedelta(days=window_days)
windowed = [d for d in digests
if d.get("started_at") and since_dt <= _parse(d["started_at"]) < until_dt]
real = filter_real(windowed, quality or QualityConfig())
patterns = cluster(extract_signals(real), min_frequency=min_frequency)
by_repo: dict[str, list[Suggestion]] = collections.defaultdict(list)
for p in patterns:
if p.polarity != "problem":
continue # improvements come from problems
rec = (_recommendation(p.key, p.locus, catalog)
or f"Investigate {p.signal_type.replace('_', ' ')} on {p.locus}")
priority = "high" if (p.cross_flavor or p.score >= _HIGH_SCORE) else "medium"
for repo in (p.repos or ["(unknown)"]):
by_repo[repo].append(Suggestion(
repo=repo, title=p.title, recommendation=rec, priority=priority,
score=p.score, signal_type=p.signal_type, cross_flavor=p.cross_flavor,
pattern_key=p.key))
suggestions: list[Suggestion] = []
for repo in sorted(by_repo):
items = sorted(by_repo[repo], key=lambda s: -s.score)
suggestions.extend(items[:max_per_repo])
# cross-flavor first, then by score (global ordering for the report)
suggestions.sort(key=lambda s: (not s.cross_flavor, -s.score))
return {
"window": {"since": _iso(since_dt), "until": _iso(until_dt), "days": window_days},
"generated_at": _iso(_now()),
"n_sessions": len(real),
"suggestions": [asdict(s) for s in suggestions],
"measure": aggregate(real),
}

View File

@@ -0,0 +1,322 @@
{
"generated_at": "2026-06-07T19:30:56Z",
"measure": {
"error_rate": 0.957,
"infra_overhead_share_median": 0.167,
"infra_overhead_share_p90": 0.23,
"n_sessions": 23,
"recurring_error_occurrences": 463,
"schema_thrash_sessions": 7,
"success_rate": 1.0,
"tokens_p50": 250725,
"tokens_p90": 901422
},
"n_sessions": 23,
"suggestions": [
{
"cross_flavor": true,
"pattern_key": "problem:recurring_error:make: *** [makefile:<n>: fix-consistency] error <n>",
"priority": "high",
"recommendation": "Investigate recurring error on make: *** [makefile:<n>: fix-consistency] error <n>",
"repo": "net-kingdom",
"score": 54.0,
"signal_type": "recurring_error",
"title": "cross-flavor problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:tool_thrash:tool:Bash",
"priority": "high",
"recommendation": "Batch related shell work into one script, not many small Bash calls",
"repo": "activity-core",
"score": 13128.0,
"signal_type": "tool_thrash",
"title": "problem: tool thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:tool_thrash:tool:Bash",
"priority": "high",
"recommendation": "Batch related shell work into one script, not many small Bash calls",
"repo": "artifact-store",
"score": 13128.0,
"signal_type": "tool_thrash",
"title": "problem: tool thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:tool_thrash:tool:Bash",
"priority": "high",
"recommendation": "Batch related shell work into one script, not many small Bash calls",
"repo": "citation-evidence",
"score": 13128.0,
"signal_type": "tool_thrash",
"title": "problem: tool thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:tool_thrash:tool:Bash",
"priority": "high",
"recommendation": "Batch related shell work into one script, not many small Bash calls",
"repo": "infospace-bench",
"score": 13128.0,
"signal_type": "tool_thrash",
"title": "problem: tool thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:tool_thrash:tool:Bash",
"priority": "high",
"recommendation": "Batch related shell work into one script, not many small Bash calls",
"repo": "railiance-apps",
"score": 13128.0,
"signal_type": "tool_thrash",
"title": "problem: tool thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:tool_thrash:tool:Bash",
"priority": "high",
"recommendation": "Batch related shell work into one script, not many small Bash calls",
"repo": "state-hub",
"score": 13128.0,
"signal_type": "tool_thrash",
"title": "problem: tool thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:schema_thrash:schema_load",
"priority": "high",
"recommendation": "Load the tool schemas you'll need once, up front",
"repo": "activity-core",
"score": 441.0,
"signal_type": "schema_thrash",
"title": "problem: schema thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:schema_thrash:schema_load",
"priority": "high",
"recommendation": "Load the tool schemas you'll need once, up front",
"repo": "citation-evidence",
"score": 441.0,
"signal_type": "schema_thrash",
"title": "problem: schema thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:schema_thrash:schema_load",
"priority": "high",
"recommendation": "Load the tool schemas you'll need once, up front",
"repo": "flex-auth",
"score": 441.0,
"signal_type": "schema_thrash",
"title": "problem: schema thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:schema_thrash:schema_load",
"priority": "high",
"recommendation": "Load the tool schemas you'll need once, up front",
"repo": "infospace-bench",
"score": 441.0,
"signal_type": "schema_thrash",
"title": "problem: schema thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:schema_thrash:schema_load",
"priority": "high",
"recommendation": "Load the tool schemas you'll need once, up front",
"repo": "ops-bridge",
"score": 441.0,
"signal_type": "schema_thrash",
"title": "problem: schema thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
"priority": "high",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "activity-core",
"score": 290.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
"priority": "high",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "citation-evidence",
"score": 290.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
"priority": "high",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "infospace-bench",
"score": 290.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
"priority": "high",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "issue-facade",
"score": 290.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
"priority": "high",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "railiance-apps",
"score": 290.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
"priority": "high",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "state-hub",
"score": 290.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
"priority": "high",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "the-custodian",
"score": 290.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
"priority": "high",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "vergabe-teilnahme",
"score": 290.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has been modified since read, either by the user or by a linter. read it again before attempting to write it.<<path>>",
"priority": "medium",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "artifact-store",
"score": 78.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has been modified since read, either by the user or by a linter. read it again before attempting to write it.<<path>>",
"priority": "medium",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "issue-facade",
"score": 78.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has been modified since read, either by the user or by a linter. read it again before attempting to write it.<<path>>",
"priority": "medium",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "railiance-apps",
"score": 78.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has been modified since read, either by the user or by a linter. read it again before attempting to write it.<<path>>",
"priority": "medium",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "state-hub",
"score": 78.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:budget_overrun:tokens",
"priority": "medium",
"recommendation": "Read narrowly \u2014 target the region you need, not whole large files",
"repo": "artifact-store",
"score": 50.55,
"signal_type": "budget_overrun",
"title": "problem: budget overrun"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:{",
"priority": "medium",
"recommendation": "Investigate recurring error on {",
"repo": "vergabe-teilnahme",
"score": 12.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:found <n> errors (<n> fixed, <n> remaining).",
"priority": "medium",
"recommendation": "Investigate recurring error on found <n> errors (<n> fixed, <n> remaining).",
"repo": "ops-bridge",
"score": 10.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:(note: edit also tried swapping \\uxxxx escapes and their characters; neither form matched, so the mismatch is likely elsewhere in old_string. re-read the file a",
"priority": "medium",
"recommendation": "Investigate recurring error on (note: edit also tried swapping \\uxxxx escapes and their characters; neither form matched, so the mismatch is likely elsewhere in old_string. re-read the file a",
"repo": "net-kingdom",
"score": 6.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:found <n> error (<n> fixed, <n> remaining).",
"priority": "medium",
"recommendation": "Investigate recurring error on found <n> error (<n> fixed, <n> remaining).",
"repo": "ops-bridge",
"score": 6.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<n> failed, <n> passed in <n>.00s",
"priority": "medium",
"recommendation": "Investigate recurring error on <n> failed, <n> passed in <n>.00s",
"repo": "agentic-resources",
"score": 4.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
}
],
"window": {
"days": 30,
"since": "2026-05-08T19:30:56Z",
"until": "2026-06-07T19:30:56Z"
}
}

View File

@@ -0,0 +1,39 @@
# Weekly Coding Retro (2026-05-08 → 2026-06-07)
_23 real sessions · generated 2026-06-07T19:30:56Z_
## Top improvement suggestions (cross-flavor first, ≤3 per repo)
- **net-kingdom** (high, score=54.0) [CROSS-FLAVOR]: cross-flavor problem: recurring error — Investigate recurring error on make: *** [makefile:<n>: fix-consistency] error <n>
- **activity-core** (high, score=13128.0): problem: tool thrash — Batch related shell work into one script, not many small Bash calls
- **artifact-store** (high, score=13128.0): problem: tool thrash — Batch related shell work into one script, not many small Bash calls
- **citation-evidence** (high, score=13128.0): problem: tool thrash — Batch related shell work into one script, not many small Bash calls
- **infospace-bench** (high, score=13128.0): problem: tool thrash — Batch related shell work into one script, not many small Bash calls
- **railiance-apps** (high, score=13128.0): problem: tool thrash — Batch related shell work into one script, not many small Bash calls
- **state-hub** (high, score=13128.0): problem: tool thrash — Batch related shell work into one script, not many small Bash calls
- **activity-core** (high, score=441.0): problem: schema thrash — Load the tool schemas you'll need once, up front
- **citation-evidence** (high, score=441.0): problem: schema thrash — Load the tool schemas you'll need once, up front
- **flex-auth** (high, score=441.0): problem: schema thrash — Load the tool schemas you'll need once, up front
- **infospace-bench** (high, score=441.0): problem: schema thrash — Load the tool schemas you'll need once, up front
- **ops-bridge** (high, score=441.0): problem: schema thrash — Load the tool schemas you'll need once, up front
- **activity-core** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **citation-evidence** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **infospace-bench** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **issue-facade** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **railiance-apps** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **state-hub** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **the-custodian** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **vergabe-teilnahme** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **artifact-store** (medium, score=78.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **issue-facade** (medium, score=78.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **railiance-apps** (medium, score=78.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **state-hub** (medium, score=78.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **artifact-store** (medium, score=50.55): problem: budget overrun — Read narrowly — target the region you need, not whole large files
- **vergabe-teilnahme** (medium, score=12.0): problem: recurring error — Investigate recurring error on {
- **ops-bridge** (medium, score=10.0): problem: recurring error — Investigate recurring error on found <n> errors (<n> fixed, <n> remaining).
- **net-kingdom** (medium, score=6.0): problem: recurring error — Investigate recurring error on (note: edit also tried swapping \uxxxx escapes and their characters; neither form matched, so the mismatch is likely elsewhere in old_string. re-read the file a
- **ops-bridge** (medium, score=6.0): problem: recurring error — Investigate recurring error on found <n> error (<n> fixed, <n> remaining).
- **agentic-resources** (medium, score=4.0): problem: recurring error — Investigate recurring error on <n> failed, <n> passed in <n>.00s
## Fleet snapshot
- infra-overhead median: 0.167
- error rate: 0.957 · schema-thrash: 7
- success rate: 1.0 · tokens p50: 250725

View File

@@ -0,0 +1,78 @@
"""Publish the weekly retro (AGENTIC-WP-0010 T02).
The retro is published to the State Hub as a **read model** — a progress event of
``event_type=coding_retro`` whose ``detail`` carries the structured report. This is
exactly how ``daily-triage-report`` surfaces, and it is what activity-core's
``coding_retro`` resolver (ACTIVITY-WP-0008) reads. A local JSON + markdown report
is always written; the hub publish is best-effort and **degrades gracefully** when
the hub is unreachable.
"""
from __future__ import annotations
import json
import os
import urllib.request
from typing import Callable, Optional
DEFAULT_HUB = "http://127.0.0.1:8000"
def render_markdown(report: dict) -> str:
w = report.get("window", {})
lines = [
f"# Weekly Coding Retro ({w.get('since', '')[:10]}{w.get('until', '')[:10]})",
f"_{report.get('n_sessions', 0)} real sessions · generated {report.get('generated_at', '')}_",
"",
"## Top improvement suggestions (cross-flavor first, ≤3 per repo)",
]
if not report.get("suggestions"):
lines.append("- (no recurring problems above threshold this week)")
for s in report.get("suggestions", []):
flag = " [CROSS-FLAVOR]" if s.get("cross_flavor") else ""
lines.append(f"- **{s['repo']}** ({s['priority']}, score={s['score']}){flag}: "
f"{s['title']}{s['recommendation']}")
m = report.get("measure", {})
lines += ["", "## Fleet snapshot",
f"- infra-overhead median: {m.get('infra_overhead_share_median')}",
f"- error rate: {m.get('error_rate')} · schema-thrash: {m.get('schema_thrash_sessions')}",
f"- success rate: {m.get('success_rate')} · tokens p50: {m.get('tokens_p50')}"]
return "\n".join(lines)
def write_local(report: dict, json_path: str, md_path: Optional[str] = None) -> None:
os.makedirs(os.path.dirname(json_path) or ".", exist_ok=True)
with open(json_path, "w", encoding="utf-8") as fh:
json.dump(report, fh, indent=2, sort_keys=True)
fh.write("\n")
if md_path:
with open(md_path, "w", encoding="utf-8") as fh:
fh.write(render_markdown(report))
fh.write("\n")
def _http_post(url: str, payload: dict) -> None:
req = urllib.request.Request(url, data=json.dumps(payload).encode(),
headers={"Content-Type": "application/json"}, method="POST")
with urllib.request.urlopen(req, timeout=10) as r:
r.read()
def publish_to_hub(report: dict, *, base_url: str = DEFAULT_HUB,
poster: Optional[Callable[[str, dict], None]] = None) -> bool:
"""POST the retro as an event_type=coding_retro progress event. Best-effort."""
poster = poster or _http_post
n = report.get("n_sessions", 0)
k = len(report.get("suggestions", []))
payload = {
"event_type": "coding_retro",
"author": "helix-forge",
"summary": f"Weekly coding retro: {k} ranked suggestions across "
f"{report.get('window', {}).get('days', 7)} days ({n} sessions).",
"detail": report,
}
try:
poster(f"{base_url.rstrip('/')}/progress/", payload)
return True
except Exception:
return False

View File

@@ -0,0 +1,62 @@
"""find_for / covers tests (AGENTIC-WP-0010 follow-up)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.catalog import Catalog # noqa: E402
from session_memory.curate.schema import ( # noqa: E402
Provenance,
Resolution,
SolutionPattern,
)
def _pattern(pid, src, covers=None, name="P"):
return SolutionPattern(
id=pid, name=name, version="1.0.0", polarity="problem", problem="p",
resolutions=[Resolution(summary="do x")],
provenance=Provenance(source_key=src), covers=covers or [])
def test_covers_round_trips(tmp_path):
cat = Catalog(str(tmp_path))
cat.upsert(_pattern("sp-a", "problem:file_not_read:edit",
covers=["file has not been read"]))
assert cat.load("sp-a").covers == ["file has not been read"]
def test_find_for_exact_key(tmp_path):
cat = Catalog(str(tmp_path))
cat.upsert(_pattern(SolutionPattern.make_id("problem:retry_storm:retries"),
"problem:retry_storm:retries"))
got = cat.find_for("problem:retry_storm:retries")
assert got is not None and got.id == "sp-problem-retry_storm-retries"
def test_find_for_covers_match(tmp_path):
cat = Catalog(str(tmp_path))
cat.upsert(_pattern("sp-rbe", "problem:file_not_read:edit",
covers=["file has not been read", "modified since read"]))
# a recurring_error signal with a different key but matching fingerprint locus
got = cat.find_for(
"problem:recurring_error:<tool_use_error>file has not been read yet...",
locus="<tool_use_error>file has not been read yet. read it first...")
assert got is not None and got.id == "sp-rbe"
def test_find_for_no_match_returns_none(tmp_path):
cat = Catalog(str(tmp_path))
cat.upsert(_pattern("sp-rbe", "problem:file_not_read:edit",
covers=["file has not been read"]))
assert cat.find_for("problem:recurring_error:some unrelated error") is None
def test_covers_change_versions(tmp_path):
cat = Catalog(str(tmp_path))
cat.upsert(_pattern("sp-a", "problem:x:y"))
p = cat.load("sp-a")
p.covers = ["new coverage"]
assert cat.upsert(p) == "versioned" # covers is substantive content
assert cat.load("sp-a").version == "1.0.1"

View File

@@ -0,0 +1,78 @@
"""digest_lookup entrypoint tests (AGENTIC-WP-0011 T03)."""
import json
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.core.store import Store # noqa: E402
from session_memory.digest_lookup import lookup_digest, main, resolve_store_paths # noqa: E402
def _write_config(tmp_path) -> str:
store = tmp_path / ".store"
toml = tmp_path / "config.toml"
toml.write_text(
f'[store]\ndb_path = "{store / "m.db"}"\nblob_dir = "{store / "blobs"}"\n'
f'cursor = "{store / "c.json"}"\n')
return str(toml), str(store)
def _seed(store_dir, uid="claude:test-uid"):
st = Store(os.path.join(store_dir, "m.db"), os.path.join(store_dir, "blobs"))
st.write_digest(uid, {
"session_uid": uid,
"flavor": "claude",
"repo": "agentic-resources",
"outcome": "success",
"started_at": "2026-06-19T10:00:00Z",
"ended_at": "2026-06-19T11:00:00Z",
"cost": {"input_tokens": 100, "output_tokens": 25},
"tool_histogram": {"Bash": 10, "Edit": 5},
})
st.close()
return uid
def test_resolve_store_paths_from_config(tmp_path):
cfg_path, store_dir = _write_config(tmp_path)
db, blob = resolve_store_paths(config_path=cfg_path)
assert db.endswith("m.db")
assert blob.endswith("blobs")
assert store_dir in db
def test_resolve_store_paths_from_env(tmp_path, monkeypatch):
db = tmp_path / "custom" / "mem.db"
db.parent.mkdir(parents=True)
monkeypatch.setenv("HELIX_STORE_DB", str(db))
resolved_db, blob = resolve_store_paths()
assert resolved_db == str(db)
assert blob == str(tmp_path / "custom" / "blobs")
def test_lookup_digest_found_and_missing(tmp_path):
cfg_path, store_dir = _write_config(tmp_path)
uid = _seed(store_dir)
found = lookup_digest(uid, config_path=cfg_path)
assert found is not None and found["outcome"] == "success"
assert lookup_digest("claude:missing", config_path=cfg_path) is None
def test_main_json_success(tmp_path, capsys):
cfg_path, store_dir = _write_config(tmp_path)
uid = _seed(store_dir)
rc = main(["--config", cfg_path, uid, "--json"])
assert rc == 0
data = json.loads(capsys.readouterr().out)
assert data["session_uid"] == uid
assert data["repo"] == "agentic-resources"
def test_main_not_found(tmp_path, capsys):
cfg_path, store_dir = _write_config(tmp_path)
_seed(store_dir)
rc = main(["--config", cfg_path, "claude:missing"])
assert rc == 1
assert "not found" in capsys.readouterr().err.lower()

View File

@@ -0,0 +1,88 @@
"""Distributor base tests (WP-0007 T01): markers, idempotent upsert, rendering."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.schema import Resolution, SolutionPattern # noqa: E402
from session_memory.distribute.base import ( # noqa: E402
Artifact,
BaseDistributor,
Distributor,
render_markdown_body,
upsert_block,
wrap_block,
)
def _pattern(pid="sp-x", polarity="problem"):
return SolutionPattern(
id=pid, name="Read before edit", version="1.2.0", polarity=polarity,
problem="Agents edit files they have not read.",
resolutions=[Resolution(summary="Read the file first", detail="then Edit",
steps=["Read", "Edit"])],
rendering_hints={"claude": {"target": "CLAUDE.md"}},
)
def test_render_markdown_body_has_problem_and_resolution():
body = render_markdown_body(_pattern())
assert "### Read before edit" in body
assert "Agents edit files" in body
assert "**Avoid:**" in body # problem polarity
assert "- Read the file first — then Edit" in body
assert " - Read" in body
def test_success_polarity_label():
assert "**Prefer:**" in render_markdown_body(_pattern(polarity="success"))
def test_wrap_block_has_markers_and_version():
block = wrap_block("sp-x", "hello", "1.2.0")
assert block.startswith("<!-- BEGIN helix-forge pattern:sp-x --> v1.2.0")
assert block.rstrip().endswith("<!-- END helix-forge pattern:sp-x -->")
def test_upsert_inserts_then_replaces_in_place():
doc = "# Title\n\nsome text\n"
b1 = wrap_block("sp-x", "first", "1")
once = upsert_block(doc, "sp-x", b1)
assert "first" in once and once.count("BEGIN helix-forge pattern:sp-x") == 1
# re-distributing the same id replaces, does not duplicate
b2 = wrap_block("sp-x", "second", "2")
twice = upsert_block(once, "sp-x", b2)
assert "second" in twice and "first" not in twice
assert twice.count("BEGIN helix-forge pattern:sp-x") == 1
def test_upsert_keeps_other_patterns():
doc = upsert_block("", "sp-a", wrap_block("sp-a", "A"))
doc = upsert_block(doc, "sp-b", wrap_block("sp-b", "B"))
assert "sp-a" in doc and "sp-b" in doc
def test_base_distributor_renders_artifact():
d = BaseDistributor(flavor="claude", target_path="CLAUDE.md")
art = d.render(_pattern())
assert isinstance(art, Artifact)
assert isinstance(d, Distributor) # satisfies the protocol
assert art.flavor == "claude"
assert art.target_path == "CLAUDE.md"
assert "BEGIN helix-forge pattern:sp-x" in art.content
assert "Read before edit" in art.content
def test_body_hint_overrides_default():
p = _pattern()
p.rendering_hints["claude"]["body"] = "custom claude body"
d = BaseDistributor(flavor="claude", target_path="CLAUDE.md")
assert "custom claude body" in d.render(p).content
def test_target_hint_overrides_default():
p = _pattern()
p.rendering_hints["claude"]["target"] = "docs/CLAUDE.md"
d = BaseDistributor(flavor="claude", target_path="CLAUDE.md")
assert d.render(p).target_path == "docs/CLAUDE.md"

View File

@@ -0,0 +1,40 @@
"""Claude distributor tests (WP-0007 T02)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.schema import Resolution, SolutionPattern # noqa: E402
from session_memory.distribute.claude import ClaudeDistributor # noqa: E402
def _pattern(hints=None):
return SolutionPattern(
id="sp-read-before-edit", name="Read before edit", version="1.0.0",
polarity="problem", problem="Agents edit files they have not read.",
resolutions=[Resolution(summary="Read the file first", steps=["Read", "Edit"])],
rendering_hints=hints or {"claude": {}},
)
def test_default_targets_claude_md():
art = ClaudeDistributor().render(_pattern())
assert art.flavor == "claude"
assert art.target_path == "CLAUDE.md"
assert "BEGIN helix-forge pattern:sp-read-before-edit" in art.content
assert "### Read before edit" in art.content
def test_skill_mode_emits_skill_stub():
art = ClaudeDistributor().render(_pattern({"claude": {"as": "skill"}}))
assert "## Skill: Read before edit" in art.content
assert "**When:**" in art.content
assert " - Read" in art.content
def test_idempotent_marker_present_for_reupsert():
art = ClaudeDistributor().render(_pattern())
# same id in both renders -> caller can upsert in place
art2 = ClaudeDistributor().render(_pattern())
assert art.pattern_id == art2.pattern_id == "sp-read-before-edit"

View File

@@ -0,0 +1,49 @@
"""Codex + Grok distributor + registry tests (WP-0007 T03)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.schema import Resolution, SolutionPattern # noqa: E402
from session_memory.distribute.codex import CodexDistributor # noqa: E402
from session_memory.distribute.grok import GrokDistributor # noqa: E402
from session_memory.distribute.registry import all_flavors, get_distributor # noqa: E402
def _pattern():
return SolutionPattern(
id="sp-x", name="Read before edit", version="1.0.0", polarity="problem",
problem="Agents edit files they have not read.",
resolutions=[Resolution(summary="Read the file first")],
)
def test_codex_targets_agents_md():
art = CodexDistributor().render(_pattern())
assert art.flavor == "codex" and art.target_path == "AGENTS.md"
assert "Read before edit" in art.content
def test_grok_targets_native_instructions():
art = GrokDistributor().render(_pattern())
assert art.flavor == "grok" and art.target_path == ".grok/instructions.md"
def test_same_pattern_expressible_for_all_flavors():
# FR-A3: one pattern, rendered for every flavor (same body, different targets)
p = _pattern()
bodies = {}
for f in all_flavors():
art = get_distributor(f).render(p)
# strip markers -> compare agnostic body
inner = art.content.split("\n", 1)[1].rsplit("\n", 1)[0]
bodies[f] = inner
targets = {get_distributor(f).render(p).target_path for f in all_flavors()}
assert len(targets) == 3 # distinct per-flavor targets
assert len(set(bodies.values())) == 1 # identical agnostic body
def test_registry_unknown_flavor():
assert get_distributor("gpt") is None
assert set(all_flavors()) == {"claude", "codex", "grok"}

View File

@@ -0,0 +1,76 @@
"""Distribute entrypoint tests (WP-0007 T05)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.catalog import Catalog # noqa: E402
from session_memory.curate.schema import Resolution, Scope, SolutionPattern # noqa: E402
from session_memory.distribute.__main__ import build_targets, main, run_distribute # noqa: E402
def _pattern(pid, repos, flavors, status="approved", ready=True):
return SolutionPattern(
id=pid, name=pid, version="1.0.0", polarity="problem", problem="p",
resolutions=[Resolution(summary="do x")],
scope=Scope(repos=repos, flavors=flavors), status=status, distribution_ready=ready,
)
def _config(tmp_path):
return {
"repo_domain_map": {"agentic-resources": "helix_forge", "state-hub": "custodian"},
"curate": {"catalog_dir": str(tmp_path / "catalog")},
"distribute": {"proposals_dir": str(tmp_path / "proposals"),
"active_registry": str(tmp_path / "active.json")},
}
def test_build_targets_crosses_repos_and_flavors():
cfg = {"repo_domain_map": {"r1": "d1", "r2": "d2"}}
targets = build_targets(cfg)
assert len(targets) == 2 * 3 # 2 repos x 3 flavors
assert build_targets(cfg, repo_filter="r1") and all(t.repo == "r1"
for t in build_targets(cfg, repo_filter="r1"))
assert all(t.flavor == "claude" for t in build_targets(cfg, flavor_filter="claude"))
def test_run_distribute_scopes_to_catalog(tmp_path):
cfg = _config(tmp_path)
cat = Catalog(cfg["curate"]["catalog_dir"])
# in-scope for agentic-resources/claude only
cat.upsert(_pattern("sp-a", ["agentic-resources"], ["claude"]))
# provisional -> must be skipped
cat.upsert(_pattern("sp-prov", [], [], status="provisional", ready=False))
res = run_distribute(cfg)
rendered = {pid for _, _, pid, _ in res.proposals}
assert "sp-a" in rendered
assert "sp-prov" not in rendered
assert "sp-prov" in res.skipped_not_distributable
# landed only in the agentic-resources/CLAUDE.md proposal
p = os.path.join(cfg["distribute"]["proposals_dir"], "agentic-resources", "CLAUDE.md")
assert os.path.exists(p)
assert not os.path.exists(
os.path.join(cfg["distribute"]["proposals_dir"], "state-hub", "CLAUDE.md"))
def test_main_runs_json(tmp_path, capsys):
cfg = _config(tmp_path)
cat = Catalog(cfg["curate"]["catalog_dir"])
cat.upsert(_pattern("sp-a", [], ["claude"])) # unrestricted repos
# write a config file
import json as _json
cfg_path = tmp_path / "c.json"
# main() loads TOML; emulate by calling run_distribute path via a tiny toml
toml = tmp_path / "config.toml"
toml.write_text(
f'[repo_domain_map]\nagentic-resources = "helix_forge"\n'
f'[curate]\ncatalog_dir = "{cfg["curate"]["catalog_dir"]}"\n'
f'[distribute]\nproposals_dir = "{cfg["distribute"]["proposals_dir"]}"\n'
f'active_registry = "{cfg["distribute"]["active_registry"]}"\n')
rc = main(["--config", str(toml), "--json"])
assert rc == 0
out = capsys.readouterr().out
assert "sp-a" in out
_json.loads(out) # valid JSON

View File

@@ -0,0 +1,79 @@
"""Scoping + proposals + active registry tests (WP-0007 T04)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.schema import Resolution, Scope, SolutionPattern # noqa: E402
from session_memory.distribute.proposals import ( # noqa: E402
ActiveRegistry,
Target,
applies,
propose,
)
def _pattern(pid="sp-x", repos=None, flavors=None, status="approved", ready=True):
return SolutionPattern(
id=pid, name="Read before edit", version="1.0.0", polarity="problem",
problem="edit before read", resolutions=[Resolution(summary="read first")],
scope=Scope(repos=repos or [], flavors=flavors or []),
status=status, distribution_ready=ready,
)
def test_applies_respects_scope():
p = _pattern(repos=["agentic-resources"], flavors=["claude"])
assert applies(p, Target("agentic-resources", flavor="claude"))
assert not applies(p, Target("other-repo", flavor="claude"))
assert not applies(p, Target("agentic-resources", flavor="codex"))
def test_empty_scope_is_unrestricted():
assert applies(_pattern(), Target("any", flavor="grok"))
def test_propose_writes_scoped_proposal_files(tmp_path):
out = str(tmp_path / "proposals")
reg = ActiveRegistry(str(tmp_path / "active.json"))
p = _pattern(flavors=["claude"])
res = propose([p], [Target("agentic-resources", flavor="claude"),
Target("agentic-resources", flavor="codex")], out, reg)
# only claude target is in scope
assert len(res.proposals) == 1
path = os.path.join(out, "agentic-resources", "CLAUDE.md")
assert os.path.exists(path)
assert "BEGIN helix-forge pattern:sp-x" in open(path).read()
def test_not_distributable_skipped(tmp_path):
reg = ActiveRegistry(str(tmp_path / "active.json"))
prov = _pattern(status="provisional", ready=False)
res = propose([prov], [Target("r", flavor="claude")], str(tmp_path / "p"), reg)
assert res.proposals == []
assert "sp-x" in res.skipped_not_distributable
def test_proposals_idempotent_on_rerun(tmp_path):
out = str(tmp_path / "proposals")
reg_path = str(tmp_path / "active.json")
p = _pattern()
propose([p], [Target("r", flavor="claude")], out, ActiveRegistry(reg_path))
propose([p], [Target("r", flavor="claude")], out, ActiveRegistry(reg_path))
content = open(os.path.join(out, "r", "CLAUDE.md")).read()
assert content.count("BEGIN helix-forge pattern:sp-x") == 1 # no duplication
def test_active_registry_records_environment(tmp_path):
reg_path = str(tmp_path / "active.json")
reg = ActiveRegistry(reg_path)
propose([_pattern()], [Target("r", domain="helix_forge", flavor="claude")],
str(tmp_path / "p"), reg)
reg2 = ActiveRegistry(reg_path) # reload from disk
entries = reg2.entries()
assert len(entries) == 1
assert entries[0]["pattern_id"] == "sp-x"
assert entries[0]["repo"] == "r"
assert entries[0]["flavor"] == "claude"
assert entries[0]["status"] == "proposed"

View File

@@ -0,0 +1,49 @@
"""Before/after effectiveness tests (WP-0009 T02)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.measure.effect import effectiveness, split_by_date # noqa: E402
def _digest(ts, tools=None, errors=0, outcome="success"):
return {
"started_at": ts, "outcome": outcome,
"cost": {"input_tokens": 100, "output_tokens": 0},
"tool_histogram": tools or {"Bash": 10},
"error_snippets": [{"fingerprint": f"e{i}", "count": 1} for i in range(errors)],
}
def test_split_by_date():
digs = [_digest("2026-06-01"), _digest("2026-06-05"), _digest("2026-06-10")]
before, after = split_by_date(digs, "2026-06-05")
assert len(before) == 1 and len(after) == 2 # >= applied_at goes to after
def test_effectiveness_detects_improvement():
# before: lots of errors + hub overhead; after: clean
before = [_digest("2026-06-01", tools={"mcp__state-hub__x": 8, "Bash": 2}, errors=3)
for _ in range(3)]
after = [_digest("2026-06-10", tools={"Bash": 10}, errors=0) for _ in range(3)]
e = effectiveness(before + after, "2026-06-05", label="read-before-edit")
assert not e["insufficient_data"]
assert e["n_before"] == 3 and e["n_after"] == 3
assert e["deltas"]["error_rate"]["improved"] is True
assert e["deltas"]["infra_overhead_share_median"]["improved"] is True
assert e["deltas"]["error_rate"]["change"] < 0
def test_effectiveness_insufficient_data():
e = effectiveness([_digest("2026-06-01")], "2026-06-05")
assert e["insufficient_data"] is True
assert e["deltas"] == {}
def test_success_rate_higher_is_better():
before = [_digest("2026-06-01", outcome="fail") for _ in range(2)]
after = [_digest("2026-06-10", outcome="success") for _ in range(2)]
e = effectiveness(before + after, "2026-06-05")
assert e["deltas"]["success_rate"]["improved"] is True

View File

@@ -0,0 +1,79 @@
"""Measure entrypoint tests (WP-0009 T03)."""
import json
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.core.store import Store # noqa: E402
from session_memory.measure.__main__ import main, real_digests # noqa: E402
from session_memory.measure.metrics import load_baselines # noqa: E402
def _digest(uid, ts, tools=None):
return {
"session_uid": uid, "flavor": "claude", "repo": "agentic-resources",
"outcome": "success", "started_at": ts,
"cost": {"input_tokens": 100, "output_tokens": 10},
"event_count": 40, "first_prompt": "Implement the measure entrypoint cleanly",
"tool_histogram": tools or {"Bash": 20, "Edit": 12, "Read": 8},
"error_snippets": [],
}
def _write_config(tmp_path) -> str:
store = tmp_path / ".store"
toml = tmp_path / "config.toml"
toml.write_text(
f'[store]\ndb_path = "{store / "m.db"}"\nblob_dir = "{store / "blobs"}"\n'
f'cursor = "{store / "c.json"}"\n'
f'[measure]\nbaselines = "{tmp_path / "baselines.jsonl"}"\n')
return str(toml), str(store)
def _seed(store_dir):
st = Store(os.path.join(store_dir, "m.db"), os.path.join(store_dir, "blobs"))
st.write_digest("claude:a", _digest("claude:a", "2026-06-01"))
st.write_digest("claude:b", _digest("claude:b", "2026-06-10",
tools={"mcp__state-hub__x": 18, "Bash": 8, "Edit": 4}))
st.close()
def test_real_digests_filters_and_loads(tmp_path):
cfg_path, store_dir = _write_config(tmp_path)
_seed(store_dir)
from session_memory.ingest import load_config
digs = real_digests(load_config(cfg_path))
assert len(digs) == 2
def test_main_writes_baseline_and_reports(tmp_path, capsys):
cfg_path, store_dir = _write_config(tmp_path)
_seed(store_dir)
rc = main(["--config", cfg_path, "--label", "first"])
assert rc == 0
out = capsys.readouterr().out
assert "Fleet metrics" in out
rows = load_baselines(str(tmp_path / "baselines.jsonl"))
assert len(rows) == 1 and rows[0]["label"] == "first"
def test_main_no_save_and_json(tmp_path, capsys):
cfg_path, store_dir = _write_config(tmp_path)
_seed(store_dir)
rc = main(["--config", cfg_path, "--no-save", "--json"])
assert rc == 0
data = json.loads(capsys.readouterr().out)
assert data["current"]["n_sessions"] == 2
assert not os.path.exists(str(tmp_path / "baselines.jsonl"))
def test_main_effectiveness_since(tmp_path, capsys):
cfg_path, store_dir = _write_config(tmp_path)
_seed(store_dir)
rc = main(["--config", cfg_path, "--no-save", "--since", "2026-06-05", "--json"])
assert rc == 0
data = json.loads(capsys.readouterr().out)
assert data["effectiveness"]["n_before"] == 1
assert data["effectiveness"]["n_after"] == 1

View File

@@ -0,0 +1,63 @@
"""Fleet metrics + baseline tests (WP-0009 T01)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.measure.metrics import ( # noqa: E402
aggregate,
load_baselines,
save_baseline,
session_metrics,
snapshot,
)
def _digest(tools=None, errors=0, tokens=100, outcome="success"):
return {
"outcome": outcome,
"cost": {"input_tokens": tokens, "output_tokens": 0},
"tool_histogram": tools or {"Bash": 10, "Edit": 5},
"error_snippets": [{"fingerprint": f"e{i}", "count": 1} for i in range(errors)],
}
def test_session_metrics_overhead_and_errors():
m = session_metrics(_digest(tools={"mcp__state-hub__create_task": 6, "Bash": 4}, errors=2))
assert abs(m["infra_overhead_share"] - 0.6) < 1e-9
assert m["error_occurrences"] == 2
assert m["has_error"] is True
def test_aggregate_rates_and_percentiles():
digs = [
_digest(tools={"mcp__state-hub__x": 8, "Bash": 2}, errors=1, tokens=50), # 80% overhead
_digest(tools={"Bash": 9, "Edit": 1}, errors=0, tokens=200), # 0% overhead
_digest(tools={"ToolSearch": 6, "Bash": 4}, errors=0, tokens=100, outcome="fail"),
]
a = aggregate(digs)
assert a["n_sessions"] == 3
assert a["error_rate"] == round(1 / 3, 3)
assert a["success_rate"] == round(2 / 3, 3)
assert a["schema_thrash_sessions"] == 1 # the ToolSearch=6 session
assert 0 <= a["infra_overhead_share_median"] <= 1
def test_aggregate_empty():
assert aggregate([]) == {"n_sessions": 0}
def test_snapshot_has_timestamp_and_label():
s = snapshot([_digest()], label="baseline")
assert s["label"] == "baseline"
assert "captured_at" in s and s["n_sessions"] == 1
def test_baseline_roundtrip_appends(tmp_path):
path = str(tmp_path / "baselines.jsonl")
save_baseline(snapshot([_digest()], label="a"), path)
save_baseline(snapshot([_digest(), _digest()], label="b"), path)
rows = load_baselines(path)
assert [r["label"] for r in rows] == ["a", "b"]
assert rows[1]["n_sessions"] == 2

106
tests/test_retro_build.py Normal file
View File

@@ -0,0 +1,106 @@
"""Weekly retro report tests (AGENTIC-WP-0010 T01)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.catalog import Catalog # noqa: E402
from session_memory.curate.schema import Resolution, SolutionPattern # noqa: E402
from session_memory.retro.build import weekly_retro # noqa: E402
def _digest(uid, repo, ts, flavor="claude", retries=5):
return {
"session_uid": uid, "flavor": flavor, "repo": repo, "outcome": "fail",
"started_at": ts, "event_count": 40,
"first_prompt": "Fix the failing build and retry the suite",
"cost": {"input_tokens": 100, "output_tokens": 10},
"tool_histogram": {"Bash": 20, "Edit": 12, "Read": 8},
"markers": {"errors": 0, "retries": retries, "test_runs": 0},
"error_snippets": [],
}
def test_window_excludes_old_sessions():
digs = [
_digest("claude:a", "r1", "2026-06-01T10:00:00Z"),
_digest("claude:b", "r1", "2026-06-02T10:00:00Z"),
_digest("claude:old", "r1", "2026-01-01T10:00:00Z"), # outside window
]
r = weekly_retro(digs, since="2026-05-30T00:00:00Z", until="2026-06-08T00:00:00Z")
assert r["n_sessions"] == 2
assert r["window"]["days"] == 7
def test_retry_storm_becomes_suggestion():
digs = [_digest(f"claude:{i}", "r1", "2026-06-0{}T10:00:00Z".format(i + 1))
for i in range(2)]
r = weekly_retro(digs, since="2026-05-30T00:00:00Z", until="2026-06-08T00:00:00Z")
s = r["suggestions"]
assert s and s[0]["repo"] == "r1"
assert s[0]["signal_type"] == "retry_storm"
assert "Investigate" in s[0]["recommendation"] # no catalog -> default
def test_recommendation_from_catalog(tmp_path):
cat = Catalog(str(tmp_path / "catalog"))
key = "problem:retry_storm:retries"
cat.upsert(SolutionPattern(
id=SolutionPattern.make_id(key), name="Retry storm", version="1.0.0",
polarity="problem", problem="repeated retries",
resolutions=[Resolution(summary="Stop and diagnose before retrying")]))
digs = [_digest(f"claude:{i}", "r1", "2026-06-0{}T10:00:00Z".format(i + 1)) for i in range(2)]
r = weekly_retro(digs, catalog=cat, since="2026-05-30T00:00:00Z", until="2026-06-08T00:00:00Z")
assert r["suggestions"][0]["recommendation"] == "Stop and diagnose before retrying"
def test_recurring_error_inherits_recommendation_via_covers(tmp_path):
cat = Catalog(str(tmp_path / "catalog"))
cat.upsert(SolutionPattern(
id="sp-rbe", name="Read before edit", version="1.0.0", polarity="problem",
problem="edit before read",
resolutions=[Resolution(summary="Read the file first before Edit/Write")],
covers=["file has not been read"]))
digs = []
for i in range(2):
d = _digest(f"claude:{i}", "r1", "2026-06-0{}T10:00:00Z".format(i + 1))
d["error_snippets"] = [{
"fingerprint": "<tool_use_error>file has not been read yet. read it first...",
"sample": "File has not been read yet", "count": 2, "tool": "Edit"}]
digs.append(d)
r = weekly_retro(digs, catalog=cat, since="2026-05-30T00:00:00Z", until="2026-06-08T00:00:00Z")
rec_err = [s for s in r["suggestions"] if s["signal_type"] == "recurring_error"]
assert rec_err, "expected a recurring_error suggestion"
assert rec_err[0]["recommendation"] == "Read the file first before Edit/Write"
def test_caps_three_per_repo():
# five distinct problem signals in one repo -> capped at 3
digs = []
for i in range(2):
d = _digest(f"claude:{i}", "r1", "2026-06-0{}T10:00:00Z".format(i + 1))
d["markers"] = {"errors": 5, "retries": 5, "test_runs": 0, "human_interventions": 0}
d["tool_histogram"] = {"Bash": 120, "ToolSearch": 9,
"mcp__state-hub__x": 30, "Edit": 5}
d["outcome"] = "abandoned"
digs.append(d)
r = weekly_retro(digs, since="2026-05-30T00:00:00Z", until="2026-06-08T00:00:00Z")
per_repo = [s for s in r["suggestions"] if s["repo"] == "r1"]
assert len(per_repo) <= 3
def test_cross_flavor_ranks_first():
digs = [
_digest("claude:a", "r1", "2026-06-01T10:00:00Z", flavor="claude"),
_digest("grok:b", "r2", "2026-06-02T10:00:00Z", flavor="grok"),
]
r = weekly_retro(digs, since="2026-05-30T00:00:00Z", until="2026-06-08T00:00:00Z")
assert r["suggestions"][0]["cross_flavor"] is True
assert r["suggestions"][0]["priority"] == "high"
def test_includes_measure_snapshot():
digs = [_digest(f"claude:{i}", "r1", "2026-06-0{}T10:00:00Z".format(i + 1)) for i in range(2)]
r = weekly_retro(digs, since="2026-05-30T00:00:00Z", until="2026-06-08T00:00:00Z")
assert r["measure"]["n_sessions"] == 2

View File

@@ -0,0 +1,63 @@
"""Retro entrypoint tests (AGENTIC-WP-0010 T03)."""
import json
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.core.store import Store # noqa: E402
from session_memory.retro.__main__ import main, run_retro # noqa: E402
def _digest(uid, repo, ts, retries=5):
return {
"session_uid": uid, "flavor": "claude", "repo": repo, "outcome": "fail",
"started_at": ts, "event_count": 40,
"first_prompt": "Fix the failing build and retry the suite repeatedly",
"cost": {"input_tokens": 100, "output_tokens": 10},
"tool_histogram": {"Bash": 20, "Edit": 12, "Read": 8},
"markers": {"errors": 0, "retries": retries, "test_runs": 0},
"error_snippets": [],
}
def _config(tmp_path):
store = tmp_path / ".store"
toml = tmp_path / "config.toml"
toml.write_text(
f'[store]\ndb_path="{store / "m.db"}"\nblob_dir="{store / "blobs"}"\ncursor="{store / "c.json"}"\n'
f'[curate]\ncatalog_dir="{tmp_path / "catalog"}"\n'
f'[retro]\nwindow_days=7\nreport_json="{tmp_path / "r.json"}"\nreport_md="{tmp_path / "r.md"}"\n')
st = Store(str(store / "m.db"), str(store / "blobs"))
st.write_digest("claude:a", _digest("claude:a", "r1", "2026-06-01T10:00:00Z"))
st.write_digest("claude:b", _digest("claude:b", "r1", "2026-06-02T10:00:00Z"))
st.close()
return str(toml), tmp_path
def test_run_retro_over_store(tmp_path):
from session_memory.ingest import load_config
cfg_path, _ = _config(tmp_path)
rep = run_retro(load_config(cfg_path), since="2026-05-30T00:00:00Z", until="2026-06-08T00:00:00Z")
assert rep["n_sessions"] == 2
assert rep["suggestions"]
def test_main_writes_report_files(tmp_path, capsys):
cfg_path, tp = _config(tmp_path)
rc = main(["--config", cfg_path, "--since", "2026-05-30T00:00:00Z",
"--until", "2026-06-08T00:00:00Z"])
assert rc == 0
assert os.path.exists(str(tp / "r.json")) and os.path.exists(str(tp / "r.md"))
assert "Weekly Coding Retro" in capsys.readouterr().out
def test_main_json(tmp_path, capsys):
cfg_path, _ = _config(tmp_path)
rc = main(["--config", cfg_path, "--since", "2026-05-30T00:00:00Z",
"--until", "2026-06-08T00:00:00Z", "--json"])
assert rc == 0
data = json.loads(capsys.readouterr().out)
assert data["report"]["n_sessions"] == 2
assert data["published"] is None # no --publish

View File

@@ -0,0 +1,62 @@
"""Retro publish tests (AGENTIC-WP-0010 T02)."""
import json
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.retro.publish import ( # noqa: E402
publish_to_hub,
render_markdown,
write_local,
)
def _report():
return {
"window": {"since": "2026-06-01T00:00:00Z", "until": "2026-06-08T00:00:00Z", "days": 7},
"generated_at": "2026-06-08T19:00:00Z", "n_sessions": 12,
"suggestions": [
{"repo": "state-hub", "title": "schema thrash", "recommendation": "front-load schemas",
"priority": "high", "score": 632.0, "cross_flavor": False, "signal_type": "schema_thrash"},
],
"measure": {"infra_overhead_share_median": 0.117, "error_rate": 0.96,
"schema_thrash_sessions": 8, "success_rate": 1.0, "tokens_p50": 250725},
}
def test_render_markdown():
md = render_markdown(_report())
assert "Weekly Coding Retro" in md
assert "**state-hub**" in md and "front-load schemas" in md
assert "infra-overhead median: 0.117" in md
def test_write_local_json_and_md(tmp_path):
jp = str(tmp_path / "out" / "retro.json")
mp = str(tmp_path / "out" / "retro.md")
write_local(_report(), jp, mp)
assert json.load(open(jp))["n_sessions"] == 12
assert "Weekly Coding Retro" in open(mp).read()
def test_publish_calls_poster_with_coding_retro_event():
captured = {}
def poster(url, payload):
captured["url"] = url
captured["payload"] = payload
ok = publish_to_hub(_report(), base_url="http://hub", poster=poster)
assert ok is True
assert captured["url"] == "http://hub/progress/"
assert captured["payload"]["event_type"] == "coding_retro"
assert captured["payload"]["detail"]["n_sessions"] == 12
def test_publish_degrades_gracefully_on_failure():
def boom(url, payload):
raise OSError("hub down")
assert publish_to_hub(_report(), poster=boom) is False

View File

@@ -0,0 +1,100 @@
---
id: AGENTIC-WP-0007
type: workplan
title: "Coding Session Memory — Phase 3 (Distribute: per-flavor artifacts, HITL)"
domain: helix_forge
repo: agentic-resources
status: finished
owner: codex
topic_slug: helix-forge
created: "2026-06-07"
updated: "2026-06-07"
state_hub_workstream_id: "766c9089-d5de-472a-8c0f-85529028cfb9"
---
# Coding Session Memory — Phase 3 (Distribute)
Implements **Distribute** (PRD §6.4, FR-X1FR-X4), continuing
[AGENTIC-WP-0004](AGENTIC-WP-0004-session-memory-phase2.md) (Curate). Distributor
adapters render the **approved / `distribution_ready`** SolutionPatterns from the
Pattern Catalog into per-flavor artifacts, using the `rendering_hints` produced in
Phase 2. Mirror image of the collector design: **agnostic core, thin adapters at
the edges** (FR-A2) — adding a flavor = one collector + one distributor.
Key boundary (FR-X3): output is **proposed, not auto-applied** — artifacts are
written as reviewable proposals (HITL), scoped by repo/domain (FR-X2), with an
active-pattern registry tracking which patterns are live where (FR-X4).
## Distributor Adapter Interface + Artifact Base
```task
id: AGENTIC-WP-0007-T01
status: done
priority: high
state_hub_task_id: "ff618fa6-a78b-4b80-846b-8cde7ad65451"
```
Define a `Distributor` protocol and an `Artifact` dataclass (flavor, target_path,
content, pattern_id) in `session_memory/distribute/`. `render(pattern, scope)`
reads the agnostic `SolutionPattern` plus its per-flavor `rendering_hints`; base
helpers handle idempotent snippet markers. Agnostic core; flavor logic only in
adapters. Unit-tested.
## Claude Distributor (CLAUDE.md snippet)
```task
id: AGENTIC-WP-0007-T02
status: done
priority: high
state_hub_task_id: "64f50bd4-1fdf-452e-ae14-890253ab9f33"
```
`distribute/claude.py`: render an approved pattern into a `CLAUDE.md` snippet block
(or skill stub) with stable `BEGIN/END` markers so re-distribution updates in
place rather than duplicating. Uses `rendering_hints["claude"]`. Unit-tested.
## Codex + Grok Distributors
```task
id: AGENTIC-WP-0007-T03
status: done
priority: high
state_hub_task_id: "382790f5-1fb4-4394-b039-1649cbf3b20a"
```
`distribute/codex.py` (`AGENTS.md` snippet) and `distribute/grok.py` (native
instruction format), each rendering the *same* agnostic pattern via its
`rendering_hints`. Confirms FR-A3: a pattern discovered via one flavor is
expressible for all. Unit-tested.
## Scoping + Proposed-Not-Applied Output + Active-Pattern Registry
```task
id: AGENTIC-WP-0007-T04
status: done
priority: high
state_hub_task_id: "2c690f29-2aee-460a-b9cd-3566018f6b3c"
```
Filter patterns by `Scope` (repos/domains/flavors) so a pattern only lands where it
applies (FR-X2). Write artifacts as **proposals** under a `proposals/` dir, never
auto-applied (FR-X3, HITL). Track which patterns are active in which environments
in an active-pattern registry (FR-X4). Unit-tested.
## Distribute Entrypoint + Tests + Verify
```task
id: AGENTIC-WP-0007-T05
status: done
priority: medium
state_hub_task_id: "f9e24c13-7049-4c1c-a2d6-3a4dc4e752fd"
```
`python -m session_memory.distribute`: read approved catalog patterns, render
per-flavor proposals scoped by repo/domain, emit a proposal summary + JSON.
Document in `session_memory/README.md`. Verify end-to-end against the real catalog.
After workplan updates, notify the operator to run from `~/state-hub`:
```bash
make fix-consistency REPO=agentic-resources
```

View File

@@ -0,0 +1,56 @@
---
id: AGENTIC-WP-0008
type: workplan
title: "Act on #1 friction — Read-before-Edit reflex"
domain: helix_forge
repo: agentic-resources
status: finished
owner: codex
topic_slug: helix-forge
created: "2026-06-07"
updated: "2026-06-07"
state_hub_workstream_id: "6aac5cfc-4799-4d07-9537-42a203af2d1b"
---
# Act on #1 Friction — Read-before-Edit Reflex
The error-body mining ([AGENTIC-WP-0006](AGENTIC-WP-0006-error-body-mining.md))
found that the single most common error across real coding sessions is
**`File has not been read yet. Read it first before writing to it.`** — Edit/Write
before Read, in **12 of 27 sessions across 8 repos** — followed by the stale-read
**`File has been modified since read`** (6 sessions). See
[ASSESSMENT-infra-friction.md](../docs/ASSESSMENT-infra-friction.md).
This is the cheapest high-value fix surfaced by the whole analysis: a short
behavioural reflex in the agent instructions. We also capture it as a curated
SolutionPattern so Phase 3 Distribute can propose it to other repos/flavors —
closing the assess → curate → distribute loop by hand for one real pattern.
## Add Read-before-Edit Reflex to Agent Instructions
```task
id: AGENTIC-WP-0008-T01
status: done
priority: high
state_hub_task_id: "549c84c1-5bd8-4ff6-b61d-1c72946b8b8e"
```
Add a concise, data-cited **Read-before-Edit / re-read-on-"modified since read"**
reflex to `AGENTS.md` (and note for `CLAUDE.md`), targeting the #1 and #2 recurring
errors. Keep it short to avoid context bloat (cf. PRD OQ6 — pattern bloat degrades
context budgets).
## Capture as Curated SolutionPattern for Distribute
```task
id: AGENTIC-WP-0008-T02
status: done
priority: medium
state_hub_task_id: "c007baf9-db14-40fa-b944-d1f1a71ea28b"
```
Promote the recurring "file not read" problem into a curated `SolutionPattern` in
the Pattern Catalog with per-flavor `rendering_hints`, so Phase 3 Distribute can
render and propose it across repos/flavors. Links assess → curate → distribute end
to end on a real pattern. After updates, notify the operator to run
`make fix-consistency REPO=agentic-resources`.

View File

@@ -0,0 +1,68 @@
---
id: AGENTIC-WP-0009
type: workplan
title: "Coding Session Memory — Phase 4 (Measure: effectiveness + fleet trend)"
domain: helix_forge
repo: agentic-resources
status: finished
owner: codex
topic_slug: helix-forge
created: "2026-06-07"
updated: "2026-06-07"
state_hub_workstream_id: "99f1d836-3be0-40e5-9f17-63d3ecc5fcca"
---
# Coding Session Memory — Phase 4 (Measure)
Implements **Measure** (PRD §6.5, FR-M1FR-M3) — the loop-closer. After patterns
are distributed (Phase 3) and changes land (e.g. the State Hub skill
[STATE-WP-0058] and the Read-before-Edit reflex
[AGENTIC-WP-0008](AGENTIC-WP-0008-read-before-edit-reflex.md)), Measure answers:
**did it actually help?**
Reuses what is already captured — WP-0005 tool buckets, WP-0006 error mining — so
this is computation over existing digests, not new capture.
## Baseline Metrics Module + Persisted Baseline
```task
id: AGENTIC-WP-0009-T01
status: done
priority: high
state_hub_task_id: "e5c2016a-2d51-4382-a013-7153e053e8ed"
```
`session_memory/measure/metrics.py`: compute fleet metrics over real sessions
(infra-overhead share, error rate, recurring-error count, schema-thrash, cost
percentiles) and persist a **timestamped baseline snapshot**. Reuses
`detect.signals.tool_bucket` and the digest `error_snippets`. Unit-tested.
## Before/After Per-Pattern Effectiveness
```task
id: AGENTIC-WP-0009-T02
status: done
priority: high
state_hub_task_id: "aa097a00-3462-41da-a137-67e1d61d8d33"
```
Given a change/pattern with an applied-at date, compare sessions **after** it
against the pre-change baseline (cost, error rate, infra-overhead, success) to
surface **per-pattern effectiveness** so ineffective patterns can be revised or
retired (FR-M1/FR-M2). Unit-tested.
## Fleet-Trend Report + Entrypoint + Tests
```task
id: AGENTIC-WP-0009-T03
status: done
priority: medium
state_hub_task_id: "f1147d59-2fb7-4d35-baec-b8f001bb9d62"
```
`python -m session_memory.measure`: fleet-level trend (is the median session
getting cheaper / more reliable over time, FR-M3) plus per-pattern effectiveness;
markdown + JSON. Document in `session_memory/README.md`. After updates, notify the
operator to run `make fix-consistency REPO=agentic-resources`.
[STATE-WP-0058]: handed off to the state-hub repo worker

View File

@@ -0,0 +1,76 @@
---
id: AGENTIC-WP-0010
type: workplan
title: "Coding Session Memory — Weekly Retro entrypoint + hub publish"
domain: helix_forge
repo: agentic-resources
status: finished
owner: codex
topic_slug: helix-forge
created: "2026-06-07"
updated: "2026-06-07"
state_hub_workstream_id: "6b9816e4-65bc-4fc7-b8e1-33f4edd51e7a"
---
# Coding Session Memory — Weekly Retro entrypoint + hub publish
The **analysis half** of a weekly coding retrospection. A windowed retro runs
detect + measure over the previous week, ranks the **top-3 improvement
suggestions per repo** (impact × frequency, cross-flavor first; recommendations
pulled from the Pattern Catalog), and **publishes the ranked result to the State
Hub as a read model** (an `event_type=coding_retro` progress event, mirroring how
`daily-triage-report` publishes).
This is the dependency that activity-core's weekly schedule consumes
(`activity-wp-0008`*Weekly Coding Retrospection schedule*). Keeping the analysis
here and publishing to the hub keeps activity-core decoupled from the
workstation-local session store.
## Windowed Weekly Retro Report (top-3 per repo)
```task
id: AGENTIC-WP-0010-T01
status: done
priority: high
state_hub_task_id: "34d30250-c0d3-4837-81c7-1c858c2ee801"
```
`retro/build.py`: window digests by date (last N days), run
`extract_signals` + `cluster` over the window, explode problem patterns per repo,
rank by score and cap at **3 per repo**. Attach a recommendation per suggestion
from the Pattern Catalog (lookup by pattern key → first resolution) with a sensible
default. Include a fleet measure snapshot for context. Pure function over digests;
unit-tested.
## Publish Retro to the Hub + Local Report
```task
id: AGENTIC-WP-0010-T02
status: done
priority: high
state_hub_task_id: "cbe1288a-ce51-48c0-b741-adf4a6cbce3a"
```
Publish the ranked retro to the State Hub as a read model: POST a progress event
(`event_type=coding_retro`) with the structured report (`suggestions[]`, window,
`generated_at`) in `detail`. Also write a local JSON + markdown report. **Graceful
degrade** when the hub is unreachable (write local, skip publish). Hub URL under
`[retro]` in `config.toml`.
## Retro Entrypoint + Tests + Live Verify
```task
id: AGENTIC-WP-0010-T03
status: done
priority: medium
state_hub_task_id: "af540220-58dd-4cf5-a9dc-6db4b995fa08"
```
`python -m session_memory.retro [--window-days 7] [--publish] [--json]`: windowed
retro → ranked top-3 per repo → optional hub publish + local report. Document in
`session_memory/README.md`. Live verify over the real local sessions. After
workplan updates, notify the operator to run from `~/state-hub`:
```bash
make fix-consistency REPO=agentic-resources
```

View File

@@ -0,0 +1,99 @@
---
id: AGENTIC-WP-0011
type: workplan
title: "Helix Forge ↔ kaizen-agentic correlation — doc links and session-close contract"
domain: helix_forge
repo: agentic-resources
status: finished
owner: codex
topic_slug: helix-forge
created: "2026-06-19"
updated: "2026-06-21"
state_hub_workstream_id: "a689a145-645d-4ed4-a16c-00b75263a2c4"
---
# Helix Forge ↔ kaizen-agentic Correlation — Doc Links and Session-Close Contract
**Coordination trigger:** State Hub inbox from `kaizen-agentic` (2026-06-15) after
KAIZEN-WP-0004 — ADR-004 correlation layer is live on the project-metrics side.
This repo already has a partial link in `docs/DESIGN-session-memory.md` §11 (commit
`a66d502`). Remaining work is bidirectional surfacing, a session-close env-export
convention for agents using both layers, and a stable read-path for
`kaizen-agentic metrics correlate`.
**Boundary:** link-by-reference only — no ingestion or write path from kaizen-agentic
into Helix Forge. Authoritative cross-repo contract stays in kaizen-agentic:
`docs/integrations/helix-forge-correlation.md`.
## Bidirectional Doc Links and Stale Design Footer
```task
id: AGENTIC-WP-0011-T01
status: done
priority: high
state_hub_task_id: "616ec197-58ce-4de0-bb7b-1c4a3f9a1e24"
```
Complete the bidirectional documentation surface requested by kaizen-agentic:
- Add a **Project metrics correlation** subsection to `docs/PRD-helix-forge.md`
(ecosystem integration / downstream consumers) linking to DESIGN §11 and the
kaizen contract doc.
- Add a short **Correlation with kaizen-agentic** section to
`session_memory/README.md` (two-layer model, link to contract, no re-implementation).
- Remove or replace the stale DESIGN §11 footer (*"Next step: AGENTIC-WP-0002"*) —
Phase 04 and retro are finished.
## Session-Close Env Export Convention
```task
id: AGENTIC-WP-0011-T02
status: done
priority: high
state_hub_task_id: "1c20d5fa-83c2-46c7-b21e-be1ea241bae6"
```
Document the recommended environment export at session close for agents that run
**both** Helix Forge capture and kaizen `metrics record`. Cover at minimum:
| Variable | Source (Helix Forge) | Purpose |
|----------|----------------------|---------|
| `HELIX_SESSION_UID` | `Session.session_uid` after digest write | Primary correlation key |
| `HELIX_REPO` | session `repo` field | Project/repo scoping |
| `HELIX_FLAVOR` | session `flavor` | Agent runtime (claude/codex/grok) |
| `HELIX_TOKENS` | `digest.cost` totals | Token rollup for project metrics |
| `HELIX_INFRA_OVERHEAD_SHARE` | MCP/tool histogram share | Infra overhead attribution |
Place in DESIGN §11 (canonical spec) and a concise operator note in
`session_memory/README.md`. Align field names with kaizen-agentic ADR-004 mapping
table — do not invent parallel names.
## Stable Digest Read Path for `metrics correlate`
```task
id: AGENTIC-WP-0011-T03
status: done
priority: medium
state_hub_task_id: "80e35324-d46e-4737-88e7-0316088e7ace"
```
Give `kaizen-agentic metrics correlate <uid>` a stable, documented read convention
across hosts:
- Document `HELIX_STORE_DB` defaulting to `session_memory/.store/mem.db` (from
`config.toml` `[store].db_path`, absolute path when exported).
- Document the existing `Store.get_digest(session_uid)` JSON shape consumers may
rely on (outcome, cost, tool_histogram, repo, flavor, timestamps).
- Add a thin read entrypoint — e.g. `python -m session_memory.digest_lookup <uid>`
or `python -m session_memory.store --digest <uid> --json` — that prints one
digest without running a full ingest sweep. Unit-test the CLI; no new runtime
dependencies.
After workplan updates, notify the operator to run from `~/state-hub`:
```bash
make fix-consistency REPO=agentic-resources
```
Mark the kaizen-agentic inbox message read once T01 is merged or the workplan is
filed (coordination ack).