Compare commits

..

46 Commits

Author SHA1 Message Date
43bea485aa established rules 2026-06-22 23:06:36 +02:00
63eb431db9 Add .repo-classification.yaml (CUST-WP-0050 T11 agent first-pass) 2026-06-22 17:47:34 +02:00
3250a1746f chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-21:
  - update .custodian-brief.md for agentic-resources
2026-06-21 16:09:45 +02:00
41bfb6e0f3 workplan: finish AGENTIC-WP-0011 and sync State Hub IDs
Mark kaizen correlation follow-up finished; add workstream and task IDs
written by fix-consistency so hub and file stay aligned.
2026-06-21 16:09:34 +02:00
d2e50cf96a chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-19:
  - update .custodian-brief.md for agentic-resources
2026-06-19 20:37:50 +02:00
01d2affc3b Implement AGENTIC-WP-0011 kaizen correlation follow-up
Add bidirectional doc links (PRD §9.1, README, DESIGN §11), session-close
HELIX_* env convention, stable digest JSON contract, and digest_lookup CLI
for read-only correlate lookups. All tasks done; 163 tests green.
2026-06-19 20:27:00 +02:00
292b656952 workplan: AGENTIC-WP-0011 kaizen correlation follow-up
File ready workplan for bidirectional doc links, session-close env export
convention, and stable digest read path per kaizen-agentic coordination.
2026-06-19 20:24:39 +02:00
0a5ba5c24a docs: add credential routing guidance for agent runtimes
Inline ops-warden CredentialRouting canon into AGENTS.md and mirror it
as a Claude Code rule so agents route secret and access requests correctly.
2026-06-19 20:24:35 +02:00
a66d502b95 docs: add kaizen-agentic project metrics correlation (WP-0005 T16)
Link Helix Forge fleet session memory to kaizen-agentic ADR-004 project
metrics via helix_session_uid. Reciprocal reference to the cross-repo
correlation contract.
2026-06-16 07:13:07 +02:00
f9f91a0ca8 Add capability registry scaffold (REUSE-WP-0014-T03 B01)
Empty helix_forge registry layout for federation publishing.
2026-06-16 01:50:07 +02:00
06bcfdc1d9 session-memory: refresh published retro report artifacts
Latest retro publish (30-day window) regenerated last_retro.{json,md} — 30
ranked suggestions across 13 repos with catalog-sourced recommendations. This is
the read model published to the hub to unblock activity-core ACTIVITY-WP-0008.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 21:48:18 +02:00
e237dcc622 session-memory: map signals to catalog recommendations via covers (WP-0010 follow-up)
Closes the gap where recurring_error suggestions showed generic 'Investigate'
instead of the curated recommendation. Added a covers[] field to SolutionPattern
(lowercase substrings a pattern's recommendation also applies to) + Catalog.find_for
(exact key first, then covers match against signal key+locus). Retro now resolves
recommendations through find_for. Tagged the read-before-edit pattern with
covers=['file has not been read','modified since read','file_not_read'] (v1.0.1).
Live: file-not-read suggestions across all repos now inherit 'Read the file before
Edit/Write'. 6 new tests; suite 158/158.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 21:09:44 +02:00
0d05dfcc5d session-memory: weekly retro entrypoint + hub publish (AGENTIC-WP-0010)
The analysis half of the weekly coding retrospection. retro/build.py: windowed
detect+measure -> top-3 improvement suggestions per repo (cross-flavor first,
recommendations pulled from the Pattern Catalog) + fleet snapshot. retro/publish.py:
publishes the report to the hub as the coding_retro read model (event_type=
coding_retro progress event) + local JSON/md, graceful degrade. retro entrypoint
with --window-days/--publish/--json. Live verify over real sessions surfaced
per-repo suggestions with catalog recommendations. 13 new tests; suite 152/152.

Consumed by activity-core ACTIVITY-WP-0008 (Weekly Coding Retrospection, Sat 19:00).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 19:17:24 +02:00
15ba625351 session-memory: fill real resolutions into auto-approved catalog stubs
Replaced the placeholder 'TODO: capture the recommended resolution' in the five
auto-approved patterns with grounded problem descriptions + concrete resolutions
drawn from the friction assessment: budget_overrun (read narrowly / checkpoint),
infra_overhead (batch hub writes / orient once), schema_thrash (front-load tool
schemas), tool_thrash (batch shell + larger edits), clean_pass (tests gate done).
Each versioned 1.0.0 -> 1.0.1 with the stub archived to <id>.history.jsonl.
Proposals regenerate with real content (0 TODO). Suite 139/139.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 16:26:56 +02:00
4f28cd67cf session-memory: Phase 4 Measure — baseline, effectiveness, trend (WP-0009)
Closes the loop. metrics.py: fleet metrics (infra-overhead share, error rate,
schema-thrash, token percentiles, success) + persisted baseline trend. effect.py:
before/after per-pattern effectiveness with an improved verdict per metric.
measure entrypoint with trend + --since effectiveness + JSON. Recorded pre-fix
baseline: 27 sessions, overhead median 11.7%, error rate 0.96, schema-thrash 8.
13 new tests; suite 139/139. Capture->Detect->Curate->Distribute->Measure complete.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 15:49:22 +02:00
035c7a20d3 session-memory: Read-before-Edit reflex + curated pattern (WP-0008)
Acts on the #1 friction finding. T01: added a data-cited Read-before-Edit /
re-read-on-stale reflex to AGENTS.md (top error: 'File has not been read yet',
12/27 sessions). T02: captured it as a curated SolutionPattern
(sp-problem-file_not_read-edit, approved/distribution_ready) with real
resolutions + per-flavor hints, so Distribute proposes it across repos/flavors —
closing assess->curate->distribute on a real pattern. Suite 126/126.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 15:27:22 +02:00
59632e94db session-memory: distribute entrypoint + live verify (WP-0007 T05)
python -m session_memory.distribute: reads approved catalog patterns, builds
targets from repo->domain map x flavors, renders scoped per-flavor proposals
(HITL) + active registry. Live verify against the real catalog: 12 renders
across 5 repos, idempotent, provisional skipped. proposals/ gitignored
(regenerated); active_patterns.json committed. README documents detect->curate->
distribute. Phase 3 finished; suite 126/126.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 15:25:20 +02:00
00e8958540 session-memory: scoping + proposals + active registry (WP-0007 T04)
distribute/proposals.py: Scope-aware targeting (FR-X2, empty axis = any), render
distributable (approved+distribution_ready) patterns into a proposals/ tree
mirroring target paths — proposed not applied (FR-X3, HITL), idempotent on re-run.
ActiveRegistry (FR-X4) records which pattern+version is proposed in which
(repo,flavor). 6 new tests; suite 123/123.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 15:09:40 +02:00
9e28b1b806 session-memory: Claude + Codex + Grok distributors + registry (WP-0007 T02/T03)
Thin per-flavor distributors over the shared base: Claude (CLAUDE.md, optional
skill-stub mode), Codex (AGENTS.md), Grok (.grok/instructions.md). registry maps
flavor->distributor — adding a flavor is one entry + one module. Same agnostic
body renders to distinct per-flavor targets (FR-A3). 7 new tests; suite 117/117.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 15:06:15 +02:00
7646cbc358 session-memory: distributor base + Artifact (WP-0007 T01)
distribute/base.py: Artifact dataclass + Distributor protocol + idempotent
BEGIN/END snippet markers (upsert_block replaces a pattern's block in place so
re-distribution doesn't duplicate) + agnostic markdown body rendering from
SolutionPattern fields. BaseDistributor honours per-flavor body/target hints.
8 new tests; suite 110/110.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 15:02:47 +02:00
9e6f8a6e08 Register WP-0007 (Distribute), WP-0008 (Read-before-Edit), WP-0009 (Measure)
Three workplans queued and registered with the State Hub (via REST — MCP write
layer is erroring this session):
- AGENTIC-WP-0007 Phase 3 Distribute: per-flavor distributor adapters render
  approved catalog patterns into proposed (HITL) artifacts, scoped by repo/domain.
- AGENTIC-WP-0008 Read-before-Edit reflex: act on the #1 friction finding.
- AGENTIC-WP-0009 Phase 4 Measure: baseline + before/after effectiveness + trend.
Proceeding in that order.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 14:58:03 +02:00
ea03cbdd47 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-07:
  - update .custodian-brief.md for agentic-resources
2026-06-07 13:46:45 +02:00
1b6081cd88 session-memory: denoise error fingerprints (WP-0006 follow-up)
Tighten _is_failed: exclude successful hub JSON responses (top-level no-error
payloads) and file-read snapshots (numbered cat -n source lines) that were
polluting error_snippets. JSON verdict classifies error vs success payloads
directly. Cuts distinct fingerprints 444 -> 269 (~40%) over the real corpus with
the top errors unchanged. Assessment caveat updated. 5 new tests; suite 102/102.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 13:39:08 +02:00
7cce276d32 session-memory: error root-cause assessment + v2 re-ingest (WP-0006 T03)
Re-ingested under schema v2 (populates error_snippets) and re-ran detect over
27 real sessions. Added a 'content-level root causes' section to
docs/ASSESSMENT-infra-friction.md: top recurring error is Edit/Write-before-Read
(12/27 sessions, 8 repos), then stale-read conflicts, a cross-flavor (claude+grok)
make fix-consistency failure, and State Hub MCP instability. Documented a
fingerprint-noise caveat. WP-0006 finished; suite 98/98.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 13:09:29 +02:00
e022c0f9d6 session-memory: recurring-error signal + clustering (WP-0006 T02)
detect/signals.py sig_recurring_error emits one signal per distinct error
fingerprint per session (magnitude = in-session occurrences), so the same error
recurring across sessions/repos/flavors clusters into a candidate root-cause
problem pattern via the existing clusterer — cross-flavor flagged automatically.
3 new tests; suite 98/98 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 13:01:29 +02:00
2bd6aa3b41 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-07:
  - update .custodian-brief.md for agentic-resources
2026-06-07 12:48:18 +02:00
97379e9658 session-memory: error-body mining into digest (WP-0006 T01)
build_digest now extracts normalized error fingerprints + samples from failed
events (error kind + failing tool_result bodies) into a durable error_snippets
list — paths/numbers/uuids/addrs stripped so the same error collapses to one
fingerprint with a count; Python traceback header skipped in favour of the real
exception line. Durable in Tier 2 (survives Tier 1 eviction). SCHEMA_VERSION ->
2 (re-ingest needed to populate). 7 new tests; suite 95/95 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 12:45:01 +02:00
dbd212d2b1 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-07:
  - update .custodian-brief.md for agentic-resources
2026-06-07 11:59:38 +02:00
896fde59f0 Register AGENTIC-WP-0006 (error-body mining) workplan
Captures normalized error fingerprints into the durable digest and clusters
recurring root-cause errors across sessions — closes the content-level 'why' gap
called out in the friction assessment. 3 tasks; we implement this in helix_forge.
(State Hub skill handed off to the state-hub worker as STATE-WP-0058.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 11:56:17 +02:00
48618293b0 session-memory: friction assessment + hardened catalog (WP-0005 T03)
Re-ran ingest->detect with the quality filter + infra signals over real local
sessions (72 captured -> 27 real). Purged the false-positive 'abandoned' catalog
entry and re-curated; catalog now carries tool_thrash/schema_thrash/infra_overhead
patterns. docs/ASSESSMENT-infra-friction.md ranks the friction: ~17.6% of real
tool activity is hub/task/schema plumbing (State Hub 10.3%, one session 231 calls;
ToolSearch in 81% of sessions). Validates the CLI/MCP-skill hypothesis as top-2;
recommends a State Hub skill (front-load schemas + batched writes) + bulk hub ops.
Workplan finished; suite 88/88.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 11:18:27 +02:00
21c714e286 session-memory: infra-overhead + thrash signals (WP-0005 T02)
signals.py: tool_bucket helper + three tool_histogram-based extractors that the
outcome/marker signals were blind to — sig_infra_overhead (hub+task+schema share
of tool calls over threshold), sig_schema_thrash (repeated ToolSearch), and
sig_tool_thrash (one tool dominating). Thresholds in build_context. 8 new tests;
suite 88/88 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 11:12:09 +02:00
70433cda61 session-memory: session-quality filter (WP-0005 T01)
detect/quality.py: is_real_coding_session drops health-checks / smoke-tests /
interrupted / trivially-short sessions (event floor, repo present, substantive
tool activity, non-trivial prompt). Wired into run_detect so signals only form
over real sessions — fixes the abandoned false-positive. [detect.quality] knobs;
existing detect/curate fixtures made realistic. 8 new tests; suite 80/80.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 11:07:22 +02:00
56b2f576de AGENTIC-WP-0001: complete T02 + close bootstrap workplan
T02 was the one genuinely-incomplete bootstrap task: AGENTS.md had no
dev-workflow section. Added one documenting the pure-stdlib Python 3.11+
toolchain, pytest, and the session_memory ingest/detect/curate entrypoints so
future sessions can verify changes. T01 (integration files) and T03 (first real
workplan) were already satisfied; reconciled stale ready/todo bookkeeping to
finished/done.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 10:15:23 +02:00
d06791f070 session-memory Phase 2: verify + catalog artifacts (T07)
End-to-end verification over real local sessions: ingest 94->93 -> 72 digests;
detect 3 candidates (2 cross-flavor); curate --auto-approve cataloged 3
SolutionPatterns (2 cross-flavor approved/distribution_ready, 1 Claude-only),
re-run fully idempotent, 3 hub decisions queued (API offline). Commits the 3
catalog artifacts as the source of truth. PRD §12 OQ4/OQ5/OQ6 marked resolved;
README + design refreshed. Workplan finished; suite 72/72.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 10:08:52 +02:00
519e76442a session-memory Phase 2: curate entrypoint + README (T06)
python -m session_memory.curate: refreshes detect candidates, then drives them
through review interactively or with --auto-approve (batch, gate-driven) /
--json. Emits a catalog diff summary; queues hub decisions when offline.
[curate] config gains decision_queue + workstream id. README documents the
detect -> curate -> distribute flow and the gate knobs. 2 new tests; suite 72/72.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 10:00:56 +02:00
4b7a628b6f session-memory Phase 2: hub decision integration (T05)
decisions.py: every final promote/reject becomes a record_decision-shaped
payload (rationale + source key + evidence snapshot). DecisionRecorder degrades
gracefully under a hub outage — pluggable sink with a durable local-queue
fallback and ordered flush/replay (mirrors Phase 1's after-the-fact sync).
Wired into review() via an optional recorder. 6 new tests; suite 70/70 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 00:31:22 +02:00
ab22d22bfb session-memory Phase 2: evidence-bar + bloat guard (T04)
gating.py: two-tier evidence bar (OQ5) — promote floor (frequency/sessions/
cost_impact) plus a stricter distribution-eligibility floor that sets a
promoted pattern to approved+distribution_ready vs provisional. Wired into
review() so thin approvals land provisional. bloat_warnings flags duplicate
and near-duplicate (same signal-type+locus) candidates (OQ6). [curate]/
[curate.gate] knobs in config.toml. 6 new tests; suite 64/64 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 00:28:34 +02:00
e51fd8154d session-memory Phase 2: review workflow (T03)
UI-free discuss/approve/reject engine driving detect candidates into the
catalog via a decide callback. candidate_to_pattern builds a provisional
SolutionPattern with per-flavor rendering-hint stubs. ReviewLog makes
re-review idempotent: prior rejects remembered, re-surfaced only when the
evidence fingerprint changes. 6 new tests; suite 58/58 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 00:25:10 +02:00
c6164a82ba session-memory Phase 2: versioned Pattern Catalog store (T02)
Files-first catalog (one JSON per pattern, id = source-key). Single
idempotent upsert path: added / unchanged / updated (status-only, no bump) /
versioned (content change bumps semver + archives prior to <id>.history.jsonl).
Dedup is structural on pattern id. 5 new tests; suite 52/52 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 00:18:01 +02:00
5f810a6992 session-memory Phase 2: Solution Pattern schema (T01)
Curate package scaffold + flavor-agnostic SolutionPattern artifact with
separate per-flavor rendering hints (OQ4): Resolution/Scope/Provenance
sub-records, stable source-key id, semver bump helper, deterministic
round-trip serialization. 7 new tests; suite 47/47 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 00:16:46 +02:00
43d76b5cf8 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-07:
  - update .custodian-brief.md for agentic-resources
2026-06-07 00:11:12 +02:00
055713aa4f session-memory Phase 1: T08 verify across all three flavors + docs
Marks AGENTIC-WP-0003 finished. Full suite 40/40 green; live pipeline
over real local sessions (Codex via fixtures) surfaced 3 candidate
patterns, 2 cross-flavor (Claude+Grok) — PRD success metric met.
README documents the detect entrypoint and Phase 0/1/next status.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 23:39:37 +02:00
436a96dcd8 session-memory Phase 1: Detect pipeline (T04-T07)
- detect/signals.py: pure extractors over digests (retry storm, repeated
  errors, budget overrun vs corpus p90, abandoned, clean pass, recovery)
- detect/cluster.py: deterministic clustering into candidate Patterns with
  evidence (sessions/repos/flavors/cost impact) + cross-flavor flagging
- detect/__main__.py: python -m session_memory.detect, ranked report
  (cross-flavor first) + --json; persists candidates to Tier 2 patterns table
- core/store.py: list_digests + save_patterns
- tests for signals, cluster, detect entrypoint

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 22:31:13 +02:00
06767ef924 session-memory Phase 1: Grok adapter (T02)
- adapters/grok.py: reads the per-session dir (summary.json + chat_history.jsonl
  + events.jsonl + updates.jsonl); conversation from chat_history, lifecycle/
  turn from events, tool-call names paired in order from updates ACP stream
- registered in ingest dispatch; codex+grok sources enabled in config.toml
- tests/test_grok_adapter.py (synthetic + real local sessions)
- live multi-flavor dry-run discovers 89 sessions across flavors

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 22:12:30 +02:00
bc11cb9aec session-memory Phase 1: Codex adapter (T01) + multi-file merge (T03)
- adapters/common.py: shared Normalized + helpers (resolve_repo, classify_tool,
  jsonl iter, etc.); claude.py refactored to use it (Normalized re-exported)
- adapters/codex.py: rollout {timestamp,type,payload} parser; session_meta/
  response_item/event_msg mapping; flat call_id join; token_count cost;
  registered in ingest dispatch
- core/store.py: ingest() now merges multi-file sessions by content
  fingerprint, appends new events with offset seq (design OQ6); idempotent
- tests/test_codex_adapter.py, tests/test_merge.py

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 21:55:32 +02:00
5aea22f24f Register AGENTIC-WP-0003 (session-memory Phase 1) with State Hub
Codex + Grok adapters, multi-file session merge, and the Detect pipeline
(signals -> clustering -> evidence -> candidate report).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 21:50:23 +02:00
114 changed files with 8016 additions and 121 deletions

20
.claude/rules/agents.md Normal file
View File

@@ -0,0 +1,20 @@
## Kaizen Agents
Specialized agent personas available on demand via the state-hub MCP.
**Discover:** `list_kaizen_agents()` — returns all agents with name, description, category
**Load:** `get_kaizen_agent("tdd-workflow")` — returns full instructions; read and follow them
Common agents:
| Agent | Category | When to use |
|-------|----------|-------------|
| `tdd-workflow` | testing | Step-by-step TDD8 workflow for any feature |
| `code-refactoring` | quality | Code quality analysis and safe refactoring |
| `test-maintenance` | testing | Diagnose and fix failing tests |
| `requirements-engineering` | process | Prevent interface/mock mismatches upfront |
| `keepaTodofile` | process | Maintain TODO.md during work |
| `project-management` | process | Track status, determine next steps |
| `datamodel-optimization` | quality | Optimize dataclasses and data structures |
All 17 agents: call `list_kaizen_agents()` for the full list.

View File

@@ -0,0 +1,8 @@
## Architecture
<!-- TODO: Describe the key design decisions and component structure.
Key modules, data flows, external integrations, state machines, etc. -->
## Quick Reference
`~/state-hub/mcp_server/TOOLS.md` — MCP tool reference

View File

@@ -0,0 +1,50 @@
# Credential and access routing
**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
for inference. Run this check **before** requesting secrets, API keys, SSH access,
login tokens, or database passwords — in any repo, not only `ops-warden`.
ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
other credential need belongs to another subsystem. **Do not** message
`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
### Lookup (do this first)
```bash
warden route find "<describe your need>" --json
warden route show <catalog-id> --json
```
Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
| Agent runtime | How to orient |
| --- | --- |
| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=agentic-resources` is for coordination, not secret vending |
| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
### Quick routing table
| I need… | Owner | ops-warden executes? |
| --- | --- | --- |
| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes**`warden sign` |
| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
| Login / OIDC / MFA | key-cape / Keycloak | No — route only |
| Authorization decision | flex-auth | No — route only |
| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
### Anti-patterns (do not do these)
- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
- Pasting secrets into Git, State Hub, workplans, logs, or chat
### Other capabilities (reuse-surface)
Non-credential capabilities are usually discovered through **reuse-surface** federation
(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
every repo's agent instructions because it is high-frequency, high-risk, and easy to
get wrong.
**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`

View File

@@ -0,0 +1,38 @@
## First Session Protocol
Triggered when `get_domain_summary("infotech")` shows **no workstreams**.
The project is registered but work has not yet been structured.
**Step 1 — Read, don't write**
- `~/the-custodian/canon/projects/infotech/project_charter_v0.1.md` — purpose, scope
- `~/the-custodian/canon/projects/infotech/roadmap_v0.1.md` — planned phases
- Scan repo root: README, directory structure, existing code or docs
**Step 2 — Survey in-progress work**
Look for TODOs, open branches, half-finished files. Note done vs. started but incomplete.
**Step 3 — Propose workstreams to Bernd**
Propose 13 workstreams — each a coherent strand, weeks to months, anchored to a
roadmap phase. **Wait for approval before creating.**
**Step 4 — Create workplan file first, then DB record (ADR-001)**
```
workplans/AGENTIC-WP-NNNN-<slug>.md ← write this first
```
Then register in the hub:
```
create_workstream(topic_id="f39fa2a3-c491-414c-a91b-b4c5fcc6139c", title="...", owner="...", description="...")
create_task(workstream_id="<id>", title="...", priority="high|medium|low")
```
**Step 5 — Record the setup**
```
add_progress_event(
summary="First session: structured infotech into N workstreams, M tasks",
event_type="milestone",
topic_id="f39fa2a3-c491-414c-a91b-b4c5fcc6139c",
detail={"workstreams": [...], "tasks_created": M}
)
```
<!-- Delete or archive this file once past first session -->

View File

@@ -0,0 +1,8 @@
## Repo boundary
This repo owns **agentic-resources** only. It does not own:
<!-- TODO: List what belongs in adjacent repos, e.g.:
- SSH key management → railiance-infra/
- State hub code → state-hub/
-->

View File

@@ -0,0 +1,5 @@
**Purpose:** Iterating towards optimal agentic performance.
**Domain:** infotech
**Repo slug:** agentic-resources
**Topic ID:** f39fa2a3-c491-414c-a91b-b4c5fcc6139c

View File

@@ -0,0 +1,85 @@
## Session Protocol
Dev Hub (State Hub API): http://127.0.0.1:8000
MCP server name in `~/.claude.json`: `dev-hub`
**Step 1 — Orient**
Read the offline-safe brief first — it works without a live hub connection:
```bash
cat .custodian-brief.md
```
Then call the MCP tool for richer cross-domain context when MCP tools are exposed:
```
get_domain_summary("infotech")
```
If MCP tools are unavailable in the current agent session, use the REST API:
```bash
curl -s "http://127.0.0.1:8000/state/summary" | python3 -m json.tool
```
If the hub is offline: `cd ~/state-hub && make api`
**Step 2 — Check inbox**
With MCP tools:
```
get_messages(to_agent="agentic-resources", unread_only=True)
```
Mark read with `mark_message_read(message_id)`. Reply or act on coordination
requests before proceeding.
Without MCP tools:
```bash
curl -s "http://127.0.0.1:8000/messages/?to_agent=agentic-resources&unread_only=true" \
| python3 -m json.tool
curl -s -X PATCH "http://127.0.0.1:8000/messages/<id>/read" \
-H "Content-Type: application/json" -d '{}'
```
**Step 3 — Scan workplans**
```bash
ls workplans/
```
For each file with `status: ready`, `active`, or `blocked`, note pending
`wait`/`todo`/`progress` tasks.
**Step 4 — Present brief**
1. **Active workstreams** for `infotech` — title, task counts, blocking decisions
2. **Pending tasks** from `workplans/` + any `[repo:agentic-resources]` hub tasks
3. **Goal guidance** — if `goal_guidance` in summary:
- `needs_workplan`: surface as top action — *"Repo goal '{title}' has no workplan yet"*
- `alignment_warnings`: flag if active work is not aligned with current goal
4. **Suggested next action** — highest-priority open item
5. **SBOM status** — flag if `last_sbom_at` is unset for this repo
If no workstreams: follow First Session Protocol (`first-session.md`).
**During work:** `record_decision()` · `add_progress_event()` · `resolve_decision()`
> State Hub is a *read model*. Bootstrap tools (`create_workstream`, `create_task`)
> are First Session Protocol only. Work structure belongs in repo files (ADR-001).
**Session close:**
With MCP tools:
```
add_progress_event(summary="...", topic_id="f39fa2a3-c491-414c-a91b-b4c5fcc6139c", workstream_id="<uuid>")
```
Without MCP tools:
```bash
curl -s -X POST http://127.0.0.1:8000/progress/ \
-H "Content-Type: application/json" \
-d '{"topic_id":"f39fa2a3-c491-414c-a91b-b4c5fcc6139c","workstream_id":"<uuid>","event_type":"note","summary":"what changed","author":"codex"}'
```
If workplan files were modified, ensure the local copy is up to date first:
```bash
git -C <repo_path> pull --ff-only
cd ~/state-hub && make fix-consistency REPO=agentic-resources
```
For repos where implementation runs on a remote machine (e.g. CoulombCore),
use the combined target which pulls before fixing:
```bash
cd ~/state-hub && make fix-consistency-remote REPO=agentic-resources
```
**C-15** (DB task ahead of file) is normal in multi-machine workflows — writeback
will sync the file to match DB. **C-16** (repo behind remote) blocks all writes
until you pull — intentional to prevent clobbering remote progress.

View File

@@ -0,0 +1,19 @@
## Stack
<!-- TODO: Fill in language, frameworks, and key dependencies -->
- **Language:**
- **Key deps:**
## Dev Commands
```bash
# TODO: Fill in the standard commands for this repo
# Install dependencies
# Run tests
# Lint / type check
# Build / package (if applicable)
```

View File

@@ -0,0 +1,40 @@
## Workplan Convention (ADR-001)
File location: `workplans/AGENTIC-WP-NNNN-<slug>.md`
ID prefix: `AGENTIC-WP-`
Work items originate as files in this repo **before** being registered in the hub.
Canonical workplan/workstream frontmatter statuses are:
`proposed`, `ready`, `active`, `blocked`, `backlog`, `finished`, `archived`.
Use `proposed` for a newly drafted plan, `ready` after review against current
repo state, and `finished` when implementation is complete. `stalled` and
`needs_review` are derived health labels, not stored statuses.
Closed workplans may be moved to `workplans/archived/` with a completion-date
prefix: `YYMMDD-AGENTIC-WP-NNNN-<slug>.md`. The frontmatter id remains
unchanged; the prefix is only for quick visual reference.
Small opportunistic tasks discovered during another session use **Ad Hoc Tasks**:
`workplans/ADHOC-YYYY-MM-DD.md`, workstream slug `adhoc-YYYY-MM-DD`, and task ids
`ADHOC-YYYY-MM-DD-T01`, `T02`, etc. Use adhocs only for low-risk work completed
directly. Promote anything requiring analysis, design, approval, dependencies, or
multiple planned phases into a normal workplan.
Ecosystem todos from other agents arrive as `[repo:agentic-resources]` hub tasks —
visible at session start. Pick one up by creating the workplan file, then registering
the workstream.
Task blocks use this shape:
```task
id: AGENTIC-WP-NNNN-T01
status: wait | todo | progress | done | cancel
priority: high | medium | low
state_hub_task_id: "<uuid>" # written by fix-consistency — do not edit
```
Status progression is `todo``progress``done`; use `wait` for waiting or
blocked work and `cancel` for stopped work.
<!-- Ralph Loop rules and HEUREKA sequence: ~/.claude/CLAUDE.md — do not duplicate here -->

View File

@@ -2,18 +2,12 @@
# Custodian Brief — agentic-resources
**Domain:** helix_forge
**Last synced:** 2026-06-05 22:10 UTC
**Last synced:** 2026-06-21 14:09 UTC
**State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)*
## Active Workstreams
### Bootstrap State Hub integration
Progress: 0/3 done | workstream_id: `bb9a43a3-a54f-434b-97c2-e1c7142b52f5`
**Open tasks:**
- · Review Generated Integration Files `3ad7b7a9`
- · Verify Local Developer Workflow `db248d57`
- · Seed First Real Workplan `9cbb7aa5`
*(none — repo may need first-session setup)*
---
## MCP Orientation (when available)

2
.gitignore vendored
View File

@@ -177,6 +177,8 @@ cython_debug/
# session-memory local store
session_memory/.store/
# generated per-flavor distribution proposals (HITL, regenerated each run)
session_memory/proposals/
__pycache__/
*.pyc
.pytest_cache/

18
.repo-classification.yaml Normal file
View File

@@ -0,0 +1,18 @@
repo_classification:
standard: Repo Classification Standard
version: '1.0'
classified_at: '2026-06-22'
classified_by: agent
category: project
domain: infotech
secondary_domains: []
capability_tags:
- automation
- orchestration
business_stake:
- technology
- product
- operations
business_mechanics:
- coordination
- operation

View File

@@ -4,7 +4,7 @@
**Purpose:** Iterating towards optimal agentic performance.
**Domain:** helix_forge
**Domain:** infotech
**Repo slug:** agentic-resources
**Topic ID:** `f39fa2a3-c491-414c-a91b-b4c5fcc6139c`
**Workplan prefix:** `AGENTIC-WP-`
@@ -101,6 +101,63 @@ curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
---
## Credential and access routing
**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
for inference. Run this check **before** requesting secrets, API keys, SSH access,
login tokens, or database passwords — in any repo, not only `ops-warden`.
ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
other credential need belongs to another subsystem. **Do not** message
`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
### Lookup (do this first)
```bash
warden route find "<describe your need>" --json
warden route show <catalog-id> --json
```
Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
| Agent runtime | How to orient |
| --- | --- |
| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=agentic-resources` is for coordination, not secret vending |
| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
### Quick routing table
| I need… | Owner | ops-warden executes? |
| --- | --- | --- |
| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes** — `warden sign` |
| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
| Login / OIDC / MFA | key-cape / Keycloak | No — route only |
| Authorization decision | flex-auth | No — route only |
| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
### Anti-patterns (do not do these)
- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
- Pasting secrets into Git, State Hub, workplans, logs, or chat
### Other capabilities (reuse-surface)
Non-credential capabilities are usually discovered through **reuse-surface** federation
(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
every repo's agent instructions because it is high-frequency, high-risk, and easy to
get wrong.
**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`
<!-- REPO-AGENTS-EXTENSIONS -->
<!-- Append repo-specific agent instructions below this marker.
The state-hub template sync preserves content after this line. -->
---
## Workplan Convention (ADR-001)
Work items originate as files in this repo — not in the hub. The hub is a
@@ -124,7 +181,7 @@ anything needing analysis, design, approval, dependencies, or multiple phases.
id: AGENTIC-WP-NNNN
type: workplan
title: "..."
domain: helix_forge
domain: infotech
repo: agentic-resources
status: proposed | ready | active | blocked | backlog | finished | archived
owner: codex

12
CLAUDE.md Normal file
View File

@@ -0,0 +1,12 @@
# agentic-resources — Claude Code Instructions
@SCOPE.md
@.claude/rules/repo-identity.md
@.claude/rules/session-protocol.md
@.claude/rules/first-session.md
@.claude/rules/workplan-convention.md
@.claude/rules/stack-and-commands.md
@.claude/rules/architecture.md
@.claude/rules/repo-boundary.md
@.claude/rules/credential-routing.md
@.claude/rules/agents.md

View File

@@ -0,0 +1,144 @@
# Infrastructure Friction Assessment
*Generated 2026-06-07 from captured coding-session data (Helix Forge session
memory), after the Detect-hardening pass ([AGENTIC-WP-0005]). First data-driven
assessment of where our agentic coding sessions spend effort on plumbing rather
than work.*
## Method & data quality
- **Corpus:** 72 sessions captured across Claude + Grok. A session-quality filter
([detect/quality.py]) drops health-checks, smoke-tests, and interrupted runs
(mostly `llm-connect` *"Say hello in one word"*). **27 are real coding sessions.**
- **Caveat:** the 41 % that were filtered out had been mislabeled `abandoned` by
the outcome heuristic and produced a *false-positive* "cross-flavor abandoned"
pattern in the first catalog — now purged. Treat any pre-hardening finding with
suspicion.
- **Key framing:** all 27 real sessions ended in `success`. So the friction here
is **cost/efficiency, not failure** — sessions get there, but pay an avoidable
tax to do it.
## The headline number
Across the 27 real sessions, tool-call activity breaks down as:
| Bucket | Share |
|--------|------:|
| shell (Bash / run_terminal) | 38.2 % |
| edit | 30.2 % |
| read | 12.9 % |
| **State Hub MCP** | **10.3 %** |
| **task-management plumbing** | **5.8 %** |
| **schema-loading (`ToolSearch`)** | **1.5 %** |
| other | 1.1 % |
**~17.6 % of all tool calls in real coding sessions are coordination plumbing
(hub + task + schema-loading), not touching the repo.** Per-session infra-overhead
share: median **11.7 %**, p90 **26.1 %**, max **43.3 %** — it concentrates badly.
## Ranked friction
### 1. State Hub call volume — *highest cost, addressable*
State Hub MCP is 10.3 % of all tool calls and dominates the worst sessions:
| Repo (one session) | total calls | State Hub calls | overhead share |
|--------------------|------:|------:|------:|
| vergabe-teilnahme | 570 | **231** | 43 % |
| activity-core | 488 | 98 | 23 % |
| flex-auth | 236 | 35 (+27 task) | 29 % |
| net-kingdom | 129 | 25 | 22 % |
Root cause: many **fine-grained** calls — per-task status updates, per-event
progress writes, repeated `get_domain_summary`. 231 hub calls in a single session
is coordination overhead, not work.
### 2. Schema-loading thrash (`ToolSearch`) — *low cost, near-zero-effort fix*
**106 `ToolSearch` calls across 22 of 27 sessions (81 %).** The State Hub MCP
tools are *deferred*, so nearly every session re-discovers and re-loads the same
tool schemas before it can call them. This is pure overhead with no work value —
and it is **exactly the CLI/MCP-interface friction hypothesized.**
### 3. Task-management plumbing — 5.8 %
`TaskUpdate` / `TaskCreate` / `todo_write` / `update_task_status`. Overlaps with
(1); much of it is redundant status churn within a session.
### 4. Tool thrash — *session-shape, watch only*
11 sessions hammer a single tool 80230× (usually Bash or Edit). Less an infra
problem than a sign of missing higher-level tooling; low priority.
### 5. Budget overrun — 3 sessions
Token cost well above peers. Secondary; revisit once (1)(2) are addressed.
## Recommendations
**The CLI/MCP-interface hypothesis is validated as a top-2 friction, not a minor
issue.** Two high-ROI moves:
- **A. A State Hub skill (highest ROI).** A skill (or a pre-loaded tool manifest)
that (i) **front-loads the common hub tool schemas** so agents stop
`ToolSearch`-ing for them — eliminates finding #2 almost entirely (81 % of
sessions) — and (ii) **teaches batched writes** (sync N task statuses in one
call, fewer progress events) to attack finding #1. Low effort, broad reach.
- **B. Coarser hub operations.** Add bulk endpoints / a single "sync workplan
statuses" op so a session doesn't make 200+ individual hub calls. This is the
structural fix behind the skill's guidance.
- **C. Measure the effect (Phase 4).** After A/B land, compare infra-overhead
share on subsequent sessions against this baseline (median 11.7 %, p90 26.1 %).
This is precisely what the Measure phase is for — the loop closes here.
## Content-level root causes (error-body mining)
*Added 2026-06-07 from [AGENTIC-WP-0006] — `build_digest` now mines normalized
error fingerprints into the durable digest, and `sig_recurring_error` clusters
them. This is the "why" the tool-mix view above could not see.*
**26 of 27 real sessions hit at least one error.** Top recurring error
fingerprints across the corpus (by # sessions affected):
| # sessions | occ | flavors | top sample |
|-----------:|----:|---------|------------|
| **12** | 32 | claude | `<tool_use_error>File has not been read yet. Read it first before writing to it.` |
| **6** | 13 | claude | `<tool_use_error>File has been modified since read …` |
| **4** | 9 | **claude + grok** | `make: *** [Makefile:227: fix-consistency] Error 1` |
| 3 | 21 | claude | `MCP error -32602: Invalid request parameters` |
| 3 | 6 | claude | `Error calling tool 'update_task_status': 'title'` |
| 2 | 6 | claude | `make: *** [Makefile:21: test] Error 1` |
Reading:
- **#1 — Edit/Write-before-Read (12/27 sessions, 8 repos).** The single most
common error is agents trying to edit a file they haven't read into context.
This is a *workflow* friction, highly addressable: a Read-before-Edit reflex in
the agent instructions / a skill, or a harness affordance. (Observed live: the
author hit this exact error twice while writing this workplan.)
- **#2 — stale-read conflicts (6 sessions):** "File has been modified since read"
— same family, a re-read-before-edit discipline fixes both.
- **#3 — cross-flavor `make fix-consistency` failures (claude + grok, 3 repos):**
the consistency tooling itself fails across flavors — a shared infra issue worth
a look on the state-hub side (cf. [STATE-WP-0058]).
- **State Hub MCP instability** (`-32602`, `update_task_status 'title'`) shows up
in 3 sessions each — corroborates the plumbing-overhead story and the live MCP
flakiness seen during this work (REST fallback used).
**Fingerprint noise — mostly handled.** `_is_failed` now excludes successful hub
JSON responses (top-level no-error payloads) and file-read snapshots (numbered
`cat -n` source lines), which cut distinct fingerprints **444 → 269 (~40 %)**
without touching the top entries. Residual low-value items remain in the long tail
(bare structural lines like `{`, linter "N errors" summaries); the *top*
fingerprints are real. Note several entries (`MCP error -32602`,
`update_task_status 'title'`) reflect the State Hub MCP instability hit live during
this work — genuine, if self-referential, friction.
## What this assessment still can't see
- ~~**Why** a session was expensive at the content level.~~ **Now addressed**
(error-body mining, above), modulo the fingerprint-noise caveat.
- Repeated *failed approaches* (as opposed to surfaced errors) — e.g. an agent
silently retrying a wrong strategy without an error — are still invisible.
- Grok/Codex are thin in the corpus (4 Grok, 0 Codex sessions), so cross-flavor
friction claims are Claude-weighted for now.
[AGENTIC-WP-0005]: ../workplans/AGENTIC-WP-0005-detect-hardening.md
[AGENTIC-WP-0006]: ../workplans/AGENTIC-WP-0006-error-body-mining.md
[STATE-WP-0058]: handed off to the state-hub repo worker
[detect/quality.py]: ../session_memory/detect/quality.py

View File

@@ -370,8 +370,89 @@ hub indexes).
---
*Next step: [AGENTIC-WP-0002] implements Phase 0 — the schema, the Claude
collector, the Tier1/Tier2 store, and the budget-based eviction sweep.*
## 11. Project metrics correlation (kaizen-agentic)
Helix Forge owns **fleet-level** session capture and digests (this repo). The
**kaizen-agentic** framework owns **project-scoped** agent execution metrics
(ADR-004: `.kaizen/metrics/<agent>/executions.jsonl`). The two layers correlate
by optional `helix_session_uid` on project records — link-by-reference, no
duplicate ingestion in either repo.
| Layer | Owner | Storage |
|-------|-------|---------|
| Fleet | agentic-resources (Helix Forge) | digest store (`digests` table) |
| Project | kaizen-agentic | `.kaizen/metrics/<agent>/executions.jsonl` |
**Cross-repo contract:** [Helix Forge Correlation Contract](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/docs/integrations/helix-forge-correlation.md)
(kaizen-agentic). Field mapping from `Session.session_uid``helix_session_uid`,
`digest.cost``tokens`, `tool_histogram` MCP share → `infra_overhead_share`.
**Read path:** `kaizen-agentic metrics correlate <uid>` looks up a digest via
`HELIX_STORE_DB` (this repo's session store). No write path from kaizen-agentic
into Helix Forge.
**Related kaizen-agentic docs:** [ADR-004 project metrics convention](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/docs/adr/ADR-004-project-metrics-convention.md),
[wiki/EcosystemIntegration.md](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/wiki/EcosystemIntegration.md).
### 11.1 Session-close env export (dual-layer agents)
Agents that run **both** Helix Forge capture and kaizen `metrics record` should
export the following **after** the ingest sweep has written the session digest
(`python -m session_memory.ingest` or an equivalent Stop/SessionEnd hook). Names
match kaizen-agentic ADR-004 — do not invent parallel aliases.
| Variable | Source in Helix Forge | Purpose |
|----------|----------------------|---------|
| `HELIX_SESSION_UID` | `Session.session_uid` | Primary correlation key → `helix_session_uid` |
| `HELIX_REPO` | `digest.repo` | Project/repo scoping |
| `HELIX_FLAVOR` | `digest.flavor` | Agent runtime (`claude` / `codex` / `grok`) |
| `HELIX_TOKENS` | `digest.cost.input_tokens + digest.cost.output_tokens` | Token rollup → `tokens` |
| `HELIX_INFRA_OVERHEAD_SHARE` | infra bucket share over `tool_histogram` (see `measure.metrics.session_metrics`) | MCP/plumbing overhead → `infra_overhead_share` |
Example (after digest exists):
```bash
export HELIX_SESSION_UID="claude:abc-123"
export HELIX_REPO="agentic-resources"
export HELIX_FLAVOR="claude"
export HELIX_TOKENS=125000
export HELIX_INFRA_OVERHEAD_SHARE=0.117
# optional — lets kaizen correlate without guessing the store location:
export HELIX_STORE_DB="$(pwd)/session_memory/.store/mem.db"
kaizen-agentic metrics record # merges HELIX_* when present
```
### 11.2 Digest store location and read API
- **`HELIX_STORE_DB`** — absolute path to the SQLite file holding Tier 2 digests.
Defaults to `config.toml` `[store].db_path` (`session_memory/.store/mem.db` relative
to the repo root). Export as an absolute path when setting the variable on session
close so `metrics correlate` works across hosts and working directories.
- **Thin CLI** — `python -m session_memory.digest_lookup <session_uid> [--json]`
prints one digest without running ingest. Exit `0` on hit, `1` when missing.
- **Programmatic** — `Store.get_digest(session_uid)` returns the JSON blob written
by `build_digest` / `analyze`.
**Stable digest JSON shape** (fields consumers may rely on):
| Field | Type | Notes |
|-------|------|-------|
| `session_uid` | string | Normalized uid (`<flavor>:<native-id>`) |
| `flavor`, `repo`, `domain` | string | Session attribution |
| `model` | string | Model id when known |
| `started_at`, `ended_at` | string | ISO timestamps |
| `outcome` | string | `success` / `fail` / `abandoned` / `unknown` |
| `cost` | object | `input_tokens`, `output_tokens`, `cache_tokens`, `wall_clock_s`, `turns`, `retries` |
| `tool_histogram` | object | Tool name → call count |
| `event_count`, `kind_counts`, `markers` | object/int | Compact activity summary |
| `first_prompt`, `last_assistant` | string | Short text snippets |
| `error_snippets` | array | `{fingerprint, sample, count, tool}` entries |
| `schema_version` | int | Digest schema version |
---
*Implemented:* Phases 04, weekly retro ([AGENTIC-WP-0002][AGENTIC-WP-0010]);
kaizen correlation follow-up ([AGENTIC-WP-0011]).
## Sources

View File

@@ -5,7 +5,7 @@
**Status:** Draft v0.1
**Author:** Claude (drafted with Bernd Worsch)
**Created:** 2026-06-06
**Updated:** 2026-06-06
**Updated:** 2026-06-19
---
@@ -223,6 +223,32 @@ record:
- The hub remains a **read model**; Helix Forge writes its durable artifacts as files
and lets the hub index them.
### 9.1 Downstream: kaizen-agentic project metrics correlation
Helix Forge is a **fleet-level** producer of normalized session digests. The
**kaizen-agentic** framework is a **project-scoped** consumer of optional
correlation fields on its execution metrics (ADR-004). The two layers link
**by reference** — kaizen-agentic does not re-implement JSONL ingestion or write
into the Helix Forge store.
| Layer | Owner | What it stores |
|-------|-------|----------------|
| Fleet | agentic-resources (`session_memory`) | Per-session digests in the local SQLite store |
| Project | kaizen-agentic | `.kaizen/metrics/<agent>/executions.jsonl` |
**Canonical spec in this repo:** [DESIGN-session-memory.md §11](DESIGN-session-memory.md#11-project-metrics-correlation-kaizen-agentic)
(session-close env export, digest read path, stable JSON shape).
**Authoritative cross-repo contract (kaizen-agentic):**
[Helix Forge Correlation Contract](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/docs/integrations/helix-forge-correlation.md).
Field mapping: `Session.session_uid``helix_session_uid`; digest token totals →
`tokens`; MCP/tool overhead share → `infra_overhead_share`.
**Read path for consumers:** `HELIX_STORE_DB` points at the digest SQLite file
(default `session_memory/.store/mem.db`); `python -m session_memory.digest_lookup
<uid> --json` or `kaizen-agentic metrics correlate <uid>` performs a read-only
lookup. No ingestion code belongs in kaizen-agentic.
## 10. Success Metrics
| Metric | Meaning | Target (directional, v1) |
@@ -255,12 +281,26 @@ record:
three flavors?
- **OQ3** Where does detection logic run — local batch jobs, hub-side, or a dedicated
service? What volume do we actually expect?
- **OQ4** Pattern format: how do we keep one agnostic representation while giving each
distributor enough to render high-quality native artifacts?
- **OQ5** What's the minimum trustworthy evidence bar before a pattern is allowed to be
distributed to live agent environments?
- **OQ6** How do we prevent pattern bloat — too many low-value instructions degrading
agent context budgets (cf. the token-budget policy in global instructions)?
- ~~**OQ4** Pattern format: how do we keep one agnostic representation while giving each
distributor enough to render high-quality native artifacts?~~ **Resolved (Phase 2,
AGENTIC-WP-0004):** the `SolutionPattern` core is flavor-agnostic (problem,
resolutions, scope, provenance) and carries per-flavor knowledge only in a separate
`rendering_hints` sub-structure keyed by flavor — distributors read the hints, the
core stays neutral. Catalogued as versioned files-first artifacts (FR-U3).
- ~~**OQ5** What's the minimum trustworthy evidence bar before a pattern is allowed to be
distributed to live agent environments?~~ **Resolved (Phase 2):** a two-tier
evidence bar (`[curate.gate]`). A *promote* floor (frequency / distinct sessions /
cost-impact) admits a candidate as `provisional`; a stricter *distribution* floor
(higher frequency, optional cross-flavor requirement, cost-impact) is required to
mark a pattern `approved` + `distribution_ready`. Defaults are conservative and
config-tunable.
- ~~**OQ6** How do we prevent pattern bloat — too many low-value instructions degrading
agent context budgets (cf. the token-budget policy in global instructions)?~~
**Resolved (Phase 2):** a bloat guard flags duplicate (same id) and near-duplicate
(same signal-type+locus) candidates at review time, and the catalog dedups
structurally on the source-candidate key so re-promotion never multiplies entries.
Thin candidates stay `provisional` (not distributed) rather than padding live
context.
## 13. Risks

12
registry/README.md Normal file
View File

@@ -0,0 +1,12 @@
# Capability Registry
Markdown-first capability index for federation and reuse planning.
## Authoring
1. Copy a capability entry template (see reuse-surface `templates/capability-entry.template.md`).
2. Add the row to `indexes/capabilities.yaml`.
3. Run `reuse-surface validate` from a checkout with the CLI installed.
4. Merge to `main` and verify publish with `reuse-surface establish --publish-check`.
Federation contract: reuse-surface `docs/RegistryFederation.md`.

View File

View File

@@ -0,0 +1,4 @@
version: 1
updated: '2026-06-16'
domain: helix_forge
capabilities: []

View File

@@ -13,14 +13,40 @@ time window.
```
session_memory/
adapters/claude.py # Tier0 -> Tier1 normalizer (Codex/Grok land in Phase 1)
adapters/common.py # shared Normalized bundle + helpers
adapters/claude.py # Tier0 -> Tier1 normalizers, one per flavor
adapters/codex.py # (rollout {timestamp,type,payload}, flat call_id join)
adapters/grok.py # (per-session dir: chat_history + events + updates)
core/schema.py # Session / SessionEvent / Cost
core/store.py # SQLite rows + blob-dir bodies (Tier1) + digests (Tier2)
core/store.py # SQLite rows + blob-dir bodies (Tier1) + digests/patterns (Tier2)
core/cursor.py # incremental ingest cursors
core/digest.py # Tier1 -> Tier2 promotion + outcome heuristic
core/retention.py # budget-based eviction sweep
ingest.py # one sweep: discover -> normalize -> store -> digest -> evict
config.toml # store paths, retention caps, sources, repo->domain map
detect/signals.py # signal extractors over digests
detect/cluster.py # cluster signals -> candidate patterns + cross-flavor flag
detect/__main__.py # python -m session_memory.detect (ranked report)
curate/schema.py # SolutionPattern artifact + per-flavor rendering hints
curate/catalog.py # versioned, files-first Pattern Catalog (dedup on id)
curate/gating.py # promotion evidence bar + bloat guard
curate/review.py # discuss/approve/reject -> promote workflow
curate/decisions.py # hub decision audit trail (graceful local-queue fallback)
curate/__main__.py # python -m session_memory.curate (interactive / --auto-approve)
catalog/ # the committed Pattern Catalog (source of truth)
distribute/base.py # Artifact + Distributor protocol + idempotent snippet markers
distribute/claude.py # CLAUDE.md (or skill) renderer } per-flavor edges
distribute/codex.py # AGENTS.md renderer } (agnostic body,
distribute/grok.py # native instruction renderer } different targets)
distribute/proposals.py # scoping + proposed-not-applied output + active registry
distribute/__main__.py # python -m session_memory.distribute
measure/metrics.py # fleet metrics + persisted baseline snapshots
measure/effect.py # before/after per-pattern effectiveness
measure/__main__.py # python -m session_memory.measure
retro/build.py # windowed top-3-per-repo suggestions
retro/publish.py # hub coding_retro read model + local report
retro/__main__.py # python -m session_memory.retro
digest_lookup.py # python -m session_memory.digest_lookup (read one digest, no ingest)
config.toml # store paths, retention caps, sources, repo->domain map, curate gate
```
The local store lives under `session_memory/.store/` (gitignored).
@@ -51,6 +77,147 @@ the sweep *runs*. Trigger it with the repo scheduler, e.g. daily:
or a cron entry / `/loop` on a timer. Push-capture (agent Stop/SessionEnd hooks)
can also enqueue a sweep; see design §7.
## Detect candidate patterns
After ingesting, mine the digests for recurring problem/success patterns:
```bash
python -m session_memory.detect # ranked report, cross-flavor first
python -m session_memory.detect --json # machine-readable candidates
python -m session_memory.detect --min-frequency 3
```
Candidates are persisted to a Tier 2 `patterns` table and are the input to the
Curate phase (Phase 2). Patterns whose evidence spans more than one agent flavor
are flagged `[CROSS-FLAVOR]` — the highest-value reuse targets.
## Curate candidates into the Pattern Catalog
Review detect candidates into versioned **Solution Patterns** held in the
files-first catalog (`session_memory/catalog/`). The flow is **detect → curate →
(Phase 3) distribute**; `curate` refreshes candidates by running detect first.
```bash
python -m session_memory.curate # interactive review (a/r/d per candidate)
python -m session_memory.curate --auto-approve # batch: promote all that clear the evidence bar
python -m session_memory.curate --json # machine-readable result
```
- **Promotion** writes a `SolutionPattern` file (id = source candidate key, so
re-promoting the same candidate dedups; content changes bump the semver and
archive the prior version to `<id>.history.jsonl`).
- The **evidence bar** (`[curate.gate]`) sets two floors: a promote floor and a
stricter *distribution* floor. A thin-but-real candidate lands `provisional`;
one clearing the distribution floor lands `approved` + `distribution_ready`.
- A **bloat guard** flags duplicate / near-duplicate candidates so the catalog
stays lean.
- Re-review is **idempotent** — a remembered decision is skipped unless the
candidate's evidence changed; a prior reject is not re-surfaced.
- Each final promote/reject is recorded as a **hub decision**; if the hub is
offline the decision is queued to `[curate].decision_queue` for later sync
(the same after-the-fact pattern used in Phase 1).
### Curate knobs (`[curate]` / `[curate.gate]` in config.toml)
| Key | Meaning |
|-----|---------|
| `catalog_dir` | committed Pattern Catalog dir (source of truth) |
| `review_log` / `decision_queue` | remembered decisions + pending hub decisions (gitignored) |
| `min_frequency` / `min_sessions` / `min_cost_impact` | floor to promote at all |
| `dist_require_cross_flavor` | require cross-flavor evidence to be distribution-eligible |
| `dist_min_frequency` / `dist_min_cost_impact` | stricter floor for `distribution_ready` |
## Distribute patterns as per-flavor proposals
Render approved catalog patterns into per-flavor artifacts — **proposed, never
auto-applied** (HITL). Completes the loop: **detect → curate → distribute**.
```bash
python -m session_memory.distribute # proposals for all repos/flavors
python -m session_memory.distribute --repo state-hub --flavor claude
python -m session_memory.distribute --json
```
- Only `approved` + `distribution_ready` patterns are rendered; each pattern's
`Scope` (repos/domains/flavors) decides where it lands (FR-X2).
- Each flavor renders the **same agnostic body** to its own target (Claude →
`CLAUDE.md`/skill, Codex → `AGENTS.md`, Grok → native) via `rendering_hints`
(FR-A3); blocks carry stable `BEGIN/END` markers so re-running updates in place.
- Output goes to `session_memory/proposals/<repo>/<target>` (gitignored,
regenerated) — a reviewable diff a human applies (FR-X3). The committed
`distribute/active_patterns.json` records which pattern+version is proposed in
which `(repo, flavor)` (FR-X4).
## Measure effectiveness (closing the loop)
Track whether the fleet is getting cheaper / more reliable, and whether a
distributed pattern actually helped.
```bash
python -m session_memory.measure --label "baseline" # snapshot + trend
python -m session_memory.measure --since 2026-06-07 # before/after a change
python -m session_memory.measure --no-save --json
```
- A **snapshot** (infra-overhead share, error rate, schema-thrash, token
percentiles, success rate) is appended to `measure/baselines.jsonl` to build a
trend (FR-M3).
- `--since DATE` splits sessions before/after a change and diffs the metrics, with
an `improved` verdict per metric (FR-M1/FR-M2) — so ineffective patterns can be
retired. Recorded pre-fix baseline (2026-06-07): 27 sessions, infra-overhead
median 11.7 %, error rate 0.96, schema-thrash 8 sessions.
## Weekly retro (the input to the scheduled retrospection)
A windowed roll-up: detect + measure over the last N days → the **top-3
improvement suggestions per repo** (cross-flavor first; recommendations pulled
from the Pattern Catalog) → published to the hub as the `coding_retro` read model.
```bash
python -m session_memory.retro # last 7 days, local report
python -m session_memory.retro --window-days 30 --json
python -m session_memory.retro --publish # also post coding_retro to the hub
```
Writes `retro/last_retro.{json,md}` and (with `--publish`) posts an
`event_type=coding_retro` progress event. This is consumed by activity-core's
**Weekly Coding Retrospection** schedule (ACTIVITY-WP-0008, Saturday 19:00 Berlin),
which emits one improvement task per relevant repo. Hub publish degrades
gracefully when the hub is unreachable.
## Correlation with kaizen-agentic
Helix Forge owns **fleet-level** session digests; **kaizen-agentic** owns
**project-scoped** execution metrics (ADR-004). The two layers correlate by
optional `helix_session_uid` on project records — **link-by-reference only**;
kaizen-agentic does not ingest JSONL into this store.
| Layer | Storage |
|-------|---------|
| Fleet (here) | `session_memory/.store/mem.db``digests` table |
| Project (kaizen) | `.kaizen/metrics/<agent>/executions.jsonl` |
- **Spec:** [DESIGN-session-memory.md §11](../docs/DESIGN-session-memory.md#11-project-metrics-correlation-kaizen-agentic)
- **Contract (kaizen-agentic):** [Helix Forge Correlation Contract](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/docs/integrations/helix-forge-correlation.md)
### Session-close env export
After ingest has written the digest, agents using both layers export `HELIX_*`
vars for `kaizen-agentic metrics record` to merge (names match ADR-004):
`HELIX_SESSION_UID`, `HELIX_REPO`, `HELIX_FLAVOR`, `HELIX_TOKENS`,
`HELIX_INFRA_OVERHEAD_SHARE`, and optionally `HELIX_STORE_DB` (absolute path to
`mem.db`). See DESIGN §11.1 for field sources.
### Read one digest (for `metrics correlate`)
```bash
python -m session_memory.digest_lookup claude:abc-123 --json
HELIX_STORE_DB=/abs/path/to/mem.db python -m session_memory.digest_lookup <uid>
```
Defaults to `[store].db_path` in `config.toml`. Read-only — does not run ingest.
## Retention knobs (`[retention]` in config.toml)
| Key | Meaning |
@@ -66,10 +233,28 @@ exists, except the explicitly-reported hard-cap overflow path.
## Tests
```bash
python -m pytest # 26 tests: schema, adapter, store, digest, retention, ingest
python -m pytest # schema, adapters, store, digest, retention, ingest, detect, curate
```
## Status
Phase 0 (AGENTIC-WP-0002): Claude adapter only, end to end. Codex and Grok
adapters are designed (schemas confirmed in the design doc) and land in Phase 1.
- **Phase 0** (AGENTIC-WP-0002): schema, store, digest, budget retention, Claude
adapter, ingest sweep.
- **Phase 1** (AGENTIC-WP-0003): Codex + Grok adapters, multi-file session merge,
and the Detect pipeline (signals → clustering → cross-flavor candidate patterns).
- **Phase 2** (AGENTIC-WP-0004): Curate — Solution Pattern schema, versioned
files-first Pattern Catalog, discuss/approve/reject review with an evidence bar +
bloat guard, and hub-decision audit trail.
- **Detect hardening** (AGENTIC-WP-0005): session-quality filter + tool-mix /
infra-overhead signals. **Error mining** (AGENTIC-WP-0006): recurring error
fingerprints → root-cause patterns.
- **Phase 3** (AGENTIC-WP-0007): Distribute — per-flavor distributor adapters
render approved patterns into proposed (HITL) artifacts, scoped by repo/domain,
with an active-pattern registry.
- **Phase 4** (AGENTIC-WP-0009): Measure — fleet baseline/trend + before/after
per-pattern effectiveness. The Capture → Detect → Curate → Distribute → Measure
loop is closed.
- **Weekly retro** (AGENTIC-WP-0010): windowed top-3-per-repo + hub `coding_retro`
publish.
- **Kaizen correlation** (AGENTIC-WP-0011): bidirectional doc links, session-close
`HELIX_*` env convention, `digest_lookup` read path.

View File

@@ -11,54 +11,23 @@ that the store persists out-of-line so Tier 1 rows stay light.
from __future__ import annotations
import json
import os
from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Any, Iterable, Optional
from typing import Any, Optional
from ..core.schema import Cost, Session, SessionEvent
from .common import ( # noqa: F401 (Normalized re-exported for back-compat)
Normalized,
classify_tool,
first_line as _first_line,
iter_jsonl as _iter_records,
now_iso as _now,
resolve_repo as _resolve_repo,
seconds_between as _seconds_between,
stringify as _stringify,
)
FLAVOR = "claude"
# tool_use names that mutate files -> kind "edit"
_EDIT_TOOLS = {"Edit", "Write", "NotebookEdit", "MultiEdit"}
# crude test-runner detection inside Bash commands -> kind "test_run"
_TEST_HINTS = ("pytest", "unittest", "npm test", "npm run test", "go test", "cargo test", "jest", "vitest")
@dataclass
class Normalized:
session: Session
events: list[SessionEvent]
blobs: dict[str, str] = field(default_factory=dict)
def _iter_records(path: str) -> Iterable[dict[str, Any]]:
with open(path, "r", encoding="utf-8") as f:
for line in f:
line = line.strip()
if not line:
continue
try:
yield json.loads(line)
except json.JSONDecodeError:
continue # tolerate partial/corrupt trailing lines
def _resolve_repo(cwd: Optional[str], repo_domain_map: dict[str, str]) -> tuple[Optional[str], Optional[str]]:
"""cwd -> (repo, domain). repo is the cwd basename; domain via map."""
if not cwd:
return None, None
repo = os.path.basename(cwd.rstrip("/")) or None
domain = repo_domain_map.get(repo) if repo else None
return repo, domain
def _is_test_command(text: str) -> bool:
low = text.lower()
return any(h in low for h in _TEST_HINTS)
def _content_blocks(message: dict[str, Any]) -> list[dict[str, Any]]:
content = message.get("content")
@@ -159,11 +128,8 @@ def parse_session(path: str, repo_domain_map: Optional[dict[str, str]] = None) -
name = b.get("name", "")
inp = b.get("input", {})
body = _stringify(inp)
kind = "tool_call"
if name in _EDIT_TOOLS:
kind = "edit"
elif name == "Bash" and _is_test_command(_stringify(inp.get("command", ""))):
kind = "test_run"
cmd = inp.get("command", "") if isinstance(inp, dict) else ""
kind = classify_tool(name, _stringify(cmd))
add_event(uuid, parent, ts, kind, role="assistant", tool=name,
summary=f"{name}", body=body, sidechain=sidechain)
@@ -194,35 +160,3 @@ def parse_session(path: str, repo_domain_map: Optional[dict[str, str]] = None) -
discovered_at=_now(),
)
return Normalized(session=session, events=events, blobs=blobs)
# ---- helpers ---------------------------------------------------------------
def _stringify(v: Any) -> str:
if v is None:
return ""
if isinstance(v, str):
return v
try:
return json.dumps(v, ensure_ascii=False)[:20000]
except (TypeError, ValueError):
return str(v)[:20000]
def _first_line(text: str) -> str:
return (text or "").strip().splitlines()[0] if (text or "").strip() else ""
def _seconds_between(start: Optional[str], end: Optional[str]) -> float:
if not start or not end:
return 0.0
try:
a = datetime.fromisoformat(start.replace("Z", "+00:00"))
b = datetime.fromisoformat(end.replace("Z", "+00:00"))
return max(0.0, (b - a).total_seconds())
except ValueError:
return 0.0
def _now() -> str:
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")

View File

@@ -0,0 +1,167 @@
"""OpenAI Codex CLI collector adapter — Tier 0 -> Tier 1 (design §2.2, §4.3).
Reads ``$CODEX_HOME/sessions/YYYY/MM/DD/rollout-*.jsonl``. Each line is a
``RolloutLine`` wrapper ``{timestamp, type, payload}``; ``type`` discriminates
``session_meta`` / ``response_item`` / ``event_msg`` / ``turn_context`` /
``compacted``.
Codex is **flat** — tool calls and outputs are joined only by ``call_id`` with no
parent-ref DAG — so ``seq`` is assigned by temporal (line) order and
``parent_seq`` is set for ``function_call_output`` back to its ``function_call``.
"""
from __future__ import annotations
import os
from typing import Any, Optional
from ..core.schema import Cost, Session, SessionEvent
from .common import (
Normalized,
classify_tool,
first_line,
iter_jsonl,
now_iso,
resolve_repo,
seconds_between,
stringify,
)
FLAVOR = "codex"
def _message_text(payload: dict[str, Any]) -> str:
content = payload.get("content")
if isinstance(content, str):
return content
parts = []
if isinstance(content, list):
for b in content:
if isinstance(b, dict):
parts.append(b.get("text") or b.get("output_text") or "")
elif isinstance(b, str):
parts.append(b)
return "\n".join(p for p in parts if p)
def _extract_tokens(payload: dict[str, Any]) -> tuple[int, int, int]:
"""Best-effort (input, output, cache) from a token_count payload.
Field shapes vary across Codex versions; probe known locations, else recurse.
"""
for scope in (payload, payload.get("info") or {}, payload.get("usage") or {},
(payload.get("info") or {}).get("total_token_usage") or {}):
if isinstance(scope, dict):
i = scope.get("input_tokens") or scope.get("prompt_tokens")
o = scope.get("output_tokens") or scope.get("completion_tokens")
if i is not None or o is not None:
cache = scope.get("cached_input_tokens") or scope.get("cache_read_input_tokens") or 0
return int(i or 0), int(o or 0), int(cache or 0)
return 0, 0, 0
def parse_session(path: str, repo_domain_map: Optional[dict[str, str]] = None) -> Optional[Normalized]:
repo_domain_map = repo_domain_map or {}
records = list(iter_jsonl(path))
if not records:
return None
session_id: Optional[str] = None
cwd = model = cli_version = None
timestamps: list[str] = []
events: list[SessionEvent] = []
blobs: dict[str, str] = {}
call_seq: dict[str, int] = {} # call_id -> seq of its function_call
cost = Cost()
seq = 0
def add_event(ts, kind, *, role=None, tool=None, summary=None, body=None,
tokens=0, parent_seq=None) -> int:
nonlocal seq
s = seq
seq += 1
payload_ref = None
if body:
payload_ref = f"blob://{session_id}/{s}"
blobs[payload_ref] = body
events.append(SessionEvent(
session_uid=Session.make_uid(FLAVOR, session_id or "unknown"),
seq=s, parent_seq=parent_seq, ts=ts, kind=kind, role=role, tool=tool,
summary=(summary or "")[:300] or None, payload_ref=payload_ref, tokens=tokens,
))
return s
for rec in records:
rtype = rec.get("type")
ts = rec.get("timestamp")
if ts:
timestamps.append(ts)
payload = rec.get("payload") or {}
if rtype == "session_meta":
session_id = session_id or payload.get("id")
cwd = cwd or payload.get("cwd")
model = model or payload.get("model")
cli_version = cli_version or payload.get("cli_version")
elif rtype == "turn_context":
model = model or payload.get("model")
elif rtype == "response_item":
ptype = payload.get("type")
if ptype == "message":
role = payload.get("role", "assistant")
text = _message_text(payload)
kind = "assistant_msg" if role == "assistant" else "user_msg"
add_event(ts, kind, role=role, summary=first_line(text), body=text)
elif ptype == "function_call":
name = payload.get("name", "")
args = stringify(payload.get("arguments"))
kind = classify_tool(name, args)
s = add_event(ts, kind, role="assistant", tool=name,
summary=name, body=args)
call_id = payload.get("call_id")
if call_id:
call_seq[call_id] = s
elif ptype == "function_call_output":
call_id = payload.get("call_id")
parent = call_seq.get(call_id)
body = stringify(payload.get("output"))
add_event(ts, "tool_result", role="tool", tool=None,
summary="tool result", body=body, parent_seq=parent)
elif ptype == "reasoning":
body = _message_text(payload) or stringify(payload.get("summary"))
add_event(ts, "thinking", role="assistant", summary="reasoning", body=body)
elif rtype == "event_msg":
ptype = payload.get("type")
if ptype == "task_started":
add_event(ts, "lifecycle", summary="task_started")
elif ptype == "task_complete":
add_event(ts, "completion", summary="task_complete")
elif ptype == "token_count":
i, o, c = _extract_tokens(payload)
cost.input_tokens += i
cost.output_tokens += o
cost.cache_tokens += c
# user_message / agent_message echoes are duplicated by response_item
# messages on modern Codex; skipped to avoid double counting.
if session_id is None:
return None
cost.turns = sum(1 for e in events if e.kind == "user_msg")
started = min(timestamps) if timestamps else None
ended = max(timestamps) if timestamps else None
cost.wall_clock_s = seconds_between(started, ended)
repo, domain = resolve_repo(cwd, repo_domain_map)
session = Session(
session_uid=Session.make_uid(FLAVOR, session_id),
flavor=FLAVOR, native_session_id=session_id,
repo=repo, domain=domain, cwd=cwd, model=model,
started_at=started, ended_at=ended, outcome="unknown", cost=cost,
source_path=path, source_bytes=os.path.getsize(path) if os.path.exists(path) else 0,
discovered_at=now_iso(),
)
return Normalized(session=session, events=events, blobs=blobs)

View File

@@ -0,0 +1,100 @@
"""Shared adapter helpers (Tier 0 -> Tier 1).
The ``Normalized`` bundle contract and small flavor-agnostic helpers used by every
collector adapter. Per-flavor parsing lives in the individual adapter modules.
"""
from __future__ import annotations
import json
import os
from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Any, Optional
from ..core.schema import Session, SessionEvent
# tool names that mutate files -> kind "edit" (union across flavors)
EDIT_TOOLS = {
"Edit", "Write", "NotebookEdit", "MultiEdit", # Claude
"apply_patch", "write_file", "edit_file", # Codex / Grok variants
}
# substrings in a shell/tool command that indicate a test run -> kind "test_run"
TEST_HINTS = (
"pytest", "unittest", "npm test", "npm run test", "go test",
"cargo test", "jest", "vitest", "make test", "tox",
)
@dataclass
class Normalized:
session: Session
events: list[SessionEvent]
blobs: dict[str, str] = field(default_factory=dict)
def resolve_repo(cwd: Optional[str], repo_domain_map: dict[str, str]) -> tuple[Optional[str], Optional[str]]:
"""cwd -> (repo, domain). repo is the cwd basename; domain via map."""
if not cwd:
return None, None
repo = os.path.basename(cwd.rstrip("/")) or None
domain = repo_domain_map.get(repo) if repo else None
return repo, domain
def is_test_command(text: str) -> bool:
low = (text or "").lower()
return any(h in low for h in TEST_HINTS)
def classify_tool(name: str, command_text: str = "") -> str:
"""Map a tool invocation to an event kind: edit | test_run | tool_call."""
if name in EDIT_TOOLS:
return "edit"
if is_test_command(command_text) or is_test_command(name):
return "test_run"
return "tool_call"
def stringify(v: Any, limit: int = 20000) -> str:
if v is None:
return ""
if isinstance(v, str):
return v[:limit]
try:
return json.dumps(v, ensure_ascii=False)[:limit]
except (TypeError, ValueError):
return str(v)[:limit]
def first_line(text: str) -> str:
t = (text or "").strip()
return t.splitlines()[0] if t else ""
def seconds_between(start: Optional[str], end: Optional[str]) -> float:
if not start or not end:
return 0.0
try:
a = datetime.fromisoformat(start.replace("Z", "+00:00"))
b = datetime.fromisoformat(end.replace("Z", "+00:00"))
return max(0.0, (b - a).total_seconds())
except ValueError:
return 0.0
def iter_jsonl(path: str):
"""Yield parsed JSON objects from a JSONL file, tolerating bad lines."""
with open(path, "r", encoding="utf-8") as f:
for line in f:
line = line.strip()
if not line:
continue
try:
yield json.loads(line)
except json.JSONDecodeError:
continue
def now_iso() -> str:
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")

View File

@@ -0,0 +1,182 @@
"""Grok CLI collector adapter — Tier 0 -> Tier 1 (design §2.3, §4.3).
A Grok session is a *directory* ``~/.grok/sessions/<enc-cwd>/<uuid>/`` containing
``summary.json`` (metadata), ``chat_history.jsonl`` (the canonical transcript),
``events.jsonl`` (explicit lifecycle + ``turn_number``), and ``updates.jsonl``
(ACP ``session/update`` stream, which carries tool-call names/args).
The ingest glob matches ``chat_history.jsonl``; this adapter derives its sibling
files from the same directory. Conversation order is taken from
``chat_history.jsonl``; tool-call names are paired, in order, from
``updates.jsonl`` ``tool_call`` entries to classify edits/test runs.
"""
from __future__ import annotations
import json
import os
from typing import Any, Optional
from ..core.schema import Cost, Session, SessionEvent
from .common import (
Normalized,
classify_tool,
first_line,
iter_jsonl,
now_iso,
resolve_repo,
seconds_between,
stringify,
)
FLAVOR = "grok"
def _text_content(content: Any) -> str:
if isinstance(content, str):
return content
if isinstance(content, list):
return "\n".join(
(b.get("text") or "") for b in content if isinstance(b, dict)
)
return ""
def _tool_calls_in_order(session_dir: str) -> list[dict[str, Any]]:
"""Ordered list of {title, rawInput} from updates.jsonl tool_call entries."""
calls: list[dict[str, Any]] = []
upd = os.path.join(session_dir, "updates.jsonl")
if not os.path.exists(upd):
return calls
for rec in iter_jsonl(upd):
u = (rec.get("params") or {}).get("update") or {}
if u.get("sessionUpdate") == "tool_call":
calls.append({"title": u.get("title") or "", "rawInput": u.get("rawInput") or {},
"id": u.get("toolCallId")})
return calls
def _session_meta(session_dir: str) -> dict[str, Any]:
p = os.path.join(session_dir, "summary.json")
if not os.path.exists(p):
return {}
try:
with open(p, "r", encoding="utf-8") as f:
return json.load(f)
except (OSError, ValueError):
return {}
def _lifecycle(session_dir: str) -> tuple[list[dict[str, Any]], Optional[str]]:
"""events.jsonl records + the model id seen there."""
evs, model = [], None
p = os.path.join(session_dir, "events.jsonl")
if os.path.exists(p):
for rec in iter_jsonl(p):
evs.append(rec)
model = model or rec.get("model_id")
return evs, model
def parse_session(path: str, repo_domain_map: Optional[dict[str, str]] = None) -> Optional[Normalized]:
repo_domain_map = repo_domain_map or {}
# accept either the chat_history.jsonl path or the session dir
session_dir = path if os.path.isdir(path) else os.path.dirname(path)
chat = os.path.join(session_dir, "chat_history.jsonl")
if not os.path.exists(chat):
return None
meta = _session_meta(session_dir)
info = meta.get("info") or {}
session_id = info.get("id") or os.path.basename(session_dir.rstrip("/"))
cwd = info.get("cwd") or meta.get("git_root_dir")
life_events, life_model = _lifecycle(session_dir)
model = meta.get("current_model_id") or life_model
pending_calls = _tool_calls_in_order(session_dir)
call_idx = 0
events: list[SessionEvent] = []
blobs: dict[str, str] = {}
seq = 0
def add(kind, *, role=None, tool=None, summary=None, body=None, parent_seq=None) -> int:
nonlocal seq
s = seq
seq += 1
ref = None
if body:
ref = f"blob://{session_id}/{s}"
blobs[ref] = body
events.append(SessionEvent(
session_uid=Session.make_uid(FLAVOR, session_id), seq=s, parent_seq=parent_seq,
ts=None, kind=kind, role=role, tool=tool,
summary=(summary or "")[:300] or None, payload_ref=ref,
))
return s
# explicit lifecycle first (turn_started/turn_ended carry no bodies)
for le in life_events:
t = le.get("type")
if t in ("turn_started", "loop_started", "turn_ended", "phase_changed"):
add("lifecycle", summary=t)
for rec in iter_jsonl(chat):
rtype = rec.get("type")
content = rec.get("content")
if rtype == "user":
text = _text_content(content)
if text.strip():
add("user_msg", role="user", summary=first_line(text), body=text)
elif rtype == "reasoning":
text = _text_content(content)
if text.strip():
add("thinking", role="assistant", summary="reasoning", body=text)
elif rtype == "assistant":
text = _text_content(content)
if text.strip():
add("assistant_msg", role="assistant", summary=first_line(text), body=text)
elif rtype == "tool_result":
# pair with the next tool_call (in order) to recover name/args
tool = None
parent = None
if call_idx < len(pending_calls):
call = pending_calls[call_idx]
call_idx += 1
tool = call["title"]
cmd = stringify(call["rawInput"])
kind = classify_tool(tool, cmd)
parent = add(kind, role="assistant", tool=tool, summary=tool, body=cmd)
body = _text_content(content) if not isinstance(content, str) else content
add("tool_result", role="tool", tool=tool, summary="tool result",
body=stringify(body), parent_seq=parent)
if not events:
return None
cost = Cost(turns=sum(1 for e in events if e.kind == "user_msg"))
started = info.get("created_at") or meta.get("created_at")
ended = meta.get("last_active_at") or info.get("updated_at") or meta.get("updated_at")
cost.wall_clock_s = seconds_between(started, ended)
repo, domain = resolve_repo(cwd, repo_domain_map)
session = Session(
session_uid=Session.make_uid(FLAVOR, session_id), flavor=FLAVOR,
native_session_id=session_id, repo=repo, domain=domain, cwd=cwd,
git_branch=meta.get("head_branch"), model=model,
started_at=started, ended_at=ended, outcome="unknown", cost=cost,
source_path=chat,
source_bytes=_dir_bytes(session_dir),
discovered_at=now_iso(),
)
return Normalized(session=session, events=events, blobs=blobs)
def _dir_bytes(d: str) -> int:
total = 0
for root, _, files in os.walk(d):
for f in files:
try:
total += os.path.getsize(os.path.join(root, f))
except OSError:
pass
return total

View File

@@ -0,0 +1 @@
{"created_at": "2026-06-07T09:13:20Z", "distribution_ready": true, "id": "sp-problem-budget_overrun-tokens", "name": "problem: budget overrun", "polarity": "problem", "problem": "problem: budget overrun", "provenance": {"detected_at": null, "evidence": {"cost_impact": 10.667, "cross_flavor": false, "flavors": ["claude"], "frequency": 3, "key": "problem:budget_overrun:tokens", "locus": "tokens", "polarity": "problem", "repos": ["artifact-store", "citation-evidence", "infospace-bench"], "score": 32.001, "sessions": ["claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca", "claude:6e0d3d68-872b-4d93-bb09-0691e091314b", "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78"], "signal_type": "budget_overrun", "title": "problem: budget overrun"}, "promoted_at": "2026-06-07T09:13:20Z", "source_key": "problem:budget_overrun:tokens"}, "rendering_hints": {"claude": {"note": "TODO: refine rendering", "target": "CLAUDE.md"}}, "resolutions": [{"detail": "", "steps": [], "summary": "TODO: capture the recommended resolution"}], "schema_version": 1, "scope": {"domains": [], "flavors": ["claude"], "repos": ["artifact-store", "citation-evidence", "infospace-bench"]}, "status": "superseded", "updated_at": "2026-06-07T09:13:20Z", "version": "1.0.0"}

View File

@@ -0,0 +1,77 @@
{
"created_at": "2026-06-07T09:13:20Z",
"distribution_ready": true,
"id": "sp-problem-budget_overrun-tokens",
"name": "Budget overrun: token cost above peers",
"polarity": "problem",
"problem": "A session's token cost lands well above its peers (>p90). Usually driven by re-reading large files or tool outputs, carrying redundant context, or long exploratory loops without checkpoints.",
"provenance": {
"detected_at": null,
"evidence": {
"cost_impact": 10.667,
"cross_flavor": false,
"flavors": [
"claude"
],
"frequency": 3,
"key": "problem:budget_overrun:tokens",
"locus": "tokens",
"polarity": "problem",
"repos": [
"artifact-store",
"citation-evidence",
"infospace-bench"
],
"score": 32.001,
"sessions": [
"claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca",
"claude:6e0d3d68-872b-4d93-bb09-0691e091314b",
"claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78"
],
"signal_type": "budget_overrun",
"title": "problem: budget overrun"
},
"promoted_at": "2026-06-07T09:13:20Z",
"source_key": "problem:budget_overrun:tokens"
},
"rendering_hints": {
"claude": {
"target": "CLAUDE.md"
}
},
"resolutions": [
{
"detail": "Use offset/limit; don't re-Read a file already in the transcript.",
"steps": [
"Locate with grep/glob first",
"Read only the relevant span"
],
"summary": "Read narrowly \u2014 target the region you need, not whole large files"
},
{
"detail": "Summarize progress; avoid re-pulling outputs already shown.",
"steps": [],
"summary": "Checkpoint and prune context instead of re-fetching it"
},
{
"detail": "grep/glob narrows scope far cheaper than reading whole trees.",
"steps": [],
"summary": "Prefer targeted search over broad reads to locate code"
}
],
"schema_version": 1,
"scope": {
"domains": [],
"flavors": [
"claude"
],
"repos": [
"artifact-store",
"citation-evidence",
"infospace-bench"
]
},
"status": "approved",
"updated_at": "2026-06-07T14:21:06Z",
"version": "1.0.1"
}

View File

@@ -0,0 +1 @@
{"covers": [], "created_at": "2026-06-07T13:26:25Z", "distribution_ready": true, "id": "sp-problem-file_not_read-edit", "name": "Read before you Edit", "polarity": "problem", "problem": "Agents call Edit/Write on a file they have not read in the current session, or after it changed under them. The edit tools reject this ('File has not been read yet' / 'File has been modified since read'), and the retry burns a turn. Top recurring error in the corpus (12/27 sessions, 8 repos).", "provenance": {"detected_at": null, "evidence": {"frequency": 32, "origin": "AGENTIC-WP-0006 error mining / ASSESSMENT-infra-friction.md", "polarity": "problem", "repos": 8, "sessions": 12}, "promoted_at": null, "source_key": "problem:file_not_read:edit"}, "rendering_hints": {"claude": {"target": "CLAUDE.md"}, "codex": {"target": "AGENTS.md"}, "grok": {"target": ".grok/instructions.md"}}, "resolutions": [{"detail": "Never blind-write a file you haven't read this session.", "steps": ["Read the target file", "Then Edit/Write"], "summary": "Read the file (or the region you'll touch) before Edit/Write"}, {"detail": "A stale read means the file changed under you; refresh, don't loop.", "steps": ["Re-Read the file", "Re-apply the Edit"], "summary": "On 'modified since read', re-Read then re-Edit"}], "schema_version": 1, "scope": {"domains": [], "flavors": [], "repos": []}, "status": "superseded", "updated_at": "2026-06-07T13:26:25Z", "version": "1.0.0"}

View File

@@ -0,0 +1,63 @@
{
"covers": [
"file has not been read",
"modified since read",
"file_not_read"
],
"created_at": "2026-06-07T13:26:25Z",
"distribution_ready": true,
"id": "sp-problem-file_not_read-edit",
"name": "Read before you Edit",
"polarity": "problem",
"problem": "Agents call Edit/Write on a file they have not read in the current session, or after it changed under them. The edit tools reject this ('File has not been read yet' / 'File has been modified since read'), and the retry burns a turn. Top recurring error in the corpus (12/27 sessions, 8 repos).",
"provenance": {
"detected_at": null,
"evidence": {
"frequency": 32,
"origin": "AGENTIC-WP-0006 error mining / ASSESSMENT-infra-friction.md",
"polarity": "problem",
"repos": 8,
"sessions": 12
},
"promoted_at": null,
"source_key": "problem:file_not_read:edit"
},
"rendering_hints": {
"claude": {
"target": "CLAUDE.md"
},
"codex": {
"target": "AGENTS.md"
},
"grok": {
"target": ".grok/instructions.md"
}
},
"resolutions": [
{
"detail": "Never blind-write a file you haven't read this session.",
"steps": [
"Read the target file",
"Then Edit/Write"
],
"summary": "Read the file (or the region you'll touch) before Edit/Write"
},
{
"detail": "A stale read means the file changed under you; refresh, don't loop.",
"steps": [
"Re-Read the file",
"Re-apply the Edit"
],
"summary": "On 'modified since read', re-Read then re-Edit"
}
],
"schema_version": 1,
"scope": {
"domains": [],
"flavors": [],
"repos": []
},
"status": "approved",
"updated_at": "2026-06-07T19:06:45Z",
"version": "1.0.1"
}

View File

@@ -0,0 +1 @@
{"created_at": "2026-06-07T09:13:20Z", "distribution_ready": false, "id": "sp-problem-infra_overhead-infra_overhead", "name": "problem: infra overhead", "polarity": "problem", "problem": "problem: infra overhead", "provenance": {"detected_at": null, "evidence": {"cost_impact": 0.801, "cross_flavor": false, "flavors": ["claude"], "frequency": 2, "key": "problem:infra_overhead:infra_overhead", "locus": "infra_overhead", "polarity": "problem", "repos": ["markitect-main", "vergabe-teilnahme"], "score": 1.602, "sessions": ["claude:135002f9-98d2-4d1b-b8fb-543b20388782", "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74"], "signal_type": "infra_overhead", "title": "problem: infra overhead"}, "promoted_at": "2026-06-07T09:13:20Z", "source_key": "problem:infra_overhead:infra_overhead"}, "rendering_hints": {"claude": {"note": "TODO: refine rendering", "target": "CLAUDE.md"}}, "resolutions": [{"detail": "", "steps": [], "summary": "TODO: capture the recommended resolution"}], "schema_version": 1, "scope": {"domains": [], "flavors": ["claude"], "repos": ["markitect-main", "vergabe-teilnahme"]}, "status": "superseded", "updated_at": "2026-06-07T09:13:20Z", "version": "1.0.0"}

View File

@@ -0,0 +1,74 @@
{
"created_at": "2026-06-07T09:13:20Z",
"distribution_ready": false,
"id": "sp-problem-infra_overhead-infra_overhead",
"name": "Infrastructure overhead: too much coordination plumbing",
"polarity": "problem",
"problem": "A large share of the session's tool calls are State Hub / task-management / schema-loading plumbing rather than touching the repo (corpus median 11.7%, up to 43% in the worst sessions; one session made 231 hub calls).",
"provenance": {
"detected_at": null,
"evidence": {
"cost_impact": 0.801,
"cross_flavor": false,
"flavors": [
"claude"
],
"frequency": 2,
"key": "problem:infra_overhead:infra_overhead",
"locus": "infra_overhead",
"polarity": "problem",
"repos": [
"markitect-main",
"vergabe-teilnahme"
],
"score": 1.602,
"sessions": [
"claude:135002f9-98d2-4d1b-b8fb-543b20388782",
"claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74"
],
"signal_type": "infra_overhead",
"title": "problem: infra overhead"
},
"promoted_at": "2026-06-07T09:13:20Z",
"source_key": "problem:infra_overhead:infra_overhead"
},
"rendering_hints": {
"claude": {
"target": "CLAUDE.md"
}
},
"resolutions": [
{
"detail": "Update several task statuses together; emit fewer, coarser progress events.",
"steps": [
"Do a chunk of work",
"Then sync statuses in one pass"
],
"summary": "Batch hub writes \u2014 sync at checkpoints, not per event"
},
{
"detail": "One scoped summary at session start beats many broad reads.",
"steps": [],
"summary": "Orient once with get_domain_summary, don't re-query repeatedly"
},
{
"detail": "See STATE-WP-0058 \u2014 stops the repeated ToolSearch for hub tools.",
"steps": [],
"summary": "Front-load hub tool knowledge via the State Hub skill"
}
],
"schema_version": 1,
"scope": {
"domains": [],
"flavors": [
"claude"
],
"repos": [
"markitect-main",
"vergabe-teilnahme"
]
},
"status": "provisional",
"updated_at": "2026-06-07T14:21:06Z",
"version": "1.0.1"
}

View File

@@ -0,0 +1 @@
{"created_at": "2026-06-07T09:13:20Z", "distribution_ready": true, "id": "sp-problem-schema_thrash-schema_load", "name": "problem: schema thrash", "polarity": "problem", "problem": "problem: schema thrash", "provenance": {"detected_at": null, "evidence": {"cost_impact": 79.0, "cross_flavor": false, "flavors": ["claude"], "frequency": 8, "key": "problem:schema_thrash:schema_load", "locus": "schema_load", "polarity": "problem", "repos": ["activity-core", "citation-evidence", "flex-auth", "infospace-bench", "ops-bridge", "vergabe-teilnahme"], "score": 632.0, "sessions": ["claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca", "claude:30dbad62-c042-41f2-80c1-5953a1100e7f", "claude:4340b160-2fb6-47d0-897c-3cac0a8855d8", "claude:63fd4df2-5add-4748-af21-c1544825e006", "claude:8313f946-f008-4e98-9915-31950380e39e", "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78", "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74", "claude:bbcf1c2b-14be-40e4-826b-4b2b49b9d212"], "signal_type": "schema_thrash", "title": "problem: schema thrash"}, "promoted_at": "2026-06-07T09:13:20Z", "source_key": "problem:schema_thrash:schema_load"}, "rendering_hints": {"claude": {"note": "TODO: refine rendering", "target": "CLAUDE.md"}}, "resolutions": [{"detail": "", "steps": [], "summary": "TODO: capture the recommended resolution"}], "schema_version": 1, "scope": {"domains": [], "flavors": ["claude"], "repos": ["activity-core", "citation-evidence", "flex-auth", "infospace-bench", "ops-bridge", "vergabe-teilnahme"]}, "status": "superseded", "updated_at": "2026-06-07T09:13:20Z", "version": "1.0.0"}

View File

@@ -0,0 +1,83 @@
{
"created_at": "2026-06-07T09:13:20Z",
"distribution_ready": true,
"id": "sp-problem-schema_thrash-schema_load",
"name": "Schema thrash: repeated ToolSearch",
"polarity": "problem",
"problem": "ToolSearch fires repeatedly within a session (seen in 81% of sessions) because the State Hub MCP tools are deferred and their schemas get re-loaded each time they are needed \u2014 pure overhead with no work value.",
"provenance": {
"detected_at": null,
"evidence": {
"cost_impact": 79.0,
"cross_flavor": false,
"flavors": [
"claude"
],
"frequency": 8,
"key": "problem:schema_thrash:schema_load",
"locus": "schema_load",
"polarity": "problem",
"repos": [
"activity-core",
"citation-evidence",
"flex-auth",
"infospace-bench",
"ops-bridge",
"vergabe-teilnahme"
],
"score": 632.0,
"sessions": [
"claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca",
"claude:30dbad62-c042-41f2-80c1-5953a1100e7f",
"claude:4340b160-2fb6-47d0-897c-3cac0a8855d8",
"claude:63fd4df2-5add-4748-af21-c1544825e006",
"claude:8313f946-f008-4e98-9915-31950380e39e",
"claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78",
"claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74",
"claude:bbcf1c2b-14be-40e4-826b-4b2b49b9d212"
],
"signal_type": "schema_thrash",
"title": "problem: schema thrash"
},
"promoted_at": "2026-06-07T09:13:20Z",
"source_key": "problem:schema_thrash:schema_load"
},
"rendering_hints": {
"claude": {
"target": "CLAUDE.md"
}
},
"resolutions": [
{
"detail": "Resolve them by name in one ToolSearch (select:...) rather than searching ad hoc.",
"steps": [
"List the hub tools the session needs",
"Load them once at the start"
],
"summary": "Load the tool schemas you'll need once, up front"
},
{
"detail": "The skill carries the schemas so no per-use discovery is needed.",
"steps": [],
"summary": "Adopt the State Hub skill that front-loads common hub tool signatures"
}
],
"schema_version": 1,
"scope": {
"domains": [],
"flavors": [
"claude"
],
"repos": [
"activity-core",
"citation-evidence",
"flex-auth",
"infospace-bench",
"ops-bridge",
"vergabe-teilnahme"
]
},
"status": "approved",
"updated_at": "2026-06-07T14:21:06Z",
"version": "1.0.1"
}

View File

@@ -0,0 +1 @@
{"created_at": "2026-06-07T09:13:20Z", "distribution_ready": true, "id": "sp-problem-tool_thrash-tool-bash", "name": "problem: tool thrash", "polarity": "problem", "problem": "problem: tool thrash", "provenance": {"detected_at": null, "evidence": {"cost_impact": 1990.0, "cross_flavor": false, "flavors": ["claude"], "frequency": 11, "key": "problem:tool_thrash:tool:Bash", "locus": "tool:Bash", "polarity": "problem", "repos": ["activity-core", "artifact-store", "citation-evidence", "ihp-railiance-probe", "infospace-bench", "railiance-apps", "state-hub", "vergabe-teilnahme"], "score": 21890.0, "sessions": ["claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca", "claude:2c0d14e1-d089-4076-bf35-b134737a261d", "claude:30dbad62-c042-41f2-80c1-5953a1100e7f", "claude:4307eff6-cd39-4189-be58-79a3acb69d6c", "claude:4340b160-2fb6-47d0-897c-3cac0a8855d8", "claude:6e0d3d68-872b-4d93-bb09-0691e091314b", "claude:8313f946-f008-4e98-9915-31950380e39e", "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78", "claude:a9483f07-c9dc-4f71-9fa0-831790ea965e", "claude:b1dfbcfa-91f9-4540-823a-26fcfaab7fc8", "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74"], "signal_type": "tool_thrash", "title": "problem: tool thrash"}, "promoted_at": "2026-06-07T09:13:20Z", "source_key": "problem:tool_thrash:tool:Bash"}, "rendering_hints": {"claude": {"note": "TODO: refine rendering", "target": "CLAUDE.md"}}, "resolutions": [{"detail": "", "steps": [], "summary": "TODO: capture the recommended resolution"}], "schema_version": 1, "scope": {"domains": [], "flavors": ["claude"], "repos": ["activity-core", "artifact-store", "citation-evidence", "ihp-railiance-probe", "infospace-bench", "railiance-apps", "state-hub", "vergabe-teilnahme"]}, "status": "superseded", "updated_at": "2026-06-07T09:13:20Z", "version": "1.0.0"}

View File

@@ -0,0 +1,95 @@
{
"created_at": "2026-06-07T09:13:20Z",
"distribution_ready": true,
"id": "sp-problem-tool_thrash-tool-bash",
"name": "Tool thrash: one tool hammered",
"polarity": "problem",
"problem": "A single tool (often Bash or Edit) is invoked far more than any other in a session \u2014 a sign of trial-and-error churn or missing higher-level tooling.",
"provenance": {
"detected_at": null,
"evidence": {
"cost_impact": 1990.0,
"cross_flavor": false,
"flavors": [
"claude"
],
"frequency": 11,
"key": "problem:tool_thrash:tool:Bash",
"locus": "tool:Bash",
"polarity": "problem",
"repos": [
"activity-core",
"artifact-store",
"citation-evidence",
"ihp-railiance-probe",
"infospace-bench",
"railiance-apps",
"state-hub",
"vergabe-teilnahme"
],
"score": 21890.0,
"sessions": [
"claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca",
"claude:2c0d14e1-d089-4076-bf35-b134737a261d",
"claude:30dbad62-c042-41f2-80c1-5953a1100e7f",
"claude:4307eff6-cd39-4189-be58-79a3acb69d6c",
"claude:4340b160-2fb6-47d0-897c-3cac0a8855d8",
"claude:6e0d3d68-872b-4d93-bb09-0691e091314b",
"claude:8313f946-f008-4e98-9915-31950380e39e",
"claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78",
"claude:a9483f07-c9dc-4f71-9fa0-831790ea965e",
"claude:b1dfbcfa-91f9-4540-823a-26fcfaab7fc8",
"claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74"
],
"signal_type": "tool_thrash",
"title": "problem: tool thrash"
},
"promoted_at": "2026-06-07T09:13:20Z",
"source_key": "problem:tool_thrash:tool:Bash"
},
"rendering_hints": {
"claude": {
"target": "CLAUDE.md"
}
},
"resolutions": [
{
"detail": "Compose a single command/script; run independent calls in parallel.",
"steps": [
"Group the steps",
"Run them as one block"
],
"summary": "Batch related shell work into one script, not many small Bash calls"
},
{
"detail": "Read the region, then one substantive Edit beats many tiny ones.",
"steps": [],
"summary": "Make fewer, larger edits with full context"
},
{
"detail": "If the same invocation recurs, wrap it once.",
"steps": [],
"summary": "Factor a repeated command pattern into a helper"
}
],
"schema_version": 1,
"scope": {
"domains": [],
"flavors": [
"claude"
],
"repos": [
"activity-core",
"artifact-store",
"citation-evidence",
"ihp-railiance-probe",
"infospace-bench",
"railiance-apps",
"state-hub",
"vergabe-teilnahme"
]
},
"status": "approved",
"updated_at": "2026-06-07T14:21:06Z",
"version": "1.0.1"
}

View File

@@ -0,0 +1 @@
{"created_at": "2026-06-07T09:13:20Z", "distribution_ready": true, "id": "sp-success-clean_pass-outcome", "name": "cross-flavor success: clean pass", "polarity": "success", "problem": "cross-flavor success: clean pass", "provenance": {"detected_at": null, "evidence": {"cost_impact": 17.0, "cross_flavor": true, "flavors": ["claude", "grok"], "frequency": 17, "key": "success:clean_pass:outcome", "locus": "outcome", "polarity": "success", "repos": ["activity-core", "agentic-resources", "artifact-store", "can-you-assist", "citation-evidence", "infospace-bench", "issue-facade", "ops-bridge", "railiance-apps", "state-hub", "the-custodian", "vergabe-teilnahme"], "score": 433.5, "sessions": ["claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca", "claude:16bdbec4-b018-4902-9fb5-336f8f3d61c8", "claude:2c0d14e1-d089-4076-bf35-b134737a261d", "claude:30dbad62-c042-41f2-80c1-5953a1100e7f", "claude:4307eff6-cd39-4189-be58-79a3acb69d6c", "claude:4340b160-2fb6-47d0-897c-3cac0a8855d8", "claude:631de76e-fdee-43b5-b091-7b7675467ad1", "claude:63fd4df2-5add-4748-af21-c1544825e006", "claude:6e0d3d68-872b-4d93-bb09-0691e091314b", "claude:8313f946-f008-4e98-9915-31950380e39e", "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78", "claude:a9483f07-c9dc-4f71-9fa0-831790ea965e", "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74", "claude:eb837dd1-5b8e-472e-b9e1-4537b10e03e6", "claude:ee9e84f2-bc35-4eb5-a7ad-aaec5f31d965", "claude:f1b25697-0e5f-45f0-81d1-af0f1762c438", "grok:019e6122-00c0-79f3-b4e5-9c70b77c015d"], "signal_type": "clean_pass", "title": "cross-flavor success: clean pass"}, "promoted_at": "2026-06-07T09:13:20Z", "source_key": "success:clean_pass:outcome"}, "rendering_hints": {"claude": {"note": "TODO: refine rendering", "target": "CLAUDE.md"}, "grok": {"note": "TODO: refine rendering", "target": "instructions"}}, "resolutions": [{"detail": "", "steps": [], "summary": "TODO: capture the recommended resolution"}], "schema_version": 1, "scope": {"domains": [], "flavors": ["claude", "grok"], "repos": ["activity-core", "agentic-resources", "artifact-store", "can-you-assist", "citation-evidence", "infospace-bench", "issue-facade", "ops-bridge", "railiance-apps", "state-hub", "the-custodian", "vergabe-teilnahme"]}, "status": "superseded", "updated_at": "2026-06-07T09:13:20Z", "version": "1.0.0"}

View File

@@ -0,0 +1,110 @@
{
"created_at": "2026-06-07T09:13:20Z",
"distribution_ready": true,
"id": "sp-success-clean_pass-outcome",
"name": "Clean pass: tests green, no retries",
"polarity": "success",
"problem": "The target session shape: ends in success, runs the test suite, with no errors and no retries \u2014 resolves cheaply and reliably. Seen across many sessions and both Claude and Grok (the highest-value pattern to reinforce).",
"provenance": {
"detected_at": null,
"evidence": {
"cost_impact": 17.0,
"cross_flavor": true,
"flavors": [
"claude",
"grok"
],
"frequency": 17,
"key": "success:clean_pass:outcome",
"locus": "outcome",
"polarity": "success",
"repos": [
"activity-core",
"agentic-resources",
"artifact-store",
"can-you-assist",
"citation-evidence",
"infospace-bench",
"issue-facade",
"ops-bridge",
"railiance-apps",
"state-hub",
"the-custodian",
"vergabe-teilnahme"
],
"score": 433.5,
"sessions": [
"claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca",
"claude:16bdbec4-b018-4902-9fb5-336f8f3d61c8",
"claude:2c0d14e1-d089-4076-bf35-b134737a261d",
"claude:30dbad62-c042-41f2-80c1-5953a1100e7f",
"claude:4307eff6-cd39-4189-be58-79a3acb69d6c",
"claude:4340b160-2fb6-47d0-897c-3cac0a8855d8",
"claude:631de76e-fdee-43b5-b091-7b7675467ad1",
"claude:63fd4df2-5add-4748-af21-c1544825e006",
"claude:6e0d3d68-872b-4d93-bb09-0691e091314b",
"claude:8313f946-f008-4e98-9915-31950380e39e",
"claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78",
"claude:a9483f07-c9dc-4f71-9fa0-831790ea965e",
"claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74",
"claude:eb837dd1-5b8e-472e-b9e1-4537b10e03e6",
"claude:ee9e84f2-bc35-4eb5-a7ad-aaec5f31d965",
"claude:f1b25697-0e5f-45f0-81d1-af0f1762c438",
"grok:019e6122-00c0-79f3-b4e5-9c70b77c015d"
],
"signal_type": "clean_pass",
"title": "cross-flavor success: clean pass"
},
"promoted_at": "2026-06-07T09:13:20Z",
"source_key": "success:clean_pass:outcome"
},
"rendering_hints": {
"claude": {
"target": "CLAUDE.md"
},
"grok": {
"target": "instructions"
}
},
"resolutions": [
{
"detail": "A passing suite is the cheapest proof the change works.",
"steps": [
"Make the change",
"Run the suite",
"Only then report done"
],
"summary": "Run the test suite before declaring done; let green gate completion"
},
{
"detail": "Small verified steps beat large unverified ones that bounce.",
"steps": [],
"summary": "Work incrementally and verify as you go to avoid retries"
}
],
"schema_version": 1,
"scope": {
"domains": [],
"flavors": [
"claude",
"grok"
],
"repos": [
"activity-core",
"agentic-resources",
"artifact-store",
"can-you-assist",
"citation-evidence",
"infospace-bench",
"issue-facade",
"ops-bridge",
"railiance-apps",
"state-hub",
"the-custodian",
"vergabe-teilnahme"
]
},
"status": "approved",
"updated_at": "2026-06-07T14:21:06Z",
"version": "1.0.1"
}

View File

@@ -20,20 +20,64 @@ root = "~/.claude/projects"
# glob, relative to root; covers sessions and agent-* sidechains
glob = "*/*.jsonl"
# Codex / Grok adapters land in Phase 1 (schemas confirmed in the design doc).
# Codex / Grok adapters added in Phase 1 (AGENTIC-WP-0003).
[sources.codex]
enabled = false
enabled = true
root = "~/.codex/sessions"
glob = "*/*/*/rollout-*.jsonl"
[sources.grok]
enabled = false
enabled = true
root = "~/.grok/sessions"
glob = "*/*/chat_history.jsonl"
# Detect phase (AGENTIC-WP-0005): quality filter — drop non-coding/trivial sessions
# before signals form, so health-checks don't mint false-positive patterns.
[detect.quality]
min_events = 20 # below this many events, not a real coding session
min_substantive = 3 # require >= this many substantive (edit/read/shell) tool calls
min_prompt_len = 25 # first prompt shorter than this is treated as trivial
# Curate phase (AGENTIC-WP-0004): catalog location + promotion evidence bar.
# Measure phase (AGENTIC-WP-0009): persisted baseline/trend of fleet metrics.
[measure]
baselines = "session_memory/measure/baselines.jsonl" # timestamped metric snapshots (committed)
# Weekly retro (AGENTIC-WP-0010): windowed top-3-per-repo report, published to the
# hub as the coding_retro read model that activity-core's weekly schedule consumes.
[retro]
window_days = 7
report_json = "session_memory/retro/last_retro.json" # latest report (committed)
report_md = "session_memory/retro/last_retro.md" # human-readable mirror
hub_url = "http://127.0.0.1:8000" # for --publish (best-effort)
# Distribute phase (AGENTIC-WP-0007): where per-flavor proposals + the active
# registry are written. Proposals are HITL — reviewed, never auto-applied.
[distribute]
proposals_dir = "session_memory/proposals" # reviewable proposals (gitignored, regenerated)
active_registry = "session_memory/distribute/active_patterns.json" # what's proposed/active where (committed)
[curate]
catalog_dir = "session_memory/catalog" # files-first Pattern Catalog (committed)
review_log = "session_memory/.store/reviews.jsonl" # remembered decisions (gitignored)
decision_queue = "session_memory/.store/decisions.queue.jsonl" # hub decisions pending sync
state_hub_workstream_id = "b3703684-f60e-42f3-b03e-dabe3e8ce3f4" # AGENTIC-WP-0004
# Evidence bar (OQ5): floors to promote at all, and stricter floors to be
# distribution-eligible (status=approved, distribution_ready=true).
[curate.gate]
min_frequency = 2 # >= this many supporting signals to promote
min_sessions = 2 # >= this many distinct sessions
min_cost_impact = 0.0
dist_require_cross_flavor = false # require cross-flavor evidence to distribute
dist_min_frequency = 3
dist_min_cost_impact = 0.0
# cwd basename -> domain slug. Used to tag sessions with their Custodian domain.
[repo_domain_map]
agentic-resources = "helix_forge"
the-custodian = "custodian"
state-hub = "custodian"
ops-bridge = "custodian"
net-kingdom = "netkingdom"
can-you-assist = "coulomb_social"

View File

@@ -12,6 +12,8 @@ belongs to the Detect phase (PRD §6.2).
from __future__ import annotations
import collections
import json
import re
from typing import Any
from .schema import Session, SessionEvent
@@ -21,6 +23,22 @@ _FAIL_HINTS = ("error", "failed", "exception", "traceback", "fatal", "non-zero")
# Substrings suggesting a clean test pass.
_PASS_HINTS = ("passed", "0 failed", "ok", "success")
# A line that is numbered source content from a Read result (`cat -n` style),
# e.g. "229\t raise InfospaceError(" — code text, never a runtime error.
_NUMBERED_LINE_RE = re.compile(r"^\s*\d+\t")
# Top-level keys that mark a JSON tool-result as an actual error (vs. success).
_JSON_ERROR_KEYS = ("error", "errors", "detail")
# Normalization patterns so the same error collapses to one fingerprint
# regardless of paths / ids / counts (WP-0006 T01).
_UUID_RE = re.compile(r"\b[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}\b", re.I)
_HEXADDR_RE = re.compile(r"\b0x[0-9a-f]+\b", re.I)
_PATH_RE = re.compile(r"(?:/[\w.\-]+)+/?|[A-Za-z]:\\[\w.\\\-]+")
_NUM_RE = re.compile(r"\b\d+\b")
_WS_RE = re.compile(r"\s+")
_ERR_SAMPLE_MAX = 200
_ERR_FP_MAX = 160
def infer_outcome(events: list[SessionEvent], blobs: dict[str, str] | None = None) -> str:
"""Heuristic outcome label across flavors (design OQ2).
@@ -100,6 +118,7 @@ def build_digest(session: Session, events: list[SessionEvent],
},
"first_prompt": _first_prompt(events, blobs),
"last_assistant": _last_assistant(events, blobs),
"error_snippets": _error_snippets(events, blobs),
"schema_version": session.schema_version,
}
@@ -148,6 +167,114 @@ def _last_assistant(events, blobs):
return None
def _error_line(text: str) -> str:
"""Pick the most error-like line from a body.
Prefers the *last* line matching a fail hint — in a Python traceback the
actual exception is the final line, while the bare ``Traceback (most recent
call last):`` header is just noise and is skipped.
"""
lines = [ln.strip() for ln in text.splitlines() if ln.strip()]
matches = [ln for ln in lines
if any(h in ln.lower() for h in _FAIL_HINTS)
and not ln.lower().startswith("traceback")]
if matches:
return matches[-1]
# fall back to any fail-hint line (e.g. only the traceback header), else first
any_hint = [ln for ln in lines if any(h in ln.lower() for h in _FAIL_HINTS)]
return any_hint[-1] if any_hint else (lines[0] if lines else "")
def _error_fingerprint(text: str) -> str:
"""Stable, content-addressable key for an error, paths/ids/numbers removed."""
s = _error_line(text).lower()
s = _UUID_RE.sub("<uuid>", s)
s = _HEXADDR_RE.sub("<addr>", s)
s = _PATH_RE.sub("<path>", s)
s = _NUM_RE.sub("<n>", s)
return _WS_RE.sub(" ", s).strip()[:_ERR_FP_MAX]
def _error_body(event: SessionEvent, blobs: dict) -> str:
"""Best available text for a failed event."""
if event.payload_ref and event.payload_ref in blobs:
return blobs[event.payload_ref]
return event.summary or ""
def _looks_like_file_read(body: str) -> bool:
"""True if the body is mostly numbered source lines (a Read result), not an error."""
lines = [ln for ln in body.splitlines() if ln.strip()]
if not lines:
return False
numbered = sum(1 for ln in lines if _NUMBERED_LINE_RE.match(ln))
return numbered >= max(3, len(lines) // 2)
def _json_verdict(body: str):
"""Classify a JSON tool-result body: 'error', 'success', or None (not JSON).
Hub MCP successes look like ``{"result": "..."}`` and mention 'error' deep
inside summaries but are not failures ('success'). A payload with a top-level
error key (``{"detail": ...}`` / ``{"error": ...}``) is 'error'. Non-JSON text
returns None so the plain fail-hint heuristic still applies.
"""
s = body.strip()
if not s or s[0] not in "{[":
return None
try:
obj = json.loads(s)
except (ValueError, TypeError):
return None
if isinstance(obj, dict) and any(k in obj for k in _JSON_ERROR_KEYS):
return "error"
return "success"
def _is_failed(event: SessionEvent, blobs: dict) -> bool:
if event.kind == "error":
return True
if event.kind == "tool_result":
body = _error_body(event, blobs)
if not body.strip():
return False
if _looks_like_file_read(body):
return False
verdict = _json_verdict(body)
if verdict is not None:
return verdict == "error"
return any(h in body.lower() for h in _FAIL_HINTS)
return False
def _error_snippets(events: list[SessionEvent], blobs: dict) -> list[dict]:
"""Collapse a session's failures into deduped, normalized error fingerprints.
Durable in Tier 2 (the raw blobs may be evicted): each entry is
``{fingerprint, sample, count, tool}`` with same-fingerprint occurrences
counted. Ordered by frequency (then first appearance) for stable output.
"""
agg: dict[str, dict] = {}
order: list[str] = []
for e in events:
if not _is_failed(e, blobs):
continue
body = _error_body(e, blobs)
if not body.strip():
continue
fp = _error_fingerprint(body)
if not fp:
continue
if fp not in agg:
agg[fp] = {"fingerprint": fp, "sample": _error_line(body)[:_ERR_SAMPLE_MAX],
"count": 0, "tool": e.tool}
order.append(fp)
agg[fp]["count"] += 1
snippets = [agg[fp] for fp in order]
snippets.sort(key=lambda s: (-s["count"], order.index(s["fingerprint"])))
return snippets
def _read_blob(store, ref):
row = store.db.execute("SELECT path FROM blobs WHERE ref=?", (ref,)).fetchone()
if not row:

View File

@@ -11,7 +11,7 @@ import json
from dataclasses import asdict, dataclass, field, fields
from typing import Any, Optional
SCHEMA_VERSION = 1
SCHEMA_VERSION = 2 # v2: digest carries error_snippets (WP-0006 T01)
# Supported agent flavors. ``session_uid`` is always "<flavor>:<native id>".
FLAVORS = ("claude", "codex", "grok")

View File

@@ -12,6 +12,7 @@ Tier 2 digest — the invariant that makes budget-based retention non-lossy.
from __future__ import annotations
import hashlib
import json
import os
import re
@@ -28,6 +29,18 @@ def _now() -> str:
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
def _fingerprint(ev: SessionEvent, body: Optional[str]) -> str:
"""Stable content fingerprint, independent of seq/payload_ref, for dedup."""
h = hashlib.sha1()
parts = [ev.ts or "", ev.kind, ev.role or "", ev.tool or "", ev.summary or "",
ev.role or "", str(ev.is_sidechain)]
h.update("\x1f".join(parts).encode("utf-8"))
if body is not None:
h.update(b"\x1e")
h.update(body.encode("utf-8"))
return h.hexdigest()
class Store:
def __init__(self, db_path: str, blob_dir: str):
self.db_path = db_path
@@ -121,14 +134,75 @@ class Store:
self.db.commit()
return total
def ingest(self, bundle) -> None:
"""Persist a full Normalized bundle (session + events + blobs)."""
def ingest(self, bundle) -> int:
"""Persist a Normalized bundle, merging into any existing session.
Multiple files can map to one ``session_uid`` (Claude resume/sidechains;
Grok multi-file dirs). Events are de-duplicated by content fingerprint and
genuinely-new events are appended with offset ``seq`` (design OQ6 / T03).
Returns the number of new events written. Idempotent: re-ingesting the
same bundle adds nothing.
"""
s = bundle.session
if s.ingested_at is None:
s.ingested_at = _now()
self.upsert_session(s)
self.upsert_events(bundle.events)
self.write_blobs(s.session_uid, bundle.blobs)
existing = self.get_session(s.session_uid)
if existing is None:
if s.ingested_at is None:
s.ingested_at = _now()
self.upsert_session(s)
# known fingerprints + current max seq for this session
seen = self._event_fingerprints(s.session_uid)
next_seq = self._max_seq(s.session_uid) + 1
new_events: list[SessionEvent] = []
new_blobs: dict[str, str] = {}
old_to_new: dict[int, int] = {}
for ev in bundle.events:
body = bundle.blobs.get(ev.payload_ref) if ev.payload_ref else None
fp = _fingerprint(ev, body)
if fp in seen:
continue # already stored (prior file or prior sweep)
new_seq = next_seq
next_seq += 1
old_to_new[ev.seq] = new_seq
# remap parent within this bundle; cross-file parents become None
parent = old_to_new.get(ev.parent_seq) if ev.parent_seq is not None else None
ref = None
if body is not None:
ref = f"blob://{s.session_uid}/{new_seq}"
new_blobs[ref] = body
merged = SessionEvent(
session_uid=s.session_uid, seq=new_seq, parent_seq=parent, ts=ev.ts,
kind=ev.kind, role=ev.role, tool=ev.tool, summary=ev.summary,
payload_ref=ref, tokens=ev.tokens, is_sidechain=ev.is_sidechain,
)
new_events.append(merged)
seen.add(fp)
if new_events:
self.upsert_events(new_events)
self.write_blobs(s.session_uid, new_blobs)
return len(new_events)
def _max_seq(self, session_uid: str) -> int:
row = self.db.execute(
"SELECT COALESCE(MAX(seq), -1) m FROM events WHERE session_uid=?", (session_uid,)
).fetchone()
return int(row["m"])
def _event_fingerprints(self, session_uid: str) -> set[str]:
fps: set[str] = set()
for e in self.get_events(session_uid):
body = None
if e.payload_ref:
r = self.db.execute("SELECT path FROM blobs WHERE ref=?", (e.payload_ref,)).fetchone()
if r:
try:
with open(r["path"], "r", encoding="utf-8") as f:
body = f.read()
except OSError:
body = None
fps.add(_fingerprint(e, body))
return fps
# ---- Tier 2 (digest) ---------------------------------------------------
@@ -149,6 +223,22 @@ class Store:
row = self.db.execute("SELECT json FROM digests WHERE session_uid=?", (session_uid,)).fetchone()
return json.loads(row["json"]) if row else None
def list_digests(self) -> list[dict[str, Any]]:
return [json.loads(r["json"]) for r in self.db.execute("SELECT json FROM digests")]
def save_patterns(self, patterns: list[dict[str, Any]]) -> None:
"""Persist candidate patterns to a Tier 2 table (replace prior run)."""
self.db.execute(
"CREATE TABLE IF NOT EXISTS patterns ("
"key TEXT PRIMARY KEY, json TEXT NOT NULL, detected_at TEXT NOT NULL)"
)
self.db.execute("DELETE FROM patterns")
self.db.executemany(
"INSERT INTO patterns(key, json, detected_at) VALUES(?,?,?)",
[(p["key"], json.dumps(p, sort_keys=True), _now()) for p in patterns],
)
self.db.commit()
# ---- reads -------------------------------------------------------------
def get_session(self, session_uid: str) -> Optional[Session]:

View File

@@ -0,0 +1,9 @@
"""Curate phase (PRD §6.3) — review candidate patterns into versioned Solution
Patterns held in an in-repo Pattern Catalog.
Layout mirrors ``detect/``:
schema.py Solution Pattern artifact + per-flavor rendering hints (T01)
catalog.py versioned, files-first catalog store (T02)
review.py discuss/approve/reject -> promote workflow (T03)
__main__.py `python -m session_memory.curate` entrypoint (T06)
"""

View File

@@ -0,0 +1,130 @@
"""Curate entrypoint (T06): review detect candidates into the Pattern Catalog.
python -m session_memory.curate [--config PATH] [--auto-approve] [--json]
[--workstream-id ID]
Refreshes candidate patterns (runs the detect pipeline), then drives them through
the review workflow — **interactive** by default, or **batch** with
``--auto-approve`` (promote everything clearing the evidence bar, reject the rest)
for kaizen-agent runs. Candidates are presented cross-flavor first (detect's
ranking). Emits a catalog diff summary and, with ``--json``, a machine-readable
result. Approvals land in the files-first catalog; each final decision is logged
as a hub decision (queued if the hub is down).
"""
from __future__ import annotations
import argparse
import json
import os
from ..detect.__main__ import run_detect
from ..ingest import _expand, load_config
from .catalog import Catalog
from .decisions import DecisionRecorder
from .gating import bloat_warnings, evaluate, gate_config
from .review import APPROVE, DISCUSS, REJECT, ReviewLog, review
def _curate_paths(config: dict):
c = config.get("curate", {})
catalog_dir = _expand(c.get("catalog_dir", "session_memory/catalog"))
review_log = _expand(c.get("review_log", "session_memory/.store/reviews.jsonl"))
queue = _expand(c.get("decision_queue", "session_memory/.store/decisions.queue.jsonl"))
ws_id = c.get("state_hub_workstream_id")
return catalog_dir, review_log, queue, ws_id
def _render_candidate(cand: dict, gate, existing) -> str:
g = evaluate(cand, gate)
flag = " [CROSS-FLAVOR]" if cand.get("cross_flavor") else ""
lines = [
f"\n{cand['title']}{flag}",
f" key={cand['key']} score={cand.get('score')} freq={cand['frequency']} "
f"impact={cand.get('cost_impact')}",
f" flavors={','.join(cand.get('flavors', []))} "
f"repos={','.join(cand.get('repos', [])) or '-'} sessions={len(cand.get('sessions', []))}",
f" gate: promotable={g.promotable} distribution_ready={g.distribution_ready}"
+ (f" ({'; '.join(g.reasons)})" if g.reasons else ""),
]
for w in bloat_warnings(cand, existing):
lines.append(f" bloat: {w}")
return "\n".join(lines)
def _interactive_decider(gate, catalog):
def decide(cand):
print(_render_candidate(cand, gate, catalog.list()))
while True:
choice = input(" [a]pprove / [r]eject / [d]iscuss ? ").strip().lower()
if choice in ("a", "approve"):
return (APPROVE, input(" rationale: ").strip() or "approved")
if choice in ("r", "reject"):
return (REJECT, input(" rationale: ").strip() or "rejected")
if choice in ("d", "discuss"):
return (DISCUSS, "deferred for discussion")
return decide
def _auto_decider(gate):
"""Batch policy: approve candidates clearing the promote floor, reject the rest."""
def decide(cand):
g = evaluate(cand, gate)
if g.promotable:
return (APPROVE, "auto-approved: clears evidence bar")
return (REJECT, "auto-rejected: " + "; ".join(g.reasons))
return decide
def _summary(result, n_candidates: int) -> str:
added = [k for k, a in result.approved if a in ("added", "versioned", "updated")]
lines = [
f"# Curate summary ({n_candidates} candidates reviewed)",
f" approved : {len(result.approved)} ({', '.join(f'{k}:{a}' for k, a in result.approved) or '-'})",
f" rejected : {len(result.rejected)} ({', '.join(result.rejected) or '-'})",
f" deferred : {len(result.deferred)} ({', '.join(result.deferred) or '-'})",
f" skipped : {len(result.skipped)} (already decided)",
f" catalog writes: {len(added)}",
]
return "\n".join(lines)
def main(argv=None) -> int:
here = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
ap = argparse.ArgumentParser(description="Curate detect candidates into the Pattern Catalog.")
ap.add_argument("--config", default=os.path.join(here, "config.toml"))
ap.add_argument("--auto-approve", action="store_true",
help="batch mode: promote everything clearing the evidence bar")
ap.add_argument("--min-frequency", type=int, default=2)
ap.add_argument("--workstream-id", default=None, help="hub workstream for decisions")
ap.add_argument("--json", action="store_true", help="emit machine-readable JSON")
args = ap.parse_args(argv)
config = load_config(args.config)
candidates = run_detect(config, min_frequency=args.min_frequency)
catalog_dir, review_log_path, queue_path, ws_id = _curate_paths(config)
gate = gate_config(config)
catalog = Catalog(catalog_dir)
log = ReviewLog(review_log_path)
recorder = DecisionRecorder(queue_path, workstream_id=args.workstream_id or ws_id)
decide = _auto_decider(gate) if args.auto_approve else _interactive_decider(gate, catalog)
result = review(candidates, decide, catalog, log, gate=gate, recorder=recorder)
if args.json:
print(json.dumps({
"approved": result.approved, "rejected": result.rejected,
"deferred": result.deferred, "skipped": result.skipped,
"decisions_queued": len(recorder.pending()),
}, indent=2))
else:
print(_summary(result, len(candidates)))
if recorder.pending():
print(f" decisions queued (hub offline): {len(recorder.pending())} "
f"-> {queue_path}")
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,148 @@
"""Versioned Pattern Catalog — files-first source of truth (FR-U3; T02).
The catalog is a directory of one JSON file per Solution Pattern
(``<catalog_dir>/<pattern-id>.json``). Files originate the work; the State Hub
indexes them (ADR-001 / PRD §9). Identity is the pattern ``id`` (derived from the
source candidate key), so re-promoting the same detect candidate maps to the same
file — dedup is structural, not heuristic.
:meth:`Catalog.upsert` is the one write path and is **idempotent**:
* new id -> written as-is (``added``)
* same id, identical content -> no write, no version bump (``unchanged``)
* same id, only status/flags -> updated in place, no bump (``updated``)
* same id, content changed -> version bumped, prior snapshot
appended to ``<id>.history.jsonl`` (``versioned``)
History is append-only alongside the current file, so the catalog dir stays one
clean current file per pattern while every superseded version is recoverable.
"""
from __future__ import annotations
import json
import os
from datetime import datetime, timezone
from typing import Optional
from .schema import SolutionPattern
# Content fields that define a pattern's substance. Version, timestamps, status,
# and distribution_ready are metadata — changes to them never bump the version.
_CONTENT_KEYS = ("name", "polarity", "problem", "resolutions", "scope",
"provenance", "rendering_hints", "covers")
ADDED = "added"
UNCHANGED = "unchanged"
UPDATED = "updated"
VERSIONED = "versioned"
def _now() -> str:
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
def _content(p: SolutionPattern) -> str:
d = p.to_dict()
return json.dumps({k: d[k] for k in _CONTENT_KEYS}, sort_keys=True)
class Catalog:
"""File-backed catalog of versioned :class:`SolutionPattern` artifacts."""
def __init__(self, catalog_dir: str) -> None:
self.dir = catalog_dir
os.makedirs(self.dir, exist_ok=True)
# --- paths --------------------------------------------------------------
def _path(self, pattern_id: str) -> str:
return os.path.join(self.dir, f"{pattern_id}.json")
def _history_path(self, pattern_id: str) -> str:
return os.path.join(self.dir, f"{pattern_id}.history.jsonl")
# --- reads --------------------------------------------------------------
def load(self, pattern_id: str) -> Optional[SolutionPattern]:
path = self._path(pattern_id)
if not os.path.exists(path):
return None
with open(path, encoding="utf-8") as fh:
return SolutionPattern.from_json(fh.read())
def list(self) -> list[SolutionPattern]:
out: list[SolutionPattern] = []
for name in sorted(os.listdir(self.dir)):
if name.endswith(".json") and not name.endswith(".history.jsonl"):
with open(os.path.join(self.dir, name), encoding="utf-8") as fh:
out.append(SolutionPattern.from_json(fh.read()))
return out
def history(self, pattern_id: str) -> list[dict]:
path = self._history_path(pattern_id)
if not os.path.exists(path):
return []
with open(path, encoding="utf-8") as fh:
return [json.loads(line) for line in fh if line.strip()]
def find_for(self, signal_key: str, locus: str = "") -> Optional[SolutionPattern]:
"""Best catalog pattern for a detect signal: exact id first, then ``covers``.
Lets a signal that doesn't share a pattern's exact key (e.g. a
``recurring_error`` fingerprint) inherit the curated recommendation when a
pattern declares it covers that text.
"""
exact = self.load(SolutionPattern.make_id(signal_key))
if exact is not None:
return exact
hay = f"{signal_key} {locus}".lower()
for p in self.list(): # sorted by id -> deterministic
if any(c.lower() in hay for c in p.covers):
return p
return None
# --- the single write path ---------------------------------------------
def upsert(self, pattern: SolutionPattern) -> str:
"""Insert or version-update a pattern. Returns the action taken."""
existing = self.load(pattern.id)
now = _now()
if existing is None:
pattern.created_at = pattern.created_at or now
pattern.updated_at = now
self._write(pattern)
return ADDED
if _content(existing) == _content(pattern):
# substance unchanged — only persist a metadata (status/flag) change
if (existing.status == pattern.status
and existing.distribution_ready == pattern.distribution_ready):
return UNCHANGED
existing.status = pattern.status
existing.distribution_ready = pattern.distribution_ready
existing.updated_at = now
self._write(existing)
return UPDATED
# substance changed: archive the old version, bump, write the new one
self._append_history(existing)
pattern.version = SolutionPattern.bump_version(existing.version)
pattern.created_at = existing.created_at or now
pattern.updated_at = now
self._write(pattern)
return VERSIONED
# --- internals ----------------------------------------------------------
def _write(self, pattern: SolutionPattern) -> None:
with open(self._path(pattern.id), "w", encoding="utf-8") as fh:
fh.write(pattern.to_json())
fh.write("\n")
def _append_history(self, superseded: SolutionPattern) -> None:
superseded.status = "superseded"
with open(self._history_path(superseded.id), "a", encoding="utf-8") as fh:
fh.write(json.dumps(superseded.to_dict(), sort_keys=True))
fh.write("\n")

View File

@@ -0,0 +1,114 @@
"""State Hub decision integration (FR-U4; T05).
Every final promote/reject is recorded as an auditable decision so the rationale,
the source candidate key, and an evidence snapshot are traceable. The catalog
file remains the durable artifact (ADR-001); the decision is the audit trail.
The recorder is **graceful under a hub outage** — exactly the condition hit during
Phase 1, where statuses were synced after the fact. A pluggable ``sink`` does the
actual write (HTTP to the hub, or the MCP ``record_decision`` tool driven by the
operator). If the sink is absent or raises, the decision is appended to a local
queue (``decisions.queue.jsonl``) and can be replayed later with :meth:`flush`.
"""
from __future__ import annotations
import json
import os
from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Callable, Optional
# A sink takes a hub-shaped decision payload and persists it (may raise on failure).
Sink = Callable[[dict], None]
def _now() -> str:
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
def build_decision(candidate: dict, action: str, rationale: str,
*, workstream_id: Optional[str] = None,
decided_by: str = "curator") -> dict:
"""Shape a curate decision as a State Hub ``record_decision`` payload."""
key = candidate["key"]
verb = "Promote" if action == "approve" else "Reject"
return {
"title": f"{verb} pattern candidate {key}",
"decision_type": "made",
"workstream_id": workstream_id,
"rationale": rationale,
"decided_by": decided_by,
"description": json.dumps({
"action": action,
"source_key": key,
"evidence": candidate,
}, sort_keys=True),
"recorded_at": _now(),
}
@dataclass
class DecisionRecorder:
"""Records decisions through ``sink`` with a durable local-queue fallback."""
queue_path: str
sink: Optional[Sink] = None
workstream_id: Optional[str] = None
decided_by: str = "curator"
_queued: int = field(default=0, init=False)
def record(self, candidate: dict, action: str, rationale: str) -> bool:
"""Record one decision. Returns True if the sink accepted it, else queued."""
payload = build_decision(candidate, action, rationale,
workstream_id=self.workstream_id, decided_by=self.decided_by)
if self.sink is not None:
try:
self.sink(payload)
return True
except Exception: # hub down / transient — fall through to the queue
pass
self._append(payload)
return False
def pending(self) -> list[dict]:
if not os.path.exists(self.queue_path):
return []
with open(self.queue_path, encoding="utf-8") as fh:
return [json.loads(line) for line in fh if line.strip()]
def flush(self, sink: Optional[Sink] = None) -> int:
"""Replay queued decisions through ``sink``. Returns count synced.
Stops at the first failure so ordering is preserved; the unsynced tail is
rewritten back to the queue.
"""
sink = sink or self.sink
if sink is None:
return 0
items = self.pending()
synced = 0
for i, payload in enumerate(items):
try:
sink(payload)
synced += 1
except Exception:
self._rewrite(items[i:])
return synced
self._rewrite([])
return synced
# --- internals ----------------------------------------------------------
def _append(self, payload: dict) -> None:
os.makedirs(os.path.dirname(self.queue_path) or ".", exist_ok=True)
with open(self.queue_path, "a", encoding="utf-8") as fh:
fh.write(json.dumps(payload, sort_keys=True))
fh.write("\n")
self._queued += 1
def _rewrite(self, items: list[dict]) -> None:
with open(self.queue_path, "w", encoding="utf-8") as fh:
for payload in items:
fh.write(json.dumps(payload, sort_keys=True))
fh.write("\n")

View File

@@ -0,0 +1,117 @@
"""Promotion evidence-bar + bloat guard (design OQ5/OQ6; T04).
Two gates protect the catalog:
* **Evidence bar (OQ5)** — a candidate must clear configurable floors
(frequency, distinct supporting sessions) before it may be promoted at all.
A separate, stricter bar decides whether the promoted pattern is
*distribution-eligible* (``status="approved"``, ``distribution_ready=True``)
vs. merely ``provisional`` — the minimum trustworthy evidence before a pattern
is allowed near live agent environments.
* **Bloat guard (OQ6)** — flags candidates that would add little: a duplicate of
an already-cataloged pattern, or a near-duplicate sharing the same
signal-type+locus. Keeps the catalog lean so agent context budgets aren't
degraded by low-value instructions.
Knobs live under ``[curate]`` in ``config.toml``; :func:`gate_config` reads them
with safe defaults so the module also works config-free (tests).
"""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Optional
from .schema import SolutionPattern
@dataclass
class GateConfig:
# promotion floor (OQ5)
min_frequency: int = 2
min_sessions: int = 2
min_cost_impact: float = 0.0
# distribution-eligibility floor (stricter; OQ5)
dist_require_cross_flavor: bool = False
dist_min_frequency: int = 3
dist_min_cost_impact: float = 0.0
def gate_config(config: Optional[dict] = None) -> GateConfig:
c = (config or {}).get("curate", {}) if config else {}
g = c.get("gate", {}) if isinstance(c, dict) else {}
return GateConfig(
min_frequency=g.get("min_frequency", 2),
min_sessions=g.get("min_sessions", 2),
min_cost_impact=g.get("min_cost_impact", 0.0),
dist_require_cross_flavor=g.get("dist_require_cross_flavor", False),
dist_min_frequency=g.get("dist_min_frequency", 3),
dist_min_cost_impact=g.get("dist_min_cost_impact", 0.0),
)
@dataclass
class GateResult:
promotable: bool
distribution_ready: bool
status: str # "approved" if distribution-ready else "provisional"
reasons: list = field(default_factory=list)
def _n_sessions(candidate: dict) -> int:
return len(candidate.get("sessions", []) or [])
def evaluate(candidate: dict, config: Optional[GateConfig] = None) -> GateResult:
"""Decide whether a candidate may be promoted, and at what trust level."""
cfg = config or GateConfig()
reasons: list[str] = []
freq = candidate.get("frequency", 0)
sessions = _n_sessions(candidate)
impact = candidate.get("cost_impact", 0.0)
promotable = True
if freq < cfg.min_frequency:
promotable = False
reasons.append(f"frequency {freq} < min {cfg.min_frequency}")
if sessions < cfg.min_sessions:
promotable = False
reasons.append(f"sessions {sessions} < min {cfg.min_sessions}")
if impact < cfg.min_cost_impact:
promotable = False
reasons.append(f"cost_impact {impact} < min {cfg.min_cost_impact}")
dist = promotable
if cfg.dist_require_cross_flavor and not candidate.get("cross_flavor", False):
dist = False
reasons.append("not cross-flavor (required for distribution)")
if freq < cfg.dist_min_frequency:
dist = False
reasons.append(f"frequency {freq} < distribution min {cfg.dist_min_frequency}")
if impact < cfg.dist_min_cost_impact:
dist = False
reasons.append(f"cost_impact {impact} < distribution min {cfg.dist_min_cost_impact}")
return GateResult(
promotable=promotable,
distribution_ready=bool(dist),
status="approved" if dist else "provisional",
reasons=reasons,
)
def bloat_warnings(candidate: dict, existing: list[SolutionPattern]) -> list[str]:
"""Flag low-value adds against what is already catalogued (OQ6)."""
warnings: list[str] = []
cand_id = SolutionPattern.make_id(candidate["key"])
_, sig_type, locus = (candidate["key"].split(":", 2) + ["", ""])[:3]
for p in existing:
if p.id == cand_id:
warnings.append(f"duplicate of catalogued pattern {p.id}")
continue
p_parts = (p.provenance.source_key.split(":", 2) + ["", ""])[:3]
if (p_parts[1], p_parts[2]) == (sig_type, locus):
warnings.append(f"near-duplicate of {p.id} (same {sig_type}/{locus})")
return warnings

View File

@@ -0,0 +1,158 @@
"""Curation review workflow (FR-U1/FR-U2; T03).
Drives Phase 1 detect candidates through a **discuss / approve / reject** review
and, on approve, promotes the candidate into a :class:`SolutionPattern` written to
the :class:`Catalog`. The actual decision is supplied by a ``decide`` callback so
this engine stays UI-free — the ``__main__`` entrypoint (T06) plugs in interactive
or batch (auto-approve) logic.
Re-review is **idempotent** via a :class:`ReviewLog`: a candidate already decided
is skipped unless its *evidence fingerprint* changed (new sessions/frequency), so
a prior **reject** is remembered and not re-surfaced, and a prior **approve** is
updated in place rather than duplicated (catalog dedup does the rest).
"""
from __future__ import annotations
import hashlib
import json
import os
from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Callable, Optional
from .catalog import Catalog
from .decisions import DecisionRecorder
from .gating import GateConfig, evaluate
from .schema import Provenance, Resolution, Scope, SolutionPattern
APPROVE = "approve"
REJECT = "reject"
DISCUSS = "discuss" # defer — no final decision recorded
# Default per-flavor rendering-hint stubs a reviewer can later refine (OQ4).
_DEFAULT_TARGET = {"claude": "CLAUDE.md", "codex": "AGENTS.md", "grok": "instructions"}
# A decision callback: (candidate dict) -> (action, rationale)
Decider = Callable[[dict], tuple]
def _now() -> str:
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
def evidence_fingerprint(candidate: dict) -> str:
"""Stable hash of the evidence that would justify (re)reviewing a candidate."""
keys = ("frequency", "cost_impact", "flavors", "repos", "sessions", "cross_flavor")
payload = {k: candidate.get(k) for k in keys}
return hashlib.sha1(json.dumps(payload, sort_keys=True).encode("utf-8")).hexdigest()
def candidate_to_pattern(candidate: dict, *, status: str = "provisional",
distribution_ready: bool = False) -> SolutionPattern:
"""Build a Solution Pattern from a detect candidate.
``status``/``distribution_ready`` come from the evidence gate (T04); they
default to a provisional, non-distribution-ready pattern when ungated.
"""
src = candidate["key"]
flavors = list(candidate.get("flavors", []))
hints = {f: {"target": _DEFAULT_TARGET.get(f, ""), "note": "TODO: refine rendering"}
for f in flavors}
return SolutionPattern(
id=SolutionPattern.make_id(src),
name=candidate.get("title") or src,
version="1.0.0",
polarity=candidate.get("polarity", "problem"),
problem=candidate.get("title") or src,
resolutions=[Resolution(summary="TODO: capture the recommended resolution")],
scope=Scope(flavors=flavors, repos=list(candidate.get("repos", []))),
provenance=Provenance(source_key=src, evidence=dict(candidate), promoted_at=_now()),
rendering_hints=hints,
status=status,
distribution_ready=distribution_ready,
)
@dataclass
class ReviewLog:
"""Append-only record of final decisions, keyed by candidate source key."""
path: str
_by_key: dict = field(default_factory=dict)
def __post_init__(self) -> None:
if os.path.exists(self.path):
with open(self.path, encoding="utf-8") as fh:
for line in fh:
if line.strip():
rec = json.loads(line)
self._by_key[rec["source_key"]] = rec # last write wins
def prior(self, source_key: str) -> Optional[dict]:
return self._by_key.get(source_key)
def already_decided(self, candidate: dict) -> bool:
rec = self._by_key.get(candidate["key"])
return bool(rec) and rec["fingerprint"] == evidence_fingerprint(candidate)
def record(self, candidate: dict, action: str, rationale: str) -> None:
rec = {
"source_key": candidate["key"],
"action": action,
"rationale": rationale,
"fingerprint": evidence_fingerprint(candidate),
"ts": _now(),
}
self._by_key[candidate["key"]] = rec
os.makedirs(os.path.dirname(self.path) or ".", exist_ok=True)
with open(self.path, "a", encoding="utf-8") as fh:
fh.write(json.dumps(rec, sort_keys=True))
fh.write("\n")
@dataclass
class ReviewResult:
approved: list = field(default_factory=list) # (source_key, catalog_action)
rejected: list = field(default_factory=list) # source_key
deferred: list = field(default_factory=list) # source_key (discuss)
skipped: list = field(default_factory=list) # source_key (already decided)
def review(candidates: list[dict], decide: Decider, catalog: Catalog,
log: ReviewLog, gate: Optional[GateConfig] = None,
recorder: Optional[DecisionRecorder] = None) -> ReviewResult:
"""Run each candidate through ``decide``; promote approvals into ``catalog``.
When a ``gate`` (T04 evidence bar) is supplied, the promoted pattern's
``status``/``distribution_ready`` are set from the gate evaluation, so an
approved-but-thin candidate lands as ``provisional`` rather than
distribution-ready. When a ``recorder`` (T05) is supplied, each final
promote/reject is logged as an auditable hub decision (queued if the hub is
down).
"""
result = ReviewResult()
for cand in candidates:
key = cand["key"]
if log.already_decided(cand):
result.skipped.append(key)
continue
action, rationale = decide(cand)
if action == DISCUSS:
result.deferred.append(key)
continue # not a final decision — leave for a later pass
if action == APPROVE:
g = evaluate(cand, gate) if gate is not None else None
pattern = (candidate_to_pattern(cand, status=g.status,
distribution_ready=g.distribution_ready)
if g is not None else candidate_to_pattern(cand))
cat_action = catalog.upsert(pattern)
result.approved.append((key, cat_action))
elif action == REJECT:
result.rejected.append(key)
else:
raise ValueError(f"unknown review action {action!r}")
log.record(cand, action, rationale)
if recorder is not None:
recorder.record(cand, action, rationale)
return result

View File

@@ -0,0 +1,160 @@
"""Solution Pattern schema (PRD §6.3 FR-U2; design OQ4) — T01.
A **Solution Pattern** is the curated, reviewed artifact a candidate pattern is
promoted into: a named, versioned record pairing a problem (or success) with one
or more recommended resolutions, written **flavor-agnostically**. Everything a
distributor needs to render a native artifact lives in a *separate*
``rendering_hints`` sub-structure, keyed by flavor — so the core stays neutral
(FR-A1/FR-A2) while Phase 3 distributors still get enough to render well (OQ4).
The artifact is the durable unit of the Pattern Catalog (T02): files originate,
the State Hub indexes (ADR-001). Serialization is deterministic (sorted keys) so
catalog files diff cleanly and re-saving an unchanged pattern is a no-op.
"""
from __future__ import annotations
import json
import re
from dataclasses import asdict, dataclass, field, fields
from typing import Any, Optional
from ..core.schema import FLAVORS
SCHEMA_VERSION = 1
# Lifecycle of a catalogued pattern.
# provisional — promoted but below the distribution evidence bar (OQ5)
# approved — meets the bar; distribution-eligible (Phase 3)
# rejected — reviewed and declined; remembered so it is not re-surfaced
# superseded — replaced by a newer version of the same pattern id
STATUSES = ("provisional", "approved", "rejected", "superseded")
POLARITIES = ("problem", "success")
@dataclass
class Resolution:
"""One recommended resolution for the pattern's problem (FR-U2)."""
summary: str
detail: str = ""
steps: list[str] = field(default_factory=list)
@dataclass
class Scope:
"""Where the pattern applies (FR-X2 input). Empty list == unrestricted."""
repos: list[str] = field(default_factory=list)
domains: list[str] = field(default_factory=list)
flavors: list[str] = field(default_factory=list)
def __post_init__(self) -> None:
bad = [f for f in self.flavors if f not in FLAVORS]
if bad:
raise ValueError(f"unknown flavor(s) in scope {bad!r}; expected {FLAVORS}")
@dataclass
class Provenance:
"""Trace back to the detect candidate this pattern was promoted from."""
source_key: str # the detect Pattern.key — stable cluster identity
evidence: dict[str, Any] = field(default_factory=dict) # snapshot of the candidate
detected_at: Optional[str] = None
promoted_at: Optional[str] = None
@dataclass
class SolutionPattern:
"""A curated, versioned solution pattern (PRD §5 / §6.3)."""
id: str # stable, derived from provenance.source_key
name: str
version: str # semantic, e.g. "1.0.0"
polarity: str # problem | success
problem: str # human-readable description of the recurring situation
resolutions: list[Resolution] = field(default_factory=list)
scope: Scope = field(default_factory=Scope)
provenance: Provenance = field(default_factory=lambda: Provenance(source_key=""))
# per-flavor rendering hints, kept OUT of the agnostic core (OQ4):
# {"claude": {...}, "codex": {...}, "grok": {...}}
rendering_hints: dict[str, dict[str, Any]] = field(default_factory=dict)
# other signal keys/loci this pattern's recommendation also applies to —
# lowercase substrings matched against a candidate signal's key+locus, so a
# detect signal that doesn't share this pattern's exact key (e.g. a
# recurring_error fingerprint) can still inherit the curated resolution.
covers: list[str] = field(default_factory=list)
status: str = "provisional"
distribution_ready: bool = False
created_at: Optional[str] = None
updated_at: Optional[str] = None
schema_version: int = SCHEMA_VERSION
def __post_init__(self) -> None:
if self.polarity not in POLARITIES:
raise ValueError(f"unknown polarity {self.polarity!r}; expected {POLARITIES}")
if self.status not in STATUSES:
raise ValueError(f"unknown status {self.status!r}; expected {STATUSES}")
bad = [f for f in self.rendering_hints if f not in FLAVORS]
if bad:
raise ValueError(f"unknown flavor(s) in rendering_hints {bad!r}; expected {FLAVORS}")
# --- identity / versioning helpers -------------------------------------
@staticmethod
def make_id(source_key: str) -> str:
"""Stable catalog id from a detect candidate key (``polarity:type:locus``).
Identity is the source key, so re-promoting the same candidate maps to the
same pattern (dedup in T02), independent of wording or version.
"""
slug = re.sub(r"[^a-z0-9_]+", "-", source_key.lower()).strip("-")
return f"sp-{slug}"
@staticmethod
def bump_version(version: str, level: str = "patch") -> str:
"""Increment a ``major.minor.patch`` version string."""
parts = (version.split(".") + ["0", "0", "0"])[:3]
major, minor, patch = (int(p) for p in parts)
if level == "major":
major, minor, patch = major + 1, 0, 0
elif level == "minor":
minor, patch = minor + 1, 0
else:
patch += 1
return f"{major}.{minor}.{patch}"
# --- serialization ------------------------------------------------------
def to_dict(self) -> dict[str, Any]:
return asdict(self)
def to_json(self) -> str:
return json.dumps(self.to_dict(), sort_keys=True, indent=2)
@classmethod
def from_dict(cls, d: dict[str, Any]) -> "SolutionPattern":
d = dict(d)
resolutions = [Resolution(**{k: v for k, v in r.items() if k in _RESOLUTION_FIELDS})
for r in d.pop("resolutions", [])]
scope = d.pop("scope", None)
prov = d.pop("provenance", None)
obj = cls(**{k: v for k, v in d.items() if k in _PATTERN_FIELDS})
obj.resolutions = resolutions
if scope is not None:
obj.scope = Scope(**{k: v for k, v in scope.items() if k in _SCOPE_FIELDS})
if prov is not None:
obj.provenance = Provenance(**{k: v for k, v in prov.items() if k in _PROV_FIELDS})
return obj
@classmethod
def from_json(cls, s: str) -> "SolutionPattern":
return cls.from_dict(json.loads(s))
_PATTERN_FIELDS = {f.name for f in fields(SolutionPattern)}
_RESOLUTION_FIELDS = {f.name for f in fields(Resolution)}
_SCOPE_FIELDS = {f.name for f in fields(Scope)}
_PROV_FIELDS = {f.name for f in fields(Provenance)}

View File

@@ -0,0 +1 @@
"""Detect: extract signals from sessions, cluster into candidate patterns."""

View File

@@ -0,0 +1,72 @@
"""Detect entrypoint (T07): digests -> signals -> clusters -> report.
python -m session_memory.detect [--config PATH] [--json] [--min-frequency N]
Reads Tier 2 digests from the store, extracts signals, clusters them into
candidate patterns, persists the candidates, and prints a ranked report
(cross-flavor first) — the input to the Curate phase (Phase 2).
"""
from __future__ import annotations
import argparse
import json
import os
from ..core.store import Store
from ..ingest import _expand, load_config
from .cluster import cluster
from .quality import filter_real, quality_config
from .signals import extract_signals
def run_detect(config: dict, *, min_frequency: int = 2) -> list[dict]:
store_cfg = config.get("store", {})
store = Store(_expand(store_cfg["db_path"]), _expand(store_cfg["blob_dir"]))
digests = filter_real(store.list_digests(), quality_config(config))
signals = extract_signals(digests)
patterns = [p.to_dict() for p in cluster(signals, min_frequency=min_frequency)]
store.save_patterns(patterns)
store.close()
return patterns
def _format_report(patterns: list[dict], n_digests: int) -> str:
lines = [f"# Candidate Patterns ({len(patterns)} from {n_digests} sessions)", ""]
if not patterns:
lines.append("No recurring patterns above the frequency threshold yet.")
return "\n".join(lines)
for i, p in enumerate(patterns, 1):
flag = " [CROSS-FLAVOR]" if p["cross_flavor"] else ""
lines.append(f"{i}. {p['title']}{flag}")
lines.append(f" score={p['score']} freq={p['frequency']} "
f"impact={p['cost_impact']} flavors={','.join(p['flavors'])}")
lines.append(f" repos={','.join(p['repos']) or '-'} "
f"sessions={len(p['sessions'])}")
lines.append("")
return "\n".join(lines)
def main(argv=None) -> int:
here = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
ap = argparse.ArgumentParser(description="Detect candidate patterns from session digests.")
ap.add_argument("--config", default=os.path.join(here, "config.toml"))
ap.add_argument("--min-frequency", type=int, default=2)
ap.add_argument("--json", action="store_true", help="emit machine-readable JSON")
args = ap.parse_args(argv)
config = load_config(args.config)
store_cfg = config.get("store", {})
all_digests = Store(_expand(store_cfg["db_path"]), _expand(store_cfg["blob_dir"])).list_digests()
n = len(filter_real(all_digests, quality_config(config)))
patterns = run_detect(config, min_frequency=args.min_frequency)
if args.json:
print(json.dumps(patterns, indent=2))
else:
print(_format_report(patterns, n))
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,78 @@
"""Pattern clusterer + evidence (PRD §5, §6.2; T05/T06).
Groups recurring :class:`Signal`s into candidate ``Pattern`` records. Clustering
is deterministic and keyed on ``(polarity, signal-type, locus)`` — enough to
surface "the same thing keeps happening" without embeddings (a later option).
Each candidate carries evidence (FR-D3): supporting sessions, frequency, affected
repos, affected **flavors**, and an estimated cost-impact score. Candidates whose
evidence spans more than one flavor are flagged ``cross_flavor`` (FR-D4) — the
highest-value reuse targets.
"""
from __future__ import annotations
import collections
from dataclasses import asdict, dataclass, field
from typing import Any
from .signals import PROBLEM, Signal
@dataclass
class Pattern:
key: str # stable cluster key
polarity: str # problem | success
signal_type: str
locus: str
frequency: int # number of supporting signals
sessions: list[str] = field(default_factory=list)
repos: list[str] = field(default_factory=list)
flavors: list[str] = field(default_factory=list)
cross_flavor: bool = False
cost_impact: float = 0.0 # frequency-weighted magnitude
score: float = 0.0 # ranking score (impact x frequency)
title: str = ""
def to_dict(self) -> dict[str, Any]:
return asdict(self)
def _key(s: Signal) -> str:
return f"{s.polarity}:{s.type}:{s.locus}"
def _title(polarity: str, signal_type: str, n_flavors: int) -> str:
scope = "cross-flavor " if n_flavors > 1 else ""
verb = "problem" if polarity == PROBLEM else "success"
return f"{scope}{verb}: {signal_type.replace('_', ' ')}"
def cluster(signals: list[Signal], *, min_frequency: int = 2) -> list[Pattern]:
"""Group signals into candidate patterns; keep clusters >= min_frequency."""
groups: dict[str, list[Signal]] = collections.defaultdict(list)
for s in signals:
groups[_key(s)].append(s)
patterns: list[Pattern] = []
for key, members in groups.items():
if len(members) < min_frequency:
continue
sessions = sorted({m.session_uid for m in members})
repos = sorted({m.repo for m in members if m.repo})
flavors = sorted({m.flavor for m in members})
cost_impact = sum(m.magnitude for m in members)
first = members[0]
p = Pattern(
key=key, polarity=first.polarity, signal_type=first.type, locus=first.locus,
frequency=len(members), sessions=sessions, repos=repos, flavors=flavors,
cross_flavor=len(flavors) > 1, cost_impact=round(cost_impact, 3),
title=_title(first.polarity, first.type, len(flavors)),
)
# rank: impact x frequency, with a boost for cross-flavor reuse value
p.score = round(p.cost_impact * p.frequency * (1.5 if p.cross_flavor else 1.0), 3)
patterns.append(p)
# cross-flavor first, then by score
patterns.sort(key=lambda p: (not p.cross_flavor, -p.score))
return patterns

View File

@@ -0,0 +1,75 @@
"""Session-quality filter (T01).
The capture layer ingests *every* session it finds — including API health-checks,
smoke-tests, and interrupted runs (e.g. ``llm-connect`` firing "Say hello in one
word", or a transcript that is just ``[Request interrupted by user]``). These are
not real coding work, but the outcome heuristic labels the short ones ``abandoned``
and the clusterer then mints false-positive "problem" patterns from them.
:func:`is_real_coding_session` gates those out so Detect signals/clusters form only
over genuine coding sessions. It is intentionally conservative — a session counts
as real if it shows substantive activity, and is dropped only on clear trivial
markers. Thresholds come from ``[detect.quality]`` in ``config.toml``.
"""
from __future__ import annotations
from dataclasses import dataclass
from typing import Optional
# Prompt prefixes/markers that indicate a non-coding or interrupted session.
_TRIVIAL_PROMPTS = (
"say hello", "hello", "[request interrupted", "return only this json",
"ping", "ok", "<system-reminder>",
)
# Tool buckets that count as "substantive" coding activity.
_SUBSTANTIVE_TOOLS = (
"Edit", "Write", "Read", "Bash", "search_replace", "write", "read_file",
"run_terminal_command", "grep", "Grep", "glob", "Glob", "NotebookEdit",
)
@dataclass
class QualityConfig:
min_events: int = 20 # below this, not a real coding session
min_substantive: int = 3 # >= this many substantive tool calls required
min_prompt_len: int = 25 # first prompt shorter than this is suspect
def quality_config(config: Optional[dict] = None) -> QualityConfig:
d = (config or {}).get("detect", {}).get("quality", {}) if config else {}
return QualityConfig(
min_events=d.get("min_events", 20),
min_substantive=d.get("min_substantive", 3),
min_prompt_len=d.get("min_prompt_len", 25),
)
def _substantive_calls(digest: dict) -> int:
hist = digest.get("tool_histogram") or {}
return sum(n for t, n in hist.items() if t in _SUBSTANTIVE_TOOLS)
def is_real_coding_session(digest: dict, config: Optional[QualityConfig] = None) -> bool:
cfg = config or QualityConfig()
if not digest.get("repo"):
return False
if digest.get("event_count", 0) < cfg.min_events:
return False
if _substantive_calls(digest) < cfg.min_substantive:
return False
prompt = (digest.get("first_prompt") or "").strip().lower()
if len(prompt) < cfg.min_prompt_len:
return False
if any(prompt.startswith(p) for p in _TRIVIAL_PROMPTS):
return False
return True
def filter_real(digests: list[dict], config: Optional[QualityConfig] = None) -> list[dict]:
cfg = config or QualityConfig()
return [d for d in digests if is_real_coding_session(d, cfg)]

View File

@@ -0,0 +1,205 @@
"""Signal extractors (PRD §6.2; T04).
Pure functions over a session digest (Tier 2) — the compact, durable view. Each
extractor emits zero or more :class:`Signal`s. A signal records its source
session, a *locus* (what it's about), a *polarity* (problem vs. success), and a
*magnitude*. Signals are the atoms the clusterer groups into candidate patterns.
No new capture happens here; everything is derived from digests already written
by the Capture layer, so detection is cheap and re-runnable.
"""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Any, Callable, Optional
# polarity
PROBLEM = "problem"
SUCCESS = "success"
@dataclass
class Signal:
session_uid: str
flavor: str
repo: Optional[str]
type: str # e.g. "budget_overrun", "clean_pass"
polarity: str # PROBLEM | SUCCESS
locus: str # normalized subject key (tool, marker, ...)
magnitude: float = 1.0 # strength / cost weight
detail: dict[str, Any] = field(default_factory=dict)
# --- individual extractors --------------------------------------------------
# Each takes (digest, ctx) and returns a list[Signal]. ctx carries corpus-level
# stats (e.g. cost percentiles) so extractors can compare a session to its peers.
def _base(digest, type_, polarity, locus, magnitude=1.0, **detail) -> Signal:
return Signal(
session_uid=digest["session_uid"], flavor=digest["flavor"],
repo=digest.get("repo"), type=type_, polarity=polarity, locus=locus,
magnitude=magnitude, detail=detail,
)
def sig_retry_storm(digest, ctx) -> list[Signal]:
retries = digest.get("markers", {}).get("retries", 0)
if retries >= ctx.get("retry_storm_threshold", 3):
return [_base(digest, "retry_storm", PROBLEM, "retries", float(retries), retries=retries)]
return []
def sig_repeated_errors(digest, ctx) -> list[Signal]:
errors = digest.get("markers", {}).get("errors", 0)
if errors >= ctx.get("error_threshold", 3):
return [_base(digest, "repeated_errors", PROBLEM, "errors", float(errors), errors=errors)]
return []
def sig_budget_overrun(digest, ctx) -> list[Signal]:
total = digest.get("cost", {}).get("input_tokens", 0) + digest.get("cost", {}).get("output_tokens", 0)
p90 = ctx.get("tokens_p90", 0)
if p90 and total > p90:
return [_base(digest, "budget_overrun", PROBLEM, "tokens",
float(total) / max(p90, 1), tokens=total, p90=p90)]
return []
def sig_abandoned(digest, ctx) -> list[Signal]:
if digest.get("outcome") == "abandoned":
return [_base(digest, "abandoned", PROBLEM, "outcome", 1.0)]
return []
def sig_clean_pass(digest, ctx) -> list[Signal]:
"""Success: ended success, ran tests, no errors, modest cost."""
m = digest.get("markers", {})
if (digest.get("outcome") == "success" and m.get("test_runs", 0) >= 1
and m.get("errors", 0) == 0 and m.get("retries", 0) == 0):
return [_base(digest, "clean_pass", SUCCESS, "outcome", 1.0,
test_runs=m.get("test_runs"))]
return []
def sig_error_then_recovery(digest, ctx) -> list[Signal]:
"""Success despite hitting errors — a recovery worth learning from."""
m = digest.get("markers", {})
if digest.get("outcome") == "success" and m.get("errors", 0) >= 1:
return [_base(digest, "error_then_recovery", SUCCESS, "errors",
float(m.get("errors", 1)), errors=m.get("errors"))]
return []
# --- tool-mix / infrastructure-overhead signals (WP-0005 T02) ----------------
# These read the captured ``tool_histogram`` — friction that the outcome+marker
# signals above are blind to (sessions still "succeed", just expensively).
def tool_bucket(tool: str) -> str:
"""Group a tool name into a coarse activity bucket (flavor-agnostic)."""
if tool.startswith("mcp__state-hub"):
return "statehub_mcp"
if tool in ("TaskUpdate", "TaskCreate", "TaskGet", "TaskList", "TaskOutput",
"TaskStop", "todo_write", "update_task_status"):
return "task_mgmt"
if tool == "ToolSearch":
return "schema_load"
if tool in ("Bash", "run_terminal_command"):
return "shell"
if tool in ("Edit", "Write", "search_replace", "write", "NotebookEdit"):
return "edit"
if tool in ("Read", "read_file", "grep", "Grep", "glob", "Glob"):
return "read"
return "other"
def _bucketed(digest) -> tuple[dict, int]:
buckets: dict[str, int] = {}
for tool, n in (digest.get("tool_histogram") or {}).items():
buckets[tool_bucket(tool)] = buckets.get(tool_bucket(tool), 0) + n
return buckets, sum(buckets.values())
def sig_infra_overhead(digest, ctx) -> list[Signal]:
"""Problem: a large share of tool calls is hub/task/schema plumbing, not work."""
buckets, total = _bucketed(digest)
if total < ctx.get("infra_min_calls", 20):
return []
overhead = buckets.get("statehub_mcp", 0) + buckets.get("task_mgmt", 0) + buckets.get("schema_load", 0)
share = overhead / total
if share >= ctx.get("infra_overhead_threshold", 0.30):
return [_base(digest, "infra_overhead", PROBLEM, "infra_overhead", round(share, 3),
overhead_calls=overhead, total_calls=total,
statehub=buckets.get("statehub_mcp", 0),
task_mgmt=buckets.get("task_mgmt", 0),
schema_load=buckets.get("schema_load", 0))]
return []
def sig_schema_thrash(digest, ctx) -> list[Signal]:
"""Problem: repeated ToolSearch — deferred-tool schemas reloaded over and over."""
buckets, _ = _bucketed(digest)
n = buckets.get("schema_load", 0)
if n >= ctx.get("schema_thrash_threshold", 5):
return [_base(digest, "schema_thrash", PROBLEM, "schema_load", float(n), tool_searches=n)]
return []
def sig_tool_thrash(digest, ctx) -> list[Signal]:
"""Problem: a single tool is hammered far more than any other — likely churn."""
hist = digest.get("tool_histogram") or {}
if not hist:
return []
tool, n = max(hist.items(), key=lambda kv: kv[1])
if n >= ctx.get("tool_thrash_threshold", 80):
return [_base(digest, "tool_thrash", PROBLEM, f"tool:{tool}", float(n), tool=tool, calls=n)]
return []
def sig_recurring_error(digest, ctx) -> list[Signal]:
"""Problem: a normalized error fingerprint (WP-0006) — one signal per distinct
error in the session, so the same error across sessions/repos/flavors clusters
into a candidate root-cause pattern (locus = fingerprint, magnitude = in-session
occurrences). This is the content-level 'why', not just a coarse error count.
"""
out: list[Signal] = []
for snip in digest.get("error_snippets", []) or []:
fp = snip.get("fingerprint")
if not fp:
continue
out.append(_base(digest, "recurring_error", PROBLEM, fp, float(snip.get("count", 1)),
sample=snip.get("sample", ""), tool=snip.get("tool"),
occurrences=snip.get("count", 1)))
return out
EXTRACTORS: list[Callable] = [
sig_retry_storm, sig_repeated_errors, sig_budget_overrun, sig_abandoned,
sig_clean_pass, sig_error_then_recovery,
sig_infra_overhead, sig_schema_thrash, sig_tool_thrash,
sig_recurring_error,
]
def build_context(digests: list[dict]) -> dict[str, Any]:
"""Corpus-level stats so extractors can compare a session to its peers."""
totals = sorted(
d.get("cost", {}).get("input_tokens", 0) + d.get("cost", {}).get("output_tokens", 0)
for d in digests
)
p90 = totals[int(0.9 * (len(totals) - 1))] if totals else 0
return {
"tokens_p90": p90, "retry_storm_threshold": 3, "error_threshold": 3,
# tool-mix / infra-overhead thresholds (WP-0005 T02)
"infra_min_calls": 20, "infra_overhead_threshold": 0.30,
"schema_thrash_threshold": 5, "tool_thrash_threshold": 80,
}
def extract_signals(digests: list[dict], ctx: Optional[dict] = None) -> list[Signal]:
ctx = ctx or build_context(digests)
out: list[Signal] = []
for d in digests:
for ex in EXTRACTORS:
out.extend(ex(d, ctx))
return out

View File

@@ -0,0 +1,76 @@
"""Read a single session digest from the local store (AGENTIC-WP-0011 T03).
Thin read path for ``kaizen-agentic metrics correlate`` and other consumers.
Does not run ingest.
Usage:
python -m session_memory.digest_lookup <session_uid> [--json]
HELIX_STORE_DB=/abs/path/to/mem.db python -m session_memory.digest_lookup <uid>
"""
from __future__ import annotations
import argparse
import json
import os
import sys
from .core.store import Store
from .ingest import _expand, load_config
def resolve_store_paths(*, config_path: str | None = None) -> tuple[str, str]:
"""Resolve db + blob paths from HELIX_STORE_DB or config.toml [store]."""
env_db = os.environ.get("HELIX_STORE_DB")
if env_db:
db_path = _expand(env_db)
blob_dir = os.path.join(os.path.dirname(db_path), "blobs")
return db_path, blob_dir
here = os.path.dirname(os.path.abspath(__file__))
cfg_path = config_path or os.path.join(here, "config.toml")
store_cfg = load_config(cfg_path).get("store", {})
return _expand(store_cfg.get("db_path", "session_memory/.store/mem.db")), _expand(
store_cfg.get("blob_dir", "session_memory/.store/blobs")
)
def lookup_digest(session_uid: str, *, config_path: str | None = None) -> dict | None:
db_path, blob_dir = resolve_store_paths(config_path=config_path)
store = Store(db_path, blob_dir)
try:
return store.get_digest(session_uid)
finally:
store.close()
def main(argv: list[str] | None = None) -> int:
here = os.path.dirname(os.path.abspath(__file__))
ap = argparse.ArgumentParser(
description="Read one session digest from the Helix Forge store (no ingest)."
)
ap.add_argument("session_uid", help="Normalized session uid, e.g. claude:abc-123")
ap.add_argument("--config", default=os.path.join(here, "config.toml"),
help="config.toml when HELIX_STORE_DB is unset")
ap.add_argument("--json", action="store_true", help="print digest JSON to stdout")
args = ap.parse_args(argv)
digest = lookup_digest(args.session_uid, config_path=args.config)
if digest is None:
print(f"digest not found: {args.session_uid}", file=sys.stderr)
return 1
if args.json:
print(json.dumps(digest, indent=2, sort_keys=True))
else:
cost = digest.get("cost") or {}
tokens = cost.get("input_tokens", 0) + cost.get("output_tokens", 0)
print(f"session_uid: {digest.get('session_uid')}")
print(f"repo: {digest.get('repo')} flavor: {digest.get('flavor')}")
print(f"outcome: {digest.get('outcome')} tokens: {tokens}")
print(f"started_at: {digest.get('started_at')} ended_at: {digest.get('ended_at')}")
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,9 @@
"""Distribute phase (PRD §6.4) — render approved Solution Patterns into per-flavor
artifacts. Mirror of the collector design: agnostic core, thin distributor edges.
base.py Artifact + Distributor protocol + idempotent snippet markers (T01)
claude.py CLAUDE.md snippet distributor (T02)
codex.py AGENTS.md snippet distributor (T03)
grok.py native instruction distributor (T03)
__main__.py `python -m session_memory.distribute` (T05)
"""

View File

@@ -0,0 +1,89 @@
"""Distribute entrypoint (T05): catalog -> per-flavor proposals (HITL).
python -m session_memory.distribute [--config PATH] [--repo R] [--flavor F] [--json]
Reads approved / distribution-ready Solution Patterns from the Pattern Catalog and
renders them into per-flavor **proposals** (never auto-applied) scoped by
repo/domain, recording what is proposed where in the active-pattern registry.
Targets are the repo->domain map in ``config.toml`` crossed with the known
distributor flavors; each pattern's own ``Scope`` filters where it actually lands.
"""
from __future__ import annotations
import argparse
import json
import os
from ..curate.catalog import Catalog
from ..ingest import _expand, load_config
from .proposals import ActiveRegistry, Target, propose
from .registry import all_flavors
def build_targets(config: dict, repo_filter=None, flavor_filter=None) -> list[Target]:
repo_map = config.get("repo_domain_map", {})
flavors = [flavor_filter] if flavor_filter else all_flavors()
targets = []
for repo, domain in repo_map.items():
if repo_filter and repo != repo_filter:
continue
for flavor in flavors:
targets.append(Target(repo=repo, domain=domain, flavor=flavor))
return targets
def run_distribute(config: dict, *, repo_filter=None, flavor_filter=None):
cur = config.get("curate", {})
dist = config.get("distribute", {})
catalog = Catalog(_expand(cur.get("catalog_dir", "session_memory/catalog")))
patterns = catalog.list()
targets = build_targets(config, repo_filter, flavor_filter)
registry = ActiveRegistry(_expand(dist.get("active_registry",
"session_memory/distribute/active_patterns.json")))
out_dir = _expand(dist.get("proposals_dir", "session_memory/proposals"))
return propose(patterns, targets, out_dir, registry)
def _summary(res) -> str:
by_repo = {}
for repo, flavor, pid, _ in res.proposals:
by_repo.setdefault(repo, []).append(f"{pid}[{flavor}]")
lines = [f"# Distribute proposals ({len(res.proposals)} renders, "
f"{len(res.files_written)} files)"]
for repo in sorted(by_repo):
lines.append(f" {repo}: {', '.join(sorted(by_repo[repo]))}")
if res.skipped_not_distributable:
lines.append(f" skipped (not distribution-ready): "
f"{len(set(res.skipped_not_distributable))} pattern(s)")
if not res.proposals:
lines.append(" (no approved/distribution-ready patterns matched any target)")
return "\n".join(lines)
def main(argv=None) -> int:
here = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
ap = argparse.ArgumentParser(description="Distribute approved patterns as per-flavor proposals.")
ap.add_argument("--config", default=os.path.join(here, "config.toml"))
ap.add_argument("--repo", default=None, help="limit to one target repo")
ap.add_argument("--flavor", default=None, help="limit to one flavor")
ap.add_argument("--json", action="store_true")
args = ap.parse_args(argv)
config = load_config(args.config)
res = run_distribute(config, repo_filter=args.repo, flavor_filter=args.flavor)
if args.json:
print(json.dumps({
"proposals": [{"repo": r, "flavor": f, "pattern_id": p, "path": path}
for r, f, p, path in res.proposals],
"files_written": res.files_written,
"skipped": sorted(set(res.skipped_not_distributable)),
}, indent=2))
else:
print(_summary(res))
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,242 @@
[
{
"flavor": "claude",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "agentic-resources",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "codex",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "agentic-resources",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "grok",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "agentic-resources",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "claude",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "can-you-assist",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "codex",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "can-you-assist",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "grok",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "can-you-assist",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "claude",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "net-kingdom",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "codex",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "net-kingdom",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "grok",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "net-kingdom",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "claude",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "ops-bridge",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "codex",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "ops-bridge",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "grok",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "ops-bridge",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "claude",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "state-hub",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "codex",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "state-hub",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "grok",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "state-hub",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "claude",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "the-custodian",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "codex",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "the-custodian",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "grok",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "the-custodian",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "claude",
"pattern_id": "sp-problem-schema_thrash-schema_load",
"repo": "ops-bridge",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "claude",
"pattern_id": "sp-problem-tool_thrash-tool-bash",
"repo": "state-hub",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "claude",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "agentic-resources",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "grok",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "agentic-resources",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "claude",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "can-you-assist",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "grok",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "can-you-assist",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "claude",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "ops-bridge",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "grok",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "ops-bridge",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "claude",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "state-hub",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "grok",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "state-hub",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "claude",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "the-custodian",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "grok",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "the-custodian",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
}
]

View File

@@ -0,0 +1,115 @@
"""Distributor base — Artifact, the Distributor protocol, and idempotent markers
(PRD §6.4 FR-X1; T01).
A **distributor** turns one agnostic :class:`SolutionPattern` into a per-flavor
:class:`Artifact` (a target path + a snippet of content). Everything flavor-neutral
lives here; each flavor adapter (T02/T03) only supplies its target filename and may
override the rendered body using the pattern's ``rendering_hints``.
Snippets carry stable ``BEGIN/END`` markers keyed on the pattern id, so
re-distributing a pattern **updates its block in place** instead of duplicating it
— the property that lets Distribute run repeatedly (HITL) without drift.
"""
from __future__ import annotations
import re
from dataclasses import dataclass
from typing import Any, Optional, Protocol, runtime_checkable
from ..curate.schema import SolutionPattern
@dataclass
class Artifact:
"""A proposed per-flavor rendering of a pattern (FR-X1/FR-X3 — proposed, not applied)."""
flavor: str
target_path: str # repo-relative file the snippet belongs in (e.g. "CLAUDE.md")
pattern_id: str
content: str # the marker-wrapped snippet block
@runtime_checkable
class Distributor(Protocol):
flavor: str
target_path: str
def render(self, pattern: SolutionPattern) -> Artifact: ...
# --- idempotent snippet markers ---------------------------------------------
_MARK = "helix-forge pattern"
def begin_marker(pattern_id: str) -> str:
return f"<!-- BEGIN {_MARK}:{pattern_id} -->"
def end_marker(pattern_id: str) -> str:
return f"<!-- END {_MARK}:{pattern_id} -->"
def wrap_block(pattern_id: str, body: str, version: str = "") -> str:
"""Wrap a rendered body in stable BEGIN/END markers."""
ver = f" v{version}" if version else ""
return f"{begin_marker(pattern_id)}{ver}\n{body.strip()}\n{end_marker(pattern_id)}"
def upsert_block(doc_text: str, pattern_id: str, block: str) -> str:
"""Insert or replace a pattern's marked block within a document (idempotent)."""
pat = re.compile(
re.escape(begin_marker(pattern_id)) + r".*?" + re.escape(end_marker(pattern_id)),
re.DOTALL,
)
if pat.search(doc_text):
return pat.sub(block, doc_text)
sep = "" if doc_text.endswith("\n\n") or not doc_text else "\n\n"
return f"{doc_text}{sep}{block}\n"
# --- agnostic body rendering ------------------------------------------------
def render_markdown_body(pattern: SolutionPattern) -> str:
"""Default flavor-neutral snippet body from the agnostic pattern fields."""
label = "Avoid" if pattern.polarity == "problem" else "Prefer"
lines = [f"### {pattern.name}", "", pattern.problem.strip(), ""]
if pattern.resolutions:
lines.append(f"**{label}:**")
for r in pattern.resolutions:
detail = f"{r.detail}" if r.detail else ""
lines.append(f"- {r.summary}{detail}")
for step in r.steps:
lines.append(f" - {step}")
return "\n".join(lines).strip()
def hint(pattern: SolutionPattern, flavor: str, key: str, default: Any = None) -> Any:
"""Read a per-flavor rendering hint, falling back to ``default``."""
return (pattern.rendering_hints.get(flavor) or {}).get(key, default)
class BaseDistributor:
"""Shared distributor: renders the agnostic body, honouring a ``body`` hint
override and a ``target`` hint, then wraps it in idempotent markers."""
flavor: str = ""
target_path: str = ""
def __init__(self, flavor: Optional[str] = None, target_path: Optional[str] = None) -> None:
if flavor is not None:
self.flavor = flavor
if target_path is not None:
self.target_path = target_path
def body(self, pattern: SolutionPattern) -> str:
return hint(pattern, self.flavor, "body") or render_markdown_body(pattern)
def target(self, pattern: SolutionPattern) -> str:
return hint(pattern, self.flavor, "target") or self.target_path
def render(self, pattern: SolutionPattern) -> Artifact:
block = wrap_block(pattern.id, self.body(pattern), pattern.version)
return Artifact(flavor=self.flavor, target_path=self.target(pattern),
pattern_id=pattern.id, content=block)

View File

@@ -0,0 +1,42 @@
"""Claude distributor (PRD §6.4 FR-X1; T02).
Renders an approved Solution Pattern into a ``CLAUDE.md`` snippet block. Most logic
is inherited from :class:`BaseDistributor`; the Claude-specific touch is an
optional **skill** rendering mode (``rendering_hints["claude"]["as"] == "skill"``)
that emits a skill-style stub instead of a plain instruction snippet — Claude's
native distribution targets are CLAUDE.md snippets, skills, or hooks.
"""
from __future__ import annotations
from ..curate.schema import SolutionPattern
from .base import BaseDistributor, hint, render_markdown_body
class ClaudeDistributor(BaseDistributor):
flavor = "claude"
target_path = "CLAUDE.md"
def body(self, pattern: SolutionPattern) -> str:
override = hint(pattern, self.flavor, "body")
if override:
return override
if hint(pattern, self.flavor, "as") == "skill":
return self._skill_stub(pattern)
return render_markdown_body(pattern)
@staticmethod
def _skill_stub(pattern: SolutionPattern) -> str:
trigger = "avoid" if pattern.polarity == "problem" else "apply"
lines = [
f"## Skill: {pattern.name}",
"",
f"**When:** situations where you would {trigger}{pattern.problem.strip()}",
"",
"**Steps:**",
]
for r in pattern.resolutions:
lines.append(f"- {r.summary}" + (f"{r.detail}" if r.detail else ""))
for step in r.steps:
lines.append(f" - {step}")
return "\n".join(lines).strip()

View File

@@ -0,0 +1,15 @@
"""Codex distributor (PRD §6.4 FR-X1; T03).
Renders an approved Solution Pattern into an ``AGENTS.md`` snippet — Codex's native
repo-convention surface. Identical agnostic body to the other flavors (FR-A3: one
pattern, expressible everywhere); only the target file differs.
"""
from __future__ import annotations
from .base import BaseDistributor
class CodexDistributor(BaseDistributor):
flavor = "codex"
target_path = "AGENTS.md"

View File

@@ -0,0 +1,15 @@
"""Grok distributor (PRD §6.4 FR-X1; T03).
Renders an approved Solution Pattern into Grok's native instruction format. Defaults
to a ``.grok/instructions.md`` snippet; the same agnostic body as the other flavors
(FR-A3), overridable via ``rendering_hints["grok"]``.
"""
from __future__ import annotations
from .base import BaseDistributor
class GrokDistributor(BaseDistributor):
flavor = "grok"
target_path = ".grok/instructions.md"

View File

@@ -0,0 +1,136 @@
"""Scoping, proposed-not-applied output, and the active-pattern registry
(PRD §6.4 FR-X2/FR-X3/FR-X4; T04).
* **Scope (FR-X2):** a pattern lands in a target environment only if the target's
repo/domain/flavor are within the pattern's :class:`Scope` (an empty scope list
means "unrestricted on that axis").
* **Proposed, not applied (FR-X3):** rendered artifacts are written under a
``proposals/`` tree mirroring the target path — a reviewable diff a human applies,
never auto-written into the live file. Re-running upserts each pattern's block in
place (idempotent), so proposals don't accumulate duplicates.
* **Active-pattern registry (FR-X4):** a JSON record of which pattern (and version)
is proposed/active in which (repo, flavor) environment.
"""
from __future__ import annotations
import json
import os
from dataclasses import dataclass
from datetime import datetime, timezone
from ..curate.schema import SolutionPattern
from .base import upsert_block
from .registry import get_distributor
def _now() -> str:
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
@dataclass(frozen=True)
class Target:
"""An environment a pattern could be distributed to."""
repo: str
domain: str = ""
flavor: str = "claude"
def applies(pattern: SolutionPattern, target: Target) -> bool:
"""True if ``target`` is within the pattern's scope (empty axis == any)."""
sc = pattern.scope
if sc.repos and target.repo not in sc.repos:
return False
if sc.domains and target.domain and target.domain not in sc.domains:
return False
if sc.flavors and target.flavor not in sc.flavors:
return False
return True
def is_distributable(pattern: SolutionPattern) -> bool:
return pattern.status == "approved" and pattern.distribution_ready
class ActiveRegistry:
"""JSON record of patterns proposed/active per (repo, flavor) — FR-X4."""
def __init__(self, path: str) -> None:
self.path = path
self._entries: dict[str, dict] = {}
if os.path.exists(path):
with open(path, encoding="utf-8") as fh:
for e in json.load(fh):
self._entries[self._key(e["pattern_id"], e["repo"], e["flavor"])] = e
@staticmethod
def _key(pid: str, repo: str, flavor: str) -> str:
return f"{pid}|{repo}|{flavor}"
def record(self, pid: str, repo: str, flavor: str, version: str,
status: str = "proposed") -> None:
self._entries[self._key(pid, repo, flavor)] = {
"pattern_id": pid, "repo": repo, "flavor": flavor,
"version": version, "status": status, "updated_at": _now(),
}
def entries(self) -> list[dict]:
return [self._entries[k] for k in sorted(self._entries)]
def save(self) -> None:
os.makedirs(os.path.dirname(self.path) or ".", exist_ok=True)
with open(self.path, "w", encoding="utf-8") as fh:
json.dump(self.entries(), fh, indent=2, sort_keys=True)
fh.write("\n")
@dataclass
class ProposalResult:
proposals: list = None # (repo, flavor, pattern_id, proposal_path)
files_written: list = None # absolute proposal paths
skipped_not_distributable: list = None # pattern ids
def __post_init__(self):
self.proposals = self.proposals or []
self.files_written = self.files_written or []
self.skipped_not_distributable = self.skipped_not_distributable or []
def propose(patterns: list[SolutionPattern], targets: list[Target], out_dir: str,
registry: ActiveRegistry) -> ProposalResult:
"""Render in-scope, distributable patterns into per-target proposal files."""
result = ProposalResult()
pending: dict[str, str] = {} # proposal path -> accumulated content
for p in patterns:
if not is_distributable(p):
result.skipped_not_distributable.append(p.id)
continue
for t in targets:
dist = get_distributor(t.flavor)
if dist is None or not applies(p, t):
continue
art = dist.render(p)
path = os.path.join(out_dir, t.repo, art.target_path)
if path not in pending:
pending[path] = _read(path)
pending[path] = upsert_block(pending[path], p.id, art.content)
registry.record(p.id, t.repo, t.flavor, p.version)
result.proposals.append((t.repo, t.flavor, p.id, path))
for path, content in pending.items():
os.makedirs(os.path.dirname(path), exist_ok=True)
with open(path, "w", encoding="utf-8") as fh:
fh.write(content if content.endswith("\n") else content + "\n")
result.files_written.append(path)
registry.save()
return result
def _read(path: str) -> str:
if os.path.exists(path):
with open(path, encoding="utf-8") as fh:
return fh.read()
return ""

View File

@@ -0,0 +1,26 @@
"""Distributor registry (T03) — flavor -> distributor, the one place that knows
about all flavor edges. Adding a flavor = one entry here + one adapter module.
"""
from __future__ import annotations
from typing import Optional
from .base import BaseDistributor
from .claude import ClaudeDistributor
from .codex import CodexDistributor
from .grok import GrokDistributor
_REGISTRY: dict[str, BaseDistributor] = {
"claude": ClaudeDistributor(),
"codex": CodexDistributor(),
"grok": GrokDistributor(),
}
def get_distributor(flavor: str) -> Optional[BaseDistributor]:
return _REGISTRY.get(flavor)
def all_flavors() -> list[str]:
return list(_REGISTRY)

View File

@@ -19,13 +19,19 @@ from dataclasses import dataclass, field
from typing import Any
from .adapters import claude as claude_adapter
from .adapters import codex as codex_adapter
from .adapters import grok as grok_adapter
from .core import digest as digest_mod
from .core.cursor import Cursors
from .core.retention import RetentionConfig, sweep as retention_sweep
from .core.store import Store
# adapter dispatch by source name
_ADAPTERS = {"claude": claude_adapter.parse_session}
_ADAPTERS = {
"claude": claude_adapter.parse_session,
"codex": codex_adapter.parse_session,
"grok": grok_adapter.parse_session,
}
@dataclass

View File

@@ -0,0 +1,9 @@
"""Measure phase (PRD §6.5) — the loop-closer.
metrics.py fleet metrics + persisted baseline snapshots (T01)
effect.py before/after per-pattern effectiveness (T02)
__main__.py python -m session_memory.measure (T03)
Computation over existing digests (reusing WP-0005 tool buckets + WP-0006 error
mining); no new capture.
"""

View File

@@ -0,0 +1,101 @@
"""Measure entrypoint (T03): fleet trend + per-pattern effectiveness.
python -m session_memory.measure [--config PATH] [--label L] [--since DATE]
[--no-save] [--json]
Computes current fleet metrics over the real (quality-filtered) sessions, appends
them to the baseline trend, and reports whether the fleet is getting cheaper /
more reliable over time (FR-M3). With ``--since DATE`` it also reports before/after
effectiveness around a change (FR-M1/FR-M2).
"""
from __future__ import annotations
import argparse
import json
import os
from ..core.store import Store
from ..detect.quality import filter_real, quality_config
from ..ingest import _expand, load_config
from .effect import effectiveness
from .metrics import load_baselines, save_baseline, snapshot
_TREND_KEYS = ("infra_overhead_share_median", "error_rate", "schema_thrash_sessions",
"tokens_p50", "success_rate")
def real_digests(config: dict) -> list[dict]:
s = config.get("store", {})
store = Store(_expand(s["db_path"]), _expand(s["blob_dir"]))
out = filter_real(store.list_digests(), quality_config(config))
store.close()
return out
def _fmt_trend(baselines: list[dict]) -> str:
if not baselines:
return " (no prior snapshots)"
lines = []
recent = baselines[-5:]
for b in recent:
when = (b.get("captured_at") or "")[:10]
lbl = f" {b['label']}" if b.get("label") else ""
lines.append(f" {when}{lbl}: overhead_med={b.get('infra_overhead_share_median')} "
f"err_rate={b.get('error_rate')} schema_thrash={b.get('schema_thrash_sessions')} "
f"tok_p50={b.get('tokens_p50')} success={b.get('success_rate')} "
f"(n={b.get('n_sessions')})")
return "\n".join(lines)
def _report(current: dict, baselines: list[dict], eff: dict | None) -> str:
lines = [f"# Fleet metrics (n={current.get('n_sessions')} real sessions)"]
for k in _TREND_KEYS:
lines.append(f" {k} = {current.get(k)}")
lines.append("\n## Trend (recent snapshots)")
lines.append(_fmt_trend(baselines))
if eff is not None:
lines.append(f"\n## Effectiveness since {eff['applied_at']} "
f"(before={eff['n_before']}, after={eff['n_after']})")
if eff["insufficient_data"]:
lines.append(" insufficient data on one side of the date")
else:
for k in _TREND_KEYS:
d = eff["deltas"].get(k, {})
mark = {True: "improved", False: "worse", None: ""}[d.get("improved")]
lines.append(f" {k}: {d.get('before')} -> {d.get('after')} "
f"({d.get('change'):+}) {mark}")
return "\n".join(lines)
def main(argv=None) -> int:
here = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
ap = argparse.ArgumentParser(description="Measure fleet metrics + per-pattern effectiveness.")
ap.add_argument("--config", default=os.path.join(here, "config.toml"))
ap.add_argument("--label", default="")
ap.add_argument("--since", default=None, help="ISO date for before/after effectiveness")
ap.add_argument("--no-save", action="store_true", help="don't append to the baseline trend")
ap.add_argument("--json", action="store_true")
args = ap.parse_args(argv)
config = load_config(args.config)
digests = real_digests(config)
current = snapshot(digests, label=args.label)
path = _expand(config.get("measure", {}).get("baselines", "session_memory/measure/baselines.jsonl"))
prior = load_baselines(path)
if not args.no_save:
save_baseline(current, path)
eff = effectiveness(digests, args.since, label=args.label) if args.since else None
if args.json:
print(json.dumps({"current": current, "trend": prior + [current], "effectiveness": eff},
indent=2))
else:
print(_report(current, prior + [current], eff))
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1 @@
{"captured_at": "2026-06-07T13:30:14Z", "error_rate": 0.963, "infra_overhead_share_median": 0.117, "infra_overhead_share_p90": 0.261, "label": "phase4-baseline (pre-fixes)", "n_sessions": 27, "recurring_error_occurrences": 505, "schema_thrash_sessions": 8, "success_rate": 1.0, "tokens_p50": 250725, "tokens_p90": 1423966}

View File

@@ -0,0 +1,60 @@
"""Before/after per-pattern effectiveness (PRD §6.5 FR-M1/FR-M2; T02).
Given a change/pattern with an ``applied_at`` date, split sessions into *before*
and *after* by their start time, aggregate each side, and diff the headline
metrics — so we can say whether a distributed pattern (e.g. the Read-before-Edit
reflex, or the State Hub skill) actually moved the numbers, and retire it if not.
"""
from __future__ import annotations
from .metrics import aggregate
# Metrics where a *lower* value after the change means improvement.
_LOWER_IS_BETTER = {
"infra_overhead_share_median", "infra_overhead_share_p90", "error_rate",
"recurring_error_occurrences", "schema_thrash_sessions", "tokens_p50", "tokens_p90",
}
# Metrics where a *higher* value is improvement.
_HIGHER_IS_BETTER = {"success_rate"}
def split_by_date(digests: list[dict], applied_at: str) -> tuple[list[dict], list[dict]]:
"""Partition digests into (before, after) by ``started_at`` vs ``applied_at``."""
before, after = [], []
for d in digests:
ts = d.get("started_at") or ""
(after if ts and ts >= applied_at else before).append(d)
return before, after
def _delta(metric: str, before: float, after: float) -> dict:
change = round(after - before, 3)
if metric in _LOWER_IS_BETTER:
improved = change < 0
elif metric in _HIGHER_IS_BETTER:
improved = change > 0
else:
improved = None
return {"before": before, "after": after, "change": change, "improved": improved}
def effectiveness(digests: list[dict], applied_at: str, *, label: str = "") -> dict:
"""Compare fleet metrics after ``applied_at`` against the prior period."""
before, after = split_by_date(digests, applied_at)
b_agg, a_agg = aggregate(before), aggregate(after)
metrics = (_LOWER_IS_BETTER | _HIGHER_IS_BETTER)
deltas = {}
if before and after:
for m in metrics:
deltas[m] = _delta(m, b_agg.get(m, 0.0), a_agg.get(m, 0.0))
return {
"label": label,
"applied_at": applied_at,
"n_before": len(before),
"n_after": len(after),
"before": b_agg,
"after": a_agg,
"deltas": deltas,
"insufficient_data": not (before and after),
}

View File

@@ -0,0 +1,102 @@
"""Fleet metrics + persisted baselines (PRD §6.5 FR-M3; T01).
Computes the headline health metrics of the captured corpus — the same quantities
the friction assessment reported — so they can be tracked over time and compared
before/after a change. Reuses :func:`detect.signals.tool_bucket` (WP-0005) and the
digest ``error_snippets`` (WP-0006); no new capture.
A **baseline** is a timestamped metrics snapshot appended to a JSONL file, so
successive runs build a trend the entrypoint (T03) can chart.
"""
from __future__ import annotations
import collections
import json
import os
from datetime import datetime, timezone
from ..detect.signals import tool_bucket
def _now() -> str:
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
def _pct(values: list[float], q: float) -> float:
if not values:
return 0.0
s = sorted(values)
return round(s[int(q * (len(s) - 1))], 3)
def _median(values: list[float]) -> float:
return _pct(values, 0.5)
def _buckets(digest: dict) -> collections.Counter:
b: collections.Counter = collections.Counter()
for tool, n in (digest.get("tool_histogram") or {}).items():
b[tool_bucket(tool)] += n
return b
def session_metrics(digest: dict) -> dict:
"""Per-session metrics used to build fleet aggregates."""
b = _buckets(digest)
total = sum(b.values()) or 1
overhead = b["statehub_mcp"] + b["task_mgmt"] + b["schema_load"]
cost = digest.get("cost", {})
tokens = cost.get("input_tokens", 0) + cost.get("output_tokens", 0)
return {
"infra_overhead_share": overhead / total,
"tool_calls": total,
"schema_load": b["schema_load"],
"error_occurrences": sum(s.get("count", 1) for s in (digest.get("error_snippets") or [])),
"has_error": bool(digest.get("error_snippets")),
"tokens": tokens,
"success": digest.get("outcome") == "success",
}
def aggregate(digests: list[dict], *, schema_thrash_threshold: int = 5) -> dict:
"""Fleet-level metrics over a set of (already quality-filtered) digests."""
per = [session_metrics(d) for d in digests]
n = len(per)
if n == 0:
return {"n_sessions": 0}
shares = [m["infra_overhead_share"] for m in per]
tokens = [m["tokens"] for m in per]
return {
"n_sessions": n,
"infra_overhead_share_median": _median(shares),
"infra_overhead_share_p90": _pct(shares, 0.9),
"error_rate": round(sum(m["has_error"] for m in per) / n, 3),
"recurring_error_occurrences": sum(m["error_occurrences"] for m in per),
"schema_thrash_sessions": sum(1 for m in per if m["schema_load"] >= schema_thrash_threshold),
"tokens_p50": _pct(tokens, 0.5),
"tokens_p90": _pct(tokens, 0.9),
"success_rate": round(sum(m["success"] for m in per) / n, 3),
}
def snapshot(digests: list[dict], *, label: str = "") -> dict:
m = aggregate(digests)
m["captured_at"] = _now()
m["label"] = label
return m
def save_baseline(metrics: dict, path: str) -> None:
"""Append a metrics snapshot to the baseline JSONL trend file."""
os.makedirs(os.path.dirname(path) or ".", exist_ok=True)
with open(path, "a", encoding="utf-8") as fh:
fh.write(json.dumps(metrics, sort_keys=True))
fh.write("\n")
def load_baselines(path: str) -> list[dict]:
if not os.path.exists(path):
return []
with open(path, encoding="utf-8") as fh:
return [json.loads(line) for line in fh if line.strip()]

View File

@@ -0,0 +1,9 @@
"""Weekly retro (AGENTIC-WP-0010) — the analysis half of the coding retrospection.
build.py windowed detect + measure -> ranked top-3 suggestions per repo (T01)
publish.py publish the retro to the hub read model + local report (T02)
__main__.py python -m session_memory.retro (T03)
Consumed by activity-core's weekly-coding-retro schedule (ACTIVITY-WP-0008) via
the ``event_type=coding_retro`` read model.
"""

View File

@@ -0,0 +1,68 @@
"""Weekly retro entrypoint (AGENTIC-WP-0010 T03).
python -m session_memory.retro [--window-days 7] [--since D] [--until D]
[--publish] [--json]
Builds the windowed top-3-per-repo retro over the captured sessions, writes a local
JSON + markdown report, and (with ``--publish``) posts it to the hub as the
``coding_retro`` read model that activity-core's weekly schedule consumes.
"""
from __future__ import annotations
import argparse
import json
import os
from ..core.store import Store
from ..curate.catalog import Catalog
from ..ingest import _expand, load_config
from .build import weekly_retro
from .publish import publish_to_hub, render_markdown, write_local
def run_retro(config: dict, *, window_days=None, since=None, until=None):
s = config.get("store", {})
store = Store(_expand(s["db_path"]), _expand(s["blob_dir"]))
digests = store.list_digests()
store.close()
cur = config.get("curate", {})
catalog = Catalog(_expand(cur.get("catalog_dir", "session_memory/catalog")))
rcfg = config.get("retro", {})
return weekly_retro(digests, catalog, since=since, until=until,
window_days=window_days or rcfg.get("window_days", 7))
def main(argv=None) -> int:
here = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
ap = argparse.ArgumentParser(description="Build (and optionally publish) the weekly coding retro.")
ap.add_argument("--config", default=os.path.join(here, "config.toml"))
ap.add_argument("--window-days", type=int, default=None)
ap.add_argument("--since", default=None)
ap.add_argument("--until", default=None)
ap.add_argument("--publish", action="store_true", help="post to the hub coding_retro read model")
ap.add_argument("--json", action="store_true")
args = ap.parse_args(argv)
config = load_config(args.config)
report = run_retro(config, window_days=args.window_days, since=args.since, until=args.until)
rcfg = config.get("retro", {})
write_local(report, _expand(rcfg.get("report_json", "session_memory/retro/last_retro.json")),
_expand(rcfg.get("report_md", "session_memory/retro/last_retro.md")))
published = None
if args.publish:
published = publish_to_hub(report, base_url=rcfg.get("hub_url", "http://127.0.0.1:8000"))
if args.json:
print(json.dumps({"report": report, "published": published}, indent=2))
else:
print(render_markdown(report))
if args.publish:
print(f"\npublished to hub: {published}")
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,99 @@
"""Windowed weekly retro report (AGENTIC-WP-0010 T01).
Runs the existing detect pipeline over a date window, ranks the recurring problem
patterns into **per-repo improvement suggestions** (top 3, cross-flavor first),
attaches a recommendation from the Pattern Catalog where one exists, and bundles a
fleet measure snapshot for context. Pure function over digests — the entrypoint
(T03) handles store/publish.
"""
from __future__ import annotations
import collections
from dataclasses import asdict, dataclass
from datetime import datetime, timedelta, timezone
from typing import Optional
from ..detect.cluster import cluster
from ..detect.quality import QualityConfig, filter_real
from ..detect.signals import extract_signals
from ..measure.metrics import aggregate
# score at/above which a suggestion is "high" priority even when single-flavor
_HIGH_SCORE = 100.0
def _parse(ts: str) -> datetime:
return datetime.fromisoformat(ts.replace("Z", "+00:00"))
def _iso(dt: datetime) -> str:
return dt.astimezone(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
def _now() -> datetime:
return datetime.now(timezone.utc)
@dataclass
class Suggestion:
repo: str
title: str
recommendation: str
priority: str # high | medium
score: float
signal_type: str
cross_flavor: bool
pattern_key: str
def _recommendation(pattern_key: str, locus: str, catalog) -> Optional[str]:
if catalog is None:
return None
sp = catalog.find_for(pattern_key, locus)
if sp and sp.resolutions:
return sp.resolutions[0].summary
return None
def weekly_retro(digests: list[dict], catalog=None, *, since: Optional[str] = None,
until: Optional[str] = None, window_days: int = 7,
max_per_repo: int = 3, min_frequency: int = 2,
quality: Optional[QualityConfig] = None) -> dict:
"""Build the ranked weekly retro report over a date window."""
until_dt = _parse(until) if until else _now()
since_dt = _parse(since) if since else until_dt - timedelta(days=window_days)
windowed = [d for d in digests
if d.get("started_at") and since_dt <= _parse(d["started_at"]) < until_dt]
real = filter_real(windowed, quality or QualityConfig())
patterns = cluster(extract_signals(real), min_frequency=min_frequency)
by_repo: dict[str, list[Suggestion]] = collections.defaultdict(list)
for p in patterns:
if p.polarity != "problem":
continue # improvements come from problems
rec = (_recommendation(p.key, p.locus, catalog)
or f"Investigate {p.signal_type.replace('_', ' ')} on {p.locus}")
priority = "high" if (p.cross_flavor or p.score >= _HIGH_SCORE) else "medium"
for repo in (p.repos or ["(unknown)"]):
by_repo[repo].append(Suggestion(
repo=repo, title=p.title, recommendation=rec, priority=priority,
score=p.score, signal_type=p.signal_type, cross_flavor=p.cross_flavor,
pattern_key=p.key))
suggestions: list[Suggestion] = []
for repo in sorted(by_repo):
items = sorted(by_repo[repo], key=lambda s: -s.score)
suggestions.extend(items[:max_per_repo])
# cross-flavor first, then by score (global ordering for the report)
suggestions.sort(key=lambda s: (not s.cross_flavor, -s.score))
return {
"window": {"since": _iso(since_dt), "until": _iso(until_dt), "days": window_days},
"generated_at": _iso(_now()),
"n_sessions": len(real),
"suggestions": [asdict(s) for s in suggestions],
"measure": aggregate(real),
}

View File

@@ -0,0 +1,322 @@
{
"generated_at": "2026-06-07T19:30:56Z",
"measure": {
"error_rate": 0.957,
"infra_overhead_share_median": 0.167,
"infra_overhead_share_p90": 0.23,
"n_sessions": 23,
"recurring_error_occurrences": 463,
"schema_thrash_sessions": 7,
"success_rate": 1.0,
"tokens_p50": 250725,
"tokens_p90": 901422
},
"n_sessions": 23,
"suggestions": [
{
"cross_flavor": true,
"pattern_key": "problem:recurring_error:make: *** [makefile:<n>: fix-consistency] error <n>",
"priority": "high",
"recommendation": "Investigate recurring error on make: *** [makefile:<n>: fix-consistency] error <n>",
"repo": "net-kingdom",
"score": 54.0,
"signal_type": "recurring_error",
"title": "cross-flavor problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:tool_thrash:tool:Bash",
"priority": "high",
"recommendation": "Batch related shell work into one script, not many small Bash calls",
"repo": "activity-core",
"score": 13128.0,
"signal_type": "tool_thrash",
"title": "problem: tool thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:tool_thrash:tool:Bash",
"priority": "high",
"recommendation": "Batch related shell work into one script, not many small Bash calls",
"repo": "artifact-store",
"score": 13128.0,
"signal_type": "tool_thrash",
"title": "problem: tool thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:tool_thrash:tool:Bash",
"priority": "high",
"recommendation": "Batch related shell work into one script, not many small Bash calls",
"repo": "citation-evidence",
"score": 13128.0,
"signal_type": "tool_thrash",
"title": "problem: tool thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:tool_thrash:tool:Bash",
"priority": "high",
"recommendation": "Batch related shell work into one script, not many small Bash calls",
"repo": "infospace-bench",
"score": 13128.0,
"signal_type": "tool_thrash",
"title": "problem: tool thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:tool_thrash:tool:Bash",
"priority": "high",
"recommendation": "Batch related shell work into one script, not many small Bash calls",
"repo": "railiance-apps",
"score": 13128.0,
"signal_type": "tool_thrash",
"title": "problem: tool thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:tool_thrash:tool:Bash",
"priority": "high",
"recommendation": "Batch related shell work into one script, not many small Bash calls",
"repo": "state-hub",
"score": 13128.0,
"signal_type": "tool_thrash",
"title": "problem: tool thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:schema_thrash:schema_load",
"priority": "high",
"recommendation": "Load the tool schemas you'll need once, up front",
"repo": "activity-core",
"score": 441.0,
"signal_type": "schema_thrash",
"title": "problem: schema thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:schema_thrash:schema_load",
"priority": "high",
"recommendation": "Load the tool schemas you'll need once, up front",
"repo": "citation-evidence",
"score": 441.0,
"signal_type": "schema_thrash",
"title": "problem: schema thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:schema_thrash:schema_load",
"priority": "high",
"recommendation": "Load the tool schemas you'll need once, up front",
"repo": "flex-auth",
"score": 441.0,
"signal_type": "schema_thrash",
"title": "problem: schema thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:schema_thrash:schema_load",
"priority": "high",
"recommendation": "Load the tool schemas you'll need once, up front",
"repo": "infospace-bench",
"score": 441.0,
"signal_type": "schema_thrash",
"title": "problem: schema thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:schema_thrash:schema_load",
"priority": "high",
"recommendation": "Load the tool schemas you'll need once, up front",
"repo": "ops-bridge",
"score": 441.0,
"signal_type": "schema_thrash",
"title": "problem: schema thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
"priority": "high",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "activity-core",
"score": 290.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
"priority": "high",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "citation-evidence",
"score": 290.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
"priority": "high",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "infospace-bench",
"score": 290.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
"priority": "high",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "issue-facade",
"score": 290.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
"priority": "high",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "railiance-apps",
"score": 290.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
"priority": "high",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "state-hub",
"score": 290.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
"priority": "high",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "the-custodian",
"score": 290.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
"priority": "high",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "vergabe-teilnahme",
"score": 290.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has been modified since read, either by the user or by a linter. read it again before attempting to write it.<<path>>",
"priority": "medium",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "artifact-store",
"score": 78.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has been modified since read, either by the user or by a linter. read it again before attempting to write it.<<path>>",
"priority": "medium",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "issue-facade",
"score": 78.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has been modified since read, either by the user or by a linter. read it again before attempting to write it.<<path>>",
"priority": "medium",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "railiance-apps",
"score": 78.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has been modified since read, either by the user or by a linter. read it again before attempting to write it.<<path>>",
"priority": "medium",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "state-hub",
"score": 78.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:budget_overrun:tokens",
"priority": "medium",
"recommendation": "Read narrowly \u2014 target the region you need, not whole large files",
"repo": "artifact-store",
"score": 50.55,
"signal_type": "budget_overrun",
"title": "problem: budget overrun"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:{",
"priority": "medium",
"recommendation": "Investigate recurring error on {",
"repo": "vergabe-teilnahme",
"score": 12.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:found <n> errors (<n> fixed, <n> remaining).",
"priority": "medium",
"recommendation": "Investigate recurring error on found <n> errors (<n> fixed, <n> remaining).",
"repo": "ops-bridge",
"score": 10.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:(note: edit also tried swapping \\uxxxx escapes and their characters; neither form matched, so the mismatch is likely elsewhere in old_string. re-read the file a",
"priority": "medium",
"recommendation": "Investigate recurring error on (note: edit also tried swapping \\uxxxx escapes and their characters; neither form matched, so the mismatch is likely elsewhere in old_string. re-read the file a",
"repo": "net-kingdom",
"score": 6.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:found <n> error (<n> fixed, <n> remaining).",
"priority": "medium",
"recommendation": "Investigate recurring error on found <n> error (<n> fixed, <n> remaining).",
"repo": "ops-bridge",
"score": 6.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<n> failed, <n> passed in <n>.00s",
"priority": "medium",
"recommendation": "Investigate recurring error on <n> failed, <n> passed in <n>.00s",
"repo": "agentic-resources",
"score": 4.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
}
],
"window": {
"days": 30,
"since": "2026-05-08T19:30:56Z",
"until": "2026-06-07T19:30:56Z"
}
}

View File

@@ -0,0 +1,39 @@
# Weekly Coding Retro (2026-05-08 → 2026-06-07)
_23 real sessions · generated 2026-06-07T19:30:56Z_
## Top improvement suggestions (cross-flavor first, ≤3 per repo)
- **net-kingdom** (high, score=54.0) [CROSS-FLAVOR]: cross-flavor problem: recurring error — Investigate recurring error on make: *** [makefile:<n>: fix-consistency] error <n>
- **activity-core** (high, score=13128.0): problem: tool thrash — Batch related shell work into one script, not many small Bash calls
- **artifact-store** (high, score=13128.0): problem: tool thrash — Batch related shell work into one script, not many small Bash calls
- **citation-evidence** (high, score=13128.0): problem: tool thrash — Batch related shell work into one script, not many small Bash calls
- **infospace-bench** (high, score=13128.0): problem: tool thrash — Batch related shell work into one script, not many small Bash calls
- **railiance-apps** (high, score=13128.0): problem: tool thrash — Batch related shell work into one script, not many small Bash calls
- **state-hub** (high, score=13128.0): problem: tool thrash — Batch related shell work into one script, not many small Bash calls
- **activity-core** (high, score=441.0): problem: schema thrash — Load the tool schemas you'll need once, up front
- **citation-evidence** (high, score=441.0): problem: schema thrash — Load the tool schemas you'll need once, up front
- **flex-auth** (high, score=441.0): problem: schema thrash — Load the tool schemas you'll need once, up front
- **infospace-bench** (high, score=441.0): problem: schema thrash — Load the tool schemas you'll need once, up front
- **ops-bridge** (high, score=441.0): problem: schema thrash — Load the tool schemas you'll need once, up front
- **activity-core** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **citation-evidence** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **infospace-bench** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **issue-facade** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **railiance-apps** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **state-hub** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **the-custodian** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **vergabe-teilnahme** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **artifact-store** (medium, score=78.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **issue-facade** (medium, score=78.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **railiance-apps** (medium, score=78.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **state-hub** (medium, score=78.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **artifact-store** (medium, score=50.55): problem: budget overrun — Read narrowly — target the region you need, not whole large files
- **vergabe-teilnahme** (medium, score=12.0): problem: recurring error — Investigate recurring error on {
- **ops-bridge** (medium, score=10.0): problem: recurring error — Investigate recurring error on found <n> errors (<n> fixed, <n> remaining).
- **net-kingdom** (medium, score=6.0): problem: recurring error — Investigate recurring error on (note: edit also tried swapping \uxxxx escapes and their characters; neither form matched, so the mismatch is likely elsewhere in old_string. re-read the file a
- **ops-bridge** (medium, score=6.0): problem: recurring error — Investigate recurring error on found <n> error (<n> fixed, <n> remaining).
- **agentic-resources** (medium, score=4.0): problem: recurring error — Investigate recurring error on <n> failed, <n> passed in <n>.00s
## Fleet snapshot
- infra-overhead median: 0.167
- error rate: 0.957 · schema-thrash: 7
- success rate: 1.0 · tokens p50: 250725

View File

@@ -0,0 +1,78 @@
"""Publish the weekly retro (AGENTIC-WP-0010 T02).
The retro is published to the State Hub as a **read model** — a progress event of
``event_type=coding_retro`` whose ``detail`` carries the structured report. This is
exactly how ``daily-triage-report`` surfaces, and it is what activity-core's
``coding_retro`` resolver (ACTIVITY-WP-0008) reads. A local JSON + markdown report
is always written; the hub publish is best-effort and **degrades gracefully** when
the hub is unreachable.
"""
from __future__ import annotations
import json
import os
import urllib.request
from typing import Callable, Optional
DEFAULT_HUB = "http://127.0.0.1:8000"
def render_markdown(report: dict) -> str:
w = report.get("window", {})
lines = [
f"# Weekly Coding Retro ({w.get('since', '')[:10]}{w.get('until', '')[:10]})",
f"_{report.get('n_sessions', 0)} real sessions · generated {report.get('generated_at', '')}_",
"",
"## Top improvement suggestions (cross-flavor first, ≤3 per repo)",
]
if not report.get("suggestions"):
lines.append("- (no recurring problems above threshold this week)")
for s in report.get("suggestions", []):
flag = " [CROSS-FLAVOR]" if s.get("cross_flavor") else ""
lines.append(f"- **{s['repo']}** ({s['priority']}, score={s['score']}){flag}: "
f"{s['title']}{s['recommendation']}")
m = report.get("measure", {})
lines += ["", "## Fleet snapshot",
f"- infra-overhead median: {m.get('infra_overhead_share_median')}",
f"- error rate: {m.get('error_rate')} · schema-thrash: {m.get('schema_thrash_sessions')}",
f"- success rate: {m.get('success_rate')} · tokens p50: {m.get('tokens_p50')}"]
return "\n".join(lines)
def write_local(report: dict, json_path: str, md_path: Optional[str] = None) -> None:
os.makedirs(os.path.dirname(json_path) or ".", exist_ok=True)
with open(json_path, "w", encoding="utf-8") as fh:
json.dump(report, fh, indent=2, sort_keys=True)
fh.write("\n")
if md_path:
with open(md_path, "w", encoding="utf-8") as fh:
fh.write(render_markdown(report))
fh.write("\n")
def _http_post(url: str, payload: dict) -> None:
req = urllib.request.Request(url, data=json.dumps(payload).encode(),
headers={"Content-Type": "application/json"}, method="POST")
with urllib.request.urlopen(req, timeout=10) as r:
r.read()
def publish_to_hub(report: dict, *, base_url: str = DEFAULT_HUB,
poster: Optional[Callable[[str, dict], None]] = None) -> bool:
"""POST the retro as an event_type=coding_retro progress event. Best-effort."""
poster = poster or _http_post
n = report.get("n_sessions", 0)
k = len(report.get("suggestions", []))
payload = {
"event_type": "coding_retro",
"author": "helix-forge",
"summary": f"Weekly coding retro: {k} ranked suggestions across "
f"{report.get('window', {}).get('days', 7)} days ({n} sessions).",
"detail": report,
}
try:
poster(f"{base_url.rstrip('/')}/progress/", payload)
return True
except Exception:
return False

View File

@@ -0,0 +1,62 @@
"""find_for / covers tests (AGENTIC-WP-0010 follow-up)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.catalog import Catalog # noqa: E402
from session_memory.curate.schema import ( # noqa: E402
Provenance,
Resolution,
SolutionPattern,
)
def _pattern(pid, src, covers=None, name="P"):
return SolutionPattern(
id=pid, name=name, version="1.0.0", polarity="problem", problem="p",
resolutions=[Resolution(summary="do x")],
provenance=Provenance(source_key=src), covers=covers or [])
def test_covers_round_trips(tmp_path):
cat = Catalog(str(tmp_path))
cat.upsert(_pattern("sp-a", "problem:file_not_read:edit",
covers=["file has not been read"]))
assert cat.load("sp-a").covers == ["file has not been read"]
def test_find_for_exact_key(tmp_path):
cat = Catalog(str(tmp_path))
cat.upsert(_pattern(SolutionPattern.make_id("problem:retry_storm:retries"),
"problem:retry_storm:retries"))
got = cat.find_for("problem:retry_storm:retries")
assert got is not None and got.id == "sp-problem-retry_storm-retries"
def test_find_for_covers_match(tmp_path):
cat = Catalog(str(tmp_path))
cat.upsert(_pattern("sp-rbe", "problem:file_not_read:edit",
covers=["file has not been read", "modified since read"]))
# a recurring_error signal with a different key but matching fingerprint locus
got = cat.find_for(
"problem:recurring_error:<tool_use_error>file has not been read yet...",
locus="<tool_use_error>file has not been read yet. read it first...")
assert got is not None and got.id == "sp-rbe"
def test_find_for_no_match_returns_none(tmp_path):
cat = Catalog(str(tmp_path))
cat.upsert(_pattern("sp-rbe", "problem:file_not_read:edit",
covers=["file has not been read"]))
assert cat.find_for("problem:recurring_error:some unrelated error") is None
def test_covers_change_versions(tmp_path):
cat = Catalog(str(tmp_path))
cat.upsert(_pattern("sp-a", "problem:x:y"))
p = cat.load("sp-a")
p.covers = ["new coverage"]
assert cat.upsert(p) == "versioned" # covers is substantive content
assert cat.load("sp-a").version == "1.0.1"

54
tests/test_cluster.py Normal file
View File

@@ -0,0 +1,54 @@
"""Clusterer + evidence + cross-flavor tests (T05/T06)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.detect.cluster import cluster # noqa: E402
from session_memory.detect.signals import PROBLEM, SUCCESS, Signal # noqa: E402
def _sig(uid, flavor, repo, type_, polarity, locus, mag=1.0):
return Signal(session_uid=uid, flavor=flavor, repo=repo, type=type_,
polarity=polarity, locus=locus, magnitude=mag)
def test_min_frequency_filters_singletons():
sigs = [_sig("claude:a", "claude", "r1", "retry_storm", PROBLEM, "retries")]
assert cluster(sigs, min_frequency=2) == []
def test_clusters_recurring_signal_with_evidence():
sigs = [
_sig("claude:a", "claude", "r1", "retry_storm", PROBLEM, "retries", 5),
_sig("claude:b", "claude", "r2", "retry_storm", PROBLEM, "retries", 3),
]
pats = cluster(sigs, min_frequency=2)
assert len(pats) == 1
p = pats[0]
assert p.frequency == 2
assert p.sessions == ["claude:a", "claude:b"]
assert sorted(p.repos) == ["r1", "r2"]
assert p.flavors == ["claude"]
assert p.cross_flavor is False
assert p.cost_impact == 8.0
def test_cross_flavor_flagged_and_ranked_first():
sigs = [
# cross-flavor problem (claude + codex)
_sig("claude:a", "claude", "r1", "repeated_errors", PROBLEM, "errors", 3),
_sig("codex:b", "codex", "r2", "repeated_errors", PROBLEM, "errors", 3),
# single-flavor success cluster with higher raw impact
_sig("grok:c", "grok", "r3", "clean_pass", SUCCESS, "outcome", 5),
_sig("grok:d", "grok", "r4", "clean_pass", SUCCESS, "outcome", 5),
]
pats = cluster(sigs, min_frequency=2)
assert len(pats) == 2
xf = next(p for p in pats if p.signal_type == "repeated_errors")
assert xf.cross_flavor is True
assert sorted(xf.flavors) == ["claude", "codex"]
# cross-flavor pattern is ranked first even if another has higher raw impact
assert pats[0].cross_flavor is True
assert "cross-flavor" in pats[0].title

View File

@@ -0,0 +1,86 @@
"""Codex adapter tests (T01): synthetic rollout fixture."""
import json
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.adapters.codex import parse_session # noqa: E402
REPO_MAP = {"agentic-resources": "helix_forge"}
def _rollout(path, lines):
with open(path, "w", encoding="utf-8") as f:
for ln in lines:
f.write(json.dumps(ln) + "\n")
def test_codex_rollout_parse(tmp_path):
p = tmp_path / "rollout-2026-06-06-abc.jsonl"
_rollout(p, [
{"timestamp": "2026-06-06T10:00:00Z", "type": "session_meta",
"payload": {"id": "cdx-1", "cwd": "/home/worsch/agentic-resources",
"model_provider": "openai", "cli_version": "0.44.0", "model": "gpt-5-codex"}},
{"timestamp": "2026-06-06T10:00:01Z", "type": "turn_context",
"payload": {"model": "gpt-5-codex", "approval_policy": "on-request"}},
{"timestamp": "2026-06-06T10:00:02Z", "type": "event_msg",
"payload": {"type": "task_started"}},
{"timestamp": "2026-06-06T10:00:03Z", "type": "response_item",
"payload": {"type": "message", "role": "user",
"content": [{"type": "input_text", "text": "fix the bug"}]}},
{"timestamp": "2026-06-06T10:00:04Z", "type": "response_item",
"payload": {"type": "reasoning", "summary": "think about it"}},
{"timestamp": "2026-06-06T10:00:05Z", "type": "response_item",
"payload": {"type": "function_call", "name": "apply_patch",
"arguments": "{\"path\":\"x.py\"}", "call_id": "call_1"}},
{"timestamp": "2026-06-06T10:00:06Z", "type": "response_item",
"payload": {"type": "function_call", "name": "shell",
"arguments": "{\"command\":\"pytest -q\"}", "call_id": "call_2"}},
{"timestamp": "2026-06-06T10:00:07Z", "type": "response_item",
"payload": {"type": "function_call_output", "call_id": "call_2", "output": "2 passed"}},
{"timestamp": "2026-06-06T10:00:08Z", "type": "response_item",
"payload": {"type": "message", "role": "assistant",
"content": [{"type": "output_text", "text": "done"}]}},
{"timestamp": "2026-06-06T10:00:09Z", "type": "event_msg",
"payload": {"type": "token_count",
"info": {"total_token_usage": {"input_tokens": 200, "output_tokens": 30,
"cached_input_tokens": 15}}}},
{"timestamp": "2026-06-06T10:00:10Z", "type": "event_msg",
"payload": {"type": "task_complete"}},
])
norm = parse_session(str(p), REPO_MAP)
assert norm is not None
s = norm.session
assert s.session_uid == "codex:cdx-1"
assert s.flavor == "codex"
assert s.repo == "agentic-resources" and s.domain == "helix_forge"
assert s.model == "gpt-5-codex"
assert s.cost.input_tokens == 200 and s.cost.output_tokens == 30 and s.cost.cache_tokens == 15
assert s.cost.turns == 1
assert s.cost.wall_clock_s == 10.0
kinds = [e.kind for e in norm.events]
assert kinds == ["lifecycle", "user_msg", "thinking", "edit", "test_run",
"tool_result", "assistant_msg", "completion"]
# flat linkage: function_call_output links to its function_call by call_id
out = next(e for e in norm.events if e.kind == "tool_result")
test_call = next(e for e in norm.events if e.kind == "test_run")
assert out.parent_seq == test_call.seq
# apply_patch classified as edit; pytest as test_run
edit = next(e for e in norm.events if e.kind == "edit")
assert edit.tool == "apply_patch"
def test_codex_empty_or_no_meta_returns_none(tmp_path):
p = tmp_path / "rollout-empty.jsonl"
p.write_text("")
assert parse_session(str(p), REPO_MAP) is None
p2 = tmp_path / "rollout-nometa.jsonl"
_rollout(p2, [{"timestamp": "t", "type": "event_msg", "payload": {"type": "task_started"}}])
assert parse_session(str(p2), REPO_MAP) is None # no session_meta -> no id

View File

@@ -0,0 +1,86 @@
"""Versioned Pattern Catalog tests (T02): round-trip, dedup, idempotent upsert."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.catalog import ( # noqa: E402
ADDED,
UNCHANGED,
UPDATED,
VERSIONED,
Catalog,
)
from session_memory.curate.schema import ( # noqa: E402
Provenance,
Resolution,
Scope,
SolutionPattern,
)
def _pattern(src="success:clean_pass:outcome", problem="ran tests, clean finish"):
return SolutionPattern(
id=SolutionPattern.make_id(src),
name="Run tests before declaring success",
version="1.0.0",
polarity="success",
problem=problem,
resolutions=[Resolution(summary="run the suite")],
scope=Scope(flavors=["claude", "grok"]),
provenance=Provenance(source_key=src, evidence={"frequency": 18}),
)
def test_add_then_load_round_trips(tmp_path):
cat = Catalog(str(tmp_path))
assert cat.upsert(_pattern()) == ADDED
loaded = cat.load(SolutionPattern.make_id("success:clean_pass:outcome"))
assert loaded is not None
assert loaded.problem == "ran tests, clean finish"
assert loaded.created_at and loaded.updated_at
assert [p.id for p in cat.list()] == [loaded.id]
def test_resave_identical_is_noop(tmp_path):
cat = Catalog(str(tmp_path))
cat.upsert(_pattern())
assert cat.upsert(_pattern()) == UNCHANGED
# version not bumped, no history written
assert cat.load(_pattern().id).version == "1.0.0"
assert cat.history(_pattern().id) == []
def test_dedup_on_source_key(tmp_path):
cat = Catalog(str(tmp_path))
cat.upsert(_pattern())
cat.upsert(_pattern()) # same source key -> same id -> one file
assert len(cat.list()) == 1
def test_content_change_bumps_version_and_archives(tmp_path):
cat = Catalog(str(tmp_path))
cat.upsert(_pattern())
assert cat.upsert(_pattern(problem="now with more nuance")) == VERSIONED
current = cat.load(_pattern().id)
assert current.version == "1.0.1"
assert current.problem == "now with more nuance"
hist = cat.history(_pattern().id)
assert len(hist) == 1
assert hist[0]["version"] == "1.0.0"
assert hist[0]["status"] == "superseded"
def test_status_only_change_updates_without_bump(tmp_path):
cat = Catalog(str(tmp_path))
cat.upsert(_pattern())
p = _pattern()
p.status = "approved"
p.distribution_ready = True
assert cat.upsert(p) == UPDATED
current = cat.load(p.id)
assert current.status == "approved"
assert current.distribution_ready is True
assert current.version == "1.0.0" # metadata change, no bump
assert cat.history(p.id) == []

View File

@@ -0,0 +1,70 @@
"""Hub decision integration tests (T05): payload shape + graceful queue/flush."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.catalog import Catalog # noqa: E402
from session_memory.curate.decisions import DecisionRecorder, build_decision # noqa: E402
from session_memory.curate.review import APPROVE, REJECT, ReviewLog, review # noqa: E402
def _candidate(key="success:clean_pass:outcome"):
return {"key": key, "frequency": 18, "sessions": ["a", "b"],
"cost_impact": 9.0, "cross_flavor": True, "flavors": ["claude", "grok"]}
def test_build_decision_payload_shape():
d = build_decision(_candidate(), "approve", "looks solid", workstream_id="ws-1")
assert d["decision_type"] == "made"
assert d["workstream_id"] == "ws-1"
assert "Promote" in d["title"]
assert d["rationale"] == "looks solid"
assert "success:clean_pass:outcome" in d["description"]
def test_sink_accepts_decision(tmp_path):
captured = []
rec = DecisionRecorder(str(tmp_path / "q.jsonl"), sink=captured.append)
assert rec.record(_candidate(), "approve", "ok") is True
assert rec.pending() == []
assert len(captured) == 1
def test_queues_when_sink_down(tmp_path):
def boom(_):
raise RuntimeError("hub down")
rec = DecisionRecorder(str(tmp_path / "q.jsonl"), sink=boom)
assert rec.record(_candidate(), "reject", "noise") is False
assert len(rec.pending()) == 1
def test_no_sink_defaults_to_queue(tmp_path):
rec = DecisionRecorder(str(tmp_path / "q.jsonl"))
rec.record(_candidate(), "approve", "ok")
assert len(rec.pending()) == 1
def test_flush_replays_queue(tmp_path):
rec = DecisionRecorder(str(tmp_path / "q.jsonl")) # offline -> queue
rec.record(_candidate("problem:abandoned:outcome"), "reject", "x")
rec.record(_candidate("success:clean_pass:outcome"), "approve", "y")
captured = []
assert rec.flush(sink=captured.append) == 2
assert rec.pending() == []
assert len(captured) == 2
def test_review_records_each_final_decision(tmp_path):
cat = Catalog(str(tmp_path / "catalog"))
log = ReviewLog(str(tmp_path / "reviews.jsonl"))
captured = []
rec = DecisionRecorder(str(tmp_path / "q.jsonl"), sink=captured.append, workstream_id="ws")
cands = [_candidate("success:clean_pass:outcome"), _candidate("problem:abandoned:outcome")]
review(cands, lambda c: (APPROVE if "success" in c["key"] else REJECT, "r"), cat, log,
recorder=rec)
assert len(captured) == 2
actions = sorted("Promote" in d["title"] for d in captured)
assert actions == [False, True]

View File

@@ -0,0 +1,84 @@
"""Curate entrypoint tests (T06): batch auto-approve end-to-end via the store."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.core.store import Store # noqa: E402
from session_memory.curate.__main__ import main # noqa: E402
from session_memory.curate.catalog import Catalog # noqa: E402
def _digest(uid, flavor, repo, **markers):
return {
"session_uid": uid, "flavor": flavor, "repo": repo, "outcome": "fail",
"cost": {"input_tokens": 10, "output_tokens": 1},
"markers": {"errors": markers.get("errors", 0), "retries": markers.get("retries", 0),
"test_runs": 0, "edits": 0, "human_interventions": 0},
# real coding session per the quality filter (WP-0005 T01)
"event_count": 40, "first_prompt": "Fix the failing build and retry the suite",
"tool_histogram": {"Bash": 20, "Edit": 12, "Read": 8},
}
def _write_config(tmp_path) -> str:
store = tmp_path / ".store"
catalog = tmp_path / "catalog"
cfg = f"""
[store]
db_path = "{store / 'm.db'}"
blob_dir = "{store / 'blobs'}"
cursor = "{store / 'c.json'}"
[curate]
catalog_dir = "{catalog}"
review_log = "{store / 'reviews.jsonl'}"
decision_queue = "{store / 'decisions.queue.jsonl'}"
[curate.gate]
min_frequency = 2
min_sessions = 2
"""
path = tmp_path / "config.toml"
path.write_text(cfg)
return str(path), str(store), str(catalog)
def test_auto_approve_promotes_cross_flavor(tmp_path, capsys):
cfg_path, store_dir, catalog_dir = _write_config(tmp_path)
st = Store(os.path.join(store_dir, "m.db"), os.path.join(store_dir, "blobs"))
st.write_digest("claude:a", _digest("claude:a", "claude", "r1", retries=5))
st.write_digest("codex:b", _digest("codex:b", "codex", "r2", retries=4))
st.close()
rc = main(["--config", cfg_path, "--auto-approve"])
assert rc == 0
cat = Catalog(catalog_dir)
patterns = cat.list()
assert len(patterns) == 1
assert patterns[0].polarity == "problem"
# clears the promote floor (freq>=2) but below the default distribution
# floor (freq>=3) -> promoted as provisional, not distribution-ready
assert patterns[0].status == "provisional"
assert patterns[0].distribution_ready is False
out = capsys.readouterr().out
assert "Curate summary" in out
# hub offline in tests -> decision queued
assert "decisions queued" in out
def test_rerun_is_idempotent(tmp_path):
cfg_path, store_dir, catalog_dir = _write_config(tmp_path)
st = Store(os.path.join(store_dir, "m.db"), os.path.join(store_dir, "blobs"))
st.write_digest("claude:a", _digest("claude:a", "claude", "r1", retries=5))
st.write_digest("codex:b", _digest("codex:b", "codex", "r2", retries=4))
st.close()
main(["--config", cfg_path, "--auto-approve"])
main(["--config", cfg_path, "--auto-approve"]) # second pass: already decided
cat = Catalog(catalog_dir)
assert len(cat.list()) == 1
assert cat.load(cat.list()[0].id).version == "1.0.0" # no spurious bump

View File

@@ -0,0 +1,76 @@
"""Evidence-bar + bloat-guard tests (T04)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.catalog import Catalog # noqa: E402
from session_memory.curate.gating import ( # noqa: E402
GateConfig,
bloat_warnings,
evaluate,
gate_config,
)
from session_memory.curate.review import candidate_to_pattern # noqa: E402
def _candidate(key="success:clean_pass:outcome", freq=5, sessions=5, impact=10.0,
cross=True, flavors=("claude", "grok")):
return {
"key": key,
"frequency": freq,
"sessions": [f"s{i}" for i in range(sessions)],
"cost_impact": impact,
"cross_flavor": cross,
"flavors": list(flavors),
}
def test_clears_bar_and_distribution_ready():
r = evaluate(_candidate(), GateConfig(dist_min_frequency=3))
assert r.promotable and r.distribution_ready
assert r.status == "approved"
def test_thin_candidate_promotable_but_provisional():
# meets promote floor (freq>=2) but below distribution floor (freq<3)
r = evaluate(_candidate(freq=2, sessions=2), GateConfig(dist_min_frequency=3))
assert r.promotable
assert not r.distribution_ready
assert r.status == "provisional"
def test_below_promote_floor_not_promotable():
r = evaluate(_candidate(freq=1, sessions=1))
assert not r.promotable
assert any("frequency" in reason for reason in r.reasons)
def test_cross_flavor_required_for_distribution():
r = evaluate(_candidate(cross=False), GateConfig(dist_require_cross_flavor=True))
assert r.promotable
assert not r.distribution_ready
assert any("cross-flavor" in reason for reason in r.reasons)
def test_gate_config_reads_toml_dict():
cfg = gate_config({"curate": {"gate": {"min_frequency": 9, "dist_require_cross_flavor": True}}})
assert cfg.min_frequency == 9
assert cfg.dist_require_cross_flavor is True
# defaults preserved for unspecified keys
assert cfg.dist_min_frequency == 3
def test_bloat_flags_duplicate_and_near_duplicate(tmp_path):
cat = Catalog(str(tmp_path))
cat.upsert(candidate_to_pattern(_candidate(key="success:clean_pass:outcome")))
existing = cat.list()
# exact same key -> duplicate
dup = bloat_warnings(_candidate(key="success:clean_pass:outcome"), existing)
assert any("duplicate" in w for w in dup)
# different polarity, same signal_type+locus -> near-duplicate
near = bloat_warnings(_candidate(key="problem:clean_pass:outcome"), existing)
assert any("near-duplicate" in w for w in near)
# unrelated -> no warnings
assert bloat_warnings(_candidate(key="problem:retry_storm:retries"), existing) == []

View File

@@ -0,0 +1,93 @@
"""Review workflow tests (T03): promote/reject/discuss + idempotent re-review."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.catalog import Catalog # noqa: E402
from session_memory.curate.review import ( # noqa: E402
APPROVE,
DISCUSS,
REJECT,
ReviewLog,
candidate_to_pattern,
review,
)
from session_memory.curate.schema import SolutionPattern # noqa: E402
def _candidate(key="success:clean_pass:outcome", freq=18, flavors=("claude", "grok")):
return {
"key": key,
"polarity": key.split(":")[0],
"signal_type": key.split(":")[1],
"locus": key.split(":")[2],
"title": "cross-flavor success: clean pass",
"frequency": freq,
"flavors": list(flavors),
"repos": ["agentic-resources"],
"sessions": [f"s{i}" for i in range(freq)],
"cross_flavor": len(flavors) > 1,
"cost_impact": 12.5,
}
def _decider(action, rationale="because"):
return lambda cand: (action, rationale)
def test_approve_promotes_to_catalog(tmp_path):
cat = Catalog(str(tmp_path / "catalog"))
log = ReviewLog(str(tmp_path / "reviews.jsonl"))
res = review([_candidate()], _decider(APPROVE), cat, log)
assert len(res.approved) == 1
p = cat.load(SolutionPattern.make_id("success:clean_pass:outcome"))
assert p is not None
assert p.scope.flavors == ["claude", "grok"]
assert set(p.rendering_hints) == {"claude", "grok"}
assert p.provenance.evidence["frequency"] == 18
def test_reject_records_no_catalog_write(tmp_path):
cat = Catalog(str(tmp_path / "catalog"))
log = ReviewLog(str(tmp_path / "reviews.jsonl"))
res = review([_candidate()], _decider(REJECT), cat, log)
assert res.rejected == ["success:clean_pass:outcome"]
assert cat.list() == []
def test_discuss_defers_and_is_not_final(tmp_path):
cat = Catalog(str(tmp_path / "catalog"))
log = ReviewLog(str(tmp_path / "reviews.jsonl"))
res = review([_candidate()], _decider(DISCUSS), cat, log)
assert res.deferred == ["success:clean_pass:outcome"]
# not recorded as final -> a later pass re-surfaces it
res2 = review([_candidate()], _decider(APPROVE), cat, log)
assert len(res2.approved) == 1
def test_prior_reject_remembered_same_evidence(tmp_path):
cat = Catalog(str(tmp_path / "catalog"))
log_path = str(tmp_path / "reviews.jsonl")
review([_candidate()], _decider(REJECT), cat, ReviewLog(log_path))
# fresh log instance (reloads from disk) + same evidence -> skipped
res = review([_candidate()], _decider(APPROVE), cat, ReviewLog(log_path))
assert res.skipped == ["success:clean_pass:outcome"]
assert cat.list() == []
def test_changed_evidence_resurfaces(tmp_path):
cat = Catalog(str(tmp_path / "catalog"))
log_path = str(tmp_path / "reviews.jsonl")
review([_candidate(freq=18)], _decider(REJECT), cat, ReviewLog(log_path))
# more evidence now -> not skipped, gets re-reviewed
res = review([_candidate(freq=40)], _decider(APPROVE), cat, ReviewLog(log_path))
assert len(res.approved) == 1
def test_candidate_to_pattern_defaults():
p = candidate_to_pattern(_candidate(flavors=("claude",)))
assert p.status == "provisional"
assert p.rendering_hints["claude"]["target"] == "CLAUDE.md"
assert p.polarity == "success"

View File

@@ -0,0 +1,80 @@
"""Round-trip + validation tests for the Solution Pattern schema (T01)."""
import os
import sys
import pytest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.schema import ( # noqa: E402
Provenance,
Resolution,
Scope,
SolutionPattern,
)
def _sample() -> SolutionPattern:
src = "success:clean_pass:outcome"
return SolutionPattern(
id=SolutionPattern.make_id(src),
name="Run tests before declaring success",
version="1.0.0",
polarity="success",
problem="Sessions that run tests and finish with no retries resolve cheaply.",
resolutions=[Resolution(summary="Always run the suite", steps=["edit", "test", "commit"])],
scope=Scope(flavors=["claude", "grok"]),
provenance=Provenance(source_key=src, evidence={"frequency": 18, "cross_flavor": True}),
rendering_hints={"claude": {"target": "CLAUDE.md"}, "codex": {"target": "AGENTS.md"}},
status="approved",
distribution_ready=True,
)
def test_round_trip_is_lossless():
p = _sample()
again = SolutionPattern.from_json(p.to_json())
assert again.to_dict() == p.to_dict()
assert again.resolutions[0].steps == ["edit", "test", "commit"]
assert again.scope.flavors == ["claude", "grok"]
assert again.provenance.evidence["cross_flavor"] is True
def test_serialization_is_deterministic():
p = _sample()
assert p.to_json() == p.to_json()
assert SolutionPattern.from_json(p.to_json()).to_json() == p.to_json()
def test_make_id_is_stable_and_slugged():
assert SolutionPattern.make_id("success:clean_pass:outcome") == "sp-success-clean_pass-outcome"
# same source key -> same id regardless of later wording
assert SolutionPattern.make_id("problem:abandoned:outcome") == SolutionPattern.make_id(
"problem:abandoned:outcome"
)
def test_bump_version():
assert SolutionPattern.bump_version("1.0.0") == "1.0.1"
assert SolutionPattern.bump_version("1.2.3", "minor") == "1.3.0"
assert SolutionPattern.bump_version("1.2.3", "major") == "2.0.0"
def test_rejects_unknown_polarity():
with pytest.raises(ValueError):
SolutionPattern(id="x", name="n", version="1.0.0", polarity="meh", problem="p")
def test_rejects_unknown_status():
with pytest.raises(ValueError):
SolutionPattern(id="x", name="n", version="1.0.0", polarity="problem",
problem="p", status="bogus")
def test_rejects_unknown_flavor_in_hints_and_scope():
with pytest.raises(ValueError):
SolutionPattern(id="x", name="n", version="1.0.0", polarity="problem",
problem="p", rendering_hints={"gpt": {}})
with pytest.raises(ValueError):
Scope(flavors=["gpt"])

View File

@@ -0,0 +1,47 @@
"""Detect entrypoint tests (T07): end-to-end digests -> patterns, persisted."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.core.store import Store # noqa: E402
from session_memory.detect.__main__ import run_detect # noqa: E402
def _digest(uid, flavor, repo, **markers):
return {
"session_uid": uid, "flavor": flavor, "repo": repo, "outcome": "fail",
"cost": {"input_tokens": 10, "output_tokens": 1},
"markers": {"errors": markers.get("errors", 0), "retries": markers.get("retries", 0),
"test_runs": 0, "edits": 0, "human_interventions": 0},
# fields the quality filter (WP-0005 T01) checks — real coding session
"event_count": 40, "first_prompt": "Fix the failing build and retry the suite",
"tool_histogram": {"Bash": 20, "Edit": 12, "Read": 8},
}
def _config(tmp_path):
return {"store": {"db_path": str(tmp_path / ".store/m.db"),
"blob_dir": str(tmp_path / ".store/blobs"),
"cursor": str(tmp_path / ".store/c.json")}}
def test_run_detect_persists_cross_flavor_pattern(tmp_path):
cfg = _config(tmp_path)
st = Store(cfg["store"]["db_path"], cfg["store"]["blob_dir"])
# same problem (retry_storm) across two flavors -> cross-flavor candidate
st.write_digest("claude:a", _digest("claude:a", "claude", "r1", retries=5))
st.write_digest("codex:b", _digest("codex:b", "codex", "r2", retries=4))
st.close()
patterns = run_detect(cfg, min_frequency=2)
assert len(patterns) == 1
assert patterns[0]["cross_flavor"] is True
assert patterns[0]["signal_type"] == "retry_storm"
# persisted to the Tier 2 patterns table
st2 = Store(cfg["store"]["db_path"], cfg["store"]["blob_dir"])
rows = st2.db.execute("SELECT key FROM patterns").fetchall()
assert len(rows) == 1
st2.close()

View File

@@ -0,0 +1,80 @@
"""Infra-overhead + thrash signal tests (WP-0005 T02)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.detect.signals import ( # noqa: E402
build_context,
extract_signals,
sig_infra_overhead,
sig_schema_thrash,
sig_tool_thrash,
tool_bucket,
)
def _digest(uid="claude:a", repo="r1", tools=None):
return {"session_uid": uid, "flavor": "claude", "repo": repo, "outcome": "success",
"cost": {"input_tokens": 1, "output_tokens": 1},
"markers": {"errors": 0, "retries": 0, "test_runs": 0},
"tool_histogram": tools or {}}
CTX = {"infra_min_calls": 20, "infra_overhead_threshold": 0.30,
"schema_thrash_threshold": 5, "tool_thrash_threshold": 80}
def test_tool_bucket_mapping():
assert tool_bucket("mcp__state-hub__update_task_status") == "statehub_mcp"
assert tool_bucket("ToolSearch") == "schema_load"
assert tool_bucket("TaskUpdate") == "task_mgmt"
assert tool_bucket("Bash") == "shell"
assert tool_bucket("Edit") == "edit"
def test_infra_overhead_fires_above_share():
# 18 statehub of 30 total = 60% overhead
d = _digest(tools={"mcp__state-hub__create_task": 18, "Bash": 8, "Edit": 4})
sig = sig_infra_overhead(d, CTX)
assert sig and sig[0].type == "infra_overhead"
assert sig[0].magnitude >= 0.30
assert sig[0].detail["statehub"] == 18
def test_infra_overhead_quiet_when_mostly_work():
d = _digest(tools={"mcp__state-hub__create_task": 3, "Bash": 40, "Edit": 30})
assert sig_infra_overhead(d, CTX) == []
def test_infra_overhead_ignores_tiny_sessions():
d = _digest(tools={"mcp__state-hub__create_task": 5}) # below infra_min_calls
assert sig_infra_overhead(d, CTX) == []
def test_schema_thrash_fires():
d = _digest(tools={"ToolSearch": 9, "Bash": 5})
sig = sig_schema_thrash(d, CTX)
assert sig and sig[0].type == "schema_thrash"
assert sig[0].detail["tool_searches"] == 9
def test_tool_thrash_fires_on_dominant_tool():
d = _digest(tools={"Bash": 120, "Edit": 5})
sig = sig_tool_thrash(d, CTX)
assert sig and sig[0].locus == "tool:Bash"
def test_extract_signals_includes_infra():
d = _digest(tools={"mcp__state-hub__create_task": 18, "Bash": 8, "Edit": 4,
"ToolSearch": 6})
types = {s.type for s in extract_signals([d])}
assert "infra_overhead" in types
assert "schema_thrash" in types
def test_build_context_has_infra_defaults():
ctx = build_context([])
assert ctx["infra_overhead_threshold"] == 0.30
assert ctx["schema_thrash_threshold"] == 5

View File

@@ -0,0 +1,61 @@
"""Session-quality filter tests (T01)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.detect.quality import ( # noqa: E402
QualityConfig,
filter_real,
is_real_coding_session,
quality_config,
)
def _digest(repo="agentic-resources", events=60, prompt="Implement the curate entrypoint",
tools=None):
return {
"session_uid": "claude:x", "flavor": "claude", "repo": repo,
"event_count": events, "first_prompt": prompt,
"tool_histogram": tools if tools is not None else {"Bash": 20, "Edit": 15, "Read": 8},
}
def test_real_session_passes():
assert is_real_coding_session(_digest()) is True
def test_healthcheck_prompt_dropped():
assert is_real_coding_session(_digest(events=3, prompt="Say hello in one word.",
tools={})) is False
def test_interrupted_dropped():
assert is_real_coding_session(_digest(events=1, prompt="[Request interrupted by user]",
tools={})) is False
def test_too_short_dropped():
assert is_real_coding_session(_digest(events=5)) is False
def test_no_repo_dropped():
assert is_real_coding_session(_digest(repo=None)) is False
def test_no_substantive_tools_dropped():
# plenty of events but only plumbing calls -> not real coding
assert is_real_coding_session(
_digest(tools={"mcp__state-hub__update_task_status": 40})) is False
def test_filter_real_keeps_only_real():
digs = [_digest(), _digest(events=3, prompt="hello", tools={}), _digest(repo=None)]
assert len(filter_real(digs)) == 1
def test_quality_config_from_toml():
cfg = quality_config({"detect": {"quality": {"min_events": 50}}})
assert cfg.min_events == 50
assert cfg.min_substantive == 3 # default preserved

View File

@@ -0,0 +1,59 @@
"""Recurring-error signal + clustering (WP-0006 T02)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.detect.cluster import cluster # noqa: E402
from session_memory.detect.signals import ( # noqa: E402
extract_signals,
sig_recurring_error,
)
def _digest(uid, repo, flavor="claude", snippets=None):
return {
"session_uid": uid, "flavor": flavor, "repo": repo, "outcome": "success",
"cost": {"input_tokens": 1, "output_tokens": 1},
"markers": {"errors": 0, "retries": 0, "test_runs": 0},
"tool_histogram": {}, "error_snippets": snippets or [],
}
_FP = "modulenotfounderror: no module named 'foo' at <path>:<n>"
def test_signal_per_distinct_fingerprint():
d = _digest("claude:a", "r1", snippets=[
{"fingerprint": _FP, "sample": "ModuleNotFoundError ...", "count": 3, "tool": "Bash"},
{"fingerprint": "keyerror: <str>", "sample": "KeyError", "count": 1, "tool": None},
])
sigs = sig_recurring_error(d, {})
assert len(sigs) == 2
top = [s for s in sigs if s.locus == _FP][0]
assert top.type == "recurring_error"
assert top.magnitude == 3.0
assert top.detail["sample"].startswith("ModuleNotFound")
def test_clusters_across_sessions_and_flavors():
# same fingerprint in a claude and a grok session -> cross-flavor candidate
digs = [
_digest("claude:a", "r1", "claude",
[{"fingerprint": _FP, "sample": "ModuleNotFoundError", "count": 2, "tool": "Bash"}]),
_digest("grok:b", "r2", "grok",
[{"fingerprint": _FP, "sample": "ModuleNotFoundError", "count": 1, "tool": None}]),
]
signals = extract_signals(digs)
pats = cluster([s for s in signals if s.type == "recurring_error"], min_frequency=2)
assert len(pats) == 1
p = pats[0]
assert p.signal_type == "recurring_error"
assert p.cross_flavor is True
assert sorted(p.flavors) == ["claude", "grok"]
assert p.frequency == 2
def test_no_snippets_no_signal():
assert sig_recurring_error(_digest("claude:a", "r1"), {}) == []

101
tests/test_digest_errors.py Normal file
View File

@@ -0,0 +1,101 @@
"""Error-body mining into the digest (WP-0006 T01)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.core.digest import ( # noqa: E402
_error_fingerprint,
_error_snippets,
build_digest,
)
from session_memory.core.schema import SCHEMA_VERSION, Session, SessionEvent # noqa: E402
def _ev(seq, kind, **kw):
return SessionEvent(session_uid="claude:s", seq=seq, kind=kind, **kw)
def test_fingerprint_normalizes_paths_numbers_ids():
a = _error_fingerprint("ModuleNotFoundError: No module named 'foo' at /home/x/a.py:42")
b = _error_fingerprint("ModuleNotFoundError: No module named 'foo' at /srv/y/b.py:9991")
assert a == b # paths + line numbers stripped -> same fingerprint
assert "<path>" in a and "<n>" in a
def test_fingerprint_uuid_and_addr():
fp = _error_fingerprint("connection 0xDEADBEEF to 1972d1d9-fc35-4912-8126-1fe64cc51425 failed")
assert "<addr>" in fp and "<uuid>" in fp
def test_snippets_dedup_and_count():
blobs = {"b1": "Traceback...\nValueError: bad thing at /p/x.py:10",
"b2": "Traceback...\nValueError: bad thing at /q/y.py:99",
"b3": "KeyError: 'id'"}
events = [
_ev(0, "error", payload_ref="b1"),
_ev(1, "error", payload_ref="b2"), # same fingerprint as b1
_ev(2, "error", payload_ref="b3"),
]
snips = _error_snippets(events, blobs)
assert len(snips) == 2
top = snips[0]
assert top["count"] == 2 # the ValueError collapsed
assert "ValueError" in top["sample"]
def test_failed_tool_result_mined():
blobs = {"b1": "npm ERR! something failed with non-zero exit"}
events = [_ev(0, "tool_result", tool="Bash", payload_ref="b1")]
snips = _error_snippets(events, blobs)
assert len(snips) == 1
assert snips[0]["tool"] == "Bash"
def test_clean_tool_result_not_mined():
blobs = {"b1": "6 passed in 0.4s"}
events = [_ev(0, "tool_result", tool="Bash", payload_ref="b1")]
assert _error_snippets(events, blobs) == []
def test_success_json_not_mined():
# a hub MCP success payload mentioning 'error' deep inside is NOT a failure
blobs = {"b1": '{"result": "{\\"domain\\": \\"custodian\\", \\"note\\": \\"no errors\\"}"}'}
events = [_ev(0, "tool_result", tool="mcp__state-hub__get_domain_summary", payload_ref="b1")]
assert _error_snippets(events, blobs) == []
def test_error_json_still_mined():
blobs = {"b1": '{"detail": "Invalid request parameters"}'}
events = [_ev(0, "tool_result", tool="Bash", payload_ref="b1")]
snips = _error_snippets(events, blobs)
assert len(snips) == 1
def test_plain_mcp_error_still_mined():
blobs = {"b1": "MCP error -32602: Invalid request parameters"}
events = [_ev(0, "tool_result", tool="Bash", payload_ref="b1")]
assert len(_error_snippets(events, blobs)) == 1
def test_file_read_snapshot_not_mined():
# a Read result of source code containing 'raise ...Error' is not a runtime error
blobs = {"b1": "227\t def f():\n228\t x = 1\n229\t raise InfospaceError()\n"}
events = [_ev(0, "tool_result", tool="Read", payload_ref="b1")]
assert _error_snippets(events, blobs) == []
def test_build_digest_includes_error_snippets_and_v2():
s = Session(session_uid="claude:s", flavor="claude", native_session_id="s", repo="r")
events = [_ev(0, "user_msg"), _ev(1, "error", payload_ref="b1"), _ev(2, "assistant_msg")]
d = build_digest(s, events, {"b1": "RuntimeError: kaboom at /a/b.py:3"})
assert d["schema_version"] == SCHEMA_VERSION == 2
assert d["error_snippets"][0]["count"] == 1
assert "RuntimeError" in d["error_snippets"][0]["sample"]
def test_no_errors_empty_list():
s = Session(session_uid="claude:s", flavor="claude", native_session_id="s", repo="r")
d = build_digest(s, [_ev(0, "user_msg"), _ev(1, "assistant_msg")])
assert d["error_snippets"] == []

View File

@@ -0,0 +1,78 @@
"""digest_lookup entrypoint tests (AGENTIC-WP-0011 T03)."""
import json
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.core.store import Store # noqa: E402
from session_memory.digest_lookup import lookup_digest, main, resolve_store_paths # noqa: E402
def _write_config(tmp_path) -> str:
store = tmp_path / ".store"
toml = tmp_path / "config.toml"
toml.write_text(
f'[store]\ndb_path = "{store / "m.db"}"\nblob_dir = "{store / "blobs"}"\n'
f'cursor = "{store / "c.json"}"\n')
return str(toml), str(store)
def _seed(store_dir, uid="claude:test-uid"):
st = Store(os.path.join(store_dir, "m.db"), os.path.join(store_dir, "blobs"))
st.write_digest(uid, {
"session_uid": uid,
"flavor": "claude",
"repo": "agentic-resources",
"outcome": "success",
"started_at": "2026-06-19T10:00:00Z",
"ended_at": "2026-06-19T11:00:00Z",
"cost": {"input_tokens": 100, "output_tokens": 25},
"tool_histogram": {"Bash": 10, "Edit": 5},
})
st.close()
return uid
def test_resolve_store_paths_from_config(tmp_path):
cfg_path, store_dir = _write_config(tmp_path)
db, blob = resolve_store_paths(config_path=cfg_path)
assert db.endswith("m.db")
assert blob.endswith("blobs")
assert store_dir in db
def test_resolve_store_paths_from_env(tmp_path, monkeypatch):
db = tmp_path / "custom" / "mem.db"
db.parent.mkdir(parents=True)
monkeypatch.setenv("HELIX_STORE_DB", str(db))
resolved_db, blob = resolve_store_paths()
assert resolved_db == str(db)
assert blob == str(tmp_path / "custom" / "blobs")
def test_lookup_digest_found_and_missing(tmp_path):
cfg_path, store_dir = _write_config(tmp_path)
uid = _seed(store_dir)
found = lookup_digest(uid, config_path=cfg_path)
assert found is not None and found["outcome"] == "success"
assert lookup_digest("claude:missing", config_path=cfg_path) is None
def test_main_json_success(tmp_path, capsys):
cfg_path, store_dir = _write_config(tmp_path)
uid = _seed(store_dir)
rc = main(["--config", cfg_path, uid, "--json"])
assert rc == 0
data = json.loads(capsys.readouterr().out)
assert data["session_uid"] == uid
assert data["repo"] == "agentic-resources"
def test_main_not_found(tmp_path, capsys):
cfg_path, store_dir = _write_config(tmp_path)
_seed(store_dir)
rc = main(["--config", cfg_path, "claude:missing"])
assert rc == 1
assert "not found" in capsys.readouterr().err.lower()

View File

@@ -0,0 +1,88 @@
"""Distributor base tests (WP-0007 T01): markers, idempotent upsert, rendering."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.schema import Resolution, SolutionPattern # noqa: E402
from session_memory.distribute.base import ( # noqa: E402
Artifact,
BaseDistributor,
Distributor,
render_markdown_body,
upsert_block,
wrap_block,
)
def _pattern(pid="sp-x", polarity="problem"):
return SolutionPattern(
id=pid, name="Read before edit", version="1.2.0", polarity=polarity,
problem="Agents edit files they have not read.",
resolutions=[Resolution(summary="Read the file first", detail="then Edit",
steps=["Read", "Edit"])],
rendering_hints={"claude": {"target": "CLAUDE.md"}},
)
def test_render_markdown_body_has_problem_and_resolution():
body = render_markdown_body(_pattern())
assert "### Read before edit" in body
assert "Agents edit files" in body
assert "**Avoid:**" in body # problem polarity
assert "- Read the file first — then Edit" in body
assert " - Read" in body
def test_success_polarity_label():
assert "**Prefer:**" in render_markdown_body(_pattern(polarity="success"))
def test_wrap_block_has_markers_and_version():
block = wrap_block("sp-x", "hello", "1.2.0")
assert block.startswith("<!-- BEGIN helix-forge pattern:sp-x --> v1.2.0")
assert block.rstrip().endswith("<!-- END helix-forge pattern:sp-x -->")
def test_upsert_inserts_then_replaces_in_place():
doc = "# Title\n\nsome text\n"
b1 = wrap_block("sp-x", "first", "1")
once = upsert_block(doc, "sp-x", b1)
assert "first" in once and once.count("BEGIN helix-forge pattern:sp-x") == 1
# re-distributing the same id replaces, does not duplicate
b2 = wrap_block("sp-x", "second", "2")
twice = upsert_block(once, "sp-x", b2)
assert "second" in twice and "first" not in twice
assert twice.count("BEGIN helix-forge pattern:sp-x") == 1
def test_upsert_keeps_other_patterns():
doc = upsert_block("", "sp-a", wrap_block("sp-a", "A"))
doc = upsert_block(doc, "sp-b", wrap_block("sp-b", "B"))
assert "sp-a" in doc and "sp-b" in doc
def test_base_distributor_renders_artifact():
d = BaseDistributor(flavor="claude", target_path="CLAUDE.md")
art = d.render(_pattern())
assert isinstance(art, Artifact)
assert isinstance(d, Distributor) # satisfies the protocol
assert art.flavor == "claude"
assert art.target_path == "CLAUDE.md"
assert "BEGIN helix-forge pattern:sp-x" in art.content
assert "Read before edit" in art.content
def test_body_hint_overrides_default():
p = _pattern()
p.rendering_hints["claude"]["body"] = "custom claude body"
d = BaseDistributor(flavor="claude", target_path="CLAUDE.md")
assert "custom claude body" in d.render(p).content
def test_target_hint_overrides_default():
p = _pattern()
p.rendering_hints["claude"]["target"] = "docs/CLAUDE.md"
d = BaseDistributor(flavor="claude", target_path="CLAUDE.md")
assert d.render(p).target_path == "docs/CLAUDE.md"

View File

@@ -0,0 +1,40 @@
"""Claude distributor tests (WP-0007 T02)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.schema import Resolution, SolutionPattern # noqa: E402
from session_memory.distribute.claude import ClaudeDistributor # noqa: E402
def _pattern(hints=None):
return SolutionPattern(
id="sp-read-before-edit", name="Read before edit", version="1.0.0",
polarity="problem", problem="Agents edit files they have not read.",
resolutions=[Resolution(summary="Read the file first", steps=["Read", "Edit"])],
rendering_hints=hints or {"claude": {}},
)
def test_default_targets_claude_md():
art = ClaudeDistributor().render(_pattern())
assert art.flavor == "claude"
assert art.target_path == "CLAUDE.md"
assert "BEGIN helix-forge pattern:sp-read-before-edit" in art.content
assert "### Read before edit" in art.content
def test_skill_mode_emits_skill_stub():
art = ClaudeDistributor().render(_pattern({"claude": {"as": "skill"}}))
assert "## Skill: Read before edit" in art.content
assert "**When:**" in art.content
assert " - Read" in art.content
def test_idempotent_marker_present_for_reupsert():
art = ClaudeDistributor().render(_pattern())
# same id in both renders -> caller can upsert in place
art2 = ClaudeDistributor().render(_pattern())
assert art.pattern_id == art2.pattern_id == "sp-read-before-edit"

View File

@@ -0,0 +1,49 @@
"""Codex + Grok distributor + registry tests (WP-0007 T03)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.schema import Resolution, SolutionPattern # noqa: E402
from session_memory.distribute.codex import CodexDistributor # noqa: E402
from session_memory.distribute.grok import GrokDistributor # noqa: E402
from session_memory.distribute.registry import all_flavors, get_distributor # noqa: E402
def _pattern():
return SolutionPattern(
id="sp-x", name="Read before edit", version="1.0.0", polarity="problem",
problem="Agents edit files they have not read.",
resolutions=[Resolution(summary="Read the file first")],
)
def test_codex_targets_agents_md():
art = CodexDistributor().render(_pattern())
assert art.flavor == "codex" and art.target_path == "AGENTS.md"
assert "Read before edit" in art.content
def test_grok_targets_native_instructions():
art = GrokDistributor().render(_pattern())
assert art.flavor == "grok" and art.target_path == ".grok/instructions.md"
def test_same_pattern_expressible_for_all_flavors():
# FR-A3: one pattern, rendered for every flavor (same body, different targets)
p = _pattern()
bodies = {}
for f in all_flavors():
art = get_distributor(f).render(p)
# strip markers -> compare agnostic body
inner = art.content.split("\n", 1)[1].rsplit("\n", 1)[0]
bodies[f] = inner
targets = {get_distributor(f).render(p).target_path for f in all_flavors()}
assert len(targets) == 3 # distinct per-flavor targets
assert len(set(bodies.values())) == 1 # identical agnostic body
def test_registry_unknown_flavor():
assert get_distributor("gpt") is None
assert set(all_flavors()) == {"claude", "codex", "grok"}

View File

@@ -0,0 +1,76 @@
"""Distribute entrypoint tests (WP-0007 T05)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.catalog import Catalog # noqa: E402
from session_memory.curate.schema import Resolution, Scope, SolutionPattern # noqa: E402
from session_memory.distribute.__main__ import build_targets, main, run_distribute # noqa: E402
def _pattern(pid, repos, flavors, status="approved", ready=True):
return SolutionPattern(
id=pid, name=pid, version="1.0.0", polarity="problem", problem="p",
resolutions=[Resolution(summary="do x")],
scope=Scope(repos=repos, flavors=flavors), status=status, distribution_ready=ready,
)
def _config(tmp_path):
return {
"repo_domain_map": {"agentic-resources": "helix_forge", "state-hub": "custodian"},
"curate": {"catalog_dir": str(tmp_path / "catalog")},
"distribute": {"proposals_dir": str(tmp_path / "proposals"),
"active_registry": str(tmp_path / "active.json")},
}
def test_build_targets_crosses_repos_and_flavors():
cfg = {"repo_domain_map": {"r1": "d1", "r2": "d2"}}
targets = build_targets(cfg)
assert len(targets) == 2 * 3 # 2 repos x 3 flavors
assert build_targets(cfg, repo_filter="r1") and all(t.repo == "r1"
for t in build_targets(cfg, repo_filter="r1"))
assert all(t.flavor == "claude" for t in build_targets(cfg, flavor_filter="claude"))
def test_run_distribute_scopes_to_catalog(tmp_path):
cfg = _config(tmp_path)
cat = Catalog(cfg["curate"]["catalog_dir"])
# in-scope for agentic-resources/claude only
cat.upsert(_pattern("sp-a", ["agentic-resources"], ["claude"]))
# provisional -> must be skipped
cat.upsert(_pattern("sp-prov", [], [], status="provisional", ready=False))
res = run_distribute(cfg)
rendered = {pid for _, _, pid, _ in res.proposals}
assert "sp-a" in rendered
assert "sp-prov" not in rendered
assert "sp-prov" in res.skipped_not_distributable
# landed only in the agentic-resources/CLAUDE.md proposal
p = os.path.join(cfg["distribute"]["proposals_dir"], "agentic-resources", "CLAUDE.md")
assert os.path.exists(p)
assert not os.path.exists(
os.path.join(cfg["distribute"]["proposals_dir"], "state-hub", "CLAUDE.md"))
def test_main_runs_json(tmp_path, capsys):
cfg = _config(tmp_path)
cat = Catalog(cfg["curate"]["catalog_dir"])
cat.upsert(_pattern("sp-a", [], ["claude"])) # unrestricted repos
# write a config file
import json as _json
cfg_path = tmp_path / "c.json"
# main() loads TOML; emulate by calling run_distribute path via a tiny toml
toml = tmp_path / "config.toml"
toml.write_text(
f'[repo_domain_map]\nagentic-resources = "helix_forge"\n'
f'[curate]\ncatalog_dir = "{cfg["curate"]["catalog_dir"]}"\n'
f'[distribute]\nproposals_dir = "{cfg["distribute"]["proposals_dir"]}"\n'
f'active_registry = "{cfg["distribute"]["active_registry"]}"\n')
rc = main(["--config", str(toml), "--json"])
assert rc == 0
out = capsys.readouterr().out
assert "sp-a" in out
_json.loads(out) # valid JSON

View File

@@ -0,0 +1,79 @@
"""Scoping + proposals + active registry tests (WP-0007 T04)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.schema import Resolution, Scope, SolutionPattern # noqa: E402
from session_memory.distribute.proposals import ( # noqa: E402
ActiveRegistry,
Target,
applies,
propose,
)
def _pattern(pid="sp-x", repos=None, flavors=None, status="approved", ready=True):
return SolutionPattern(
id=pid, name="Read before edit", version="1.0.0", polarity="problem",
problem="edit before read", resolutions=[Resolution(summary="read first")],
scope=Scope(repos=repos or [], flavors=flavors or []),
status=status, distribution_ready=ready,
)
def test_applies_respects_scope():
p = _pattern(repos=["agentic-resources"], flavors=["claude"])
assert applies(p, Target("agentic-resources", flavor="claude"))
assert not applies(p, Target("other-repo", flavor="claude"))
assert not applies(p, Target("agentic-resources", flavor="codex"))
def test_empty_scope_is_unrestricted():
assert applies(_pattern(), Target("any", flavor="grok"))
def test_propose_writes_scoped_proposal_files(tmp_path):
out = str(tmp_path / "proposals")
reg = ActiveRegistry(str(tmp_path / "active.json"))
p = _pattern(flavors=["claude"])
res = propose([p], [Target("agentic-resources", flavor="claude"),
Target("agentic-resources", flavor="codex")], out, reg)
# only claude target is in scope
assert len(res.proposals) == 1
path = os.path.join(out, "agentic-resources", "CLAUDE.md")
assert os.path.exists(path)
assert "BEGIN helix-forge pattern:sp-x" in open(path).read()
def test_not_distributable_skipped(tmp_path):
reg = ActiveRegistry(str(tmp_path / "active.json"))
prov = _pattern(status="provisional", ready=False)
res = propose([prov], [Target("r", flavor="claude")], str(tmp_path / "p"), reg)
assert res.proposals == []
assert "sp-x" in res.skipped_not_distributable
def test_proposals_idempotent_on_rerun(tmp_path):
out = str(tmp_path / "proposals")
reg_path = str(tmp_path / "active.json")
p = _pattern()
propose([p], [Target("r", flavor="claude")], out, ActiveRegistry(reg_path))
propose([p], [Target("r", flavor="claude")], out, ActiveRegistry(reg_path))
content = open(os.path.join(out, "r", "CLAUDE.md")).read()
assert content.count("BEGIN helix-forge pattern:sp-x") == 1 # no duplication
def test_active_registry_records_environment(tmp_path):
reg_path = str(tmp_path / "active.json")
reg = ActiveRegistry(reg_path)
propose([_pattern()], [Target("r", domain="helix_forge", flavor="claude")],
str(tmp_path / "p"), reg)
reg2 = ActiveRegistry(reg_path) # reload from disk
entries = reg2.entries()
assert len(entries) == 1
assert entries[0]["pattern_id"] == "sp-x"
assert entries[0]["repo"] == "r"
assert entries[0]["flavor"] == "claude"
assert entries[0]["status"] == "proposed"

View File

@@ -0,0 +1,92 @@
"""Grok adapter tests (T02): synthetic session dir + real local sessions."""
import glob
import json
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.adapters.grok import parse_session # noqa: E402
REPO_MAP = {"agentic-resources": "helix_forge", "net-kingdom": "netkingdom",
"can-you-assist": "coulomb_social"}
def _mk_session(dir_path, sid):
os.makedirs(dir_path, exist_ok=True)
with open(os.path.join(dir_path, "summary.json"), "w") as f:
json.dump({"info": {"id": sid, "cwd": "/home/worsch/agentic-resources"},
"created_at": "2026-06-06T10:00:00Z",
"last_active_at": "2026-06-06T10:05:00Z",
"current_model_id": "grok-build", "head_branch": "main"}, f)
with open(os.path.join(dir_path, "events.jsonl"), "w") as f:
f.write(json.dumps({"ts": "2026-06-06T10:00:00Z", "type": "turn_started",
"turn_number": 0, "model_id": "grok-build"}) + "\n")
f.write(json.dumps({"ts": "2026-06-06T10:05:00Z", "type": "turn_ended",
"turn_number": 0}) + "\n")
with open(os.path.join(dir_path, "chat_history.jsonl"), "w") as f:
for rec in [
{"type": "system", "content": "sys prompt"},
{"type": "user", "content": [{"type": "text", "text": "fix the bug"}]},
{"type": "reasoning", "content": [{"type": "text", "text": "thinking..."}]},
{"type": "assistant", "content": ""}, # empty -> skipped
{"type": "tool_result", "content": "The file x.py has been updated"},
{"type": "assistant", "content": "done"},
{"type": "tool_result", "content": "6 passed"},
]:
f.write(json.dumps(rec) + "\n")
with open(os.path.join(dir_path, "updates.jsonl"), "w") as f:
for u in [
{"sessionUpdate": "tool_call", "toolCallId": "c1", "title": "edit_file",
"rawInput": {"target_file": "x.py"}},
{"sessionUpdate": "tool_call", "toolCallId": "c2", "title": "shell",
"rawInput": {"command": "pytest -q"}},
]:
f.write(json.dumps({"timestamp": "t", "method": "session/update",
"params": {"sessionId": sid, "update": u}}) + "\n")
def test_grok_synthetic_dir(tmp_path):
d = tmp_path / "%2Fhome%2Fworsch%2Fagentic-resources" / "sid-1"
_mk_session(str(d), "sid-1")
norm = parse_session(str(d / "chat_history.jsonl"), REPO_MAP)
assert norm is not None
s = norm.session
assert s.session_uid == "grok:sid-1"
assert s.flavor == "grok"
assert s.repo == "agentic-resources" and s.domain == "helix_forge"
assert s.model == "grok-build"
assert s.git_branch == "main"
assert s.cost.turns == 1
assert s.cost.wall_clock_s == 300.0
kinds = [e.kind for e in norm.events]
# 4 lifecycle from events.jsonl? no: turn_started + turn_ended = 2 lifecycle
assert kinds.count("lifecycle") == 2
assert "user_msg" in kinds and "thinking" in kinds and "assistant_msg" in kinds
# paired tool calls recovered names -> edit + test_run, each followed by tool_result
assert "edit" in kinds and "test_run" in kinds
edit = next(e for e in norm.events if e.kind == "edit")
assert edit.tool == "edit_file"
# tool_result after test_run links to it
tr = [e for e in norm.events if e.kind == "tool_result"]
assert len(tr) == 2
def test_real_local_grok_sessions_if_available():
base = os.path.expanduser("~/.grok/sessions")
chats = glob.glob(os.path.join(base, "*", "*", "chat_history.jsonl"))
if not chats:
return
parsed = 0
for c in chats:
norm = parse_session(c, REPO_MAP)
if norm is None:
continue
parsed += 1
assert norm.session.session_uid.startswith("grok:")
seqs = [e.seq for e in norm.events]
assert seqs == sorted(seqs) and len(seqs) == len(set(seqs))
assert parsed >= 1

View File

@@ -0,0 +1,49 @@
"""Before/after effectiveness tests (WP-0009 T02)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.measure.effect import effectiveness, split_by_date # noqa: E402
def _digest(ts, tools=None, errors=0, outcome="success"):
return {
"started_at": ts, "outcome": outcome,
"cost": {"input_tokens": 100, "output_tokens": 0},
"tool_histogram": tools or {"Bash": 10},
"error_snippets": [{"fingerprint": f"e{i}", "count": 1} for i in range(errors)],
}
def test_split_by_date():
digs = [_digest("2026-06-01"), _digest("2026-06-05"), _digest("2026-06-10")]
before, after = split_by_date(digs, "2026-06-05")
assert len(before) == 1 and len(after) == 2 # >= applied_at goes to after
def test_effectiveness_detects_improvement():
# before: lots of errors + hub overhead; after: clean
before = [_digest("2026-06-01", tools={"mcp__state-hub__x": 8, "Bash": 2}, errors=3)
for _ in range(3)]
after = [_digest("2026-06-10", tools={"Bash": 10}, errors=0) for _ in range(3)]
e = effectiveness(before + after, "2026-06-05", label="read-before-edit")
assert not e["insufficient_data"]
assert e["n_before"] == 3 and e["n_after"] == 3
assert e["deltas"]["error_rate"]["improved"] is True
assert e["deltas"]["infra_overhead_share_median"]["improved"] is True
assert e["deltas"]["error_rate"]["change"] < 0
def test_effectiveness_insufficient_data():
e = effectiveness([_digest("2026-06-01")], "2026-06-05")
assert e["insufficient_data"] is True
assert e["deltas"] == {}
def test_success_rate_higher_is_better():
before = [_digest("2026-06-01", outcome="fail") for _ in range(2)]
after = [_digest("2026-06-10", outcome="success") for _ in range(2)]
e = effectiveness(before + after, "2026-06-05")
assert e["deltas"]["success_rate"]["improved"] is True

View File

@@ -0,0 +1,79 @@
"""Measure entrypoint tests (WP-0009 T03)."""
import json
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.core.store import Store # noqa: E402
from session_memory.measure.__main__ import main, real_digests # noqa: E402
from session_memory.measure.metrics import load_baselines # noqa: E402
def _digest(uid, ts, tools=None):
return {
"session_uid": uid, "flavor": "claude", "repo": "agentic-resources",
"outcome": "success", "started_at": ts,
"cost": {"input_tokens": 100, "output_tokens": 10},
"event_count": 40, "first_prompt": "Implement the measure entrypoint cleanly",
"tool_histogram": tools or {"Bash": 20, "Edit": 12, "Read": 8},
"error_snippets": [],
}
def _write_config(tmp_path) -> str:
store = tmp_path / ".store"
toml = tmp_path / "config.toml"
toml.write_text(
f'[store]\ndb_path = "{store / "m.db"}"\nblob_dir = "{store / "blobs"}"\n'
f'cursor = "{store / "c.json"}"\n'
f'[measure]\nbaselines = "{tmp_path / "baselines.jsonl"}"\n')
return str(toml), str(store)
def _seed(store_dir):
st = Store(os.path.join(store_dir, "m.db"), os.path.join(store_dir, "blobs"))
st.write_digest("claude:a", _digest("claude:a", "2026-06-01"))
st.write_digest("claude:b", _digest("claude:b", "2026-06-10",
tools={"mcp__state-hub__x": 18, "Bash": 8, "Edit": 4}))
st.close()
def test_real_digests_filters_and_loads(tmp_path):
cfg_path, store_dir = _write_config(tmp_path)
_seed(store_dir)
from session_memory.ingest import load_config
digs = real_digests(load_config(cfg_path))
assert len(digs) == 2
def test_main_writes_baseline_and_reports(tmp_path, capsys):
cfg_path, store_dir = _write_config(tmp_path)
_seed(store_dir)
rc = main(["--config", cfg_path, "--label", "first"])
assert rc == 0
out = capsys.readouterr().out
assert "Fleet metrics" in out
rows = load_baselines(str(tmp_path / "baselines.jsonl"))
assert len(rows) == 1 and rows[0]["label"] == "first"
def test_main_no_save_and_json(tmp_path, capsys):
cfg_path, store_dir = _write_config(tmp_path)
_seed(store_dir)
rc = main(["--config", cfg_path, "--no-save", "--json"])
assert rc == 0
data = json.loads(capsys.readouterr().out)
assert data["current"]["n_sessions"] == 2
assert not os.path.exists(str(tmp_path / "baselines.jsonl"))
def test_main_effectiveness_since(tmp_path, capsys):
cfg_path, store_dir = _write_config(tmp_path)
_seed(store_dir)
rc = main(["--config", cfg_path, "--no-save", "--since", "2026-06-05", "--json"])
assert rc == 0
data = json.loads(capsys.readouterr().out)
assert data["effectiveness"]["n_before"] == 1
assert data["effectiveness"]["n_after"] == 1

View File

@@ -0,0 +1,63 @@
"""Fleet metrics + baseline tests (WP-0009 T01)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.measure.metrics import ( # noqa: E402
aggregate,
load_baselines,
save_baseline,
session_metrics,
snapshot,
)
def _digest(tools=None, errors=0, tokens=100, outcome="success"):
return {
"outcome": outcome,
"cost": {"input_tokens": tokens, "output_tokens": 0},
"tool_histogram": tools or {"Bash": 10, "Edit": 5},
"error_snippets": [{"fingerprint": f"e{i}", "count": 1} for i in range(errors)],
}
def test_session_metrics_overhead_and_errors():
m = session_metrics(_digest(tools={"mcp__state-hub__create_task": 6, "Bash": 4}, errors=2))
assert abs(m["infra_overhead_share"] - 0.6) < 1e-9
assert m["error_occurrences"] == 2
assert m["has_error"] is True
def test_aggregate_rates_and_percentiles():
digs = [
_digest(tools={"mcp__state-hub__x": 8, "Bash": 2}, errors=1, tokens=50), # 80% overhead
_digest(tools={"Bash": 9, "Edit": 1}, errors=0, tokens=200), # 0% overhead
_digest(tools={"ToolSearch": 6, "Bash": 4}, errors=0, tokens=100, outcome="fail"),
]
a = aggregate(digs)
assert a["n_sessions"] == 3
assert a["error_rate"] == round(1 / 3, 3)
assert a["success_rate"] == round(2 / 3, 3)
assert a["schema_thrash_sessions"] == 1 # the ToolSearch=6 session
assert 0 <= a["infra_overhead_share_median"] <= 1
def test_aggregate_empty():
assert aggregate([]) == {"n_sessions": 0}
def test_snapshot_has_timestamp_and_label():
s = snapshot([_digest()], label="baseline")
assert s["label"] == "baseline"
assert "captured_at" in s and s["n_sessions"] == 1
def test_baseline_roundtrip_appends(tmp_path):
path = str(tmp_path / "baselines.jsonl")
save_baseline(snapshot([_digest()], label="a"), path)
save_baseline(snapshot([_digest(), _digest()], label="b"), path)
rows = load_baselines(path)
assert [r["label"] for r in rows] == ["a", "b"]
assert rows[1]["n_sessions"] == 2

66
tests/test_merge.py Normal file
View File

@@ -0,0 +1,66 @@
"""Multi-file session merge tests (T03)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.adapters.common import Normalized # noqa: E402
from session_memory.core.schema import Session, SessionEvent # noqa: E402
from session_memory.core.store import Store # noqa: E402
def _part(native, kinds, base_blob="b"):
uid = Session.make_uid("claude", native)
s = Session(session_uid=uid, flavor="claude", native_session_id=native)
events, blobs = [], {}
for i, k in enumerate(kinds):
ref = f"blob://{native}/{i}"
events.append(SessionEvent(session_uid=uid, seq=i, parent_seq=(i - 1 if i else None),
kind=k, ts=f"2026-06-06T10:0{i}:00Z", payload_ref=ref))
blobs[ref] = f"{base_blob}-{k}-{i}"
return Normalized(session=s, events=events, blobs=blobs)
def test_second_file_appends_not_overwrites(tmp_path):
st = Store(str(tmp_path / "m.db"), str(tmp_path / "blobs"))
uid = Session.make_uid("claude", "s1")
# file 1: 3 events (seq 0..2)
n1 = _part("s1", ["user_msg", "assistant_msg", "tool_call"])
added1 = st.ingest(n1)
assert added1 == 3
assert st.count_events(uid) == 3
# file 2 for the SAME session: repeats event 0 + adds 2 new (continuation)
n2 = _part("s1", ["user_msg", "edit", "completion"])
# make the first event identical to file1's first event so it dedups
n2.events[0].kind = "user_msg"
n2.events[0].ts = "2026-06-06T10:00:00Z"
n2.blobs[n2.events[0].payload_ref] = "b-user_msg-0"
added2 = st.ingest(n2)
# only the 2 genuinely-new events appended; total grows additively
assert added2 == 2
assert st.count_events(uid) == 5
seqs = [e.seq for e in st.get_events(uid)]
assert seqs == [0, 1, 2, 3, 4] # contiguous, offset
def test_reingest_same_bundle_is_idempotent(tmp_path):
st = Store(str(tmp_path / "m.db"), str(tmp_path / "blobs"))
uid = Session.make_uid("claude", "s2")
n = _part("s2", ["user_msg", "assistant_msg"])
assert st.ingest(n) == 2
assert st.ingest(n) == 0 # nothing new on re-run
assert st.count_events(uid) == 2
def test_appended_event_parent_remapped_within_part(tmp_path):
st = Store(str(tmp_path / "m.db"), str(tmp_path / "blobs"))
uid = Session.make_uid("claude", "s3")
st.ingest(_part("s3", ["user_msg", "assistant_msg"])) # seq 0,1
st.ingest(_part("s3", ["x_unused"]) if False else _part("s3", ["thinking", "edit"])) # new 2,3
events = {e.seq: e for e in st.get_events(uid)}
# the 'edit' (seq 3) had parent_seq=0 within its part -> remapped to its part's first new seq (2)
assert events[3].parent_seq == 2

Some files were not shown because too many files have changed in this diff Show More