Compare commits

...

53 Commits

Author SHA1 Message Date
43bea485aa established rules 2026-06-22 23:06:36 +02:00
63eb431db9 Add .repo-classification.yaml (CUST-WP-0050 T11 agent first-pass) 2026-06-22 17:47:34 +02:00
3250a1746f chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-21:
  - update .custodian-brief.md for agentic-resources
2026-06-21 16:09:45 +02:00
41bfb6e0f3 workplan: finish AGENTIC-WP-0011 and sync State Hub IDs
Mark kaizen correlation follow-up finished; add workstream and task IDs
written by fix-consistency so hub and file stay aligned.
2026-06-21 16:09:34 +02:00
d2e50cf96a chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-19:
  - update .custodian-brief.md for agentic-resources
2026-06-19 20:37:50 +02:00
01d2affc3b Implement AGENTIC-WP-0011 kaizen correlation follow-up
Add bidirectional doc links (PRD §9.1, README, DESIGN §11), session-close
HELIX_* env convention, stable digest JSON contract, and digest_lookup CLI
for read-only correlate lookups. All tasks done; 163 tests green.
2026-06-19 20:27:00 +02:00
292b656952 workplan: AGENTIC-WP-0011 kaizen correlation follow-up
File ready workplan for bidirectional doc links, session-close env export
convention, and stable digest read path per kaizen-agentic coordination.
2026-06-19 20:24:39 +02:00
0a5ba5c24a docs: add credential routing guidance for agent runtimes
Inline ops-warden CredentialRouting canon into AGENTS.md and mirror it
as a Claude Code rule so agents route secret and access requests correctly.
2026-06-19 20:24:35 +02:00
a66d502b95 docs: add kaizen-agentic project metrics correlation (WP-0005 T16)
Link Helix Forge fleet session memory to kaizen-agentic ADR-004 project
metrics via helix_session_uid. Reciprocal reference to the cross-repo
correlation contract.
2026-06-16 07:13:07 +02:00
f9f91a0ca8 Add capability registry scaffold (REUSE-WP-0014-T03 B01)
Empty helix_forge registry layout for federation publishing.
2026-06-16 01:50:07 +02:00
06bcfdc1d9 session-memory: refresh published retro report artifacts
Latest retro publish (30-day window) regenerated last_retro.{json,md} — 30
ranked suggestions across 13 repos with catalog-sourced recommendations. This is
the read model published to the hub to unblock activity-core ACTIVITY-WP-0008.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 21:48:18 +02:00
e237dcc622 session-memory: map signals to catalog recommendations via covers (WP-0010 follow-up)
Closes the gap where recurring_error suggestions showed generic 'Investigate'
instead of the curated recommendation. Added a covers[] field to SolutionPattern
(lowercase substrings a pattern's recommendation also applies to) + Catalog.find_for
(exact key first, then covers match against signal key+locus). Retro now resolves
recommendations through find_for. Tagged the read-before-edit pattern with
covers=['file has not been read','modified since read','file_not_read'] (v1.0.1).
Live: file-not-read suggestions across all repos now inherit 'Read the file before
Edit/Write'. 6 new tests; suite 158/158.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 21:09:44 +02:00
0d05dfcc5d session-memory: weekly retro entrypoint + hub publish (AGENTIC-WP-0010)
The analysis half of the weekly coding retrospection. retro/build.py: windowed
detect+measure -> top-3 improvement suggestions per repo (cross-flavor first,
recommendations pulled from the Pattern Catalog) + fleet snapshot. retro/publish.py:
publishes the report to the hub as the coding_retro read model (event_type=
coding_retro progress event) + local JSON/md, graceful degrade. retro entrypoint
with --window-days/--publish/--json. Live verify over real sessions surfaced
per-repo suggestions with catalog recommendations. 13 new tests; suite 152/152.

Consumed by activity-core ACTIVITY-WP-0008 (Weekly Coding Retrospection, Sat 19:00).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 19:17:24 +02:00
15ba625351 session-memory: fill real resolutions into auto-approved catalog stubs
Replaced the placeholder 'TODO: capture the recommended resolution' in the five
auto-approved patterns with grounded problem descriptions + concrete resolutions
drawn from the friction assessment: budget_overrun (read narrowly / checkpoint),
infra_overhead (batch hub writes / orient once), schema_thrash (front-load tool
schemas), tool_thrash (batch shell + larger edits), clean_pass (tests gate done).
Each versioned 1.0.0 -> 1.0.1 with the stub archived to <id>.history.jsonl.
Proposals regenerate with real content (0 TODO). Suite 139/139.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 16:26:56 +02:00
4f28cd67cf session-memory: Phase 4 Measure — baseline, effectiveness, trend (WP-0009)
Closes the loop. metrics.py: fleet metrics (infra-overhead share, error rate,
schema-thrash, token percentiles, success) + persisted baseline trend. effect.py:
before/after per-pattern effectiveness with an improved verdict per metric.
measure entrypoint with trend + --since effectiveness + JSON. Recorded pre-fix
baseline: 27 sessions, overhead median 11.7%, error rate 0.96, schema-thrash 8.
13 new tests; suite 139/139. Capture->Detect->Curate->Distribute->Measure complete.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 15:49:22 +02:00
035c7a20d3 session-memory: Read-before-Edit reflex + curated pattern (WP-0008)
Acts on the #1 friction finding. T01: added a data-cited Read-before-Edit /
re-read-on-stale reflex to AGENTS.md (top error: 'File has not been read yet',
12/27 sessions). T02: captured it as a curated SolutionPattern
(sp-problem-file_not_read-edit, approved/distribution_ready) with real
resolutions + per-flavor hints, so Distribute proposes it across repos/flavors —
closing assess->curate->distribute on a real pattern. Suite 126/126.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 15:27:22 +02:00
59632e94db session-memory: distribute entrypoint + live verify (WP-0007 T05)
python -m session_memory.distribute: reads approved catalog patterns, builds
targets from repo->domain map x flavors, renders scoped per-flavor proposals
(HITL) + active registry. Live verify against the real catalog: 12 renders
across 5 repos, idempotent, provisional skipped. proposals/ gitignored
(regenerated); active_patterns.json committed. README documents detect->curate->
distribute. Phase 3 finished; suite 126/126.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 15:25:20 +02:00
00e8958540 session-memory: scoping + proposals + active registry (WP-0007 T04)
distribute/proposals.py: Scope-aware targeting (FR-X2, empty axis = any), render
distributable (approved+distribution_ready) patterns into a proposals/ tree
mirroring target paths — proposed not applied (FR-X3, HITL), idempotent on re-run.
ActiveRegistry (FR-X4) records which pattern+version is proposed in which
(repo,flavor). 6 new tests; suite 123/123.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 15:09:40 +02:00
9e28b1b806 session-memory: Claude + Codex + Grok distributors + registry (WP-0007 T02/T03)
Thin per-flavor distributors over the shared base: Claude (CLAUDE.md, optional
skill-stub mode), Codex (AGENTS.md), Grok (.grok/instructions.md). registry maps
flavor->distributor — adding a flavor is one entry + one module. Same agnostic
body renders to distinct per-flavor targets (FR-A3). 7 new tests; suite 117/117.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 15:06:15 +02:00
7646cbc358 session-memory: distributor base + Artifact (WP-0007 T01)
distribute/base.py: Artifact dataclass + Distributor protocol + idempotent
BEGIN/END snippet markers (upsert_block replaces a pattern's block in place so
re-distribution doesn't duplicate) + agnostic markdown body rendering from
SolutionPattern fields. BaseDistributor honours per-flavor body/target hints.
8 new tests; suite 110/110.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 15:02:47 +02:00
9e6f8a6e08 Register WP-0007 (Distribute), WP-0008 (Read-before-Edit), WP-0009 (Measure)
Three workplans queued and registered with the State Hub (via REST — MCP write
layer is erroring this session):
- AGENTIC-WP-0007 Phase 3 Distribute: per-flavor distributor adapters render
  approved catalog patterns into proposed (HITL) artifacts, scoped by repo/domain.
- AGENTIC-WP-0008 Read-before-Edit reflex: act on the #1 friction finding.
- AGENTIC-WP-0009 Phase 4 Measure: baseline + before/after effectiveness + trend.
Proceeding in that order.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 14:58:03 +02:00
ea03cbdd47 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-07:
  - update .custodian-brief.md for agentic-resources
2026-06-07 13:46:45 +02:00
1b6081cd88 session-memory: denoise error fingerprints (WP-0006 follow-up)
Tighten _is_failed: exclude successful hub JSON responses (top-level no-error
payloads) and file-read snapshots (numbered cat -n source lines) that were
polluting error_snippets. JSON verdict classifies error vs success payloads
directly. Cuts distinct fingerprints 444 -> 269 (~40%) over the real corpus with
the top errors unchanged. Assessment caveat updated. 5 new tests; suite 102/102.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 13:39:08 +02:00
7cce276d32 session-memory: error root-cause assessment + v2 re-ingest (WP-0006 T03)
Re-ingested under schema v2 (populates error_snippets) and re-ran detect over
27 real sessions. Added a 'content-level root causes' section to
docs/ASSESSMENT-infra-friction.md: top recurring error is Edit/Write-before-Read
(12/27 sessions, 8 repos), then stale-read conflicts, a cross-flavor (claude+grok)
make fix-consistency failure, and State Hub MCP instability. Documented a
fingerprint-noise caveat. WP-0006 finished; suite 98/98.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 13:09:29 +02:00
e022c0f9d6 session-memory: recurring-error signal + clustering (WP-0006 T02)
detect/signals.py sig_recurring_error emits one signal per distinct error
fingerprint per session (magnitude = in-session occurrences), so the same error
recurring across sessions/repos/flavors clusters into a candidate root-cause
problem pattern via the existing clusterer — cross-flavor flagged automatically.
3 new tests; suite 98/98 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 13:01:29 +02:00
2bd6aa3b41 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-07:
  - update .custodian-brief.md for agentic-resources
2026-06-07 12:48:18 +02:00
97379e9658 session-memory: error-body mining into digest (WP-0006 T01)
build_digest now extracts normalized error fingerprints + samples from failed
events (error kind + failing tool_result bodies) into a durable error_snippets
list — paths/numbers/uuids/addrs stripped so the same error collapses to one
fingerprint with a count; Python traceback header skipped in favour of the real
exception line. Durable in Tier 2 (survives Tier 1 eviction). SCHEMA_VERSION ->
2 (re-ingest needed to populate). 7 new tests; suite 95/95 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 12:45:01 +02:00
dbd212d2b1 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-07:
  - update .custodian-brief.md for agentic-resources
2026-06-07 11:59:38 +02:00
896fde59f0 Register AGENTIC-WP-0006 (error-body mining) workplan
Captures normalized error fingerprints into the durable digest and clusters
recurring root-cause errors across sessions — closes the content-level 'why' gap
called out in the friction assessment. 3 tasks; we implement this in helix_forge.
(State Hub skill handed off to the state-hub worker as STATE-WP-0058.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 11:56:17 +02:00
48618293b0 session-memory: friction assessment + hardened catalog (WP-0005 T03)
Re-ran ingest->detect with the quality filter + infra signals over real local
sessions (72 captured -> 27 real). Purged the false-positive 'abandoned' catalog
entry and re-curated; catalog now carries tool_thrash/schema_thrash/infra_overhead
patterns. docs/ASSESSMENT-infra-friction.md ranks the friction: ~17.6% of real
tool activity is hub/task/schema plumbing (State Hub 10.3%, one session 231 calls;
ToolSearch in 81% of sessions). Validates the CLI/MCP-skill hypothesis as top-2;
recommends a State Hub skill (front-load schemas + batched writes) + bulk hub ops.
Workplan finished; suite 88/88.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 11:18:27 +02:00
21c714e286 session-memory: infra-overhead + thrash signals (WP-0005 T02)
signals.py: tool_bucket helper + three tool_histogram-based extractors that the
outcome/marker signals were blind to — sig_infra_overhead (hub+task+schema share
of tool calls over threshold), sig_schema_thrash (repeated ToolSearch), and
sig_tool_thrash (one tool dominating). Thresholds in build_context. 8 new tests;
suite 88/88 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 11:12:09 +02:00
70433cda61 session-memory: session-quality filter (WP-0005 T01)
detect/quality.py: is_real_coding_session drops health-checks / smoke-tests /
interrupted / trivially-short sessions (event floor, repo present, substantive
tool activity, non-trivial prompt). Wired into run_detect so signals only form
over real sessions — fixes the abandoned false-positive. [detect.quality] knobs;
existing detect/curate fixtures made realistic. 8 new tests; suite 80/80.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 11:07:22 +02:00
56b2f576de AGENTIC-WP-0001: complete T02 + close bootstrap workplan
T02 was the one genuinely-incomplete bootstrap task: AGENTS.md had no
dev-workflow section. Added one documenting the pure-stdlib Python 3.11+
toolchain, pytest, and the session_memory ingest/detect/curate entrypoints so
future sessions can verify changes. T01 (integration files) and T03 (first real
workplan) were already satisfied; reconciled stale ready/todo bookkeeping to
finished/done.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 10:15:23 +02:00
d06791f070 session-memory Phase 2: verify + catalog artifacts (T07)
End-to-end verification over real local sessions: ingest 94->93 -> 72 digests;
detect 3 candidates (2 cross-flavor); curate --auto-approve cataloged 3
SolutionPatterns (2 cross-flavor approved/distribution_ready, 1 Claude-only),
re-run fully idempotent, 3 hub decisions queued (API offline). Commits the 3
catalog artifacts as the source of truth. PRD §12 OQ4/OQ5/OQ6 marked resolved;
README + design refreshed. Workplan finished; suite 72/72.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 10:08:52 +02:00
519e76442a session-memory Phase 2: curate entrypoint + README (T06)
python -m session_memory.curate: refreshes detect candidates, then drives them
through review interactively or with --auto-approve (batch, gate-driven) /
--json. Emits a catalog diff summary; queues hub decisions when offline.
[curate] config gains decision_queue + workstream id. README documents the
detect -> curate -> distribute flow and the gate knobs. 2 new tests; suite 72/72.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 10:00:56 +02:00
4b7a628b6f session-memory Phase 2: hub decision integration (T05)
decisions.py: every final promote/reject becomes a record_decision-shaped
payload (rationale + source key + evidence snapshot). DecisionRecorder degrades
gracefully under a hub outage — pluggable sink with a durable local-queue
fallback and ordered flush/replay (mirrors Phase 1's after-the-fact sync).
Wired into review() via an optional recorder. 6 new tests; suite 70/70 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 00:31:22 +02:00
ab22d22bfb session-memory Phase 2: evidence-bar + bloat guard (T04)
gating.py: two-tier evidence bar (OQ5) — promote floor (frequency/sessions/
cost_impact) plus a stricter distribution-eligibility floor that sets a
promoted pattern to approved+distribution_ready vs provisional. Wired into
review() so thin approvals land provisional. bloat_warnings flags duplicate
and near-duplicate (same signal-type+locus) candidates (OQ6). [curate]/
[curate.gate] knobs in config.toml. 6 new tests; suite 64/64 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 00:28:34 +02:00
e51fd8154d session-memory Phase 2: review workflow (T03)
UI-free discuss/approve/reject engine driving detect candidates into the
catalog via a decide callback. candidate_to_pattern builds a provisional
SolutionPattern with per-flavor rendering-hint stubs. ReviewLog makes
re-review idempotent: prior rejects remembered, re-surfaced only when the
evidence fingerprint changes. 6 new tests; suite 58/58 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 00:25:10 +02:00
c6164a82ba session-memory Phase 2: versioned Pattern Catalog store (T02)
Files-first catalog (one JSON per pattern, id = source-key). Single
idempotent upsert path: added / unchanged / updated (status-only, no bump) /
versioned (content change bumps semver + archives prior to <id>.history.jsonl).
Dedup is structural on pattern id. 5 new tests; suite 52/52 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 00:18:01 +02:00
5f810a6992 session-memory Phase 2: Solution Pattern schema (T01)
Curate package scaffold + flavor-agnostic SolutionPattern artifact with
separate per-flavor rendering hints (OQ4): Resolution/Scope/Provenance
sub-records, stable source-key id, semver bump helper, deterministic
round-trip serialization. 7 new tests; suite 47/47 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 00:16:46 +02:00
43d76b5cf8 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-07:
  - update .custodian-brief.md for agentic-resources
2026-06-07 00:11:12 +02:00
055713aa4f session-memory Phase 1: T08 verify across all three flavors + docs
Marks AGENTIC-WP-0003 finished. Full suite 40/40 green; live pipeline
over real local sessions (Codex via fixtures) surfaced 3 candidate
patterns, 2 cross-flavor (Claude+Grok) — PRD success metric met.
README documents the detect entrypoint and Phase 0/1/next status.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 23:39:37 +02:00
436a96dcd8 session-memory Phase 1: Detect pipeline (T04-T07)
- detect/signals.py: pure extractors over digests (retry storm, repeated
  errors, budget overrun vs corpus p90, abandoned, clean pass, recovery)
- detect/cluster.py: deterministic clustering into candidate Patterns with
  evidence (sessions/repos/flavors/cost impact) + cross-flavor flagging
- detect/__main__.py: python -m session_memory.detect, ranked report
  (cross-flavor first) + --json; persists candidates to Tier 2 patterns table
- core/store.py: list_digests + save_patterns
- tests for signals, cluster, detect entrypoint

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 22:31:13 +02:00
06767ef924 session-memory Phase 1: Grok adapter (T02)
- adapters/grok.py: reads the per-session dir (summary.json + chat_history.jsonl
  + events.jsonl + updates.jsonl); conversation from chat_history, lifecycle/
  turn from events, tool-call names paired in order from updates ACP stream
- registered in ingest dispatch; codex+grok sources enabled in config.toml
- tests/test_grok_adapter.py (synthetic + real local sessions)
- live multi-flavor dry-run discovers 89 sessions across flavors

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 22:12:30 +02:00
bc11cb9aec session-memory Phase 1: Codex adapter (T01) + multi-file merge (T03)
- adapters/common.py: shared Normalized + helpers (resolve_repo, classify_tool,
  jsonl iter, etc.); claude.py refactored to use it (Normalized re-exported)
- adapters/codex.py: rollout {timestamp,type,payload} parser; session_meta/
  response_item/event_msg mapping; flat call_id join; token_count cost;
  registered in ingest dispatch
- core/store.py: ingest() now merges multi-file sessions by content
  fingerprint, appends new events with offset seq (design OQ6); idempotent
- tests/test_codex_adapter.py, tests/test_merge.py

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 21:55:32 +02:00
5aea22f24f Register AGENTIC-WP-0003 (session-memory Phase 1) with State Hub
Codex + Grok adapters, multi-file session merge, and the Detect pipeline
(signals -> clustering -> evidence -> candidate report).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 21:50:23 +02:00
7c6f4358ee session-memory Phase 0: end-to-end verification + docs (T07)
- verified full sweep over 85 real local Claude transcripts: 63 sessions
  ingested+analyzed, eviction under tiny cap freed 26MB with zero data loss,
  digest-preservation invariant holds, idempotent re-run
- session_memory/README.md: usage, scheduling, retention knobs
- design doc: OQ4 resolved (median ~49KB/session), OQ6 (multi-file sessions)
- workplan AGENTIC-WP-0002 finished

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 21:44:46 +02:00
586ed90948 session-memory Phase 0: ingest cursor + sweep entrypoint + config (T06)
- session_memory/core/cursor.py: size/mtime change detection sidecar
- session_memory/config.toml: store paths, retention caps, per-source
  globs (claude on, codex/grok off for Phase 1), repo->domain map
- session_memory/ingest.py: discover->normalize->store->digest->evict;
  --dry-run creates/writes nothing; python -m session_memory.ingest
- tests/test_ingest.py; live dry-run parsed 84/85 real local sessions

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 21:41:59 +02:00
451fb8f1f3 session-memory Phase 0: budget-based retention sweep (T05)
- session_memory/core/retention.py: RetentionConfig + sweep() with backstop,
  budget (oldest-analyzed-first, never touches un-analyzed), and hard-cap
  overflow (analyze-now then reported last-resort data_loss); EvictionReport
- tests/test_retention.py covers all four branches

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 21:37:40 +02:00
abb888f3ef session-memory Phase 0: session digest + outcome heuristic (T04)
- session_memory/core/digest.py: build_digest (cost totals, kind/tool
  histograms, markers, snippets) + cross-flavor infer_outcome heuristic;
  analyze() promotes Tier1->Tier2 and sets analyzed_at (-> evictable)
- tests/test_digest.py

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 20:03:04 +02:00
29fc211a14 session-memory Phase 0: Tier1/Tier2 store (T03)
- session_memory/core/store.py: SQLite rows + blob-dir bodies, idempotent
  ingest on (session_uid,seq), Tier1/Tier2 usage accounting, evict_raw that
  drops raw but preserves the digest; watermark columns authoritative
- tests/test_store.py: ingest idempotency, accounting, eviction invariant

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 19:10:02 +02:00
1c29a94fa9 session-memory Phase 0: normalized schema (T01) + Claude adapter (T02)
- session_memory/core/schema.py: Session/SessionEvent/Cost dataclasses,
  flavor-prefixed uids, watermarks, kind/outcome validation (T01)
- session_memory/adapters/claude.py: JSONL -> Normalized bundle, turn DAG
  via uuid/parentUuid, kind mapping, cost from message.usage (T02)
- tests: schema round-trip + adapter (synthetic + real local session)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 19:06:10 +02:00
ffe191d44e Add Helix Forge PRD, session-memory design, and Phase 0 workplan
- docs/PRD-helix-forge.md: Capture→Detect→Curate→Distribute→Measure loop
- docs/DESIGN-session-memory.md: tiered store + budget-based eviction;
  verified session-log schemas for Claude/Codex/Grok
- workplans/AGENTIC-WP-0002: Phase 0 (registered with State Hub)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 19:00:30 +02:00
126 changed files with 10478 additions and 15 deletions

20
.claude/rules/agents.md Normal file
View File

@@ -0,0 +1,20 @@
## Kaizen Agents
Specialized agent personas available on demand via the state-hub MCP.
**Discover:** `list_kaizen_agents()` — returns all agents with name, description, category
**Load:** `get_kaizen_agent("tdd-workflow")` — returns full instructions; read and follow them
Common agents:
| Agent | Category | When to use |
|-------|----------|-------------|
| `tdd-workflow` | testing | Step-by-step TDD8 workflow for any feature |
| `code-refactoring` | quality | Code quality analysis and safe refactoring |
| `test-maintenance` | testing | Diagnose and fix failing tests |
| `requirements-engineering` | process | Prevent interface/mock mismatches upfront |
| `keepaTodofile` | process | Maintain TODO.md during work |
| `project-management` | process | Track status, determine next steps |
| `datamodel-optimization` | quality | Optimize dataclasses and data structures |
All 17 agents: call `list_kaizen_agents()` for the full list.

View File

@@ -0,0 +1,8 @@
## Architecture
<!-- TODO: Describe the key design decisions and component structure.
Key modules, data flows, external integrations, state machines, etc. -->
## Quick Reference
`~/state-hub/mcp_server/TOOLS.md` — MCP tool reference

View File

@@ -0,0 +1,50 @@
# Credential and access routing
**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
for inference. Run this check **before** requesting secrets, API keys, SSH access,
login tokens, or database passwords — in any repo, not only `ops-warden`.
ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
other credential need belongs to another subsystem. **Do not** message
`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
### Lookup (do this first)
```bash
warden route find "<describe your need>" --json
warden route show <catalog-id> --json
```
Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
| Agent runtime | How to orient |
| --- | --- |
| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=agentic-resources` is for coordination, not secret vending |
| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
### Quick routing table
| I need… | Owner | ops-warden executes? |
| --- | --- | --- |
| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes**`warden sign` |
| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
| Login / OIDC / MFA | key-cape / Keycloak | No — route only |
| Authorization decision | flex-auth | No — route only |
| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
### Anti-patterns (do not do these)
- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
- Pasting secrets into Git, State Hub, workplans, logs, or chat
### Other capabilities (reuse-surface)
Non-credential capabilities are usually discovered through **reuse-surface** federation
(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
every repo's agent instructions because it is high-frequency, high-risk, and easy to
get wrong.
**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`

View File

@@ -0,0 +1,38 @@
## First Session Protocol
Triggered when `get_domain_summary("infotech")` shows **no workstreams**.
The project is registered but work has not yet been structured.
**Step 1 — Read, don't write**
- `~/the-custodian/canon/projects/infotech/project_charter_v0.1.md` — purpose, scope
- `~/the-custodian/canon/projects/infotech/roadmap_v0.1.md` — planned phases
- Scan repo root: README, directory structure, existing code or docs
**Step 2 — Survey in-progress work**
Look for TODOs, open branches, half-finished files. Note done vs. started but incomplete.
**Step 3 — Propose workstreams to Bernd**
Propose 13 workstreams — each a coherent strand, weeks to months, anchored to a
roadmap phase. **Wait for approval before creating.**
**Step 4 — Create workplan file first, then DB record (ADR-001)**
```
workplans/AGENTIC-WP-NNNN-<slug>.md ← write this first
```
Then register in the hub:
```
create_workstream(topic_id="f39fa2a3-c491-414c-a91b-b4c5fcc6139c", title="...", owner="...", description="...")
create_task(workstream_id="<id>", title="...", priority="high|medium|low")
```
**Step 5 — Record the setup**
```
add_progress_event(
summary="First session: structured infotech into N workstreams, M tasks",
event_type="milestone",
topic_id="f39fa2a3-c491-414c-a91b-b4c5fcc6139c",
detail={"workstreams": [...], "tasks_created": M}
)
```
<!-- Delete or archive this file once past first session -->

View File

@@ -0,0 +1,8 @@
## Repo boundary
This repo owns **agentic-resources** only. It does not own:
<!-- TODO: List what belongs in adjacent repos, e.g.:
- SSH key management → railiance-infra/
- State hub code → state-hub/
-->

View File

@@ -0,0 +1,5 @@
**Purpose:** Iterating towards optimal agentic performance.
**Domain:** infotech
**Repo slug:** agentic-resources
**Topic ID:** f39fa2a3-c491-414c-a91b-b4c5fcc6139c

View File

@@ -0,0 +1,85 @@
## Session Protocol
Dev Hub (State Hub API): http://127.0.0.1:8000
MCP server name in `~/.claude.json`: `dev-hub`
**Step 1 — Orient**
Read the offline-safe brief first — it works without a live hub connection:
```bash
cat .custodian-brief.md
```
Then call the MCP tool for richer cross-domain context when MCP tools are exposed:
```
get_domain_summary("infotech")
```
If MCP tools are unavailable in the current agent session, use the REST API:
```bash
curl -s "http://127.0.0.1:8000/state/summary" | python3 -m json.tool
```
If the hub is offline: `cd ~/state-hub && make api`
**Step 2 — Check inbox**
With MCP tools:
```
get_messages(to_agent="agentic-resources", unread_only=True)
```
Mark read with `mark_message_read(message_id)`. Reply or act on coordination
requests before proceeding.
Without MCP tools:
```bash
curl -s "http://127.0.0.1:8000/messages/?to_agent=agentic-resources&unread_only=true" \
| python3 -m json.tool
curl -s -X PATCH "http://127.0.0.1:8000/messages/<id>/read" \
-H "Content-Type: application/json" -d '{}'
```
**Step 3 — Scan workplans**
```bash
ls workplans/
```
For each file with `status: ready`, `active`, or `blocked`, note pending
`wait`/`todo`/`progress` tasks.
**Step 4 — Present brief**
1. **Active workstreams** for `infotech` — title, task counts, blocking decisions
2. **Pending tasks** from `workplans/` + any `[repo:agentic-resources]` hub tasks
3. **Goal guidance** — if `goal_guidance` in summary:
- `needs_workplan`: surface as top action — *"Repo goal '{title}' has no workplan yet"*
- `alignment_warnings`: flag if active work is not aligned with current goal
4. **Suggested next action** — highest-priority open item
5. **SBOM status** — flag if `last_sbom_at` is unset for this repo
If no workstreams: follow First Session Protocol (`first-session.md`).
**During work:** `record_decision()` · `add_progress_event()` · `resolve_decision()`
> State Hub is a *read model*. Bootstrap tools (`create_workstream`, `create_task`)
> are First Session Protocol only. Work structure belongs in repo files (ADR-001).
**Session close:**
With MCP tools:
```
add_progress_event(summary="...", topic_id="f39fa2a3-c491-414c-a91b-b4c5fcc6139c", workstream_id="<uuid>")
```
Without MCP tools:
```bash
curl -s -X POST http://127.0.0.1:8000/progress/ \
-H "Content-Type: application/json" \
-d '{"topic_id":"f39fa2a3-c491-414c-a91b-b4c5fcc6139c","workstream_id":"<uuid>","event_type":"note","summary":"what changed","author":"codex"}'
```
If workplan files were modified, ensure the local copy is up to date first:
```bash
git -C <repo_path> pull --ff-only
cd ~/state-hub && make fix-consistency REPO=agentic-resources
```
For repos where implementation runs on a remote machine (e.g. CoulombCore),
use the combined target which pulls before fixing:
```bash
cd ~/state-hub && make fix-consistency-remote REPO=agentic-resources
```
**C-15** (DB task ahead of file) is normal in multi-machine workflows — writeback
will sync the file to match DB. **C-16** (repo behind remote) blocks all writes
until you pull — intentional to prevent clobbering remote progress.

View File

@@ -0,0 +1,19 @@
## Stack
<!-- TODO: Fill in language, frameworks, and key dependencies -->
- **Language:**
- **Key deps:**
## Dev Commands
```bash
# TODO: Fill in the standard commands for this repo
# Install dependencies
# Run tests
# Lint / type check
# Build / package (if applicable)
```

View File

@@ -0,0 +1,40 @@
## Workplan Convention (ADR-001)
File location: `workplans/AGENTIC-WP-NNNN-<slug>.md`
ID prefix: `AGENTIC-WP-`
Work items originate as files in this repo **before** being registered in the hub.
Canonical workplan/workstream frontmatter statuses are:
`proposed`, `ready`, `active`, `blocked`, `backlog`, `finished`, `archived`.
Use `proposed` for a newly drafted plan, `ready` after review against current
repo state, and `finished` when implementation is complete. `stalled` and
`needs_review` are derived health labels, not stored statuses.
Closed workplans may be moved to `workplans/archived/` with a completion-date
prefix: `YYMMDD-AGENTIC-WP-NNNN-<slug>.md`. The frontmatter id remains
unchanged; the prefix is only for quick visual reference.
Small opportunistic tasks discovered during another session use **Ad Hoc Tasks**:
`workplans/ADHOC-YYYY-MM-DD.md`, workstream slug `adhoc-YYYY-MM-DD`, and task ids
`ADHOC-YYYY-MM-DD-T01`, `T02`, etc. Use adhocs only for low-risk work completed
directly. Promote anything requiring analysis, design, approval, dependencies, or
multiple planned phases into a normal workplan.
Ecosystem todos from other agents arrive as `[repo:agentic-resources]` hub tasks —
visible at session start. Pick one up by creating the workplan file, then registering
the workstream.
Task blocks use this shape:
```task
id: AGENTIC-WP-NNNN-T01
status: wait | todo | progress | done | cancel
priority: high | medium | low
state_hub_task_id: "<uuid>" # written by fix-consistency — do not edit
```
Status progression is `todo``progress``done`; use `wait` for waiting or
blocked work and `cancel` for stopped work.
<!-- Ralph Loop rules and HEUREKA sequence: ~/.claude/CLAUDE.md — do not duplicate here -->

View File

@@ -2,18 +2,12 @@
# Custodian Brief — agentic-resources # Custodian Brief — agentic-resources
**Domain:** helix_forge **Domain:** helix_forge
**Last synced:** 2026-06-05 22:10 UTC **Last synced:** 2026-06-21 14:09 UTC
**State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)* **State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)*
## Active Workstreams ## Active Workstreams
### Bootstrap State Hub integration *(none — repo may need first-session setup)*
Progress: 0/3 done | workstream_id: `bb9a43a3-a54f-434b-97c2-e1c7142b52f5`
**Open tasks:**
- · Review Generated Integration Files `3ad7b7a9`
- · Verify Local Developer Workflow `db248d57`
- · Seed First Real Workplan `9cbb7aa5`
--- ---
## MCP Orientation (when available) ## MCP Orientation (when available)

8
.gitignore vendored
View File

@@ -174,3 +174,11 @@ cython_debug/
# PyPI configuration file # PyPI configuration file
.pypirc .pypirc
# session-memory local store
session_memory/.store/
# generated per-flavor distribution proposals (HITL, regenerated each run)
session_memory/proposals/
__pycache__/
*.pyc
.pytest_cache/

18
.repo-classification.yaml Normal file
View File

@@ -0,0 +1,18 @@
repo_classification:
standard: Repo Classification Standard
version: '1.0'
classified_at: '2026-06-22'
classified_by: agent
category: project
domain: infotech
secondary_domains: []
capability_tags:
- automation
- orchestration
business_stake:
- technology
- product
- operations
business_mechanics:
- coordination
- operation

View File

@@ -4,7 +4,7 @@
**Purpose:** Iterating towards optimal agentic performance. **Purpose:** Iterating towards optimal agentic performance.
**Domain:** helix_forge **Domain:** infotech
**Repo slug:** agentic-resources **Repo slug:** agentic-resources
**Topic ID:** `f39fa2a3-c491-414c-a91b-b4c5fcc6139c` **Topic ID:** `f39fa2a3-c491-414c-a91b-b4c5fcc6139c`
**Workplan prefix:** `AGENTIC-WP-` **Workplan prefix:** `AGENTIC-WP-`
@@ -101,6 +101,63 @@ curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
--- ---
## Credential and access routing
**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
for inference. Run this check **before** requesting secrets, API keys, SSH access,
login tokens, or database passwords — in any repo, not only `ops-warden`.
ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
other credential need belongs to another subsystem. **Do not** message
`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
### Lookup (do this first)
```bash
warden route find "<describe your need>" --json
warden route show <catalog-id> --json
```
Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
| Agent runtime | How to orient |
| --- | --- |
| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=agentic-resources` is for coordination, not secret vending |
| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
### Quick routing table
| I need… | Owner | ops-warden executes? |
| --- | --- | --- |
| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes** — `warden sign` |
| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
| Login / OIDC / MFA | key-cape / Keycloak | No — route only |
| Authorization decision | flex-auth | No — route only |
| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
### Anti-patterns (do not do these)
- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
- Pasting secrets into Git, State Hub, workplans, logs, or chat
### Other capabilities (reuse-surface)
Non-credential capabilities are usually discovered through **reuse-surface** federation
(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
every repo's agent instructions because it is high-frequency, high-risk, and easy to
get wrong.
**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`
<!-- REPO-AGENTS-EXTENSIONS -->
<!-- Append repo-specific agent instructions below this marker.
The state-hub template sync preserves content after this line. -->
---
## Workplan Convention (ADR-001) ## Workplan Convention (ADR-001)
Work items originate as files in this repo — not in the hub. The hub is a Work items originate as files in this repo — not in the hub. The hub is a
@@ -124,7 +181,7 @@ anything needing analysis, design, approval, dependencies, or multiple phases.
id: AGENTIC-WP-NNNN id: AGENTIC-WP-NNNN
type: workplan type: workplan
title: "..." title: "..."
domain: helix_forge domain: infotech
repo: agentic-resources repo: agentic-resources
status: proposed | ready | active | blocked | backlog | finished | archived status: proposed | ready | active | blocked | backlog | finished | archived
owner: codex owner: codex

12
CLAUDE.md Normal file
View File

@@ -0,0 +1,12 @@
# agentic-resources — Claude Code Instructions
@SCOPE.md
@.claude/rules/repo-identity.md
@.claude/rules/session-protocol.md
@.claude/rules/first-session.md
@.claude/rules/workplan-convention.md
@.claude/rules/stack-and-commands.md
@.claude/rules/architecture.md
@.claude/rules/repo-boundary.md
@.claude/rules/credential-routing.md
@.claude/rules/agents.md

View File

@@ -0,0 +1,144 @@
# Infrastructure Friction Assessment
*Generated 2026-06-07 from captured coding-session data (Helix Forge session
memory), after the Detect-hardening pass ([AGENTIC-WP-0005]). First data-driven
assessment of where our agentic coding sessions spend effort on plumbing rather
than work.*
## Method & data quality
- **Corpus:** 72 sessions captured across Claude + Grok. A session-quality filter
([detect/quality.py]) drops health-checks, smoke-tests, and interrupted runs
(mostly `llm-connect` *"Say hello in one word"*). **27 are real coding sessions.**
- **Caveat:** the 41 % that were filtered out had been mislabeled `abandoned` by
the outcome heuristic and produced a *false-positive* "cross-flavor abandoned"
pattern in the first catalog — now purged. Treat any pre-hardening finding with
suspicion.
- **Key framing:** all 27 real sessions ended in `success`. So the friction here
is **cost/efficiency, not failure** — sessions get there, but pay an avoidable
tax to do it.
## The headline number
Across the 27 real sessions, tool-call activity breaks down as:
| Bucket | Share |
|--------|------:|
| shell (Bash / run_terminal) | 38.2 % |
| edit | 30.2 % |
| read | 12.9 % |
| **State Hub MCP** | **10.3 %** |
| **task-management plumbing** | **5.8 %** |
| **schema-loading (`ToolSearch`)** | **1.5 %** |
| other | 1.1 % |
**~17.6 % of all tool calls in real coding sessions are coordination plumbing
(hub + task + schema-loading), not touching the repo.** Per-session infra-overhead
share: median **11.7 %**, p90 **26.1 %**, max **43.3 %** — it concentrates badly.
## Ranked friction
### 1. State Hub call volume — *highest cost, addressable*
State Hub MCP is 10.3 % of all tool calls and dominates the worst sessions:
| Repo (one session) | total calls | State Hub calls | overhead share |
|--------------------|------:|------:|------:|
| vergabe-teilnahme | 570 | **231** | 43 % |
| activity-core | 488 | 98 | 23 % |
| flex-auth | 236 | 35 (+27 task) | 29 % |
| net-kingdom | 129 | 25 | 22 % |
Root cause: many **fine-grained** calls — per-task status updates, per-event
progress writes, repeated `get_domain_summary`. 231 hub calls in a single session
is coordination overhead, not work.
### 2. Schema-loading thrash (`ToolSearch`) — *low cost, near-zero-effort fix*
**106 `ToolSearch` calls across 22 of 27 sessions (81 %).** The State Hub MCP
tools are *deferred*, so nearly every session re-discovers and re-loads the same
tool schemas before it can call them. This is pure overhead with no work value —
and it is **exactly the CLI/MCP-interface friction hypothesized.**
### 3. Task-management plumbing — 5.8 %
`TaskUpdate` / `TaskCreate` / `todo_write` / `update_task_status`. Overlaps with
(1); much of it is redundant status churn within a session.
### 4. Tool thrash — *session-shape, watch only*
11 sessions hammer a single tool 80230× (usually Bash or Edit). Less an infra
problem than a sign of missing higher-level tooling; low priority.
### 5. Budget overrun — 3 sessions
Token cost well above peers. Secondary; revisit once (1)(2) are addressed.
## Recommendations
**The CLI/MCP-interface hypothesis is validated as a top-2 friction, not a minor
issue.** Two high-ROI moves:
- **A. A State Hub skill (highest ROI).** A skill (or a pre-loaded tool manifest)
that (i) **front-loads the common hub tool schemas** so agents stop
`ToolSearch`-ing for them — eliminates finding #2 almost entirely (81 % of
sessions) — and (ii) **teaches batched writes** (sync N task statuses in one
call, fewer progress events) to attack finding #1. Low effort, broad reach.
- **B. Coarser hub operations.** Add bulk endpoints / a single "sync workplan
statuses" op so a session doesn't make 200+ individual hub calls. This is the
structural fix behind the skill's guidance.
- **C. Measure the effect (Phase 4).** After A/B land, compare infra-overhead
share on subsequent sessions against this baseline (median 11.7 %, p90 26.1 %).
This is precisely what the Measure phase is for — the loop closes here.
## Content-level root causes (error-body mining)
*Added 2026-06-07 from [AGENTIC-WP-0006] — `build_digest` now mines normalized
error fingerprints into the durable digest, and `sig_recurring_error` clusters
them. This is the "why" the tool-mix view above could not see.*
**26 of 27 real sessions hit at least one error.** Top recurring error
fingerprints across the corpus (by # sessions affected):
| # sessions | occ | flavors | top sample |
|-----------:|----:|---------|------------|
| **12** | 32 | claude | `<tool_use_error>File has not been read yet. Read it first before writing to it.` |
| **6** | 13 | claude | `<tool_use_error>File has been modified since read …` |
| **4** | 9 | **claude + grok** | `make: *** [Makefile:227: fix-consistency] Error 1` |
| 3 | 21 | claude | `MCP error -32602: Invalid request parameters` |
| 3 | 6 | claude | `Error calling tool 'update_task_status': 'title'` |
| 2 | 6 | claude | `make: *** [Makefile:21: test] Error 1` |
Reading:
- **#1 — Edit/Write-before-Read (12/27 sessions, 8 repos).** The single most
common error is agents trying to edit a file they haven't read into context.
This is a *workflow* friction, highly addressable: a Read-before-Edit reflex in
the agent instructions / a skill, or a harness affordance. (Observed live: the
author hit this exact error twice while writing this workplan.)
- **#2 — stale-read conflicts (6 sessions):** "File has been modified since read"
— same family, a re-read-before-edit discipline fixes both.
- **#3 — cross-flavor `make fix-consistency` failures (claude + grok, 3 repos):**
the consistency tooling itself fails across flavors — a shared infra issue worth
a look on the state-hub side (cf. [STATE-WP-0058]).
- **State Hub MCP instability** (`-32602`, `update_task_status 'title'`) shows up
in 3 sessions each — corroborates the plumbing-overhead story and the live MCP
flakiness seen during this work (REST fallback used).
**Fingerprint noise — mostly handled.** `_is_failed` now excludes successful hub
JSON responses (top-level no-error payloads) and file-read snapshots (numbered
`cat -n` source lines), which cut distinct fingerprints **444 → 269 (~40 %)**
without touching the top entries. Residual low-value items remain in the long tail
(bare structural lines like `{`, linter "N errors" summaries); the *top*
fingerprints are real. Note several entries (`MCP error -32602`,
`update_task_status 'title'`) reflect the State Hub MCP instability hit live during
this work — genuine, if self-referential, friction.
## What this assessment still can't see
- ~~**Why** a session was expensive at the content level.~~ **Now addressed**
(error-body mining, above), modulo the fingerprint-noise caveat.
- Repeated *failed approaches* (as opposed to surfaced errors) — e.g. an agent
silently retrying a wrong strategy without an error — are still invisible.
- Grok/Codex are thin in the corpus (4 Grok, 0 Codex sessions), so cross-flavor
friction claims are Claude-weighted for now.
[AGENTIC-WP-0005]: ../workplans/AGENTIC-WP-0005-detect-hardening.md
[AGENTIC-WP-0006]: ../workplans/AGENTIC-WP-0006-error-body-mining.md
[STATE-WP-0058]: handed off to the state-hub repo worker
[detect/quality.py]: ../session_memory/detect/quality.py

View File

@@ -0,0 +1,461 @@
# Design Document — Coding Session Memory
**Domain:** helix_forge
**Repo:** agentic-resources
**Status:** Draft v0.1
**Author:** Claude (drafted with Bernd Worsch)
**Created:** 2026-06-06
**Updated:** 2026-06-06
**Related:** [PRD-helix-forge.md](./PRD-helix-forge.md) (this is the Capture + storage layer, FR-C* / §8)
---
## 1. Purpose
Helix Forge's loop (Capture → Detect → Curate → Distribute → Measure) needs a
durable, bounded **memory of coding sessions**. This document specifies that
memory: how we **access** each coding agent's session protocol, how we
**normalize** those protocols into one schema, where we **store** the result, and
how we **age it out** — preferring a *storage-budget-based* eviction that drops
old raw content once it has been analyzed or no longer fits, rather than a naive
fixed time window.
The guiding asymmetry: **raw transcripts are bulky and re-derivable; the distilled
analysis is small and precious.** So we keep a *bounded cache* of raw sessions and
a *durable, compact* layer of extracted digests/signals. Eviction targets the
former, never the latter.
## 2. Research — How to Access Each Agent's Session Protocol
All three families persist sessions to the local filesystem as JSONL (plus, for
Grok, a per-session directory). All findings below were verified against the live
installs on this workstation (`~/.claude`, `~/.grok`) and public docs (Codex; not
installed here).
### 2.1 Claude Code ✅ verified on disk
| Aspect | Finding |
|--------|---------|
| Session transcripts | `~/.claude/projects/<url-encoded-cwd>/<session-uuid>.jsonl` — one JSONL per session |
| Subagent sidechains | same dir, `agent-<id>.jsonl`; records carry `isSidechain: true` |
| Global prompt history | `~/.claude/history.jsonl` |
| Record format | one JSON object per line; **`type`** discriminates: `user`, `assistant`, `attachment`, `queue-operation`, `ai-title`, `last-prompt`, `summary`, plus tool-result records |
| Key fields | `type`, `timestamp`, `sessionId`, `uuid`, `parentUuid` (turn DAG), `message` (`role` + content blocks: `text`/`thinking`/`tool_use`/`tool_result`), `cwd`, `gitBranch`, `version`, `requestId`, `toolUseResult`, `userType` |
| Token usage | inside assistant `message.usage` (input/output/cache tokens) |
| Model | `message.model` (e.g. `claude-opus-4-8`) |
| Side data | `~/.claude/todos/`, `~/.claude/tasks/`, `~/.claude/file-history/`, `~/.claude/shell-snapshots/` |
| Live capture hook | Claude Code **SessionEnd / Stop / SessionStart hooks** can fire our ingest on session close (push), in addition to batch scanning (pull) |
The turn DAG (`uuid`/`parentUuid`) lets us reconstruct branching, retries, and
sidechains exactly.
### 2.2 OpenAI Codex CLI ✅ schema confirmed from source (not installed locally)
Schema confirmed from the openai/codex source (`codex-rs/protocol/src/protocol.rs`
via DeepWiki) and a reverse-engineering writeup with real example lines — the two
cross-agree.
| Aspect | Finding |
|--------|---------|
| Session ("rollout") files | `$CODEX_HOME/sessions/YYYY/MM/DD/rollout-*.jsonl` (default `$CODEX_HOME = ~/.codex`) |
| Line wrapper (`RolloutLine`) | every line: **`{timestamp, type, payload}`** (UTC ts + a `RolloutItem`) |
| `type` discriminator | `session_meta` · `response_item` · `event_msg` · `turn_context` · `compacted` |
| `session_meta` | `{id, source, cwd, model_provider, cli_version}` (+ model) — restores env |
| `turn_context` | `{model, approval_policy, sandbox_policy}` — per-turn settings snapshot |
| `response_item` | raw model output / tool calls; `payload.type``message` · `function_call` · `function_call_output` · `reasoning` |
| → `message` | `{role: developer\|user\|assistant, content:[{type:"output_text"\|…, text}]}` |
| → `function_call` | `{name, arguments (JSON string), call_id}` |
| → `function_call_output` | `{call_id, output}` |
| `event_msg` | protocol events; `payload.type``task_started` · `task_complete` · `user_message` · `agent_message` · `token_count` · lifecycle |
| Token usage | `event_msg` with `payload.type = token_count`, interspersed (no fixed cadence) |
| Turn linkage | **flat — tool calls/outputs linked by `call_id`, no parent-ref DAG**; causality inferred from temporal order (unlike Claude's `uuid`/`parentUuid`) |
| Schema versions | older installs differ ("new ≥0.44 / mid / oldest 2025/08"); adapter version-detects on `session_meta.cli_version` |
| Naming / resume | filenames + `session_id` auto-generated; `codex resume --last`; `codex exec` for headless (trajectory-JSON is gh issue #2288) |
| Override location | `CODEX_HOME` env var |
**Adapter notes:** map `event_msg/task_started|task_complete``lifecycle`
events and outcome; `response_item/message``user_msg`/`assistant_msg`;
`function_call`+`function_call_output``tool_call`/`tool_result` joined on
`call_id`; `response_item/reasoning``thinking`; `event_msg/token_count` → cost
block. Because there is no parent-ref DAG, the adapter assigns `seq`/`parent_seq`
from temporal order rather than native links.
### 2.3 Grok CLI (xAI) ✅ verified on disk
Grok stores **a directory per session**, which is the richest source of the three.
| Aspect | Finding |
|--------|---------|
| Session dir | `~/.grok/sessions/<url-encoded-cwd>/<session-uuid>/` |
| `chat_history.jsonl` | full conversation; `type` = `system`/`user`/`assistant` + content |
| `events.jsonl` | **structured lifecycle events**`{ts, type, session_id, turn_number, model_id, yolo_mode, conversation_message_count, session_relationship, schema_version}`; types like `turn_started`, `loop_started` |
| `updates.jsonl` | streaming incremental updates |
| `summary.json` | `{id, cwd, session_summary, created_at, updated_at}` |
| `prompt_context.json` | injected context, incl. which AGENTS.md/CLAUDE.md files were loaded |
| `system_prompt.txt` | exact system prompt for the session |
| `rewind_points.jsonl`, `plan_mode.json` | rewind/plan-mode state |
| Per-cwd prompt history | `~/.grok/sessions/<cwd>/prompt_history.jsonl``{timestamp, session_id, prompt, is_bash}` |
| Global structured log | `~/.grok/logs/unified.jsonl``{ts, src, pid, lvl, msg, ctx, sid, ver}` |
| Search index | `~/.grok/sessions/session_search.sqlite``session_docs(session_id, cwd, updated_at, title)` + FTS5 (`session_docs_fts`) we can query directly |
| Integration surfaces | Grok exposes **ACP (Agent Client Protocol)**, **headless mode** (`grok -p`), and **hooks** (`~/.grok/docs/user-guide/10-hooks.md`) — push-capture options |
### 2.4 Cross-family summary
| | Claude Code | Codex CLI | Grok CLI |
|--|--|--|--|
| Root | `~/.claude/projects/` | `~/.codex/sessions/` | `~/.grok/sessions/` |
| Unit | one `.jsonl`/session | one `rollout-*.jsonl`/session | one **dir**/session |
| Layout | flat per-cwd dir | date-partitioned `YYYY/MM/DD` | per-cwd, per-session dir |
| Discriminator | `type` | `type` (version-dependent) | `type` (in `chat_history`/`events`) |
| Lifecycle events | inferred from records | inferred from records | **explicit** `events.jsonl` |
| Token usage | `message.usage` | per-line usage | from events/updates |
| Push capture | Stop/SessionEnd hooks | `codex exec` wrappers | hooks / ACP |
| Pull capture | scan dir by mtime | scan date partitions | scan dirs / query FTS sqlite |
**Implication:** the common denominator is *"JSONL records discriminated by a
`type` field, with a session id, timestamps, turn linkage, tool calls, and token
usage."* That maps cleanly onto one normalized schema (§4). Per-family quirks
(Grok's explicit `events.jsonl`, Codex's schema versions, Claude's sidechains) are
handled inside each adapter.
## 3. Tiered Storage Model
```
Tier 0 SOURCE (agents' own logs) read-only, never mutated
~/.claude/projects ~/.codex/sessions ~/.grok/sessions
│ collector adapters (per family) + ingest cursor
Tier 1 RAW CACHE (bounded, EVICTABLE) normalized Session + Event records
│ signal extractors / digesters
Tier 2 DISTILLED MEMORY (durable, small) session digests + signals + pattern evidence
```
- **Tier 0 — Source.** The agents' own logs. We treat them as read-only. We keep a
small **ingest cursor** per source so re-scans are incremental (see §6).
- **Tier 1 — Raw cache.** Normalized copies of sessions/events. This is the bulky
tier and the *only* tier subject to budget eviction.
- **Tier 2 — Distilled memory.** Per-session **digest** (outcome, costs, tool
histogram, error/retry/intervention markers, key snippets) plus extracted
**signals** and **pattern evidence pointers**. Compact and durable. A session can
be fully evicted from Tier 1 once its Tier 2 digest exists.
This is what makes "drop old content once it has been analyzed" safe: analysis
*promotes* the valuable bits into Tier 2 before the raw bytes are dropped.
### 3.1 Per-session lifecycle / watermarks
Each session row carries timestamps that drive eviction:
```
discovered_at → ingested_at → analyzed_at → [evictable] → evicted_at
```
- `ingested_at` set when normalized into Tier 1.
- `analyzed_at` set when the Tier 2 digest is written. **A session is evictable iff
`analyzed_at` is set.**
- `evicted_at` set when raw bytes are dropped from Tier 1 (Tier 2 digest remains).
## 4. Normalized Schema (Tier 1)
Two record kinds. Field names are stable across all adapters.
### 4.1 `Session`
```jsonc
{
"session_uid": "claude:17092961-…", // "<flavor>:<native id>", globally unique
"flavor": "claude" | "codex" | "grok",
"native_session_id": "17092961-…",
"repo": "agentic-resources", // resolved from cwd
"domain": "helix_forge", // resolved from repo→domain map
"cwd": "/home/worsch/agentic-resources",
"git_branch": "main",
"model": "claude-opus-4-8",
"started_at": "2026-06-05T21:59:30Z",
"ended_at": "2026-06-05T22:14:00Z",
"outcome": "success|fail|abandoned|unknown",
"cost": { "input_tokens": 0, "output_tokens": 0, "cache_tokens": 0,
"wall_clock_s": 0, "turns": 0, "retries": 0 },
"task_ref": "AGENTIC-WP-0002-T01", // if derivable; else null
"source_path": "~/.claude/projects/…/….jsonl",
"source_bytes": 0,
"schema_version": 1,
"ingested_at": "…", "analyzed_at": null, "evicted_at": null
}
```
### 4.2 `SessionEvent`
```jsonc
{
"session_uid": "claude:17092961-…",
"seq": 12, // monotonic within session
"parent_seq": 11, // turn DAG (Claude uuid/parentUuid)
"ts": "2026-06-05T22:01:13Z",
"kind": "user_msg | assistant_msg | thinking | tool_call | tool_result"
+ "| error | test_run | edit | retry | human_intervention | decision"
+ "| lifecycle | completion",
"role": "user|assistant|system|tool",
"tool": "Bash|Edit|Read|…", // when kind=tool_call/result
"summary": "ran pytest -q", // short, human-readable
"payload_ref": "blob://…", // pointer to full content in Tier 1 blob store
"tokens": 0,
"is_sidechain": false
}
```
Adapters map native records onto `kind`. Grok's `events.jsonl` populates
`lifecycle`/`turn` events directly; Claude/Codex lifecycle is inferred from the
record stream. Bulky bodies live behind `payload_ref` so Tier 1 rows stay light
and blobs can be evicted independently.
### 4.3 Native → `kind` mapping (all three families)
Each cell is the native record/discriminator an adapter reads to emit that
`SessionEvent.kind`. `—` = not natively present; the adapter synthesizes or omits.
| `kind` | Claude Code (`type` / `message`) | Codex CLI (`type``payload.type`) | Grok CLI (file → `type`) |
|--------|----------------------------------|--------------------------------------|---------------------------|
| `user_msg` | `user`, `message.role=user` | `response_item``message` `role=user`/`developer` | `chat_history``user` |
| `assistant_msg` | `assistant`, `message.role=assistant`, content `text` | `response_item``message` `role=assistant` (`output_text`) | `chat_history``assistant` |
| `thinking` | `assistant` content block `type=thinking` | `response_item``reasoning` | `chat_history`/`updates` reasoning block |
| `tool_call` | `assistant` content block `type=tool_use` (`name`,`input`) | `response_item``function_call` (`name`,`arguments`,`call_id`) | `chat_history`/`updates` tool-call entry |
| `tool_result` | `user`/tool record `type=tool_result` + `toolUseResult` | `response_item``function_call_output` (join on `call_id`) | `updates` tool-result entry |
| `test_run` | derived from `tool_call` (Bash running tests) | derived from `function_call` (`exec_command`) | derived from tool-call entry |
| `edit` | `tool_use` where `name` ∈ Edit/Write/NotebookEdit | `function_call` apply-patch/file-write tool | tool-call entry (edit/write) |
| `error` | `toolUseResult` error / non-zero result | `function_call_output` error / `event_msg` error | `events.jsonl` error / failed update |
| `retry` | repeated `tool_use` after error (inferred via DAG) | repeated `function_call` after error (inferred, temporal) | `events.jsonl` loop/retry event |
| `human_intervention` | `user` record mid-turn (interrupt), `userType` | `event_msg``user_message` mid-task | `prompt_history` mid-session / `events.jsonl` |
| `decision` | recorded out-of-band (State Hub `/decisions`) | recorded out-of-band (State Hub) | recorded out-of-band (State Hub) |
| `lifecycle` | inferred: first/last record, `summary`, `queue-operation` | `event_msg``task_started` / `task_complete` | **`events.jsonl`** → `turn_started`/`loop_started`/… (explicit) |
| `completion` | inferred: last `assistant` + `Stop`/`SessionEnd` hook | `event_msg``task_complete` | `events.jsonl` turn end + `summary.json` |
**Linkage note (drives `seq`/`parent_seq`):** Claude has a true turn DAG
(`uuid`/`parentUuid`) — preserve it directly. Codex is **flat**, joined only by
`call_id`; assign `seq` by temporal order. Grok carries explicit `turn_number` in
`events.jsonl`; key `seq` off that plus record order.
**Cost block sources:** Claude `message.usage`; Codex `event_msg/token_count`;
Grok `events.jsonl` / `updates.jsonl` token fields.
## 5. Retention & Eviction
The user's stated preference: **storage-budget-based**, dropping old content once
it has been analyzed or once it no longer fits — *better than* a fixed daily/weekly
window. We implement budget-based as primary, with a time backstop and a scheduled
cadence as the trigger.
### 5.1 Configurable knobs
```toml
[session_memory.retention]
raw_soft_cap_bytes = "4GiB" # begin evicting analyzed sessions above this
raw_hard_cap_bytes = "6GiB" # absolute ceiling for Tier 1
raw_max_age_days = 45 # backstop: analyzed raw older than this is evictable regardless of space
distilled_cap_bytes = "1GiB" # Tier 2 ceiling (should grow slowly; alert, don't auto-drop)
cadence = "daily" # ingest+analyze+evict sweep: daily | weekly | on-hook
```
### 5.2 Eviction algorithm (runs after each ingest+analyze sweep)
1. **Compute** current Tier 1 usage.
2. **Backstop pass:** evict any session where `analyzed_at` is set AND
`age > raw_max_age_days`.
3. **Budget pass:** while `usage > raw_soft_cap_bytes`:
- pick the **oldest `analyzed_at`** session that is not yet evicted;
- drop its Tier 1 raw rows + blobs (Tier 2 digest is kept), set `evicted_at`;
- if **no analyzed-but-unevicted session remains**, stop the budget pass
(we will not destroy un-analyzed data to free space) and go to step 4.
4. **Back-pressure / overflow:** if `usage > raw_hard_cap_bytes` and the only
remaining bulk is **un-analyzed**:
- first try to **analyze now** (run extraction) to make those sessions
evictable, then re-run the budget pass;
- if still over hard cap (analysis can't keep up or fails), evict the **oldest
un-analyzed** sessions as a last resort and emit a
`session_memory.data_loss` warning event + a State Hub progress note. This is
the only path that loses un-analyzed data, and it is always reported.
5. **Tier 2 guard:** if distilled usage > `distilled_cap_bytes`, **do not
auto-drop**; flag for human/curation review (digests are the product).
**Invariant:** *no session's raw bytes are dropped before its Tier 2 digest
exists, except the explicitly-reported hard-cap overflow path.*
### 5.3 Why budget-based beats fixed-window
A fixed daily/weekly drop either deletes data we never analyzed (lossy) or hoards
data we already distilled (wasteful). Budget + `analyzed_at` watermark ties
deletion to **two** real conditions the user named — *"once it has been analyzed"*
(promoted to Tier 2) and *"doesn't fit any longer"* (over budget) — and only falls
back to time as a backstop.
## 6. Ingest Cursors (incremental, idempotent)
Per source, persist a small cursor so sweeps are cheap and re-runnable:
- **Claude / Grok (per-cwd dirs):** track `(file_path, size, mtime)` and last
parsed line offset; re-ingest only grown/changed files. `session_uid` dedupes.
- **Codex (date partitions):** track last-seen `YYYY/MM/DD` + per-file offset.
- Ingest is **idempotent** keyed on `(session_uid, seq)` — safe to re-run after a
crash or partial sweep.
## 7. Capture Modes
- **Pull (default, portable):** scheduled sweep scans Tier 0 by mtime/partition.
Works for all three families with zero coupling to the agent. Triggered on the
configured `cadence` via the repo's scheduler (`/schedule`, cron, or `/loop`).
- **Push (optional, low-latency):** wire the agent's own hooks to ping the ingester
on session close — Claude `Stop`/`SessionEnd` hooks, Grok hooks/ACP, Codex
`exec` wrappers. Push just enqueues; the same idempotent pull pipeline does the
work.
Capture must be **non-blocking** (PRD FR-C5): we read copies of logs out-of-band;
we never sit in the agent's critical path.
## 8. Component Layout (proposed, in-repo)
```
session-memory/
adapters/
claude.py # Tier0→Tier1 normalizer (verified schema)
codex.py # version-detecting normalizer (confirm against real rollout)
grok.py # reads session dir incl. events.jsonl
core/
schema.py # Session / SessionEvent dataclasses + versioning
store.py # Tier1 (rows+blobs) and Tier2 (digests) — SQLite to start
cursor.py # per-source ingest cursors
retention.py # §5 eviction algorithm
digest.py # Tier1→Tier2 session digest + signal stubs
ingest.py # one sweep: discover → normalize → analyze → evict
config.toml # §5.1 knobs + repo→domain map + source paths
```
Storage starts as **SQLite + a blob dir** (rows in SQLite, bulky payloads as files
under `payload_ref`); graduate to Postgres alongside the State Hub only if volume
demands. Digests/decisions are also surfaced to the hub per ADR-001 (files-first;
hub indexes).
## 9. Privacy / Safety
- Tier 0 logs can contain secrets (the Grok `auth.json` and Claude `.credentials`
live in the same trees). The ingester reads **only** session transcripts, never
credential files, and **redacts** obvious secret patterns into `payload_ref`
blobs.
- All data is local; nothing leaves the workstation. Eviction of Tier 1 is a real
delete (not just an index drop) so the bounded cache is also a privacy bound.
## 10. Open Questions
- ~~**OQ1** Confirm Codex `rollout-*.jsonl` per-line schema.~~ **Resolved** (§2.2):
`{timestamp,type,payload}` lines, `type``session_meta`/`response_item`/`event_msg`/`turn_context`/`compacted`,
tool calls flat-linked by `call_id`, tokens via `event_msg/token_count`. Remaining
sub-item: verify the `token_count` payload field names against a real install when
Codex is present (older-version variance only).
- **OQ2** Outcome inference: how do we reliably label `success/fail/abandoned`
across flavors (exit signals differ)? Start heuristic (last-turn + test results +
human-intervention markers), refine in Detect phase.
- **OQ3** `task_ref` resolution — can we always map a session to a workplan task
(via cwd + branch + state-hub), or only sometimes?
- ~~**OQ4** Right default for `raw_soft_cap_bytes`.~~ **Measured** (Phase 0, 85
real local Claude files / 63 distinct sessions): source bytes per session
min 396 · **median ~49 KB** · max 48 MB (one outlier) · ~103 MB total. Claude
defaults (4 GiB soft / 6 GiB hard) leave ample headroom; revisit once Grok dirs
(heavier, multi-file) are ingested in Phase 1.
- **OQ6 (new, found in Phase 0)** Multi-file sessions: ~84 transcript files mapped
to ~63 `session_uid`s — some sessions span multiple files (resume/sidechain
sharing a `sessionId`). Current behavior upserts (last file wins per
`(session_uid, seq)`); a future refinement is to *merge* events across files of
one session rather than overwrite. Acceptable for Phase 0.
- **OQ5** Should push-hooks be opt-in per machine to avoid surprising the agents?
---
## 11. Project metrics correlation (kaizen-agentic)
Helix Forge owns **fleet-level** session capture and digests (this repo). The
**kaizen-agentic** framework owns **project-scoped** agent execution metrics
(ADR-004: `.kaizen/metrics/<agent>/executions.jsonl`). The two layers correlate
by optional `helix_session_uid` on project records — link-by-reference, no
duplicate ingestion in either repo.
| Layer | Owner | Storage |
|-------|-------|---------|
| Fleet | agentic-resources (Helix Forge) | digest store (`digests` table) |
| Project | kaizen-agentic | `.kaizen/metrics/<agent>/executions.jsonl` |
**Cross-repo contract:** [Helix Forge Correlation Contract](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/docs/integrations/helix-forge-correlation.md)
(kaizen-agentic). Field mapping from `Session.session_uid``helix_session_uid`,
`digest.cost``tokens`, `tool_histogram` MCP share → `infra_overhead_share`.
**Read path:** `kaizen-agentic metrics correlate <uid>` looks up a digest via
`HELIX_STORE_DB` (this repo's session store). No write path from kaizen-agentic
into Helix Forge.
**Related kaizen-agentic docs:** [ADR-004 project metrics convention](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/docs/adr/ADR-004-project-metrics-convention.md),
[wiki/EcosystemIntegration.md](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/wiki/EcosystemIntegration.md).
### 11.1 Session-close env export (dual-layer agents)
Agents that run **both** Helix Forge capture and kaizen `metrics record` should
export the following **after** the ingest sweep has written the session digest
(`python -m session_memory.ingest` or an equivalent Stop/SessionEnd hook). Names
match kaizen-agentic ADR-004 — do not invent parallel aliases.
| Variable | Source in Helix Forge | Purpose |
|----------|----------------------|---------|
| `HELIX_SESSION_UID` | `Session.session_uid` | Primary correlation key → `helix_session_uid` |
| `HELIX_REPO` | `digest.repo` | Project/repo scoping |
| `HELIX_FLAVOR` | `digest.flavor` | Agent runtime (`claude` / `codex` / `grok`) |
| `HELIX_TOKENS` | `digest.cost.input_tokens + digest.cost.output_tokens` | Token rollup → `tokens` |
| `HELIX_INFRA_OVERHEAD_SHARE` | infra bucket share over `tool_histogram` (see `measure.metrics.session_metrics`) | MCP/plumbing overhead → `infra_overhead_share` |
Example (after digest exists):
```bash
export HELIX_SESSION_UID="claude:abc-123"
export HELIX_REPO="agentic-resources"
export HELIX_FLAVOR="claude"
export HELIX_TOKENS=125000
export HELIX_INFRA_OVERHEAD_SHARE=0.117
# optional — lets kaizen correlate without guessing the store location:
export HELIX_STORE_DB="$(pwd)/session_memory/.store/mem.db"
kaizen-agentic metrics record # merges HELIX_* when present
```
### 11.2 Digest store location and read API
- **`HELIX_STORE_DB`** — absolute path to the SQLite file holding Tier 2 digests.
Defaults to `config.toml` `[store].db_path` (`session_memory/.store/mem.db` relative
to the repo root). Export as an absolute path when setting the variable on session
close so `metrics correlate` works across hosts and working directories.
- **Thin CLI** — `python -m session_memory.digest_lookup <session_uid> [--json]`
prints one digest without running ingest. Exit `0` on hit, `1` when missing.
- **Programmatic** — `Store.get_digest(session_uid)` returns the JSON blob written
by `build_digest` / `analyze`.
**Stable digest JSON shape** (fields consumers may rely on):
| Field | Type | Notes |
|-------|------|-------|
| `session_uid` | string | Normalized uid (`<flavor>:<native-id>`) |
| `flavor`, `repo`, `domain` | string | Session attribution |
| `model` | string | Model id when known |
| `started_at`, `ended_at` | string | ISO timestamps |
| `outcome` | string | `success` / `fail` / `abandoned` / `unknown` |
| `cost` | object | `input_tokens`, `output_tokens`, `cache_tokens`, `wall_clock_s`, `turns`, `retries` |
| `tool_histogram` | object | Tool name → call count |
| `event_count`, `kind_counts`, `markers` | object/int | Compact activity summary |
| `first_prompt`, `last_assistant` | string | Short text snippets |
| `error_snippets` | array | `{fingerprint, sample, count, tool}` entries |
| `schema_version` | int | Digest schema version |
---
*Implemented:* Phases 04, weekly retro ([AGENTIC-WP-0002][AGENTIC-WP-0010]);
kaizen correlation follow-up ([AGENTIC-WP-0011]).
## Sources
- Claude Code session format — verified on disk: `~/.claude/projects/*/*.jsonl`, `~/.claude/history.jsonl`.
- Grok CLI session format — verified on disk: `~/.grok/sessions/`, `~/.grok/logs/unified.jsonl`, `~/.grok/sessions/session_search.sqlite`; `~/.grok/README.md` (ACP/headless/hooks).
- Codex CLI session format — [ccusage Codex guide](https://ccusage.com/guide/codex/), [Codex advanced config](https://developers.openai.com/codex/config-advanced), [codex-trace](https://github.com/PixelPaw-Labs/codex-trace), [codex-logs](https://github.com/wondercoms/codex-logs), [Session/Rollout Files discussion #3827](https://github.com/openai/codex/discussions/3827), [trajectory-JSON issue #2288](https://github.com/openai/codex/issues/2288).

319
docs/PRD-helix-forge.md Normal file
View File

@@ -0,0 +1,319 @@
# Product Requirements Document — Helix Forge
**Domain:** helix_forge
**Repo:** agentic-resources
**Status:** Draft v0.1
**Author:** Claude (drafted with Bernd Worsch)
**Created:** 2026-06-06
**Updated:** 2026-06-19
---
## 1. Summary
Helix Forge is a system for **handling a collection of repositories and evolving
the utility of what those repositories provide**, by treating the coding sessions
run against them as a first-class data source.
Concretely: across a fleet of repos worked on by multiple coding agents (Claude,
Codex, GrokBuild), Helix Forge **inspects the sessions**, **collects data about the
problems agents hit and the moves that resolved them**, and turns that data into
**reusable solution patterns** that can be discussed, implemented, and re-applied —
across every agent flavor, not just the one that discovered the pattern.
The name is the metaphor: a *helix* of repeated turns (session → pattern → improved
session) feeding a *forge* where the tooling, environments, and instructions for our
agents are hammered into better shape over time. This is the operational engine
behind the INTENT.md goal of an *antifragile, continuously-optimizing agentic
ecosystem*.
## 2. Problem Statement
We run many coding sessions, across many repos, with several different agents. Today
the value of each session is **trapped in that session**:
- When an agent solves a tricky problem, the solution is not captured in a form
another agent (or the same agent next week) can reuse.
- When an agent fails, struggles, or burns excess budget on a problem, that failure
signal is lost — we re-encounter the same friction repeatedly.
- Each agent flavor (Claude, Codex, GrokBuild) has its own environment, instruction
format, and extension mechanism, so a fix discovered for one is **not portable** to
the others without manual translation.
- We have no systematic, evidence-based answer to "what is actually slowing our
agents down, and what consistently makes them faster?" — decisions about tooling,
prompts, and environments are made on anecdote.
**The cost:** repeated mistakes, non-transferable wins, slow and uneven improvement
of agent performance, and no feedback loop from real session data back into the
tools/environments/instructions that shape future sessions.
## 3. Goals & Non-Goals
### 3.1 Goals
| # | Goal |
|---|------|
| G1 | **Capture** coding sessions from Claude, Codex, and GrokBuild in a normalized, comparable form. |
| G2 | **Detect** recurring *problem patterns* (failure, friction, wasted budget) and *success patterns* (efficient resolutions) from that data. |
| G3 | **Curate** detected patterns into a reviewed catalog of *solution patterns* that humans and agents can discuss and approve. |
| G4 | **Distribute** approved patterns back into agent environments — as instructions, tools, or extensions — in a per-flavor-appropriate form. |
| G5 | **Measure** whether distributed patterns actually improved subsequent sessions (close the loop). |
| G6 | Keep the whole loop **agent-flavor-agnostic at the core**, with thin per-flavor adapters at the edges. |
### 3.2 Non-Goals (initial release)
- Not a replacement for the coding agents themselves; Helix Forge observes and
improves them, it does not execute coding tasks.
- Not a general APM/observability product; scope is coding-session improvement, not
arbitrary infrastructure monitoring.
- Not an autonomous self-modifying system — pattern promotion into live agent
environments requires human approval (HITL) for the first release.
- Not building new model training/fine-tuning pipelines; we optimize *context,
tooling, and environment*, not model weights.
- Not replacing the Custodian State Hub; Helix Forge is a producer/consumer of hub
state, not a competing system of record. (See §9.)
## 4. Users & Personas
| Persona | Description | What they need from Helix Forge |
|---------|-------------|----------------------------------|
| **Operator (Bernd)** | Owns the agentic ecosystem; decides which patterns become standards. | A reviewable catalog of patterns with evidence; control over what ships to agents. |
| **Coding agent (Claude / Codex / GrokBuild)** | Runs tasks in a repo; both the *source* of session data and the *consumer* of patterns. | To emit session data cheaply; to receive applicable patterns in its native format at session start. |
| **Repo maintainer agent** | The per-repo agent persona (e.g. `agentic-resources`) following AGENTS.md conventions. | Patterns scoped to its repo/domain; integration via existing workplan + state-hub flow. |
| **Reviewer (human or kaizen agent)** | Evaluates candidate patterns before they become standards. | Clear pattern proposals, supporting evidence, and a discuss/approve/reject workflow. |
## 5. Core Concepts (Domain Model)
- **Session** — one bounded run of a coding agent against a repo. Has an agent flavor,
repo, task reference, timeline of events, outcome, and cost (tokens/time).
- **Session Event** — a normalized atomic record within a session: tool call, edit,
test run, error, retry, human intervention, decision, completion.
- **Signal** — a derived indicator extracted from sessions: e.g. *repeated test
failure on same file*, *budget overrun*, *fast clean resolution*, *retry storm*,
*human escalation*.
- **Problem Pattern** — a recurring negative signal cluster ("agents repeatedly fail
X because Y").
- **Success Pattern** — a recurring positive resolution ("doing Z reliably resolves X
cheaply").
- **Solution Pattern** — a curated, reviewed artifact pairing a problem with one or
more recommended resolutions, written agent-flavor-agnostically, with per-flavor
rendering hints.
- **Pattern Application** — the act of distributing a solution pattern into a specific
agent environment (an instruction snippet, a tool, an extension), plus the record of
its effect on later sessions.
## 6. Functional Requirements
### 6.1 Capture (G1)
- **FR-C1** Ingest session transcripts/logs from each supported agent flavor via a
per-flavor **collector adapter**.
- **FR-C2** Normalize raw logs into the common `Session` + `Session Event` schema,
regardless of source flavor.
- **FR-C3** Tag every session with: agent flavor, repo, domain, task/workplan id (if
any), outcome (success/fail/abandoned), and cost metrics (tokens, wall-clock,
retries).
- **FR-C4** Support both **batch import** (historical logs) and **incremental ingest**
(new sessions as they close).
- **FR-C5** Collection must be low-friction and non-blocking — an agent emitting
session data must never slow or break the actual coding task.
### 6.2 Detect (G2)
- **FR-D1** Run signal extractors over normalized sessions to surface problem and
success signals.
- **FR-D2** Cluster recurring signals across sessions/repos/flavors into candidate
Problem Patterns and Success Patterns.
- **FR-D3** For each candidate pattern, attach **evidence**: the supporting sessions,
frequency, affected repos, affected flavors, and estimated cost impact.
- **FR-D4** Flag **cross-flavor** patterns explicitly (a problem seen in Claude that
Codex also hits) — these are the highest-value reuse targets.
### 6.3 Curate (G3)
- **FR-U1** Present candidate patterns for review with their evidence in a
discuss/approve/reject workflow.
- **FR-U2** Allow a reviewer (human or kaizen agent) to promote a candidate into a
**Solution Pattern**: a named, versioned artifact with problem description,
recommended resolution(s), applicability scope, and per-flavor rendering hints.
- **FR-U3** Maintain a **Pattern Catalog** as the source of truth for approved
solution patterns, versioned and stored as files in-repo (consistent with ADR-001:
files originate work, the hub indexes them).
- **FR-U4** Record pattern decisions through the State Hub decision mechanism so
rationale is auditable.
### 6.4 Distribute (G4)
- **FR-X1** Render each approved solution pattern into per-flavor artifacts via
**distributor adapters**:
- Claude → `CLAUDE.md` snippets, skills, or settings/hooks.
- Codex → `AGENTS.md` snippets / repo conventions.
- GrokBuild → its native instruction/extension format.
- **FR-X2** Scope distribution by repo and domain, so a pattern only lands where it
applies.
- **FR-X3** Distribution is **proposed, not auto-applied** in v1 — output is a
reviewable change (e.g. a workplan or PR), gated by human approval.
- **FR-X4** Track which patterns are currently active in which environments.
### 6.5 Measure (G5)
- **FR-M1** After a pattern is applied, compare subsequent sessions touching the same
signal against the pre-application baseline (cost, retry rate, success rate,
human-intervention rate).
- **FR-M2** Surface per-pattern **effectiveness** so ineffective patterns can be
revised or retired.
- **FR-M3** Provide a fleet-level view: are sessions across the collection getting
cheaper / more reliable over time? (the helix turning.)
### 6.6 Multi-Agent Support (G6)
- **FR-A1** The core schema, detection, catalog, and measurement are **flavor-agnostic**.
- **FR-A2** All flavor-specific knowledge lives in **collector adapters** (input) and
**distributor adapters** (output). Adding a fourth agent = adding one collector +
one distributor, no core changes.
- **FR-A3** A successful pattern discovered via one flavor MUST be expressible for all
other supported flavors.
## 7. Architecture Overview
```
┌──────────── per-flavor edges ────────────┐ ┌──── flavor-agnostic core ────┐
│ │ │ │
Claude ─┐ │ │ │
Codex ─┼─► Collector Adapters ──► Normalizer ─┼────────►│ Session + Event Store │
Grok ─┘ │ │ │ │
│ │ ▼ │
│ │ Signal Extractors │
│ │ │ │
│ │ ▼ │
│ │ Pattern Detector / Clusterer│
│ │ │ │
│ │ ▼ │
│ │ Curation + Pattern Catalog │ ◄─ reviewer (human/kaizen)
│ │ │ │
Claude ◄┐ │ │ ▼ │
Codex ◄┼── Distributor Adapters ◄────────────┼─────────│ Effectiveness Measurement │
Grok ◄┘ │ │ │
└───────────────────────────────────────────┘ └──────────────────────────────┘
▲ feeds back into ▲ tools / environments / instructions
```
**Design principle:** *agnostic core, thin adapters at the edges.* The expensive,
reusable intelligence (normalized sessions, detection, catalog, measurement) is built
once; each agent flavor only needs an input adapter and an output adapter.
## 8. Data & Storage
- **Pattern Catalog** and **workplans**: files in `agentic-resources` (per ADR-001 in
AGENTS.md — files are the source of truth, the hub indexes them).
- **Session/event data**: a local store (start simple: structured files / SQLite;
graduate to Postgres alongside the State Hub if volume warrants).
- **Decisions & progress**: recorded through the Custodian State Hub so the broader
ecosystem stays aware of Helix Forge's activity.
## 9. Integration with the Custodian State Hub
Helix Forge runs inside the `helix_forge` domain and is **not** a competing system of
record:
- Work originates as **workplans** in this repo (`AGENTIC-WP-NNNN`), synced via
`make fix-consistency REPO=agentic-resources`.
- Pattern-promotion and distribution decisions are logged via the hub's decision API.
- Each Helix Forge run logs at least one `add_progress_event()` / `POST /progress/`.
- The hub remains a **read model**; Helix Forge writes its durable artifacts as files
and lets the hub index them.
### 9.1 Downstream: kaizen-agentic project metrics correlation
Helix Forge is a **fleet-level** producer of normalized session digests. The
**kaizen-agentic** framework is a **project-scoped** consumer of optional
correlation fields on its execution metrics (ADR-004). The two layers link
**by reference** — kaizen-agentic does not re-implement JSONL ingestion or write
into the Helix Forge store.
| Layer | Owner | What it stores |
|-------|-------|----------------|
| Fleet | agentic-resources (`session_memory`) | Per-session digests in the local SQLite store |
| Project | kaizen-agentic | `.kaizen/metrics/<agent>/executions.jsonl` |
**Canonical spec in this repo:** [DESIGN-session-memory.md §11](DESIGN-session-memory.md#11-project-metrics-correlation-kaizen-agentic)
(session-close env export, digest read path, stable JSON shape).
**Authoritative cross-repo contract (kaizen-agentic):**
[Helix Forge Correlation Contract](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/docs/integrations/helix-forge-correlation.md).
Field mapping: `Session.session_uid``helix_session_uid`; digest token totals →
`tokens`; MCP/tool overhead share → `infra_overhead_share`.
**Read path for consumers:** `HELIX_STORE_DB` points at the digest SQLite file
(default `session_memory/.store/mem.db`); `python -m session_memory.digest_lookup
<uid> --json` or `kaizen-agentic metrics correlate <uid>` performs a read-only
lookup. No ingestion code belongs in kaizen-agentic.
## 10. Success Metrics
| Metric | Meaning | Target (directional, v1) |
|--------|---------|--------------------------|
| Sessions captured | Coverage of real work | ≥ 90% of sessions across the 3 flavors normalized |
| Patterns cataloged | Knowledge made reusable | A growing, non-trivial catalog of reviewed solution patterns |
| Cross-flavor patterns | Reuse leverage | ≥ 1 pattern proven to transfer across flavors |
| Pattern effectiveness | Loop is closing | Applied patterns show measurable cost/reliability improvement vs. baseline |
| Fleet trend | The helix turns | Median session cost ↓ and success rate ↑ over time |
| Repeated-failure rate | Friction eliminated | Known problem patterns recur less after distribution |
## 11. Phasing / Roadmap
- **Phase 0 — Foundations.** Define the Session/Event schema and Pattern Catalog
format. One collector adapter (Claude) + batch import. Manual inspection only.
- **Phase 1 — Detect.** Signal extractors + pattern clustering over captured sessions;
candidate patterns surfaced with evidence. Add Codex + GrokBuild collectors.
- **Phase 2 — Curate.** Review workflow + versioned Pattern Catalog, wired to hub
decisions.
- **Phase 3 — Distribute.** Distributor adapters for all three flavors; patterns ship
as reviewable workplans/PRs (HITL).
- **Phase 4 — Measure.** Baseline-vs-after effectiveness and fleet-level trend
reporting; retire ineffective patterns. Loop is closed.
## 12. Open Questions
- **OQ1** What is the canonical raw log format available from each of Claude, Codex,
and GrokBuild today, and how lossy is normalization from each?
- **OQ2** How are sessions reliably bounded and attributed to a repo/task across the
three flavors?
- **OQ3** Where does detection logic run — local batch jobs, hub-side, or a dedicated
service? What volume do we actually expect?
- ~~**OQ4** Pattern format: how do we keep one agnostic representation while giving each
distributor enough to render high-quality native artifacts?~~ **Resolved (Phase 2,
AGENTIC-WP-0004):** the `SolutionPattern` core is flavor-agnostic (problem,
resolutions, scope, provenance) and carries per-flavor knowledge only in a separate
`rendering_hints` sub-structure keyed by flavor — distributors read the hints, the
core stays neutral. Catalogued as versioned files-first artifacts (FR-U3).
- ~~**OQ5** What's the minimum trustworthy evidence bar before a pattern is allowed to be
distributed to live agent environments?~~ **Resolved (Phase 2):** a two-tier
evidence bar (`[curate.gate]`). A *promote* floor (frequency / distinct sessions /
cost-impact) admits a candidate as `provisional`; a stricter *distribution* floor
(higher frequency, optional cross-flavor requirement, cost-impact) is required to
mark a pattern `approved` + `distribution_ready`. Defaults are conservative and
config-tunable.
- ~~**OQ6** How do we prevent pattern bloat — too many low-value instructions degrading
agent context budgets (cf. the token-budget policy in global instructions)?~~
**Resolved (Phase 2):** a bloat guard flags duplicate (same id) and near-duplicate
(same signal-type+locus) candidates at review time, and the catalog dedups
structurally on the source-candidate key so re-promotion never multiplies entries.
Thin candidates stay `provisional` (not distributed) rather than padding live
context.
## 13. Risks
| Risk | Mitigation |
|------|------------|
| Capture overhead slows real coding sessions | Async, non-blocking collection (FR-C5); never in the agent's critical path. |
| Patterns become noise / context bloat | Effectiveness gating (FR-M2) + retirement; measure before broad distribution. |
| Over-fitting to one flavor | Agnostic core + explicit cross-flavor flagging (FR-D4, FR-A3). |
| Bad pattern degrades agents | HITL approval before distribution (FR-X3); baseline measurement to catch regressions. |
| Drift from State Hub conventions | Files-first per ADR-001; log via hub; no competing source of record. |
---
*This PRD is a draft for discussion. Next step: a `proposed` workplan
(`AGENTIC-WP-0002`) scoping Phase 0 — the Session/Event schema and the first
(Claude) collector adapter.*

12
registry/README.md Normal file
View File

@@ -0,0 +1,12 @@
# Capability Registry
Markdown-first capability index for federation and reuse planning.
## Authoring
1. Copy a capability entry template (see reuse-surface `templates/capability-entry.template.md`).
2. Add the row to `indexes/capabilities.yaml`.
3. Run `reuse-surface validate` from a checkout with the CLI installed.
4. Merge to `main` and verify publish with `reuse-surface establish --publish-check`.
Federation contract: reuse-surface `docs/RegistryFederation.md`.

View File

View File

@@ -0,0 +1,4 @@
version: 1
updated: '2026-06-16'
domain: helix_forge
capabilities: []

260
session_memory/README.md Normal file
View File

@@ -0,0 +1,260 @@
# session_memory
Capture + retention layer for Helix Forge — the **Capture** stage of the loop in
[../docs/PRD-helix-forge.md](../docs/PRD-helix-forge.md), built to the
[../docs/DESIGN-session-memory.md](../docs/DESIGN-session-memory.md) spec.
It scans coding-agent session logs, normalizes them into one schema, distills a
compact per-session digest, and ages out raw bulk under a **storage budget**
(dropping sessions once analyzed and once space is needed) rather than a fixed
time window.
## Layout
```
session_memory/
adapters/common.py # shared Normalized bundle + helpers
adapters/claude.py # Tier0 -> Tier1 normalizers, one per flavor
adapters/codex.py # (rollout {timestamp,type,payload}, flat call_id join)
adapters/grok.py # (per-session dir: chat_history + events + updates)
core/schema.py # Session / SessionEvent / Cost
core/store.py # SQLite rows + blob-dir bodies (Tier1) + digests/patterns (Tier2)
core/cursor.py # incremental ingest cursors
core/digest.py # Tier1 -> Tier2 promotion + outcome heuristic
core/retention.py # budget-based eviction sweep
ingest.py # one sweep: discover -> normalize -> store -> digest -> evict
detect/signals.py # signal extractors over digests
detect/cluster.py # cluster signals -> candidate patterns + cross-flavor flag
detect/__main__.py # python -m session_memory.detect (ranked report)
curate/schema.py # SolutionPattern artifact + per-flavor rendering hints
curate/catalog.py # versioned, files-first Pattern Catalog (dedup on id)
curate/gating.py # promotion evidence bar + bloat guard
curate/review.py # discuss/approve/reject -> promote workflow
curate/decisions.py # hub decision audit trail (graceful local-queue fallback)
curate/__main__.py # python -m session_memory.curate (interactive / --auto-approve)
catalog/ # the committed Pattern Catalog (source of truth)
distribute/base.py # Artifact + Distributor protocol + idempotent snippet markers
distribute/claude.py # CLAUDE.md (or skill) renderer } per-flavor edges
distribute/codex.py # AGENTS.md renderer } (agnostic body,
distribute/grok.py # native instruction renderer } different targets)
distribute/proposals.py # scoping + proposed-not-applied output + active registry
distribute/__main__.py # python -m session_memory.distribute
measure/metrics.py # fleet metrics + persisted baseline snapshots
measure/effect.py # before/after per-pattern effectiveness
measure/__main__.py # python -m session_memory.measure
retro/build.py # windowed top-3-per-repo suggestions
retro/publish.py # hub coding_retro read model + local report
retro/__main__.py # python -m session_memory.retro
digest_lookup.py # python -m session_memory.digest_lookup (read one digest, no ingest)
config.toml # store paths, retention caps, sources, repo->domain map, curate gate
```
The local store lives under `session_memory/.store/` (gitignored).
## Run a sweep
```bash
# from the repo root
python -m session_memory.ingest # ingest + analyze + evict
python -m session_memory.ingest --dry-run # discover + parse only, writes nothing
python -m session_memory.ingest --config path/to/config.toml
```
Output reports `discovered / ingested / skipped_unchanged / analyzed` and a
retention line (`freed`, `final_usage`, and per-pass eviction counts). Sweeps are
idempotent — re-running skips unchanged files via the cursor.
## Scheduling (cadence)
Retention is budget-based; the `cadence` in `config.toml` only decides how often
the sweep *runs*. Trigger it with the repo scheduler, e.g. daily:
```bash
# Claude Code: schedule a daily routine that runs the sweep
/schedule "daily session-memory sweep" -- python -m session_memory.ingest
```
or a cron entry / `/loop` on a timer. Push-capture (agent Stop/SessionEnd hooks)
can also enqueue a sweep; see design §7.
## Detect candidate patterns
After ingesting, mine the digests for recurring problem/success patterns:
```bash
python -m session_memory.detect # ranked report, cross-flavor first
python -m session_memory.detect --json # machine-readable candidates
python -m session_memory.detect --min-frequency 3
```
Candidates are persisted to a Tier 2 `patterns` table and are the input to the
Curate phase (Phase 2). Patterns whose evidence spans more than one agent flavor
are flagged `[CROSS-FLAVOR]` — the highest-value reuse targets.
## Curate candidates into the Pattern Catalog
Review detect candidates into versioned **Solution Patterns** held in the
files-first catalog (`session_memory/catalog/`). The flow is **detect → curate →
(Phase 3) distribute**; `curate` refreshes candidates by running detect first.
```bash
python -m session_memory.curate # interactive review (a/r/d per candidate)
python -m session_memory.curate --auto-approve # batch: promote all that clear the evidence bar
python -m session_memory.curate --json # machine-readable result
```
- **Promotion** writes a `SolutionPattern` file (id = source candidate key, so
re-promoting the same candidate dedups; content changes bump the semver and
archive the prior version to `<id>.history.jsonl`).
- The **evidence bar** (`[curate.gate]`) sets two floors: a promote floor and a
stricter *distribution* floor. A thin-but-real candidate lands `provisional`;
one clearing the distribution floor lands `approved` + `distribution_ready`.
- A **bloat guard** flags duplicate / near-duplicate candidates so the catalog
stays lean.
- Re-review is **idempotent** — a remembered decision is skipped unless the
candidate's evidence changed; a prior reject is not re-surfaced.
- Each final promote/reject is recorded as a **hub decision**; if the hub is
offline the decision is queued to `[curate].decision_queue` for later sync
(the same after-the-fact pattern used in Phase 1).
### Curate knobs (`[curate]` / `[curate.gate]` in config.toml)
| Key | Meaning |
|-----|---------|
| `catalog_dir` | committed Pattern Catalog dir (source of truth) |
| `review_log` / `decision_queue` | remembered decisions + pending hub decisions (gitignored) |
| `min_frequency` / `min_sessions` / `min_cost_impact` | floor to promote at all |
| `dist_require_cross_flavor` | require cross-flavor evidence to be distribution-eligible |
| `dist_min_frequency` / `dist_min_cost_impact` | stricter floor for `distribution_ready` |
## Distribute patterns as per-flavor proposals
Render approved catalog patterns into per-flavor artifacts — **proposed, never
auto-applied** (HITL). Completes the loop: **detect → curate → distribute**.
```bash
python -m session_memory.distribute # proposals for all repos/flavors
python -m session_memory.distribute --repo state-hub --flavor claude
python -m session_memory.distribute --json
```
- Only `approved` + `distribution_ready` patterns are rendered; each pattern's
`Scope` (repos/domains/flavors) decides where it lands (FR-X2).
- Each flavor renders the **same agnostic body** to its own target (Claude →
`CLAUDE.md`/skill, Codex → `AGENTS.md`, Grok → native) via `rendering_hints`
(FR-A3); blocks carry stable `BEGIN/END` markers so re-running updates in place.
- Output goes to `session_memory/proposals/<repo>/<target>` (gitignored,
regenerated) — a reviewable diff a human applies (FR-X3). The committed
`distribute/active_patterns.json` records which pattern+version is proposed in
which `(repo, flavor)` (FR-X4).
## Measure effectiveness (closing the loop)
Track whether the fleet is getting cheaper / more reliable, and whether a
distributed pattern actually helped.
```bash
python -m session_memory.measure --label "baseline" # snapshot + trend
python -m session_memory.measure --since 2026-06-07 # before/after a change
python -m session_memory.measure --no-save --json
```
- A **snapshot** (infra-overhead share, error rate, schema-thrash, token
percentiles, success rate) is appended to `measure/baselines.jsonl` to build a
trend (FR-M3).
- `--since DATE` splits sessions before/after a change and diffs the metrics, with
an `improved` verdict per metric (FR-M1/FR-M2) — so ineffective patterns can be
retired. Recorded pre-fix baseline (2026-06-07): 27 sessions, infra-overhead
median 11.7 %, error rate 0.96, schema-thrash 8 sessions.
## Weekly retro (the input to the scheduled retrospection)
A windowed roll-up: detect + measure over the last N days → the **top-3
improvement suggestions per repo** (cross-flavor first; recommendations pulled
from the Pattern Catalog) → published to the hub as the `coding_retro` read model.
```bash
python -m session_memory.retro # last 7 days, local report
python -m session_memory.retro --window-days 30 --json
python -m session_memory.retro --publish # also post coding_retro to the hub
```
Writes `retro/last_retro.{json,md}` and (with `--publish`) posts an
`event_type=coding_retro` progress event. This is consumed by activity-core's
**Weekly Coding Retrospection** schedule (ACTIVITY-WP-0008, Saturday 19:00 Berlin),
which emits one improvement task per relevant repo. Hub publish degrades
gracefully when the hub is unreachable.
## Correlation with kaizen-agentic
Helix Forge owns **fleet-level** session digests; **kaizen-agentic** owns
**project-scoped** execution metrics (ADR-004). The two layers correlate by
optional `helix_session_uid` on project records — **link-by-reference only**;
kaizen-agentic does not ingest JSONL into this store.
| Layer | Storage |
|-------|---------|
| Fleet (here) | `session_memory/.store/mem.db``digests` table |
| Project (kaizen) | `.kaizen/metrics/<agent>/executions.jsonl` |
- **Spec:** [DESIGN-session-memory.md §11](../docs/DESIGN-session-memory.md#11-project-metrics-correlation-kaizen-agentic)
- **Contract (kaizen-agentic):** [Helix Forge Correlation Contract](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/docs/integrations/helix-forge-correlation.md)
### Session-close env export
After ingest has written the digest, agents using both layers export `HELIX_*`
vars for `kaizen-agentic metrics record` to merge (names match ADR-004):
`HELIX_SESSION_UID`, `HELIX_REPO`, `HELIX_FLAVOR`, `HELIX_TOKENS`,
`HELIX_INFRA_OVERHEAD_SHARE`, and optionally `HELIX_STORE_DB` (absolute path to
`mem.db`). See DESIGN §11.1 for field sources.
### Read one digest (for `metrics correlate`)
```bash
python -m session_memory.digest_lookup claude:abc-123 --json
HELIX_STORE_DB=/abs/path/to/mem.db python -m session_memory.digest_lookup <uid>
```
Defaults to `[store].db_path` in `config.toml`. Read-only — does not run ingest.
## Retention knobs (`[retention]` in config.toml)
| Key | Meaning |
|-----|---------|
| `raw_soft_cap_bytes` | begin evicting **analyzed** sessions above this (oldest first) |
| `raw_hard_cap_bytes` | absolute Tier 1 ceiling; overflow path may, as a last resort, evict un-analyzed sessions and report `data_loss` |
| `raw_max_age_days` | backstop: analyzed raw older than this is evictable regardless of space |
| `distilled_cap_bytes` | Tier 2 ceiling — **alert only**, never auto-dropped |
**Invariant:** a session's raw bytes are never dropped before its Tier 2 digest
exists, except the explicitly-reported hard-cap overflow path.
## Tests
```bash
python -m pytest # schema, adapters, store, digest, retention, ingest, detect, curate
```
## Status
- **Phase 0** (AGENTIC-WP-0002): schema, store, digest, budget retention, Claude
adapter, ingest sweep.
- **Phase 1** (AGENTIC-WP-0003): Codex + Grok adapters, multi-file session merge,
and the Detect pipeline (signals → clustering → cross-flavor candidate patterns).
- **Phase 2** (AGENTIC-WP-0004): Curate — Solution Pattern schema, versioned
files-first Pattern Catalog, discuss/approve/reject review with an evidence bar +
bloat guard, and hub-decision audit trail.
- **Detect hardening** (AGENTIC-WP-0005): session-quality filter + tool-mix /
infra-overhead signals. **Error mining** (AGENTIC-WP-0006): recurring error
fingerprints → root-cause patterns.
- **Phase 3** (AGENTIC-WP-0007): Distribute — per-flavor distributor adapters
render approved patterns into proposed (HITL) artifacts, scoped by repo/domain,
with an active-pattern registry.
- **Phase 4** (AGENTIC-WP-0009): Measure — fleet baseline/trend + before/after
per-pattern effectiveness. The Capture → Detect → Curate → Distribute → Measure
loop is closed.
- **Weekly retro** (AGENTIC-WP-0010): windowed top-3-per-repo + hub `coding_retro`
publish.
- **Kaizen correlation** (AGENTIC-WP-0011): bidirectional doc links, session-close
`HELIX_*` env convention, `digest_lookup` read path.

View File

@@ -0,0 +1,7 @@
"""Coding Session Memory — Helix Forge capture + retention layer.
See docs/DESIGN-session-memory.md. Importable package name uses an underscore
(``session_memory``) where the design doc writes ``session-memory/``.
"""
__all__ = ["core", "adapters"]

View File

@@ -0,0 +1 @@
"""Per-flavor collector adapters (Tier 0 -> Tier 1 normalization)."""

View File

@@ -0,0 +1,162 @@
"""Claude Code collector adapter — Tier 0 -> Tier 1 (design §2.1, §4.3).
Reads ``~/.claude/projects/<url-encoded-cwd>/<session-uuid>.jsonl`` (and
``agent-*.jsonl`` sidechains), discriminates on the record ``type``, reconstructs
the turn DAG via ``uuid``/``parentUuid``, and emits normalized records.
Returns a :class:`Normalized` bundle: the ``Session``, its ordered
``SessionEvent`` list, and a ``blobs`` map (``payload_ref -> full text body``)
that the store persists out-of-line so Tier 1 rows stay light.
"""
from __future__ import annotations
import os
from typing import Any, Optional
from ..core.schema import Cost, Session, SessionEvent
from .common import ( # noqa: F401 (Normalized re-exported for back-compat)
Normalized,
classify_tool,
first_line as _first_line,
iter_jsonl as _iter_records,
now_iso as _now,
resolve_repo as _resolve_repo,
seconds_between as _seconds_between,
stringify as _stringify,
)
FLAVOR = "claude"
def _content_blocks(message: dict[str, Any]) -> list[dict[str, Any]]:
content = message.get("content")
if isinstance(content, str):
return [{"type": "text", "text": content}]
if isinstance(content, list):
return [b for b in content if isinstance(b, dict)]
return []
def parse_session(path: str, repo_domain_map: Optional[dict[str, str]] = None) -> Optional[Normalized]:
"""Parse one Claude transcript file into a Normalized bundle.
Returns None if the file has no usable session records.
"""
repo_domain_map = repo_domain_map or {}
records = list(_iter_records(path))
if not records:
return None
session_id: Optional[str] = None
cwd = git_branch = version = model = None
timestamps: list[str] = []
file_is_sidechain = os.path.basename(path).startswith("agent-")
events: list[SessionEvent] = []
blobs: dict[str, str] = {}
uuid_to_seq: dict[str, int] = {}
cost = Cost()
seq = 0
def add_event(uuid: Optional[str], parent_uuid: Optional[str], ts, kind, *,
role=None, tool=None, summary=None, body=None, tokens=0, sidechain=False):
nonlocal seq
s = seq
seq += 1
if uuid:
uuid_to_seq[uuid] = s
parent_seq = uuid_to_seq.get(parent_uuid) if parent_uuid else None
payload_ref = None
if body:
payload_ref = f"blob://{session_id}/{s}"
blobs[payload_ref] = body
events.append(SessionEvent(
session_uid=Session.make_uid(FLAVOR, session_id or "unknown"),
seq=s, parent_seq=parent_seq, ts=ts, kind=kind, role=role, tool=tool,
summary=(summary or "")[:300] or None, payload_ref=payload_ref,
tokens=tokens, is_sidechain=sidechain or file_is_sidechain,
))
for rec in records:
rtype = rec.get("type")
ts = rec.get("timestamp")
if ts:
timestamps.append(ts)
session_id = session_id or rec.get("sessionId")
cwd = cwd or rec.get("cwd")
git_branch = git_branch or rec.get("gitBranch")
version = version or rec.get("version")
uuid = rec.get("uuid")
parent = rec.get("parentUuid")
sidechain = bool(rec.get("isSidechain"))
if rtype == "user":
msg = rec.get("message", {})
for b in _content_blocks(msg):
bt = b.get("type")
if bt == "tool_result":
body = _stringify(b.get("content"))
add_event(uuid, parent, ts, "tool_result", role="tool",
summary="tool result", body=body, sidechain=sidechain)
else:
text = b.get("text", "")
add_event(uuid, parent, ts, "user_msg", role="user",
summary=_first_line(text), body=text, sidechain=sidechain)
elif rtype == "assistant":
msg = rec.get("message", {})
model = model or msg.get("model")
usage = msg.get("usage") or {}
cost.input_tokens += int(usage.get("input_tokens", 0) or 0)
cost.output_tokens += int(usage.get("output_tokens", 0) or 0)
cost.cache_tokens += int(
(usage.get("cache_read_input_tokens", 0) or 0)
+ (usage.get("cache_creation_input_tokens", 0) or 0)
)
out_tokens = int(usage.get("output_tokens", 0) or 0)
for b in _content_blocks(msg):
bt = b.get("type")
if bt == "thinking":
add_event(uuid, parent, ts, "thinking", role="assistant",
summary="thinking", body=b.get("thinking", ""), sidechain=sidechain)
elif bt == "text":
text = b.get("text", "")
add_event(uuid, parent, ts, "assistant_msg", role="assistant",
summary=_first_line(text), body=text, tokens=out_tokens, sidechain=sidechain)
elif bt == "tool_use":
name = b.get("name", "")
inp = b.get("input", {})
body = _stringify(inp)
cmd = inp.get("command", "") if isinstance(inp, dict) else ""
kind = classify_tool(name, _stringify(cmd))
add_event(uuid, parent, ts, kind, role="assistant", tool=name,
summary=f"{name}", body=body, sidechain=sidechain)
elif rtype == "summary":
add_event(uuid, parent, ts, "lifecycle", summary="summary",
body=_stringify(rec.get("summary")), sidechain=sidechain)
# queue-operation / ai-title / last-prompt / attachment: skipped as events
if session_id is None:
return None
cost.turns = sum(1 for e in events if e.kind == "user_msg")
started = min(timestamps) if timestamps else None
ended = max(timestamps) if timestamps else None
cost.wall_clock_s = _seconds_between(started, ended)
repo, domain = _resolve_repo(cwd, repo_domain_map)
session = Session(
session_uid=Session.make_uid(FLAVOR, session_id),
flavor=FLAVOR,
native_session_id=session_id,
repo=repo, domain=domain, cwd=cwd, git_branch=git_branch,
model=model, started_at=started, ended_at=ended,
outcome="unknown", # outcome inference happens in the digest step (T04)
cost=cost,
source_path=path,
source_bytes=os.path.getsize(path) if os.path.exists(path) else 0,
discovered_at=_now(),
)
return Normalized(session=session, events=events, blobs=blobs)

View File

@@ -0,0 +1,167 @@
"""OpenAI Codex CLI collector adapter — Tier 0 -> Tier 1 (design §2.2, §4.3).
Reads ``$CODEX_HOME/sessions/YYYY/MM/DD/rollout-*.jsonl``. Each line is a
``RolloutLine`` wrapper ``{timestamp, type, payload}``; ``type`` discriminates
``session_meta`` / ``response_item`` / ``event_msg`` / ``turn_context`` /
``compacted``.
Codex is **flat** — tool calls and outputs are joined only by ``call_id`` with no
parent-ref DAG — so ``seq`` is assigned by temporal (line) order and
``parent_seq`` is set for ``function_call_output`` back to its ``function_call``.
"""
from __future__ import annotations
import os
from typing import Any, Optional
from ..core.schema import Cost, Session, SessionEvent
from .common import (
Normalized,
classify_tool,
first_line,
iter_jsonl,
now_iso,
resolve_repo,
seconds_between,
stringify,
)
FLAVOR = "codex"
def _message_text(payload: dict[str, Any]) -> str:
content = payload.get("content")
if isinstance(content, str):
return content
parts = []
if isinstance(content, list):
for b in content:
if isinstance(b, dict):
parts.append(b.get("text") or b.get("output_text") or "")
elif isinstance(b, str):
parts.append(b)
return "\n".join(p for p in parts if p)
def _extract_tokens(payload: dict[str, Any]) -> tuple[int, int, int]:
"""Best-effort (input, output, cache) from a token_count payload.
Field shapes vary across Codex versions; probe known locations, else recurse.
"""
for scope in (payload, payload.get("info") or {}, payload.get("usage") or {},
(payload.get("info") or {}).get("total_token_usage") or {}):
if isinstance(scope, dict):
i = scope.get("input_tokens") or scope.get("prompt_tokens")
o = scope.get("output_tokens") or scope.get("completion_tokens")
if i is not None or o is not None:
cache = scope.get("cached_input_tokens") or scope.get("cache_read_input_tokens") or 0
return int(i or 0), int(o or 0), int(cache or 0)
return 0, 0, 0
def parse_session(path: str, repo_domain_map: Optional[dict[str, str]] = None) -> Optional[Normalized]:
repo_domain_map = repo_domain_map or {}
records = list(iter_jsonl(path))
if not records:
return None
session_id: Optional[str] = None
cwd = model = cli_version = None
timestamps: list[str] = []
events: list[SessionEvent] = []
blobs: dict[str, str] = {}
call_seq: dict[str, int] = {} # call_id -> seq of its function_call
cost = Cost()
seq = 0
def add_event(ts, kind, *, role=None, tool=None, summary=None, body=None,
tokens=0, parent_seq=None) -> int:
nonlocal seq
s = seq
seq += 1
payload_ref = None
if body:
payload_ref = f"blob://{session_id}/{s}"
blobs[payload_ref] = body
events.append(SessionEvent(
session_uid=Session.make_uid(FLAVOR, session_id or "unknown"),
seq=s, parent_seq=parent_seq, ts=ts, kind=kind, role=role, tool=tool,
summary=(summary or "")[:300] or None, payload_ref=payload_ref, tokens=tokens,
))
return s
for rec in records:
rtype = rec.get("type")
ts = rec.get("timestamp")
if ts:
timestamps.append(ts)
payload = rec.get("payload") or {}
if rtype == "session_meta":
session_id = session_id or payload.get("id")
cwd = cwd or payload.get("cwd")
model = model or payload.get("model")
cli_version = cli_version or payload.get("cli_version")
elif rtype == "turn_context":
model = model or payload.get("model")
elif rtype == "response_item":
ptype = payload.get("type")
if ptype == "message":
role = payload.get("role", "assistant")
text = _message_text(payload)
kind = "assistant_msg" if role == "assistant" else "user_msg"
add_event(ts, kind, role=role, summary=first_line(text), body=text)
elif ptype == "function_call":
name = payload.get("name", "")
args = stringify(payload.get("arguments"))
kind = classify_tool(name, args)
s = add_event(ts, kind, role="assistant", tool=name,
summary=name, body=args)
call_id = payload.get("call_id")
if call_id:
call_seq[call_id] = s
elif ptype == "function_call_output":
call_id = payload.get("call_id")
parent = call_seq.get(call_id)
body = stringify(payload.get("output"))
add_event(ts, "tool_result", role="tool", tool=None,
summary="tool result", body=body, parent_seq=parent)
elif ptype == "reasoning":
body = _message_text(payload) or stringify(payload.get("summary"))
add_event(ts, "thinking", role="assistant", summary="reasoning", body=body)
elif rtype == "event_msg":
ptype = payload.get("type")
if ptype == "task_started":
add_event(ts, "lifecycle", summary="task_started")
elif ptype == "task_complete":
add_event(ts, "completion", summary="task_complete")
elif ptype == "token_count":
i, o, c = _extract_tokens(payload)
cost.input_tokens += i
cost.output_tokens += o
cost.cache_tokens += c
# user_message / agent_message echoes are duplicated by response_item
# messages on modern Codex; skipped to avoid double counting.
if session_id is None:
return None
cost.turns = sum(1 for e in events if e.kind == "user_msg")
started = min(timestamps) if timestamps else None
ended = max(timestamps) if timestamps else None
cost.wall_clock_s = seconds_between(started, ended)
repo, domain = resolve_repo(cwd, repo_domain_map)
session = Session(
session_uid=Session.make_uid(FLAVOR, session_id),
flavor=FLAVOR, native_session_id=session_id,
repo=repo, domain=domain, cwd=cwd, model=model,
started_at=started, ended_at=ended, outcome="unknown", cost=cost,
source_path=path, source_bytes=os.path.getsize(path) if os.path.exists(path) else 0,
discovered_at=now_iso(),
)
return Normalized(session=session, events=events, blobs=blobs)

View File

@@ -0,0 +1,100 @@
"""Shared adapter helpers (Tier 0 -> Tier 1).
The ``Normalized`` bundle contract and small flavor-agnostic helpers used by every
collector adapter. Per-flavor parsing lives in the individual adapter modules.
"""
from __future__ import annotations
import json
import os
from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Any, Optional
from ..core.schema import Session, SessionEvent
# tool names that mutate files -> kind "edit" (union across flavors)
EDIT_TOOLS = {
"Edit", "Write", "NotebookEdit", "MultiEdit", # Claude
"apply_patch", "write_file", "edit_file", # Codex / Grok variants
}
# substrings in a shell/tool command that indicate a test run -> kind "test_run"
TEST_HINTS = (
"pytest", "unittest", "npm test", "npm run test", "go test",
"cargo test", "jest", "vitest", "make test", "tox",
)
@dataclass
class Normalized:
session: Session
events: list[SessionEvent]
blobs: dict[str, str] = field(default_factory=dict)
def resolve_repo(cwd: Optional[str], repo_domain_map: dict[str, str]) -> tuple[Optional[str], Optional[str]]:
"""cwd -> (repo, domain). repo is the cwd basename; domain via map."""
if not cwd:
return None, None
repo = os.path.basename(cwd.rstrip("/")) or None
domain = repo_domain_map.get(repo) if repo else None
return repo, domain
def is_test_command(text: str) -> bool:
low = (text or "").lower()
return any(h in low for h in TEST_HINTS)
def classify_tool(name: str, command_text: str = "") -> str:
"""Map a tool invocation to an event kind: edit | test_run | tool_call."""
if name in EDIT_TOOLS:
return "edit"
if is_test_command(command_text) or is_test_command(name):
return "test_run"
return "tool_call"
def stringify(v: Any, limit: int = 20000) -> str:
if v is None:
return ""
if isinstance(v, str):
return v[:limit]
try:
return json.dumps(v, ensure_ascii=False)[:limit]
except (TypeError, ValueError):
return str(v)[:limit]
def first_line(text: str) -> str:
t = (text or "").strip()
return t.splitlines()[0] if t else ""
def seconds_between(start: Optional[str], end: Optional[str]) -> float:
if not start or not end:
return 0.0
try:
a = datetime.fromisoformat(start.replace("Z", "+00:00"))
b = datetime.fromisoformat(end.replace("Z", "+00:00"))
return max(0.0, (b - a).total_seconds())
except ValueError:
return 0.0
def iter_jsonl(path: str):
"""Yield parsed JSON objects from a JSONL file, tolerating bad lines."""
with open(path, "r", encoding="utf-8") as f:
for line in f:
line = line.strip()
if not line:
continue
try:
yield json.loads(line)
except json.JSONDecodeError:
continue
def now_iso() -> str:
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")

View File

@@ -0,0 +1,182 @@
"""Grok CLI collector adapter — Tier 0 -> Tier 1 (design §2.3, §4.3).
A Grok session is a *directory* ``~/.grok/sessions/<enc-cwd>/<uuid>/`` containing
``summary.json`` (metadata), ``chat_history.jsonl`` (the canonical transcript),
``events.jsonl`` (explicit lifecycle + ``turn_number``), and ``updates.jsonl``
(ACP ``session/update`` stream, which carries tool-call names/args).
The ingest glob matches ``chat_history.jsonl``; this adapter derives its sibling
files from the same directory. Conversation order is taken from
``chat_history.jsonl``; tool-call names are paired, in order, from
``updates.jsonl`` ``tool_call`` entries to classify edits/test runs.
"""
from __future__ import annotations
import json
import os
from typing import Any, Optional
from ..core.schema import Cost, Session, SessionEvent
from .common import (
Normalized,
classify_tool,
first_line,
iter_jsonl,
now_iso,
resolve_repo,
seconds_between,
stringify,
)
FLAVOR = "grok"
def _text_content(content: Any) -> str:
if isinstance(content, str):
return content
if isinstance(content, list):
return "\n".join(
(b.get("text") or "") for b in content if isinstance(b, dict)
)
return ""
def _tool_calls_in_order(session_dir: str) -> list[dict[str, Any]]:
"""Ordered list of {title, rawInput} from updates.jsonl tool_call entries."""
calls: list[dict[str, Any]] = []
upd = os.path.join(session_dir, "updates.jsonl")
if not os.path.exists(upd):
return calls
for rec in iter_jsonl(upd):
u = (rec.get("params") or {}).get("update") or {}
if u.get("sessionUpdate") == "tool_call":
calls.append({"title": u.get("title") or "", "rawInput": u.get("rawInput") or {},
"id": u.get("toolCallId")})
return calls
def _session_meta(session_dir: str) -> dict[str, Any]:
p = os.path.join(session_dir, "summary.json")
if not os.path.exists(p):
return {}
try:
with open(p, "r", encoding="utf-8") as f:
return json.load(f)
except (OSError, ValueError):
return {}
def _lifecycle(session_dir: str) -> tuple[list[dict[str, Any]], Optional[str]]:
"""events.jsonl records + the model id seen there."""
evs, model = [], None
p = os.path.join(session_dir, "events.jsonl")
if os.path.exists(p):
for rec in iter_jsonl(p):
evs.append(rec)
model = model or rec.get("model_id")
return evs, model
def parse_session(path: str, repo_domain_map: Optional[dict[str, str]] = None) -> Optional[Normalized]:
repo_domain_map = repo_domain_map or {}
# accept either the chat_history.jsonl path or the session dir
session_dir = path if os.path.isdir(path) else os.path.dirname(path)
chat = os.path.join(session_dir, "chat_history.jsonl")
if not os.path.exists(chat):
return None
meta = _session_meta(session_dir)
info = meta.get("info") or {}
session_id = info.get("id") or os.path.basename(session_dir.rstrip("/"))
cwd = info.get("cwd") or meta.get("git_root_dir")
life_events, life_model = _lifecycle(session_dir)
model = meta.get("current_model_id") or life_model
pending_calls = _tool_calls_in_order(session_dir)
call_idx = 0
events: list[SessionEvent] = []
blobs: dict[str, str] = {}
seq = 0
def add(kind, *, role=None, tool=None, summary=None, body=None, parent_seq=None) -> int:
nonlocal seq
s = seq
seq += 1
ref = None
if body:
ref = f"blob://{session_id}/{s}"
blobs[ref] = body
events.append(SessionEvent(
session_uid=Session.make_uid(FLAVOR, session_id), seq=s, parent_seq=parent_seq,
ts=None, kind=kind, role=role, tool=tool,
summary=(summary or "")[:300] or None, payload_ref=ref,
))
return s
# explicit lifecycle first (turn_started/turn_ended carry no bodies)
for le in life_events:
t = le.get("type")
if t in ("turn_started", "loop_started", "turn_ended", "phase_changed"):
add("lifecycle", summary=t)
for rec in iter_jsonl(chat):
rtype = rec.get("type")
content = rec.get("content")
if rtype == "user":
text = _text_content(content)
if text.strip():
add("user_msg", role="user", summary=first_line(text), body=text)
elif rtype == "reasoning":
text = _text_content(content)
if text.strip():
add("thinking", role="assistant", summary="reasoning", body=text)
elif rtype == "assistant":
text = _text_content(content)
if text.strip():
add("assistant_msg", role="assistant", summary=first_line(text), body=text)
elif rtype == "tool_result":
# pair with the next tool_call (in order) to recover name/args
tool = None
parent = None
if call_idx < len(pending_calls):
call = pending_calls[call_idx]
call_idx += 1
tool = call["title"]
cmd = stringify(call["rawInput"])
kind = classify_tool(tool, cmd)
parent = add(kind, role="assistant", tool=tool, summary=tool, body=cmd)
body = _text_content(content) if not isinstance(content, str) else content
add("tool_result", role="tool", tool=tool, summary="tool result",
body=stringify(body), parent_seq=parent)
if not events:
return None
cost = Cost(turns=sum(1 for e in events if e.kind == "user_msg"))
started = info.get("created_at") or meta.get("created_at")
ended = meta.get("last_active_at") or info.get("updated_at") or meta.get("updated_at")
cost.wall_clock_s = seconds_between(started, ended)
repo, domain = resolve_repo(cwd, repo_domain_map)
session = Session(
session_uid=Session.make_uid(FLAVOR, session_id), flavor=FLAVOR,
native_session_id=session_id, repo=repo, domain=domain, cwd=cwd,
git_branch=meta.get("head_branch"), model=model,
started_at=started, ended_at=ended, outcome="unknown", cost=cost,
source_path=chat,
source_bytes=_dir_bytes(session_dir),
discovered_at=now_iso(),
)
return Normalized(session=session, events=events, blobs=blobs)
def _dir_bytes(d: str) -> int:
total = 0
for root, _, files in os.walk(d):
for f in files:
try:
total += os.path.getsize(os.path.join(root, f))
except OSError:
pass
return total

View File

@@ -0,0 +1 @@
{"created_at": "2026-06-07T09:13:20Z", "distribution_ready": true, "id": "sp-problem-budget_overrun-tokens", "name": "problem: budget overrun", "polarity": "problem", "problem": "problem: budget overrun", "provenance": {"detected_at": null, "evidence": {"cost_impact": 10.667, "cross_flavor": false, "flavors": ["claude"], "frequency": 3, "key": "problem:budget_overrun:tokens", "locus": "tokens", "polarity": "problem", "repos": ["artifact-store", "citation-evidence", "infospace-bench"], "score": 32.001, "sessions": ["claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca", "claude:6e0d3d68-872b-4d93-bb09-0691e091314b", "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78"], "signal_type": "budget_overrun", "title": "problem: budget overrun"}, "promoted_at": "2026-06-07T09:13:20Z", "source_key": "problem:budget_overrun:tokens"}, "rendering_hints": {"claude": {"note": "TODO: refine rendering", "target": "CLAUDE.md"}}, "resolutions": [{"detail": "", "steps": [], "summary": "TODO: capture the recommended resolution"}], "schema_version": 1, "scope": {"domains": [], "flavors": ["claude"], "repos": ["artifact-store", "citation-evidence", "infospace-bench"]}, "status": "superseded", "updated_at": "2026-06-07T09:13:20Z", "version": "1.0.0"}

View File

@@ -0,0 +1,77 @@
{
"created_at": "2026-06-07T09:13:20Z",
"distribution_ready": true,
"id": "sp-problem-budget_overrun-tokens",
"name": "Budget overrun: token cost above peers",
"polarity": "problem",
"problem": "A session's token cost lands well above its peers (>p90). Usually driven by re-reading large files or tool outputs, carrying redundant context, or long exploratory loops without checkpoints.",
"provenance": {
"detected_at": null,
"evidence": {
"cost_impact": 10.667,
"cross_flavor": false,
"flavors": [
"claude"
],
"frequency": 3,
"key": "problem:budget_overrun:tokens",
"locus": "tokens",
"polarity": "problem",
"repos": [
"artifact-store",
"citation-evidence",
"infospace-bench"
],
"score": 32.001,
"sessions": [
"claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca",
"claude:6e0d3d68-872b-4d93-bb09-0691e091314b",
"claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78"
],
"signal_type": "budget_overrun",
"title": "problem: budget overrun"
},
"promoted_at": "2026-06-07T09:13:20Z",
"source_key": "problem:budget_overrun:tokens"
},
"rendering_hints": {
"claude": {
"target": "CLAUDE.md"
}
},
"resolutions": [
{
"detail": "Use offset/limit; don't re-Read a file already in the transcript.",
"steps": [
"Locate with grep/glob first",
"Read only the relevant span"
],
"summary": "Read narrowly \u2014 target the region you need, not whole large files"
},
{
"detail": "Summarize progress; avoid re-pulling outputs already shown.",
"steps": [],
"summary": "Checkpoint and prune context instead of re-fetching it"
},
{
"detail": "grep/glob narrows scope far cheaper than reading whole trees.",
"steps": [],
"summary": "Prefer targeted search over broad reads to locate code"
}
],
"schema_version": 1,
"scope": {
"domains": [],
"flavors": [
"claude"
],
"repos": [
"artifact-store",
"citation-evidence",
"infospace-bench"
]
},
"status": "approved",
"updated_at": "2026-06-07T14:21:06Z",
"version": "1.0.1"
}

View File

@@ -0,0 +1 @@
{"covers": [], "created_at": "2026-06-07T13:26:25Z", "distribution_ready": true, "id": "sp-problem-file_not_read-edit", "name": "Read before you Edit", "polarity": "problem", "problem": "Agents call Edit/Write on a file they have not read in the current session, or after it changed under them. The edit tools reject this ('File has not been read yet' / 'File has been modified since read'), and the retry burns a turn. Top recurring error in the corpus (12/27 sessions, 8 repos).", "provenance": {"detected_at": null, "evidence": {"frequency": 32, "origin": "AGENTIC-WP-0006 error mining / ASSESSMENT-infra-friction.md", "polarity": "problem", "repos": 8, "sessions": 12}, "promoted_at": null, "source_key": "problem:file_not_read:edit"}, "rendering_hints": {"claude": {"target": "CLAUDE.md"}, "codex": {"target": "AGENTS.md"}, "grok": {"target": ".grok/instructions.md"}}, "resolutions": [{"detail": "Never blind-write a file you haven't read this session.", "steps": ["Read the target file", "Then Edit/Write"], "summary": "Read the file (or the region you'll touch) before Edit/Write"}, {"detail": "A stale read means the file changed under you; refresh, don't loop.", "steps": ["Re-Read the file", "Re-apply the Edit"], "summary": "On 'modified since read', re-Read then re-Edit"}], "schema_version": 1, "scope": {"domains": [], "flavors": [], "repos": []}, "status": "superseded", "updated_at": "2026-06-07T13:26:25Z", "version": "1.0.0"}

View File

@@ -0,0 +1,63 @@
{
"covers": [
"file has not been read",
"modified since read",
"file_not_read"
],
"created_at": "2026-06-07T13:26:25Z",
"distribution_ready": true,
"id": "sp-problem-file_not_read-edit",
"name": "Read before you Edit",
"polarity": "problem",
"problem": "Agents call Edit/Write on a file they have not read in the current session, or after it changed under them. The edit tools reject this ('File has not been read yet' / 'File has been modified since read'), and the retry burns a turn. Top recurring error in the corpus (12/27 sessions, 8 repos).",
"provenance": {
"detected_at": null,
"evidence": {
"frequency": 32,
"origin": "AGENTIC-WP-0006 error mining / ASSESSMENT-infra-friction.md",
"polarity": "problem",
"repos": 8,
"sessions": 12
},
"promoted_at": null,
"source_key": "problem:file_not_read:edit"
},
"rendering_hints": {
"claude": {
"target": "CLAUDE.md"
},
"codex": {
"target": "AGENTS.md"
},
"grok": {
"target": ".grok/instructions.md"
}
},
"resolutions": [
{
"detail": "Never blind-write a file you haven't read this session.",
"steps": [
"Read the target file",
"Then Edit/Write"
],
"summary": "Read the file (or the region you'll touch) before Edit/Write"
},
{
"detail": "A stale read means the file changed under you; refresh, don't loop.",
"steps": [
"Re-Read the file",
"Re-apply the Edit"
],
"summary": "On 'modified since read', re-Read then re-Edit"
}
],
"schema_version": 1,
"scope": {
"domains": [],
"flavors": [],
"repos": []
},
"status": "approved",
"updated_at": "2026-06-07T19:06:45Z",
"version": "1.0.1"
}

View File

@@ -0,0 +1 @@
{"created_at": "2026-06-07T09:13:20Z", "distribution_ready": false, "id": "sp-problem-infra_overhead-infra_overhead", "name": "problem: infra overhead", "polarity": "problem", "problem": "problem: infra overhead", "provenance": {"detected_at": null, "evidence": {"cost_impact": 0.801, "cross_flavor": false, "flavors": ["claude"], "frequency": 2, "key": "problem:infra_overhead:infra_overhead", "locus": "infra_overhead", "polarity": "problem", "repos": ["markitect-main", "vergabe-teilnahme"], "score": 1.602, "sessions": ["claude:135002f9-98d2-4d1b-b8fb-543b20388782", "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74"], "signal_type": "infra_overhead", "title": "problem: infra overhead"}, "promoted_at": "2026-06-07T09:13:20Z", "source_key": "problem:infra_overhead:infra_overhead"}, "rendering_hints": {"claude": {"note": "TODO: refine rendering", "target": "CLAUDE.md"}}, "resolutions": [{"detail": "", "steps": [], "summary": "TODO: capture the recommended resolution"}], "schema_version": 1, "scope": {"domains": [], "flavors": ["claude"], "repos": ["markitect-main", "vergabe-teilnahme"]}, "status": "superseded", "updated_at": "2026-06-07T09:13:20Z", "version": "1.0.0"}

View File

@@ -0,0 +1,74 @@
{
"created_at": "2026-06-07T09:13:20Z",
"distribution_ready": false,
"id": "sp-problem-infra_overhead-infra_overhead",
"name": "Infrastructure overhead: too much coordination plumbing",
"polarity": "problem",
"problem": "A large share of the session's tool calls are State Hub / task-management / schema-loading plumbing rather than touching the repo (corpus median 11.7%, up to 43% in the worst sessions; one session made 231 hub calls).",
"provenance": {
"detected_at": null,
"evidence": {
"cost_impact": 0.801,
"cross_flavor": false,
"flavors": [
"claude"
],
"frequency": 2,
"key": "problem:infra_overhead:infra_overhead",
"locus": "infra_overhead",
"polarity": "problem",
"repos": [
"markitect-main",
"vergabe-teilnahme"
],
"score": 1.602,
"sessions": [
"claude:135002f9-98d2-4d1b-b8fb-543b20388782",
"claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74"
],
"signal_type": "infra_overhead",
"title": "problem: infra overhead"
},
"promoted_at": "2026-06-07T09:13:20Z",
"source_key": "problem:infra_overhead:infra_overhead"
},
"rendering_hints": {
"claude": {
"target": "CLAUDE.md"
}
},
"resolutions": [
{
"detail": "Update several task statuses together; emit fewer, coarser progress events.",
"steps": [
"Do a chunk of work",
"Then sync statuses in one pass"
],
"summary": "Batch hub writes \u2014 sync at checkpoints, not per event"
},
{
"detail": "One scoped summary at session start beats many broad reads.",
"steps": [],
"summary": "Orient once with get_domain_summary, don't re-query repeatedly"
},
{
"detail": "See STATE-WP-0058 \u2014 stops the repeated ToolSearch for hub tools.",
"steps": [],
"summary": "Front-load hub tool knowledge via the State Hub skill"
}
],
"schema_version": 1,
"scope": {
"domains": [],
"flavors": [
"claude"
],
"repos": [
"markitect-main",
"vergabe-teilnahme"
]
},
"status": "provisional",
"updated_at": "2026-06-07T14:21:06Z",
"version": "1.0.1"
}

View File

@@ -0,0 +1 @@
{"created_at": "2026-06-07T09:13:20Z", "distribution_ready": true, "id": "sp-problem-schema_thrash-schema_load", "name": "problem: schema thrash", "polarity": "problem", "problem": "problem: schema thrash", "provenance": {"detected_at": null, "evidence": {"cost_impact": 79.0, "cross_flavor": false, "flavors": ["claude"], "frequency": 8, "key": "problem:schema_thrash:schema_load", "locus": "schema_load", "polarity": "problem", "repos": ["activity-core", "citation-evidence", "flex-auth", "infospace-bench", "ops-bridge", "vergabe-teilnahme"], "score": 632.0, "sessions": ["claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca", "claude:30dbad62-c042-41f2-80c1-5953a1100e7f", "claude:4340b160-2fb6-47d0-897c-3cac0a8855d8", "claude:63fd4df2-5add-4748-af21-c1544825e006", "claude:8313f946-f008-4e98-9915-31950380e39e", "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78", "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74", "claude:bbcf1c2b-14be-40e4-826b-4b2b49b9d212"], "signal_type": "schema_thrash", "title": "problem: schema thrash"}, "promoted_at": "2026-06-07T09:13:20Z", "source_key": "problem:schema_thrash:schema_load"}, "rendering_hints": {"claude": {"note": "TODO: refine rendering", "target": "CLAUDE.md"}}, "resolutions": [{"detail": "", "steps": [], "summary": "TODO: capture the recommended resolution"}], "schema_version": 1, "scope": {"domains": [], "flavors": ["claude"], "repos": ["activity-core", "citation-evidence", "flex-auth", "infospace-bench", "ops-bridge", "vergabe-teilnahme"]}, "status": "superseded", "updated_at": "2026-06-07T09:13:20Z", "version": "1.0.0"}

View File

@@ -0,0 +1,83 @@
{
"created_at": "2026-06-07T09:13:20Z",
"distribution_ready": true,
"id": "sp-problem-schema_thrash-schema_load",
"name": "Schema thrash: repeated ToolSearch",
"polarity": "problem",
"problem": "ToolSearch fires repeatedly within a session (seen in 81% of sessions) because the State Hub MCP tools are deferred and their schemas get re-loaded each time they are needed \u2014 pure overhead with no work value.",
"provenance": {
"detected_at": null,
"evidence": {
"cost_impact": 79.0,
"cross_flavor": false,
"flavors": [
"claude"
],
"frequency": 8,
"key": "problem:schema_thrash:schema_load",
"locus": "schema_load",
"polarity": "problem",
"repos": [
"activity-core",
"citation-evidence",
"flex-auth",
"infospace-bench",
"ops-bridge",
"vergabe-teilnahme"
],
"score": 632.0,
"sessions": [
"claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca",
"claude:30dbad62-c042-41f2-80c1-5953a1100e7f",
"claude:4340b160-2fb6-47d0-897c-3cac0a8855d8",
"claude:63fd4df2-5add-4748-af21-c1544825e006",
"claude:8313f946-f008-4e98-9915-31950380e39e",
"claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78",
"claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74",
"claude:bbcf1c2b-14be-40e4-826b-4b2b49b9d212"
],
"signal_type": "schema_thrash",
"title": "problem: schema thrash"
},
"promoted_at": "2026-06-07T09:13:20Z",
"source_key": "problem:schema_thrash:schema_load"
},
"rendering_hints": {
"claude": {
"target": "CLAUDE.md"
}
},
"resolutions": [
{
"detail": "Resolve them by name in one ToolSearch (select:...) rather than searching ad hoc.",
"steps": [
"List the hub tools the session needs",
"Load them once at the start"
],
"summary": "Load the tool schemas you'll need once, up front"
},
{
"detail": "The skill carries the schemas so no per-use discovery is needed.",
"steps": [],
"summary": "Adopt the State Hub skill that front-loads common hub tool signatures"
}
],
"schema_version": 1,
"scope": {
"domains": [],
"flavors": [
"claude"
],
"repos": [
"activity-core",
"citation-evidence",
"flex-auth",
"infospace-bench",
"ops-bridge",
"vergabe-teilnahme"
]
},
"status": "approved",
"updated_at": "2026-06-07T14:21:06Z",
"version": "1.0.1"
}

View File

@@ -0,0 +1 @@
{"created_at": "2026-06-07T09:13:20Z", "distribution_ready": true, "id": "sp-problem-tool_thrash-tool-bash", "name": "problem: tool thrash", "polarity": "problem", "problem": "problem: tool thrash", "provenance": {"detected_at": null, "evidence": {"cost_impact": 1990.0, "cross_flavor": false, "flavors": ["claude"], "frequency": 11, "key": "problem:tool_thrash:tool:Bash", "locus": "tool:Bash", "polarity": "problem", "repos": ["activity-core", "artifact-store", "citation-evidence", "ihp-railiance-probe", "infospace-bench", "railiance-apps", "state-hub", "vergabe-teilnahme"], "score": 21890.0, "sessions": ["claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca", "claude:2c0d14e1-d089-4076-bf35-b134737a261d", "claude:30dbad62-c042-41f2-80c1-5953a1100e7f", "claude:4307eff6-cd39-4189-be58-79a3acb69d6c", "claude:4340b160-2fb6-47d0-897c-3cac0a8855d8", "claude:6e0d3d68-872b-4d93-bb09-0691e091314b", "claude:8313f946-f008-4e98-9915-31950380e39e", "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78", "claude:a9483f07-c9dc-4f71-9fa0-831790ea965e", "claude:b1dfbcfa-91f9-4540-823a-26fcfaab7fc8", "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74"], "signal_type": "tool_thrash", "title": "problem: tool thrash"}, "promoted_at": "2026-06-07T09:13:20Z", "source_key": "problem:tool_thrash:tool:Bash"}, "rendering_hints": {"claude": {"note": "TODO: refine rendering", "target": "CLAUDE.md"}}, "resolutions": [{"detail": "", "steps": [], "summary": "TODO: capture the recommended resolution"}], "schema_version": 1, "scope": {"domains": [], "flavors": ["claude"], "repos": ["activity-core", "artifact-store", "citation-evidence", "ihp-railiance-probe", "infospace-bench", "railiance-apps", "state-hub", "vergabe-teilnahme"]}, "status": "superseded", "updated_at": "2026-06-07T09:13:20Z", "version": "1.0.0"}

View File

@@ -0,0 +1,95 @@
{
"created_at": "2026-06-07T09:13:20Z",
"distribution_ready": true,
"id": "sp-problem-tool_thrash-tool-bash",
"name": "Tool thrash: one tool hammered",
"polarity": "problem",
"problem": "A single tool (often Bash or Edit) is invoked far more than any other in a session \u2014 a sign of trial-and-error churn or missing higher-level tooling.",
"provenance": {
"detected_at": null,
"evidence": {
"cost_impact": 1990.0,
"cross_flavor": false,
"flavors": [
"claude"
],
"frequency": 11,
"key": "problem:tool_thrash:tool:Bash",
"locus": "tool:Bash",
"polarity": "problem",
"repos": [
"activity-core",
"artifact-store",
"citation-evidence",
"ihp-railiance-probe",
"infospace-bench",
"railiance-apps",
"state-hub",
"vergabe-teilnahme"
],
"score": 21890.0,
"sessions": [
"claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca",
"claude:2c0d14e1-d089-4076-bf35-b134737a261d",
"claude:30dbad62-c042-41f2-80c1-5953a1100e7f",
"claude:4307eff6-cd39-4189-be58-79a3acb69d6c",
"claude:4340b160-2fb6-47d0-897c-3cac0a8855d8",
"claude:6e0d3d68-872b-4d93-bb09-0691e091314b",
"claude:8313f946-f008-4e98-9915-31950380e39e",
"claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78",
"claude:a9483f07-c9dc-4f71-9fa0-831790ea965e",
"claude:b1dfbcfa-91f9-4540-823a-26fcfaab7fc8",
"claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74"
],
"signal_type": "tool_thrash",
"title": "problem: tool thrash"
},
"promoted_at": "2026-06-07T09:13:20Z",
"source_key": "problem:tool_thrash:tool:Bash"
},
"rendering_hints": {
"claude": {
"target": "CLAUDE.md"
}
},
"resolutions": [
{
"detail": "Compose a single command/script; run independent calls in parallel.",
"steps": [
"Group the steps",
"Run them as one block"
],
"summary": "Batch related shell work into one script, not many small Bash calls"
},
{
"detail": "Read the region, then one substantive Edit beats many tiny ones.",
"steps": [],
"summary": "Make fewer, larger edits with full context"
},
{
"detail": "If the same invocation recurs, wrap it once.",
"steps": [],
"summary": "Factor a repeated command pattern into a helper"
}
],
"schema_version": 1,
"scope": {
"domains": [],
"flavors": [
"claude"
],
"repos": [
"activity-core",
"artifact-store",
"citation-evidence",
"ihp-railiance-probe",
"infospace-bench",
"railiance-apps",
"state-hub",
"vergabe-teilnahme"
]
},
"status": "approved",
"updated_at": "2026-06-07T14:21:06Z",
"version": "1.0.1"
}

View File

@@ -0,0 +1 @@
{"created_at": "2026-06-07T09:13:20Z", "distribution_ready": true, "id": "sp-success-clean_pass-outcome", "name": "cross-flavor success: clean pass", "polarity": "success", "problem": "cross-flavor success: clean pass", "provenance": {"detected_at": null, "evidence": {"cost_impact": 17.0, "cross_flavor": true, "flavors": ["claude", "grok"], "frequency": 17, "key": "success:clean_pass:outcome", "locus": "outcome", "polarity": "success", "repos": ["activity-core", "agentic-resources", "artifact-store", "can-you-assist", "citation-evidence", "infospace-bench", "issue-facade", "ops-bridge", "railiance-apps", "state-hub", "the-custodian", "vergabe-teilnahme"], "score": 433.5, "sessions": ["claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca", "claude:16bdbec4-b018-4902-9fb5-336f8f3d61c8", "claude:2c0d14e1-d089-4076-bf35-b134737a261d", "claude:30dbad62-c042-41f2-80c1-5953a1100e7f", "claude:4307eff6-cd39-4189-be58-79a3acb69d6c", "claude:4340b160-2fb6-47d0-897c-3cac0a8855d8", "claude:631de76e-fdee-43b5-b091-7b7675467ad1", "claude:63fd4df2-5add-4748-af21-c1544825e006", "claude:6e0d3d68-872b-4d93-bb09-0691e091314b", "claude:8313f946-f008-4e98-9915-31950380e39e", "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78", "claude:a9483f07-c9dc-4f71-9fa0-831790ea965e", "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74", "claude:eb837dd1-5b8e-472e-b9e1-4537b10e03e6", "claude:ee9e84f2-bc35-4eb5-a7ad-aaec5f31d965", "claude:f1b25697-0e5f-45f0-81d1-af0f1762c438", "grok:019e6122-00c0-79f3-b4e5-9c70b77c015d"], "signal_type": "clean_pass", "title": "cross-flavor success: clean pass"}, "promoted_at": "2026-06-07T09:13:20Z", "source_key": "success:clean_pass:outcome"}, "rendering_hints": {"claude": {"note": "TODO: refine rendering", "target": "CLAUDE.md"}, "grok": {"note": "TODO: refine rendering", "target": "instructions"}}, "resolutions": [{"detail": "", "steps": [], "summary": "TODO: capture the recommended resolution"}], "schema_version": 1, "scope": {"domains": [], "flavors": ["claude", "grok"], "repos": ["activity-core", "agentic-resources", "artifact-store", "can-you-assist", "citation-evidence", "infospace-bench", "issue-facade", "ops-bridge", "railiance-apps", "state-hub", "the-custodian", "vergabe-teilnahme"]}, "status": "superseded", "updated_at": "2026-06-07T09:13:20Z", "version": "1.0.0"}

View File

@@ -0,0 +1,110 @@
{
"created_at": "2026-06-07T09:13:20Z",
"distribution_ready": true,
"id": "sp-success-clean_pass-outcome",
"name": "Clean pass: tests green, no retries",
"polarity": "success",
"problem": "The target session shape: ends in success, runs the test suite, with no errors and no retries \u2014 resolves cheaply and reliably. Seen across many sessions and both Claude and Grok (the highest-value pattern to reinforce).",
"provenance": {
"detected_at": null,
"evidence": {
"cost_impact": 17.0,
"cross_flavor": true,
"flavors": [
"claude",
"grok"
],
"frequency": 17,
"key": "success:clean_pass:outcome",
"locus": "outcome",
"polarity": "success",
"repos": [
"activity-core",
"agentic-resources",
"artifact-store",
"can-you-assist",
"citation-evidence",
"infospace-bench",
"issue-facade",
"ops-bridge",
"railiance-apps",
"state-hub",
"the-custodian",
"vergabe-teilnahme"
],
"score": 433.5,
"sessions": [
"claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca",
"claude:16bdbec4-b018-4902-9fb5-336f8f3d61c8",
"claude:2c0d14e1-d089-4076-bf35-b134737a261d",
"claude:30dbad62-c042-41f2-80c1-5953a1100e7f",
"claude:4307eff6-cd39-4189-be58-79a3acb69d6c",
"claude:4340b160-2fb6-47d0-897c-3cac0a8855d8",
"claude:631de76e-fdee-43b5-b091-7b7675467ad1",
"claude:63fd4df2-5add-4748-af21-c1544825e006",
"claude:6e0d3d68-872b-4d93-bb09-0691e091314b",
"claude:8313f946-f008-4e98-9915-31950380e39e",
"claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78",
"claude:a9483f07-c9dc-4f71-9fa0-831790ea965e",
"claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74",
"claude:eb837dd1-5b8e-472e-b9e1-4537b10e03e6",
"claude:ee9e84f2-bc35-4eb5-a7ad-aaec5f31d965",
"claude:f1b25697-0e5f-45f0-81d1-af0f1762c438",
"grok:019e6122-00c0-79f3-b4e5-9c70b77c015d"
],
"signal_type": "clean_pass",
"title": "cross-flavor success: clean pass"
},
"promoted_at": "2026-06-07T09:13:20Z",
"source_key": "success:clean_pass:outcome"
},
"rendering_hints": {
"claude": {
"target": "CLAUDE.md"
},
"grok": {
"target": "instructions"
}
},
"resolutions": [
{
"detail": "A passing suite is the cheapest proof the change works.",
"steps": [
"Make the change",
"Run the suite",
"Only then report done"
],
"summary": "Run the test suite before declaring done; let green gate completion"
},
{
"detail": "Small verified steps beat large unverified ones that bounce.",
"steps": [],
"summary": "Work incrementally and verify as you go to avoid retries"
}
],
"schema_version": 1,
"scope": {
"domains": [],
"flavors": [
"claude",
"grok"
],
"repos": [
"activity-core",
"agentic-resources",
"artifact-store",
"can-you-assist",
"citation-evidence",
"infospace-bench",
"issue-facade",
"ops-bridge",
"railiance-apps",
"state-hub",
"the-custodian",
"vergabe-teilnahme"
]
},
"status": "approved",
"updated_at": "2026-06-07T14:21:06Z",
"version": "1.0.1"
}

View File

@@ -0,0 +1,83 @@
# Coding Session Memory — configuration (design §5.1, §8).
# Paths support ~ expansion. Edit caps to taste; see docs/DESIGN-session-memory.md.
[store]
# Local store lives under the repo by default (gitignored).
db_path = "session_memory/.store/mem.db"
blob_dir = "session_memory/.store/blobs"
cursor = "session_memory/.store/cursors.json"
[retention]
raw_soft_cap_bytes = 4294967296 # 4 GiB — begin evicting analyzed sessions above this
raw_hard_cap_bytes = 6442450944 # 6 GiB — absolute Tier 1 ceiling
raw_max_age_days = 45 # backstop: analyzed raw older than this is evictable
distilled_cap_bytes = 1073741824 # 1 GiB — Tier 2 ceiling (alert, never auto-drop)
cadence = "daily" # sweep trigger: daily | weekly | on-hook
[sources.claude]
enabled = true
root = "~/.claude/projects"
# glob, relative to root; covers sessions and agent-* sidechains
glob = "*/*.jsonl"
# Codex / Grok adapters added in Phase 1 (AGENTIC-WP-0003).
[sources.codex]
enabled = true
root = "~/.codex/sessions"
glob = "*/*/*/rollout-*.jsonl"
[sources.grok]
enabled = true
root = "~/.grok/sessions"
glob = "*/*/chat_history.jsonl"
# Detect phase (AGENTIC-WP-0005): quality filter — drop non-coding/trivial sessions
# before signals form, so health-checks don't mint false-positive patterns.
[detect.quality]
min_events = 20 # below this many events, not a real coding session
min_substantive = 3 # require >= this many substantive (edit/read/shell) tool calls
min_prompt_len = 25 # first prompt shorter than this is treated as trivial
# Curate phase (AGENTIC-WP-0004): catalog location + promotion evidence bar.
# Measure phase (AGENTIC-WP-0009): persisted baseline/trend of fleet metrics.
[measure]
baselines = "session_memory/measure/baselines.jsonl" # timestamped metric snapshots (committed)
# Weekly retro (AGENTIC-WP-0010): windowed top-3-per-repo report, published to the
# hub as the coding_retro read model that activity-core's weekly schedule consumes.
[retro]
window_days = 7
report_json = "session_memory/retro/last_retro.json" # latest report (committed)
report_md = "session_memory/retro/last_retro.md" # human-readable mirror
hub_url = "http://127.0.0.1:8000" # for --publish (best-effort)
# Distribute phase (AGENTIC-WP-0007): where per-flavor proposals + the active
# registry are written. Proposals are HITL — reviewed, never auto-applied.
[distribute]
proposals_dir = "session_memory/proposals" # reviewable proposals (gitignored, regenerated)
active_registry = "session_memory/distribute/active_patterns.json" # what's proposed/active where (committed)
[curate]
catalog_dir = "session_memory/catalog" # files-first Pattern Catalog (committed)
review_log = "session_memory/.store/reviews.jsonl" # remembered decisions (gitignored)
decision_queue = "session_memory/.store/decisions.queue.jsonl" # hub decisions pending sync
state_hub_workstream_id = "b3703684-f60e-42f3-b03e-dabe3e8ce3f4" # AGENTIC-WP-0004
# Evidence bar (OQ5): floors to promote at all, and stricter floors to be
# distribution-eligible (status=approved, distribution_ready=true).
[curate.gate]
min_frequency = 2 # >= this many supporting signals to promote
min_sessions = 2 # >= this many distinct sessions
min_cost_impact = 0.0
dist_require_cross_flavor = false # require cross-flavor evidence to distribute
dist_min_frequency = 3
dist_min_cost_impact = 0.0
# cwd basename -> domain slug. Used to tag sessions with their Custodian domain.
[repo_domain_map]
agentic-resources = "helix_forge"
the-custodian = "custodian"
state-hub = "custodian"
ops-bridge = "custodian"
net-kingdom = "netkingdom"
can-you-assist = "coulomb_social"

View File

@@ -0,0 +1 @@
"""Flavor-agnostic core: schema, store, cursor, digest, retention."""

View File

@@ -0,0 +1,49 @@
"""Per-source ingest cursors (design §6; T06).
Tracks ``(path -> size, mtime)`` so sweeps re-ingest only changed/grown files.
Persisted as a small JSON sidecar. Ingest itself is idempotent on
``(session_uid, seq)`` in the store, so the cursor is an optimization, not a
correctness requirement — a lost cursor just means a full (still-idempotent)
re-scan.
"""
from __future__ import annotations
import json
import os
from typing import Optional
class Cursors:
def __init__(self, path: str):
self.path = path
self._data: dict[str, dict] = {}
if os.path.exists(path):
try:
with open(path, "r", encoding="utf-8") as f:
self._data = json.load(f)
except (OSError, ValueError):
self._data = {}
def is_changed(self, file_path: str) -> bool:
"""True if the file is new or has changed size/mtime since last seen."""
try:
stat = os.stat(file_path)
except OSError:
return False
prev = self._data.get(file_path)
return prev is None or prev.get("size") != stat.st_size or prev.get("mtime") != stat.st_mtime
def mark(self, file_path: str) -> None:
try:
stat = os.stat(file_path)
except OSError:
return
self._data[file_path] = {"size": stat.st_size, "mtime": stat.st_mtime}
def save(self) -> None:
os.makedirs(os.path.dirname(self.path) or ".", exist_ok=True)
tmp = self.path + ".tmp"
with open(tmp, "w", encoding="utf-8") as f:
json.dump(self._data, f)
os.replace(tmp, self.path)

View File

@@ -0,0 +1,286 @@
"""Session digest — Tier 1 -> Tier 2 promotion (design §3, §4; T04).
Compresses a session's events into a small, durable digest: outcome heuristic,
cost totals, tool histogram, and counts of error/retry/test/edit/human markers,
plus a few key snippets. Writing the digest sets ``analyzed_at``, which is what
makes a session evictable under budget-based retention (design §5).
Signal extraction beyond this digest is intentionally out of scope here — it
belongs to the Detect phase (PRD §6.2).
"""
from __future__ import annotations
import collections
import json
import re
from typing import Any
from .schema import Session, SessionEvent
# Substrings in tool_result bodies / summaries that suggest a failure.
_FAIL_HINTS = ("error", "failed", "exception", "traceback", "fatal", "non-zero")
# Substrings suggesting a clean test pass.
_PASS_HINTS = ("passed", "0 failed", "ok", "success")
# A line that is numbered source content from a Read result (`cat -n` style),
# e.g. "229\t raise InfospaceError(" — code text, never a runtime error.
_NUMBERED_LINE_RE = re.compile(r"^\s*\d+\t")
# Top-level keys that mark a JSON tool-result as an actual error (vs. success).
_JSON_ERROR_KEYS = ("error", "errors", "detail")
# Normalization patterns so the same error collapses to one fingerprint
# regardless of paths / ids / counts (WP-0006 T01).
_UUID_RE = re.compile(r"\b[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}\b", re.I)
_HEXADDR_RE = re.compile(r"\b0x[0-9a-f]+\b", re.I)
_PATH_RE = re.compile(r"(?:/[\w.\-]+)+/?|[A-Za-z]:\\[\w.\\\-]+")
_NUM_RE = re.compile(r"\b\d+\b")
_WS_RE = re.compile(r"\s+")
_ERR_SAMPLE_MAX = 200
_ERR_FP_MAX = 160
def infer_outcome(events: list[SessionEvent], blobs: dict[str, str] | None = None) -> str:
"""Heuristic outcome label across flavors (design OQ2).
- ``abandoned`` if the session has no assistant output at all.
- ``fail`` if the last substantive signal is an error / failing test.
- ``success`` if it ends on assistant output or a passing test.
- ``unknown`` otherwise.
"""
blobs = blobs or {}
assistant = [e for e in events if e.kind == "assistant_msg"]
if not assistant:
return "abandoned"
# Look at error and test signals; weight the latest ones.
last_fail = _last_index(events, lambda e: e.kind == "error")
last_test = _last_index(events, lambda e: e.kind == "test_run")
last_completion = _last_index(events, lambda e: e.kind in ("completion", "assistant_msg"))
test_passed = None
if last_test is not None:
# inspect the nearest following tool_result body for pass/fail hints
body = _nearby_result_body(events, last_test, blobs)
if body:
low = body.lower()
if any(h in low for h in _FAIL_HINTS):
test_passed = False
elif any(h in low for h in _PASS_HINTS):
test_passed = True
if test_passed is False and (last_test or 0) >= (last_completion or 0):
return "fail"
if last_fail is not None and last_completion is not None and last_fail > last_completion:
return "fail"
if test_passed is True:
return "success"
if last_completion is not None:
return "success"
return "unknown"
def build_digest(session: Session, events: list[SessionEvent],
blobs: dict[str, str] | None = None) -> dict[str, Any]:
"""Produce the compact Tier 2 digest dict for a session."""
blobs = blobs or {}
kind_counts = collections.Counter(e.kind for e in events)
tool_hist = collections.Counter(e.tool for e in events if e.tool)
retries = kind_counts.get("retry", 0)
outcome = infer_outcome(events, blobs)
return {
"session_uid": session.session_uid,
"flavor": session.flavor,
"repo": session.repo,
"domain": session.domain,
"model": session.model,
"started_at": session.started_at,
"ended_at": session.ended_at,
"outcome": outcome,
"cost": {
"input_tokens": session.cost.input_tokens,
"output_tokens": session.cost.output_tokens,
"cache_tokens": session.cost.cache_tokens,
"wall_clock_s": session.cost.wall_clock_s,
"turns": session.cost.turns,
"retries": retries,
},
"event_count": len(events),
"kind_counts": dict(kind_counts),
"tool_histogram": dict(tool_hist),
"markers": {
"errors": kind_counts.get("error", 0),
"retries": retries,
"test_runs": kind_counts.get("test_run", 0),
"edits": kind_counts.get("edit", 0),
"human_interventions": kind_counts.get("human_intervention", 0),
},
"first_prompt": _first_prompt(events, blobs),
"last_assistant": _last_assistant(events, blobs),
"error_snippets": _error_snippets(events, blobs),
"schema_version": session.schema_version,
}
def analyze(store, session_uid: str) -> dict[str, Any]:
"""Read a session from the store, write its digest, return the digest."""
session = store.get_session(session_uid)
if session is None:
raise KeyError(session_uid)
events = store.get_events(session_uid)
blobs = {e.payload_ref: _read_blob(store, e.payload_ref)
for e in events if e.payload_ref}
digest = build_digest(session, events, blobs)
store.write_digest(session_uid, digest)
return digest
# ---- helpers ---------------------------------------------------------------
def _last_index(events, pred):
idx = None
for i, e in enumerate(events):
if pred(e):
idx = i
return idx
def _nearby_result_body(events, idx, blobs):
for e in events[idx + 1: idx + 4]:
if e.kind == "tool_result" and e.payload_ref in blobs:
return blobs[e.payload_ref]
return None
def _first_prompt(events, blobs):
for e in events:
if e.kind == "user_msg":
return (blobs.get(e.payload_ref) or e.summary or "")[:280]
return None
def _last_assistant(events, blobs):
for e in reversed(events):
if e.kind == "assistant_msg":
return (blobs.get(e.payload_ref) or e.summary or "")[:280]
return None
def _error_line(text: str) -> str:
"""Pick the most error-like line from a body.
Prefers the *last* line matching a fail hint — in a Python traceback the
actual exception is the final line, while the bare ``Traceback (most recent
call last):`` header is just noise and is skipped.
"""
lines = [ln.strip() for ln in text.splitlines() if ln.strip()]
matches = [ln for ln in lines
if any(h in ln.lower() for h in _FAIL_HINTS)
and not ln.lower().startswith("traceback")]
if matches:
return matches[-1]
# fall back to any fail-hint line (e.g. only the traceback header), else first
any_hint = [ln for ln in lines if any(h in ln.lower() for h in _FAIL_HINTS)]
return any_hint[-1] if any_hint else (lines[0] if lines else "")
def _error_fingerprint(text: str) -> str:
"""Stable, content-addressable key for an error, paths/ids/numbers removed."""
s = _error_line(text).lower()
s = _UUID_RE.sub("<uuid>", s)
s = _HEXADDR_RE.sub("<addr>", s)
s = _PATH_RE.sub("<path>", s)
s = _NUM_RE.sub("<n>", s)
return _WS_RE.sub(" ", s).strip()[:_ERR_FP_MAX]
def _error_body(event: SessionEvent, blobs: dict) -> str:
"""Best available text for a failed event."""
if event.payload_ref and event.payload_ref in blobs:
return blobs[event.payload_ref]
return event.summary or ""
def _looks_like_file_read(body: str) -> bool:
"""True if the body is mostly numbered source lines (a Read result), not an error."""
lines = [ln for ln in body.splitlines() if ln.strip()]
if not lines:
return False
numbered = sum(1 for ln in lines if _NUMBERED_LINE_RE.match(ln))
return numbered >= max(3, len(lines) // 2)
def _json_verdict(body: str):
"""Classify a JSON tool-result body: 'error', 'success', or None (not JSON).
Hub MCP successes look like ``{"result": "..."}`` and mention 'error' deep
inside summaries but are not failures ('success'). A payload with a top-level
error key (``{"detail": ...}`` / ``{"error": ...}``) is 'error'. Non-JSON text
returns None so the plain fail-hint heuristic still applies.
"""
s = body.strip()
if not s or s[0] not in "{[":
return None
try:
obj = json.loads(s)
except (ValueError, TypeError):
return None
if isinstance(obj, dict) and any(k in obj for k in _JSON_ERROR_KEYS):
return "error"
return "success"
def _is_failed(event: SessionEvent, blobs: dict) -> bool:
if event.kind == "error":
return True
if event.kind == "tool_result":
body = _error_body(event, blobs)
if not body.strip():
return False
if _looks_like_file_read(body):
return False
verdict = _json_verdict(body)
if verdict is not None:
return verdict == "error"
return any(h in body.lower() for h in _FAIL_HINTS)
return False
def _error_snippets(events: list[SessionEvent], blobs: dict) -> list[dict]:
"""Collapse a session's failures into deduped, normalized error fingerprints.
Durable in Tier 2 (the raw blobs may be evicted): each entry is
``{fingerprint, sample, count, tool}`` with same-fingerprint occurrences
counted. Ordered by frequency (then first appearance) for stable output.
"""
agg: dict[str, dict] = {}
order: list[str] = []
for e in events:
if not _is_failed(e, blobs):
continue
body = _error_body(e, blobs)
if not body.strip():
continue
fp = _error_fingerprint(body)
if not fp:
continue
if fp not in agg:
agg[fp] = {"fingerprint": fp, "sample": _error_line(body)[:_ERR_SAMPLE_MAX],
"count": 0, "tool": e.tool}
order.append(fp)
agg[fp]["count"] += 1
snippets = [agg[fp] for fp in order]
snippets.sort(key=lambda s: (-s["count"], order.index(s["fingerprint"])))
return snippets
def _read_blob(store, ref):
row = store.db.execute("SELECT path FROM blobs WHERE ref=?", (ref,)).fetchone()
if not row:
return ""
try:
with open(row["path"], "r", encoding="utf-8") as f:
return f.read()
except OSError:
return ""

View File

@@ -0,0 +1,144 @@
"""Budget-based retention sweep (design §5; T05).
Eviction is tied to the two conditions the design names — a session is dropped
from Tier 1 once it has been *analyzed* (its digest is in Tier 2) **and** space is
needed, with a max-age backstop. The invariant: raw bytes are never dropped
before the Tier 2 digest exists, except the explicitly-reported hard-cap overflow
path.
Order of passes per sweep:
1. backstop — evict analyzed sessions older than ``raw_max_age_days``
2. budget — while over ``raw_soft_cap_bytes``, evict oldest-analyzed first
3. overflow — if still over ``raw_hard_cap_bytes`` and only un-analyzed bulk
remains: analyze-now, retry budget; last resort evict oldest
un-analyzed and emit a reported ``data_loss`` event.
"""
from __future__ import annotations
from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Callable, Optional
from .schema import Session
@dataclass
class RetentionConfig:
raw_soft_cap_bytes: int = 4 * 1024**3 # 4 GiB
raw_hard_cap_bytes: int = 6 * 1024**3 # 6 GiB
raw_max_age_days: int = 45
distilled_cap_bytes: int = 1 * 1024**3 # 1 GiB (alert only, never auto-drop)
@dataclass
class EvictionReport:
backstop_evicted: list[str] = field(default_factory=list)
budget_evicted: list[str] = field(default_factory=list)
overflow_analyzed: list[str] = field(default_factory=list)
overflow_data_loss: list[str] = field(default_factory=list)
bytes_freed: int = 0
final_usage_bytes: int = 0
over_hard_cap: bool = False
tier2_over_cap: bool = False
warnings: list[str] = field(default_factory=list)
@property
def lost_data(self) -> bool:
return bool(self.overflow_data_loss)
def _parse_ts(ts: Optional[str]) -> Optional[datetime]:
if not ts:
return None
try:
return datetime.fromisoformat(ts.replace("Z", "+00:00"))
except ValueError:
return None
def _age_days(s: Session, now: datetime) -> Optional[float]:
ref = _parse_ts(s.ended_at) or _parse_ts(s.started_at) or _parse_ts(s.ingested_at)
if ref is None:
return None
if ref.tzinfo is None:
ref = ref.replace(tzinfo=timezone.utc)
return (now - ref).total_seconds() / 86400.0
def _sort_key(s: Session) -> str:
# oldest-analyzed-first; fall back through timestamps
return s.analyzed_at or s.ended_at or s.ingested_at or ""
def sweep(store, config: RetentionConfig, *,
analyze_fn: Optional[Callable[[object, str], object]] = None,
now: Optional[datetime] = None) -> EvictionReport:
"""Run one retention sweep against ``store``. Returns an EvictionReport.
``analyze_fn(store, session_uid)`` is used by the overflow path to make
un-analyzed sessions evictable; pass ``digest.analyze``.
"""
now = now or datetime.now(timezone.utc)
report = EvictionReport()
def live_sessions() -> list[Session]:
return [s for s in store.list_sessions() if s.evicted_at is None]
# 1. backstop pass — analyzed + older than max age
for s in sorted(live_sessions(), key=_sort_key):
age = _age_days(s, now)
if s.is_evictable and age is not None and age > config.raw_max_age_days:
report.bytes_freed += store.evict_raw(s.session_uid)
report.backstop_evicted.append(s.session_uid)
# 2. budget pass — evict oldest analyzed while over soft cap
while store.tier1_usage_bytes() > config.raw_soft_cap_bytes:
candidates = [s for s in live_sessions() if s.is_evictable]
if not candidates:
break # will not destroy un-analyzed data for space
victim = min(candidates, key=_sort_key)
report.bytes_freed += store.evict_raw(victim.session_uid)
report.budget_evicted.append(victim.session_uid)
# 3. overflow path — only if still over HARD cap with un-analyzed bulk left
if store.tier1_usage_bytes() > config.raw_hard_cap_bytes:
# 3a. try to analyze now so those sessions become evictable
if analyze_fn is not None:
for s in sorted(live_sessions(), key=_sort_key):
if not s.is_evictable:
try:
analyze_fn(store, s.session_uid)
report.overflow_analyzed.append(s.session_uid)
except Exception as e: # analysis may fail; keep going
report.warnings.append(f"analyze failed for {s.session_uid}: {e}")
# retry budget pass on the freshly-analyzed sessions
while store.tier1_usage_bytes() > config.raw_soft_cap_bytes:
candidates = [s for s in live_sessions() if s.is_evictable]
if not candidates:
break
victim = min(candidates, key=_sort_key)
report.bytes_freed += store.evict_raw(victim.session_uid)
report.budget_evicted.append(victim.session_uid)
# 3b. last resort — evict oldest un-analyzed, REPORTED as data loss
while store.tier1_usage_bytes() > config.raw_hard_cap_bytes:
remaining = [s for s in live_sessions() if not s.is_evictable]
if not remaining:
break
victim = min(remaining, key=_sort_key)
report.bytes_freed += store.evict_raw(victim.session_uid)
report.overflow_data_loss.append(victim.session_uid)
report.warnings.append(
f"data_loss: evicted un-analyzed {victim.session_uid} to stay under hard cap"
)
usage = store.tier1_usage_bytes()
report.final_usage_bytes = usage
report.over_hard_cap = usage > config.raw_hard_cap_bytes
report.tier2_over_cap = store.tier2_usage_bytes() > config.distilled_cap_bytes
if report.tier2_over_cap:
report.warnings.append(
"tier2 distilled store over cap — flag for curation review (do not auto-drop)"
)
return report

View File

@@ -0,0 +1,156 @@
"""Normalized session schema (Tier 1) — design doc §4.
Two record kinds, ``Session`` and ``SessionEvent``, plus the small enums every
adapter targets. Field names here are the stable contract; per-flavor quirks are
absorbed inside each adapter (see design §4.3 native -> kind mapping).
"""
from __future__ import annotations
import json
from dataclasses import asdict, dataclass, field, fields
from typing import Any, Optional
SCHEMA_VERSION = 2 # v2: digest carries error_snippets (WP-0006 T01)
# Supported agent flavors. ``session_uid`` is always "<flavor>:<native id>".
FLAVORS = ("claude", "codex", "grok")
# SessionEvent.kind universe (design §4.2 / §4.3).
KINDS = (
"user_msg",
"assistant_msg",
"thinking",
"tool_call",
"tool_result",
"error",
"test_run",
"edit",
"retry",
"human_intervention",
"decision",
"lifecycle",
"completion",
)
# Session.outcome universe.
OUTCOMES = ("success", "fail", "abandoned", "unknown")
@dataclass
class Cost:
"""Token + effort accounting for a session."""
input_tokens: int = 0
output_tokens: int = 0
cache_tokens: int = 0
wall_clock_s: float = 0.0
turns: int = 0
retries: int = 0
@dataclass
class Session:
"""One bounded run of a coding agent against a repo (design §4.1)."""
session_uid: str # "<flavor>:<native id>" — globally unique
flavor: str
native_session_id: str
repo: Optional[str] = None
domain: Optional[str] = None
cwd: Optional[str] = None
git_branch: Optional[str] = None
model: Optional[str] = None
started_at: Optional[str] = None # ISO-8601 UTC
ended_at: Optional[str] = None
outcome: str = "unknown"
cost: Cost = field(default_factory=Cost)
task_ref: Optional[str] = None
source_path: Optional[str] = None
source_bytes: int = 0
schema_version: int = SCHEMA_VERSION
# watermarks (design §3.1): discovered -> ingested -> analyzed -> evicted
discovered_at: Optional[str] = None
ingested_at: Optional[str] = None
analyzed_at: Optional[str] = None
evicted_at: Optional[str] = None
def __post_init__(self) -> None:
if self.flavor not in FLAVORS:
raise ValueError(f"unknown flavor {self.flavor!r}; expected one of {FLAVORS}")
if self.outcome not in OUTCOMES:
raise ValueError(f"unknown outcome {self.outcome!r}; expected one of {OUTCOMES}")
expected_prefix = f"{self.flavor}:"
if not self.session_uid.startswith(expected_prefix):
raise ValueError(
f"session_uid {self.session_uid!r} must start with {expected_prefix!r}"
)
@property
def is_evictable(self) -> bool:
"""A session may be evicted from Tier 1 only once analyzed (design §3.1)."""
return self.analyzed_at is not None and self.evicted_at is None
@staticmethod
def make_uid(flavor: str, native_session_id: str) -> str:
return f"{flavor}:{native_session_id}"
def to_dict(self) -> dict[str, Any]:
d = asdict(self)
return d
def to_json(self) -> str:
return json.dumps(self.to_dict(), sort_keys=True)
@classmethod
def from_dict(cls, d: dict[str, Any]) -> "Session":
d = dict(d)
cost = d.pop("cost", None)
obj = cls(**{k: v for k, v in d.items() if k in _SESSION_FIELDS})
if cost is not None:
obj.cost = Cost(**{k: v for k, v in cost.items() if k in _COST_FIELDS})
return obj
@classmethod
def from_json(cls, s: str) -> "Session":
return cls.from_dict(json.loads(s))
@dataclass
class SessionEvent:
"""One atomic record within a session (design §4.2)."""
session_uid: str
seq: int # monotonic within session
ts: Optional[str] = None
kind: str = "lifecycle"
parent_seq: Optional[int] = None # turn DAG (Claude); None for flat flavors
role: Optional[str] = None # user|assistant|system|tool
tool: Optional[str] = None # when kind in {tool_call, tool_result}
summary: Optional[str] = None # short, human-readable
payload_ref: Optional[str] = None # pointer to full body in Tier 1 blob store
tokens: int = 0
is_sidechain: bool = False
def __post_init__(self) -> None:
if self.kind not in KINDS:
raise ValueError(f"unknown kind {self.kind!r}; expected one of {KINDS}")
def to_dict(self) -> dict[str, Any]:
return asdict(self)
def to_json(self) -> str:
return json.dumps(self.to_dict(), sort_keys=True)
@classmethod
def from_dict(cls, d: dict[str, Any]) -> "SessionEvent":
return cls(**{k: v for k, v in d.items() if k in _EVENT_FIELDS})
@classmethod
def from_json(cls, s: str) -> "SessionEvent":
return cls.from_dict(json.loads(s))
_SESSION_FIELDS = {f.name for f in fields(Session)}
_COST_FIELDS = {f.name for f in fields(Cost)}
_EVENT_FIELDS = {f.name for f in fields(SessionEvent)}

View File

@@ -0,0 +1,315 @@
"""Two-tier store (design §3, §8).
Tier 1 (bulky, evictable): ``Session`` + ``SessionEvent`` rows in SQLite, with
event bodies written out-of-line as files under a blob dir (referenced by
``payload_ref``). Tier 2 (compact, durable): per-session ``digest`` rows.
Writes are idempotent on ``(session_uid, seq)`` for events and on
``session_uid`` for sessions/digests, so sweeps are safely re-runnable. Eviction
(:meth:`evict_raw`) deletes Tier 1 rows + blobs but keeps the session row and its
Tier 2 digest — the invariant that makes budget-based retention non-lossy.
"""
from __future__ import annotations
import hashlib
import json
import os
import re
import sqlite3
from datetime import datetime, timezone
from typing import Any, Optional
from .schema import Cost, Session, SessionEvent
_SAFE = re.compile(r"[^A-Za-z0-9._-]+")
def _now() -> str:
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
def _fingerprint(ev: SessionEvent, body: Optional[str]) -> str:
"""Stable content fingerprint, independent of seq/payload_ref, for dedup."""
h = hashlib.sha1()
parts = [ev.ts or "", ev.kind, ev.role or "", ev.tool or "", ev.summary or "",
ev.role or "", str(ev.is_sidechain)]
h.update("\x1f".join(parts).encode("utf-8"))
if body is not None:
h.update(b"\x1e")
h.update(body.encode("utf-8"))
return h.hexdigest()
class Store:
def __init__(self, db_path: str, blob_dir: str):
self.db_path = db_path
self.blob_dir = blob_dir
os.makedirs(os.path.dirname(db_path) or ".", exist_ok=True)
os.makedirs(blob_dir, exist_ok=True)
self.db = sqlite3.connect(db_path)
self.db.row_factory = sqlite3.Row
self.db.execute("PRAGMA journal_mode=WAL")
self._init_schema()
def close(self) -> None:
self.db.close()
def __enter__(self) -> "Store":
return self
def __exit__(self, *exc) -> None:
self.close()
def _init_schema(self) -> None:
self.db.executescript(
"""
CREATE TABLE IF NOT EXISTS sessions (
session_uid TEXT PRIMARY KEY,
json TEXT NOT NULL,
analyzed_at TEXT,
evicted_at TEXT
);
CREATE TABLE IF NOT EXISTS events (
session_uid TEXT NOT NULL,
seq INTEGER NOT NULL,
json TEXT NOT NULL,
PRIMARY KEY (session_uid, seq)
);
CREATE TABLE IF NOT EXISTS blobs (
ref TEXT PRIMARY KEY,
session_uid TEXT NOT NULL,
path TEXT NOT NULL,
nbytes INTEGER NOT NULL
);
CREATE TABLE IF NOT EXISTS digests (
session_uid TEXT PRIMARY KEY,
json TEXT NOT NULL,
nbytes INTEGER NOT NULL
);
CREATE INDEX IF NOT EXISTS ix_events_uid ON events(session_uid);
CREATE INDEX IF NOT EXISTS ix_blobs_uid ON blobs(session_uid);
"""
)
self.db.commit()
# ---- Tier 1 writes -----------------------------------------------------
def upsert_session(self, s: Session) -> None:
self.db.execute(
"INSERT INTO sessions(session_uid, json, analyzed_at, evicted_at) "
"VALUES(?,?,?,?) ON CONFLICT(session_uid) DO UPDATE SET "
"json=excluded.json, analyzed_at=excluded.analyzed_at, evicted_at=excluded.evicted_at",
(s.session_uid, s.to_json(), s.analyzed_at, s.evicted_at),
)
self.db.commit()
def upsert_events(self, events: list[SessionEvent]) -> int:
rows = [(e.session_uid, e.seq, e.to_json()) for e in events]
self.db.executemany(
"INSERT INTO events(session_uid, seq, json) VALUES(?,?,?) "
"ON CONFLICT(session_uid, seq) DO UPDATE SET json=excluded.json",
rows,
)
self.db.commit()
return len(rows)
def write_blobs(self, session_uid: str, blobs: dict[str, str]) -> int:
"""Write event bodies as files; record path + size. Returns bytes written."""
total = 0
sub = os.path.join(self.blob_dir, _SAFE.sub("_", session_uid))
os.makedirs(sub, exist_ok=True)
for ref, body in blobs.items():
data = body.encode("utf-8")
fname = _SAFE.sub("_", ref) + ".txt"
path = os.path.join(sub, fname)
with open(path, "w", encoding="utf-8") as f:
f.write(body)
self.db.execute(
"INSERT INTO blobs(ref, session_uid, path, nbytes) VALUES(?,?,?,?) "
"ON CONFLICT(ref) DO UPDATE SET path=excluded.path, nbytes=excluded.nbytes",
(ref, session_uid, path, len(data)),
)
total += len(data)
self.db.commit()
return total
def ingest(self, bundle) -> int:
"""Persist a Normalized bundle, merging into any existing session.
Multiple files can map to one ``session_uid`` (Claude resume/sidechains;
Grok multi-file dirs). Events are de-duplicated by content fingerprint and
genuinely-new events are appended with offset ``seq`` (design OQ6 / T03).
Returns the number of new events written. Idempotent: re-ingesting the
same bundle adds nothing.
"""
s = bundle.session
existing = self.get_session(s.session_uid)
if existing is None:
if s.ingested_at is None:
s.ingested_at = _now()
self.upsert_session(s)
# known fingerprints + current max seq for this session
seen = self._event_fingerprints(s.session_uid)
next_seq = self._max_seq(s.session_uid) + 1
new_events: list[SessionEvent] = []
new_blobs: dict[str, str] = {}
old_to_new: dict[int, int] = {}
for ev in bundle.events:
body = bundle.blobs.get(ev.payload_ref) if ev.payload_ref else None
fp = _fingerprint(ev, body)
if fp in seen:
continue # already stored (prior file or prior sweep)
new_seq = next_seq
next_seq += 1
old_to_new[ev.seq] = new_seq
# remap parent within this bundle; cross-file parents become None
parent = old_to_new.get(ev.parent_seq) if ev.parent_seq is not None else None
ref = None
if body is not None:
ref = f"blob://{s.session_uid}/{new_seq}"
new_blobs[ref] = body
merged = SessionEvent(
session_uid=s.session_uid, seq=new_seq, parent_seq=parent, ts=ev.ts,
kind=ev.kind, role=ev.role, tool=ev.tool, summary=ev.summary,
payload_ref=ref, tokens=ev.tokens, is_sidechain=ev.is_sidechain,
)
new_events.append(merged)
seen.add(fp)
if new_events:
self.upsert_events(new_events)
self.write_blobs(s.session_uid, new_blobs)
return len(new_events)
def _max_seq(self, session_uid: str) -> int:
row = self.db.execute(
"SELECT COALESCE(MAX(seq), -1) m FROM events WHERE session_uid=?", (session_uid,)
).fetchone()
return int(row["m"])
def _event_fingerprints(self, session_uid: str) -> set[str]:
fps: set[str] = set()
for e in self.get_events(session_uid):
body = None
if e.payload_ref:
r = self.db.execute("SELECT path FROM blobs WHERE ref=?", (e.payload_ref,)).fetchone()
if r:
try:
with open(r["path"], "r", encoding="utf-8") as f:
body = f.read()
except OSError:
body = None
fps.add(_fingerprint(e, body))
return fps
# ---- Tier 2 (digest) ---------------------------------------------------
def write_digest(self, session_uid: str, digest: dict[str, Any], analyzed_at: Optional[str] = None) -> None:
payload = json.dumps(digest, sort_keys=True)
self.db.execute(
"INSERT INTO digests(session_uid, json, nbytes) VALUES(?,?,?) "
"ON CONFLICT(session_uid) DO UPDATE SET json=excluded.json, nbytes=excluded.nbytes",
(session_uid, payload, len(payload.encode("utf-8"))),
)
self.db.execute(
"UPDATE sessions SET analyzed_at=? WHERE session_uid=?",
(analyzed_at or _now(), session_uid),
)
self.db.commit()
def get_digest(self, session_uid: str) -> Optional[dict[str, Any]]:
row = self.db.execute("SELECT json FROM digests WHERE session_uid=?", (session_uid,)).fetchone()
return json.loads(row["json"]) if row else None
def list_digests(self) -> list[dict[str, Any]]:
return [json.loads(r["json"]) for r in self.db.execute("SELECT json FROM digests")]
def save_patterns(self, patterns: list[dict[str, Any]]) -> None:
"""Persist candidate patterns to a Tier 2 table (replace prior run)."""
self.db.execute(
"CREATE TABLE IF NOT EXISTS patterns ("
"key TEXT PRIMARY KEY, json TEXT NOT NULL, detected_at TEXT NOT NULL)"
)
self.db.execute("DELETE FROM patterns")
self.db.executemany(
"INSERT INTO patterns(key, json, detected_at) VALUES(?,?,?)",
[(p["key"], json.dumps(p, sort_keys=True), _now()) for p in patterns],
)
self.db.commit()
# ---- reads -------------------------------------------------------------
def get_session(self, session_uid: str) -> Optional[Session]:
row = self.db.execute(
"SELECT json, analyzed_at, evicted_at FROM sessions WHERE session_uid=?", (session_uid,)
).fetchone()
return self._row_to_session(row) if row else None
def list_sessions(self) -> list[Session]:
rows = self.db.execute("SELECT json, analyzed_at, evicted_at FROM sessions")
return [self._row_to_session(r) for r in rows]
@staticmethod
def _row_to_session(row) -> Session:
"""Rebuild a Session, treating the watermark columns as authoritative."""
s = Session.from_json(row["json"])
s.analyzed_at = row["analyzed_at"]
s.evicted_at = row["evicted_at"]
return s
def get_events(self, session_uid: str) -> list[SessionEvent]:
rows = self.db.execute(
"SELECT json FROM events WHERE session_uid=? ORDER BY seq", (session_uid,)
).fetchall()
return [SessionEvent.from_json(r["json"]) for r in rows]
def count_events(self, session_uid: str) -> int:
return self.db.execute(
"SELECT COUNT(*) c FROM events WHERE session_uid=?", (session_uid,)
).fetchone()["c"]
# ---- usage accounting (drives retention) -------------------------------
def tier1_usage_bytes(self) -> int:
"""Bytes held in Tier 1: event-row JSON + blob bytes for non-evicted sessions."""
row = self.db.execute(
"SELECT COALESCE(SUM(LENGTH(json)),0) b FROM events e "
"WHERE NOT EXISTS (SELECT 1 FROM sessions s "
"WHERE s.session_uid=e.session_uid AND s.evicted_at IS NOT NULL)"
).fetchone()
blob = self.db.execute("SELECT COALESCE(SUM(nbytes),0) b FROM blobs").fetchone()
return int(row["b"]) + int(blob["b"])
def session_tier1_bytes(self, session_uid: str) -> int:
ev = self.db.execute(
"SELECT COALESCE(SUM(LENGTH(json)),0) b FROM events WHERE session_uid=?", (session_uid,)
).fetchone()["b"]
bl = self.db.execute(
"SELECT COALESCE(SUM(nbytes),0) b FROM blobs WHERE session_uid=?", (session_uid,)
).fetchone()["b"]
return int(ev) + int(bl)
def tier2_usage_bytes(self) -> int:
return int(self.db.execute("SELECT COALESCE(SUM(nbytes),0) b FROM digests").fetchone()["b"])
# ---- eviction ----------------------------------------------------------
def evict_raw(self, session_uid: str) -> int:
"""Drop Tier 1 raw (events + blob files) for a session; keep digest + row.
Sets ``evicted_at``. Returns bytes freed. Safe to call on an
already-evicted session (no-op-ish).
"""
freed = self.session_tier1_bytes(session_uid)
for r in self.db.execute("SELECT path FROM blobs WHERE session_uid=?", (session_uid,)).fetchall():
try:
os.remove(r["path"])
except FileNotFoundError:
pass
self.db.execute("DELETE FROM blobs WHERE session_uid=?", (session_uid,))
self.db.execute("DELETE FROM events WHERE session_uid=?", (session_uid,))
self.db.execute("UPDATE sessions SET evicted_at=? WHERE session_uid=?", (_now(), session_uid))
self.db.commit()
return freed

View File

@@ -0,0 +1,9 @@
"""Curate phase (PRD §6.3) — review candidate patterns into versioned Solution
Patterns held in an in-repo Pattern Catalog.
Layout mirrors ``detect/``:
schema.py Solution Pattern artifact + per-flavor rendering hints (T01)
catalog.py versioned, files-first catalog store (T02)
review.py discuss/approve/reject -> promote workflow (T03)
__main__.py `python -m session_memory.curate` entrypoint (T06)
"""

View File

@@ -0,0 +1,130 @@
"""Curate entrypoint (T06): review detect candidates into the Pattern Catalog.
python -m session_memory.curate [--config PATH] [--auto-approve] [--json]
[--workstream-id ID]
Refreshes candidate patterns (runs the detect pipeline), then drives them through
the review workflow — **interactive** by default, or **batch** with
``--auto-approve`` (promote everything clearing the evidence bar, reject the rest)
for kaizen-agent runs. Candidates are presented cross-flavor first (detect's
ranking). Emits a catalog diff summary and, with ``--json``, a machine-readable
result. Approvals land in the files-first catalog; each final decision is logged
as a hub decision (queued if the hub is down).
"""
from __future__ import annotations
import argparse
import json
import os
from ..detect.__main__ import run_detect
from ..ingest import _expand, load_config
from .catalog import Catalog
from .decisions import DecisionRecorder
from .gating import bloat_warnings, evaluate, gate_config
from .review import APPROVE, DISCUSS, REJECT, ReviewLog, review
def _curate_paths(config: dict):
c = config.get("curate", {})
catalog_dir = _expand(c.get("catalog_dir", "session_memory/catalog"))
review_log = _expand(c.get("review_log", "session_memory/.store/reviews.jsonl"))
queue = _expand(c.get("decision_queue", "session_memory/.store/decisions.queue.jsonl"))
ws_id = c.get("state_hub_workstream_id")
return catalog_dir, review_log, queue, ws_id
def _render_candidate(cand: dict, gate, existing) -> str:
g = evaluate(cand, gate)
flag = " [CROSS-FLAVOR]" if cand.get("cross_flavor") else ""
lines = [
f"\n{cand['title']}{flag}",
f" key={cand['key']} score={cand.get('score')} freq={cand['frequency']} "
f"impact={cand.get('cost_impact')}",
f" flavors={','.join(cand.get('flavors', []))} "
f"repos={','.join(cand.get('repos', [])) or '-'} sessions={len(cand.get('sessions', []))}",
f" gate: promotable={g.promotable} distribution_ready={g.distribution_ready}"
+ (f" ({'; '.join(g.reasons)})" if g.reasons else ""),
]
for w in bloat_warnings(cand, existing):
lines.append(f" bloat: {w}")
return "\n".join(lines)
def _interactive_decider(gate, catalog):
def decide(cand):
print(_render_candidate(cand, gate, catalog.list()))
while True:
choice = input(" [a]pprove / [r]eject / [d]iscuss ? ").strip().lower()
if choice in ("a", "approve"):
return (APPROVE, input(" rationale: ").strip() or "approved")
if choice in ("r", "reject"):
return (REJECT, input(" rationale: ").strip() or "rejected")
if choice in ("d", "discuss"):
return (DISCUSS, "deferred for discussion")
return decide
def _auto_decider(gate):
"""Batch policy: approve candidates clearing the promote floor, reject the rest."""
def decide(cand):
g = evaluate(cand, gate)
if g.promotable:
return (APPROVE, "auto-approved: clears evidence bar")
return (REJECT, "auto-rejected: " + "; ".join(g.reasons))
return decide
def _summary(result, n_candidates: int) -> str:
added = [k for k, a in result.approved if a in ("added", "versioned", "updated")]
lines = [
f"# Curate summary ({n_candidates} candidates reviewed)",
f" approved : {len(result.approved)} ({', '.join(f'{k}:{a}' for k, a in result.approved) or '-'})",
f" rejected : {len(result.rejected)} ({', '.join(result.rejected) or '-'})",
f" deferred : {len(result.deferred)} ({', '.join(result.deferred) or '-'})",
f" skipped : {len(result.skipped)} (already decided)",
f" catalog writes: {len(added)}",
]
return "\n".join(lines)
def main(argv=None) -> int:
here = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
ap = argparse.ArgumentParser(description="Curate detect candidates into the Pattern Catalog.")
ap.add_argument("--config", default=os.path.join(here, "config.toml"))
ap.add_argument("--auto-approve", action="store_true",
help="batch mode: promote everything clearing the evidence bar")
ap.add_argument("--min-frequency", type=int, default=2)
ap.add_argument("--workstream-id", default=None, help="hub workstream for decisions")
ap.add_argument("--json", action="store_true", help="emit machine-readable JSON")
args = ap.parse_args(argv)
config = load_config(args.config)
candidates = run_detect(config, min_frequency=args.min_frequency)
catalog_dir, review_log_path, queue_path, ws_id = _curate_paths(config)
gate = gate_config(config)
catalog = Catalog(catalog_dir)
log = ReviewLog(review_log_path)
recorder = DecisionRecorder(queue_path, workstream_id=args.workstream_id or ws_id)
decide = _auto_decider(gate) if args.auto_approve else _interactive_decider(gate, catalog)
result = review(candidates, decide, catalog, log, gate=gate, recorder=recorder)
if args.json:
print(json.dumps({
"approved": result.approved, "rejected": result.rejected,
"deferred": result.deferred, "skipped": result.skipped,
"decisions_queued": len(recorder.pending()),
}, indent=2))
else:
print(_summary(result, len(candidates)))
if recorder.pending():
print(f" decisions queued (hub offline): {len(recorder.pending())} "
f"-> {queue_path}")
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,148 @@
"""Versioned Pattern Catalog — files-first source of truth (FR-U3; T02).
The catalog is a directory of one JSON file per Solution Pattern
(``<catalog_dir>/<pattern-id>.json``). Files originate the work; the State Hub
indexes them (ADR-001 / PRD §9). Identity is the pattern ``id`` (derived from the
source candidate key), so re-promoting the same detect candidate maps to the same
file — dedup is structural, not heuristic.
:meth:`Catalog.upsert` is the one write path and is **idempotent**:
* new id -> written as-is (``added``)
* same id, identical content -> no write, no version bump (``unchanged``)
* same id, only status/flags -> updated in place, no bump (``updated``)
* same id, content changed -> version bumped, prior snapshot
appended to ``<id>.history.jsonl`` (``versioned``)
History is append-only alongside the current file, so the catalog dir stays one
clean current file per pattern while every superseded version is recoverable.
"""
from __future__ import annotations
import json
import os
from datetime import datetime, timezone
from typing import Optional
from .schema import SolutionPattern
# Content fields that define a pattern's substance. Version, timestamps, status,
# and distribution_ready are metadata — changes to them never bump the version.
_CONTENT_KEYS = ("name", "polarity", "problem", "resolutions", "scope",
"provenance", "rendering_hints", "covers")
ADDED = "added"
UNCHANGED = "unchanged"
UPDATED = "updated"
VERSIONED = "versioned"
def _now() -> str:
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
def _content(p: SolutionPattern) -> str:
d = p.to_dict()
return json.dumps({k: d[k] for k in _CONTENT_KEYS}, sort_keys=True)
class Catalog:
"""File-backed catalog of versioned :class:`SolutionPattern` artifacts."""
def __init__(self, catalog_dir: str) -> None:
self.dir = catalog_dir
os.makedirs(self.dir, exist_ok=True)
# --- paths --------------------------------------------------------------
def _path(self, pattern_id: str) -> str:
return os.path.join(self.dir, f"{pattern_id}.json")
def _history_path(self, pattern_id: str) -> str:
return os.path.join(self.dir, f"{pattern_id}.history.jsonl")
# --- reads --------------------------------------------------------------
def load(self, pattern_id: str) -> Optional[SolutionPattern]:
path = self._path(pattern_id)
if not os.path.exists(path):
return None
with open(path, encoding="utf-8") as fh:
return SolutionPattern.from_json(fh.read())
def list(self) -> list[SolutionPattern]:
out: list[SolutionPattern] = []
for name in sorted(os.listdir(self.dir)):
if name.endswith(".json") and not name.endswith(".history.jsonl"):
with open(os.path.join(self.dir, name), encoding="utf-8") as fh:
out.append(SolutionPattern.from_json(fh.read()))
return out
def history(self, pattern_id: str) -> list[dict]:
path = self._history_path(pattern_id)
if not os.path.exists(path):
return []
with open(path, encoding="utf-8") as fh:
return [json.loads(line) for line in fh if line.strip()]
def find_for(self, signal_key: str, locus: str = "") -> Optional[SolutionPattern]:
"""Best catalog pattern for a detect signal: exact id first, then ``covers``.
Lets a signal that doesn't share a pattern's exact key (e.g. a
``recurring_error`` fingerprint) inherit the curated recommendation when a
pattern declares it covers that text.
"""
exact = self.load(SolutionPattern.make_id(signal_key))
if exact is not None:
return exact
hay = f"{signal_key} {locus}".lower()
for p in self.list(): # sorted by id -> deterministic
if any(c.lower() in hay for c in p.covers):
return p
return None
# --- the single write path ---------------------------------------------
def upsert(self, pattern: SolutionPattern) -> str:
"""Insert or version-update a pattern. Returns the action taken."""
existing = self.load(pattern.id)
now = _now()
if existing is None:
pattern.created_at = pattern.created_at or now
pattern.updated_at = now
self._write(pattern)
return ADDED
if _content(existing) == _content(pattern):
# substance unchanged — only persist a metadata (status/flag) change
if (existing.status == pattern.status
and existing.distribution_ready == pattern.distribution_ready):
return UNCHANGED
existing.status = pattern.status
existing.distribution_ready = pattern.distribution_ready
existing.updated_at = now
self._write(existing)
return UPDATED
# substance changed: archive the old version, bump, write the new one
self._append_history(existing)
pattern.version = SolutionPattern.bump_version(existing.version)
pattern.created_at = existing.created_at or now
pattern.updated_at = now
self._write(pattern)
return VERSIONED
# --- internals ----------------------------------------------------------
def _write(self, pattern: SolutionPattern) -> None:
with open(self._path(pattern.id), "w", encoding="utf-8") as fh:
fh.write(pattern.to_json())
fh.write("\n")
def _append_history(self, superseded: SolutionPattern) -> None:
superseded.status = "superseded"
with open(self._history_path(superseded.id), "a", encoding="utf-8") as fh:
fh.write(json.dumps(superseded.to_dict(), sort_keys=True))
fh.write("\n")

View File

@@ -0,0 +1,114 @@
"""State Hub decision integration (FR-U4; T05).
Every final promote/reject is recorded as an auditable decision so the rationale,
the source candidate key, and an evidence snapshot are traceable. The catalog
file remains the durable artifact (ADR-001); the decision is the audit trail.
The recorder is **graceful under a hub outage** — exactly the condition hit during
Phase 1, where statuses were synced after the fact. A pluggable ``sink`` does the
actual write (HTTP to the hub, or the MCP ``record_decision`` tool driven by the
operator). If the sink is absent or raises, the decision is appended to a local
queue (``decisions.queue.jsonl``) and can be replayed later with :meth:`flush`.
"""
from __future__ import annotations
import json
import os
from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Callable, Optional
# A sink takes a hub-shaped decision payload and persists it (may raise on failure).
Sink = Callable[[dict], None]
def _now() -> str:
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
def build_decision(candidate: dict, action: str, rationale: str,
*, workstream_id: Optional[str] = None,
decided_by: str = "curator") -> dict:
"""Shape a curate decision as a State Hub ``record_decision`` payload."""
key = candidate["key"]
verb = "Promote" if action == "approve" else "Reject"
return {
"title": f"{verb} pattern candidate {key}",
"decision_type": "made",
"workstream_id": workstream_id,
"rationale": rationale,
"decided_by": decided_by,
"description": json.dumps({
"action": action,
"source_key": key,
"evidence": candidate,
}, sort_keys=True),
"recorded_at": _now(),
}
@dataclass
class DecisionRecorder:
"""Records decisions through ``sink`` with a durable local-queue fallback."""
queue_path: str
sink: Optional[Sink] = None
workstream_id: Optional[str] = None
decided_by: str = "curator"
_queued: int = field(default=0, init=False)
def record(self, candidate: dict, action: str, rationale: str) -> bool:
"""Record one decision. Returns True if the sink accepted it, else queued."""
payload = build_decision(candidate, action, rationale,
workstream_id=self.workstream_id, decided_by=self.decided_by)
if self.sink is not None:
try:
self.sink(payload)
return True
except Exception: # hub down / transient — fall through to the queue
pass
self._append(payload)
return False
def pending(self) -> list[dict]:
if not os.path.exists(self.queue_path):
return []
with open(self.queue_path, encoding="utf-8") as fh:
return [json.loads(line) for line in fh if line.strip()]
def flush(self, sink: Optional[Sink] = None) -> int:
"""Replay queued decisions through ``sink``. Returns count synced.
Stops at the first failure so ordering is preserved; the unsynced tail is
rewritten back to the queue.
"""
sink = sink or self.sink
if sink is None:
return 0
items = self.pending()
synced = 0
for i, payload in enumerate(items):
try:
sink(payload)
synced += 1
except Exception:
self._rewrite(items[i:])
return synced
self._rewrite([])
return synced
# --- internals ----------------------------------------------------------
def _append(self, payload: dict) -> None:
os.makedirs(os.path.dirname(self.queue_path) or ".", exist_ok=True)
with open(self.queue_path, "a", encoding="utf-8") as fh:
fh.write(json.dumps(payload, sort_keys=True))
fh.write("\n")
self._queued += 1
def _rewrite(self, items: list[dict]) -> None:
with open(self.queue_path, "w", encoding="utf-8") as fh:
for payload in items:
fh.write(json.dumps(payload, sort_keys=True))
fh.write("\n")

View File

@@ -0,0 +1,117 @@
"""Promotion evidence-bar + bloat guard (design OQ5/OQ6; T04).
Two gates protect the catalog:
* **Evidence bar (OQ5)** — a candidate must clear configurable floors
(frequency, distinct supporting sessions) before it may be promoted at all.
A separate, stricter bar decides whether the promoted pattern is
*distribution-eligible* (``status="approved"``, ``distribution_ready=True``)
vs. merely ``provisional`` — the minimum trustworthy evidence before a pattern
is allowed near live agent environments.
* **Bloat guard (OQ6)** — flags candidates that would add little: a duplicate of
an already-cataloged pattern, or a near-duplicate sharing the same
signal-type+locus. Keeps the catalog lean so agent context budgets aren't
degraded by low-value instructions.
Knobs live under ``[curate]`` in ``config.toml``; :func:`gate_config` reads them
with safe defaults so the module also works config-free (tests).
"""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Optional
from .schema import SolutionPattern
@dataclass
class GateConfig:
# promotion floor (OQ5)
min_frequency: int = 2
min_sessions: int = 2
min_cost_impact: float = 0.0
# distribution-eligibility floor (stricter; OQ5)
dist_require_cross_flavor: bool = False
dist_min_frequency: int = 3
dist_min_cost_impact: float = 0.0
def gate_config(config: Optional[dict] = None) -> GateConfig:
c = (config or {}).get("curate", {}) if config else {}
g = c.get("gate", {}) if isinstance(c, dict) else {}
return GateConfig(
min_frequency=g.get("min_frequency", 2),
min_sessions=g.get("min_sessions", 2),
min_cost_impact=g.get("min_cost_impact", 0.0),
dist_require_cross_flavor=g.get("dist_require_cross_flavor", False),
dist_min_frequency=g.get("dist_min_frequency", 3),
dist_min_cost_impact=g.get("dist_min_cost_impact", 0.0),
)
@dataclass
class GateResult:
promotable: bool
distribution_ready: bool
status: str # "approved" if distribution-ready else "provisional"
reasons: list = field(default_factory=list)
def _n_sessions(candidate: dict) -> int:
return len(candidate.get("sessions", []) or [])
def evaluate(candidate: dict, config: Optional[GateConfig] = None) -> GateResult:
"""Decide whether a candidate may be promoted, and at what trust level."""
cfg = config or GateConfig()
reasons: list[str] = []
freq = candidate.get("frequency", 0)
sessions = _n_sessions(candidate)
impact = candidate.get("cost_impact", 0.0)
promotable = True
if freq < cfg.min_frequency:
promotable = False
reasons.append(f"frequency {freq} < min {cfg.min_frequency}")
if sessions < cfg.min_sessions:
promotable = False
reasons.append(f"sessions {sessions} < min {cfg.min_sessions}")
if impact < cfg.min_cost_impact:
promotable = False
reasons.append(f"cost_impact {impact} < min {cfg.min_cost_impact}")
dist = promotable
if cfg.dist_require_cross_flavor and not candidate.get("cross_flavor", False):
dist = False
reasons.append("not cross-flavor (required for distribution)")
if freq < cfg.dist_min_frequency:
dist = False
reasons.append(f"frequency {freq} < distribution min {cfg.dist_min_frequency}")
if impact < cfg.dist_min_cost_impact:
dist = False
reasons.append(f"cost_impact {impact} < distribution min {cfg.dist_min_cost_impact}")
return GateResult(
promotable=promotable,
distribution_ready=bool(dist),
status="approved" if dist else "provisional",
reasons=reasons,
)
def bloat_warnings(candidate: dict, existing: list[SolutionPattern]) -> list[str]:
"""Flag low-value adds against what is already catalogued (OQ6)."""
warnings: list[str] = []
cand_id = SolutionPattern.make_id(candidate["key"])
_, sig_type, locus = (candidate["key"].split(":", 2) + ["", ""])[:3]
for p in existing:
if p.id == cand_id:
warnings.append(f"duplicate of catalogued pattern {p.id}")
continue
p_parts = (p.provenance.source_key.split(":", 2) + ["", ""])[:3]
if (p_parts[1], p_parts[2]) == (sig_type, locus):
warnings.append(f"near-duplicate of {p.id} (same {sig_type}/{locus})")
return warnings

View File

@@ -0,0 +1,158 @@
"""Curation review workflow (FR-U1/FR-U2; T03).
Drives Phase 1 detect candidates through a **discuss / approve / reject** review
and, on approve, promotes the candidate into a :class:`SolutionPattern` written to
the :class:`Catalog`. The actual decision is supplied by a ``decide`` callback so
this engine stays UI-free — the ``__main__`` entrypoint (T06) plugs in interactive
or batch (auto-approve) logic.
Re-review is **idempotent** via a :class:`ReviewLog`: a candidate already decided
is skipped unless its *evidence fingerprint* changed (new sessions/frequency), so
a prior **reject** is remembered and not re-surfaced, and a prior **approve** is
updated in place rather than duplicated (catalog dedup does the rest).
"""
from __future__ import annotations
import hashlib
import json
import os
from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Callable, Optional
from .catalog import Catalog
from .decisions import DecisionRecorder
from .gating import GateConfig, evaluate
from .schema import Provenance, Resolution, Scope, SolutionPattern
APPROVE = "approve"
REJECT = "reject"
DISCUSS = "discuss" # defer — no final decision recorded
# Default per-flavor rendering-hint stubs a reviewer can later refine (OQ4).
_DEFAULT_TARGET = {"claude": "CLAUDE.md", "codex": "AGENTS.md", "grok": "instructions"}
# A decision callback: (candidate dict) -> (action, rationale)
Decider = Callable[[dict], tuple]
def _now() -> str:
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
def evidence_fingerprint(candidate: dict) -> str:
"""Stable hash of the evidence that would justify (re)reviewing a candidate."""
keys = ("frequency", "cost_impact", "flavors", "repos", "sessions", "cross_flavor")
payload = {k: candidate.get(k) for k in keys}
return hashlib.sha1(json.dumps(payload, sort_keys=True).encode("utf-8")).hexdigest()
def candidate_to_pattern(candidate: dict, *, status: str = "provisional",
distribution_ready: bool = False) -> SolutionPattern:
"""Build a Solution Pattern from a detect candidate.
``status``/``distribution_ready`` come from the evidence gate (T04); they
default to a provisional, non-distribution-ready pattern when ungated.
"""
src = candidate["key"]
flavors = list(candidate.get("flavors", []))
hints = {f: {"target": _DEFAULT_TARGET.get(f, ""), "note": "TODO: refine rendering"}
for f in flavors}
return SolutionPattern(
id=SolutionPattern.make_id(src),
name=candidate.get("title") or src,
version="1.0.0",
polarity=candidate.get("polarity", "problem"),
problem=candidate.get("title") or src,
resolutions=[Resolution(summary="TODO: capture the recommended resolution")],
scope=Scope(flavors=flavors, repos=list(candidate.get("repos", []))),
provenance=Provenance(source_key=src, evidence=dict(candidate), promoted_at=_now()),
rendering_hints=hints,
status=status,
distribution_ready=distribution_ready,
)
@dataclass
class ReviewLog:
"""Append-only record of final decisions, keyed by candidate source key."""
path: str
_by_key: dict = field(default_factory=dict)
def __post_init__(self) -> None:
if os.path.exists(self.path):
with open(self.path, encoding="utf-8") as fh:
for line in fh:
if line.strip():
rec = json.loads(line)
self._by_key[rec["source_key"]] = rec # last write wins
def prior(self, source_key: str) -> Optional[dict]:
return self._by_key.get(source_key)
def already_decided(self, candidate: dict) -> bool:
rec = self._by_key.get(candidate["key"])
return bool(rec) and rec["fingerprint"] == evidence_fingerprint(candidate)
def record(self, candidate: dict, action: str, rationale: str) -> None:
rec = {
"source_key": candidate["key"],
"action": action,
"rationale": rationale,
"fingerprint": evidence_fingerprint(candidate),
"ts": _now(),
}
self._by_key[candidate["key"]] = rec
os.makedirs(os.path.dirname(self.path) or ".", exist_ok=True)
with open(self.path, "a", encoding="utf-8") as fh:
fh.write(json.dumps(rec, sort_keys=True))
fh.write("\n")
@dataclass
class ReviewResult:
approved: list = field(default_factory=list) # (source_key, catalog_action)
rejected: list = field(default_factory=list) # source_key
deferred: list = field(default_factory=list) # source_key (discuss)
skipped: list = field(default_factory=list) # source_key (already decided)
def review(candidates: list[dict], decide: Decider, catalog: Catalog,
log: ReviewLog, gate: Optional[GateConfig] = None,
recorder: Optional[DecisionRecorder] = None) -> ReviewResult:
"""Run each candidate through ``decide``; promote approvals into ``catalog``.
When a ``gate`` (T04 evidence bar) is supplied, the promoted pattern's
``status``/``distribution_ready`` are set from the gate evaluation, so an
approved-but-thin candidate lands as ``provisional`` rather than
distribution-ready. When a ``recorder`` (T05) is supplied, each final
promote/reject is logged as an auditable hub decision (queued if the hub is
down).
"""
result = ReviewResult()
for cand in candidates:
key = cand["key"]
if log.already_decided(cand):
result.skipped.append(key)
continue
action, rationale = decide(cand)
if action == DISCUSS:
result.deferred.append(key)
continue # not a final decision — leave for a later pass
if action == APPROVE:
g = evaluate(cand, gate) if gate is not None else None
pattern = (candidate_to_pattern(cand, status=g.status,
distribution_ready=g.distribution_ready)
if g is not None else candidate_to_pattern(cand))
cat_action = catalog.upsert(pattern)
result.approved.append((key, cat_action))
elif action == REJECT:
result.rejected.append(key)
else:
raise ValueError(f"unknown review action {action!r}")
log.record(cand, action, rationale)
if recorder is not None:
recorder.record(cand, action, rationale)
return result

View File

@@ -0,0 +1,160 @@
"""Solution Pattern schema (PRD §6.3 FR-U2; design OQ4) — T01.
A **Solution Pattern** is the curated, reviewed artifact a candidate pattern is
promoted into: a named, versioned record pairing a problem (or success) with one
or more recommended resolutions, written **flavor-agnostically**. Everything a
distributor needs to render a native artifact lives in a *separate*
``rendering_hints`` sub-structure, keyed by flavor — so the core stays neutral
(FR-A1/FR-A2) while Phase 3 distributors still get enough to render well (OQ4).
The artifact is the durable unit of the Pattern Catalog (T02): files originate,
the State Hub indexes (ADR-001). Serialization is deterministic (sorted keys) so
catalog files diff cleanly and re-saving an unchanged pattern is a no-op.
"""
from __future__ import annotations
import json
import re
from dataclasses import asdict, dataclass, field, fields
from typing import Any, Optional
from ..core.schema import FLAVORS
SCHEMA_VERSION = 1
# Lifecycle of a catalogued pattern.
# provisional — promoted but below the distribution evidence bar (OQ5)
# approved — meets the bar; distribution-eligible (Phase 3)
# rejected — reviewed and declined; remembered so it is not re-surfaced
# superseded — replaced by a newer version of the same pattern id
STATUSES = ("provisional", "approved", "rejected", "superseded")
POLARITIES = ("problem", "success")
@dataclass
class Resolution:
"""One recommended resolution for the pattern's problem (FR-U2)."""
summary: str
detail: str = ""
steps: list[str] = field(default_factory=list)
@dataclass
class Scope:
"""Where the pattern applies (FR-X2 input). Empty list == unrestricted."""
repos: list[str] = field(default_factory=list)
domains: list[str] = field(default_factory=list)
flavors: list[str] = field(default_factory=list)
def __post_init__(self) -> None:
bad = [f for f in self.flavors if f not in FLAVORS]
if bad:
raise ValueError(f"unknown flavor(s) in scope {bad!r}; expected {FLAVORS}")
@dataclass
class Provenance:
"""Trace back to the detect candidate this pattern was promoted from."""
source_key: str # the detect Pattern.key — stable cluster identity
evidence: dict[str, Any] = field(default_factory=dict) # snapshot of the candidate
detected_at: Optional[str] = None
promoted_at: Optional[str] = None
@dataclass
class SolutionPattern:
"""A curated, versioned solution pattern (PRD §5 / §6.3)."""
id: str # stable, derived from provenance.source_key
name: str
version: str # semantic, e.g. "1.0.0"
polarity: str # problem | success
problem: str # human-readable description of the recurring situation
resolutions: list[Resolution] = field(default_factory=list)
scope: Scope = field(default_factory=Scope)
provenance: Provenance = field(default_factory=lambda: Provenance(source_key=""))
# per-flavor rendering hints, kept OUT of the agnostic core (OQ4):
# {"claude": {...}, "codex": {...}, "grok": {...}}
rendering_hints: dict[str, dict[str, Any]] = field(default_factory=dict)
# other signal keys/loci this pattern's recommendation also applies to —
# lowercase substrings matched against a candidate signal's key+locus, so a
# detect signal that doesn't share this pattern's exact key (e.g. a
# recurring_error fingerprint) can still inherit the curated resolution.
covers: list[str] = field(default_factory=list)
status: str = "provisional"
distribution_ready: bool = False
created_at: Optional[str] = None
updated_at: Optional[str] = None
schema_version: int = SCHEMA_VERSION
def __post_init__(self) -> None:
if self.polarity not in POLARITIES:
raise ValueError(f"unknown polarity {self.polarity!r}; expected {POLARITIES}")
if self.status not in STATUSES:
raise ValueError(f"unknown status {self.status!r}; expected {STATUSES}")
bad = [f for f in self.rendering_hints if f not in FLAVORS]
if bad:
raise ValueError(f"unknown flavor(s) in rendering_hints {bad!r}; expected {FLAVORS}")
# --- identity / versioning helpers -------------------------------------
@staticmethod
def make_id(source_key: str) -> str:
"""Stable catalog id from a detect candidate key (``polarity:type:locus``).
Identity is the source key, so re-promoting the same candidate maps to the
same pattern (dedup in T02), independent of wording or version.
"""
slug = re.sub(r"[^a-z0-9_]+", "-", source_key.lower()).strip("-")
return f"sp-{slug}"
@staticmethod
def bump_version(version: str, level: str = "patch") -> str:
"""Increment a ``major.minor.patch`` version string."""
parts = (version.split(".") + ["0", "0", "0"])[:3]
major, minor, patch = (int(p) for p in parts)
if level == "major":
major, minor, patch = major + 1, 0, 0
elif level == "minor":
minor, patch = minor + 1, 0
else:
patch += 1
return f"{major}.{minor}.{patch}"
# --- serialization ------------------------------------------------------
def to_dict(self) -> dict[str, Any]:
return asdict(self)
def to_json(self) -> str:
return json.dumps(self.to_dict(), sort_keys=True, indent=2)
@classmethod
def from_dict(cls, d: dict[str, Any]) -> "SolutionPattern":
d = dict(d)
resolutions = [Resolution(**{k: v for k, v in r.items() if k in _RESOLUTION_FIELDS})
for r in d.pop("resolutions", [])]
scope = d.pop("scope", None)
prov = d.pop("provenance", None)
obj = cls(**{k: v for k, v in d.items() if k in _PATTERN_FIELDS})
obj.resolutions = resolutions
if scope is not None:
obj.scope = Scope(**{k: v for k, v in scope.items() if k in _SCOPE_FIELDS})
if prov is not None:
obj.provenance = Provenance(**{k: v for k, v in prov.items() if k in _PROV_FIELDS})
return obj
@classmethod
def from_json(cls, s: str) -> "SolutionPattern":
return cls.from_dict(json.loads(s))
_PATTERN_FIELDS = {f.name for f in fields(SolutionPattern)}
_RESOLUTION_FIELDS = {f.name for f in fields(Resolution)}
_SCOPE_FIELDS = {f.name for f in fields(Scope)}
_PROV_FIELDS = {f.name for f in fields(Provenance)}

View File

@@ -0,0 +1 @@
"""Detect: extract signals from sessions, cluster into candidate patterns."""

View File

@@ -0,0 +1,72 @@
"""Detect entrypoint (T07): digests -> signals -> clusters -> report.
python -m session_memory.detect [--config PATH] [--json] [--min-frequency N]
Reads Tier 2 digests from the store, extracts signals, clusters them into
candidate patterns, persists the candidates, and prints a ranked report
(cross-flavor first) — the input to the Curate phase (Phase 2).
"""
from __future__ import annotations
import argparse
import json
import os
from ..core.store import Store
from ..ingest import _expand, load_config
from .cluster import cluster
from .quality import filter_real, quality_config
from .signals import extract_signals
def run_detect(config: dict, *, min_frequency: int = 2) -> list[dict]:
store_cfg = config.get("store", {})
store = Store(_expand(store_cfg["db_path"]), _expand(store_cfg["blob_dir"]))
digests = filter_real(store.list_digests(), quality_config(config))
signals = extract_signals(digests)
patterns = [p.to_dict() for p in cluster(signals, min_frequency=min_frequency)]
store.save_patterns(patterns)
store.close()
return patterns
def _format_report(patterns: list[dict], n_digests: int) -> str:
lines = [f"# Candidate Patterns ({len(patterns)} from {n_digests} sessions)", ""]
if not patterns:
lines.append("No recurring patterns above the frequency threshold yet.")
return "\n".join(lines)
for i, p in enumerate(patterns, 1):
flag = " [CROSS-FLAVOR]" if p["cross_flavor"] else ""
lines.append(f"{i}. {p['title']}{flag}")
lines.append(f" score={p['score']} freq={p['frequency']} "
f"impact={p['cost_impact']} flavors={','.join(p['flavors'])}")
lines.append(f" repos={','.join(p['repos']) or '-'} "
f"sessions={len(p['sessions'])}")
lines.append("")
return "\n".join(lines)
def main(argv=None) -> int:
here = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
ap = argparse.ArgumentParser(description="Detect candidate patterns from session digests.")
ap.add_argument("--config", default=os.path.join(here, "config.toml"))
ap.add_argument("--min-frequency", type=int, default=2)
ap.add_argument("--json", action="store_true", help="emit machine-readable JSON")
args = ap.parse_args(argv)
config = load_config(args.config)
store_cfg = config.get("store", {})
all_digests = Store(_expand(store_cfg["db_path"]), _expand(store_cfg["blob_dir"])).list_digests()
n = len(filter_real(all_digests, quality_config(config)))
patterns = run_detect(config, min_frequency=args.min_frequency)
if args.json:
print(json.dumps(patterns, indent=2))
else:
print(_format_report(patterns, n))
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,78 @@
"""Pattern clusterer + evidence (PRD §5, §6.2; T05/T06).
Groups recurring :class:`Signal`s into candidate ``Pattern`` records. Clustering
is deterministic and keyed on ``(polarity, signal-type, locus)`` — enough to
surface "the same thing keeps happening" without embeddings (a later option).
Each candidate carries evidence (FR-D3): supporting sessions, frequency, affected
repos, affected **flavors**, and an estimated cost-impact score. Candidates whose
evidence spans more than one flavor are flagged ``cross_flavor`` (FR-D4) — the
highest-value reuse targets.
"""
from __future__ import annotations
import collections
from dataclasses import asdict, dataclass, field
from typing import Any
from .signals import PROBLEM, Signal
@dataclass
class Pattern:
key: str # stable cluster key
polarity: str # problem | success
signal_type: str
locus: str
frequency: int # number of supporting signals
sessions: list[str] = field(default_factory=list)
repos: list[str] = field(default_factory=list)
flavors: list[str] = field(default_factory=list)
cross_flavor: bool = False
cost_impact: float = 0.0 # frequency-weighted magnitude
score: float = 0.0 # ranking score (impact x frequency)
title: str = ""
def to_dict(self) -> dict[str, Any]:
return asdict(self)
def _key(s: Signal) -> str:
return f"{s.polarity}:{s.type}:{s.locus}"
def _title(polarity: str, signal_type: str, n_flavors: int) -> str:
scope = "cross-flavor " if n_flavors > 1 else ""
verb = "problem" if polarity == PROBLEM else "success"
return f"{scope}{verb}: {signal_type.replace('_', ' ')}"
def cluster(signals: list[Signal], *, min_frequency: int = 2) -> list[Pattern]:
"""Group signals into candidate patterns; keep clusters >= min_frequency."""
groups: dict[str, list[Signal]] = collections.defaultdict(list)
for s in signals:
groups[_key(s)].append(s)
patterns: list[Pattern] = []
for key, members in groups.items():
if len(members) < min_frequency:
continue
sessions = sorted({m.session_uid for m in members})
repos = sorted({m.repo for m in members if m.repo})
flavors = sorted({m.flavor for m in members})
cost_impact = sum(m.magnitude for m in members)
first = members[0]
p = Pattern(
key=key, polarity=first.polarity, signal_type=first.type, locus=first.locus,
frequency=len(members), sessions=sessions, repos=repos, flavors=flavors,
cross_flavor=len(flavors) > 1, cost_impact=round(cost_impact, 3),
title=_title(first.polarity, first.type, len(flavors)),
)
# rank: impact x frequency, with a boost for cross-flavor reuse value
p.score = round(p.cost_impact * p.frequency * (1.5 if p.cross_flavor else 1.0), 3)
patterns.append(p)
# cross-flavor first, then by score
patterns.sort(key=lambda p: (not p.cross_flavor, -p.score))
return patterns

View File

@@ -0,0 +1,75 @@
"""Session-quality filter (T01).
The capture layer ingests *every* session it finds — including API health-checks,
smoke-tests, and interrupted runs (e.g. ``llm-connect`` firing "Say hello in one
word", or a transcript that is just ``[Request interrupted by user]``). These are
not real coding work, but the outcome heuristic labels the short ones ``abandoned``
and the clusterer then mints false-positive "problem" patterns from them.
:func:`is_real_coding_session` gates those out so Detect signals/clusters form only
over genuine coding sessions. It is intentionally conservative — a session counts
as real if it shows substantive activity, and is dropped only on clear trivial
markers. Thresholds come from ``[detect.quality]`` in ``config.toml``.
"""
from __future__ import annotations
from dataclasses import dataclass
from typing import Optional
# Prompt prefixes/markers that indicate a non-coding or interrupted session.
_TRIVIAL_PROMPTS = (
"say hello", "hello", "[request interrupted", "return only this json",
"ping", "ok", "<system-reminder>",
)
# Tool buckets that count as "substantive" coding activity.
_SUBSTANTIVE_TOOLS = (
"Edit", "Write", "Read", "Bash", "search_replace", "write", "read_file",
"run_terminal_command", "grep", "Grep", "glob", "Glob", "NotebookEdit",
)
@dataclass
class QualityConfig:
min_events: int = 20 # below this, not a real coding session
min_substantive: int = 3 # >= this many substantive tool calls required
min_prompt_len: int = 25 # first prompt shorter than this is suspect
def quality_config(config: Optional[dict] = None) -> QualityConfig:
d = (config or {}).get("detect", {}).get("quality", {}) if config else {}
return QualityConfig(
min_events=d.get("min_events", 20),
min_substantive=d.get("min_substantive", 3),
min_prompt_len=d.get("min_prompt_len", 25),
)
def _substantive_calls(digest: dict) -> int:
hist = digest.get("tool_histogram") or {}
return sum(n for t, n in hist.items() if t in _SUBSTANTIVE_TOOLS)
def is_real_coding_session(digest: dict, config: Optional[QualityConfig] = None) -> bool:
cfg = config or QualityConfig()
if not digest.get("repo"):
return False
if digest.get("event_count", 0) < cfg.min_events:
return False
if _substantive_calls(digest) < cfg.min_substantive:
return False
prompt = (digest.get("first_prompt") or "").strip().lower()
if len(prompt) < cfg.min_prompt_len:
return False
if any(prompt.startswith(p) for p in _TRIVIAL_PROMPTS):
return False
return True
def filter_real(digests: list[dict], config: Optional[QualityConfig] = None) -> list[dict]:
cfg = config or QualityConfig()
return [d for d in digests if is_real_coding_session(d, cfg)]

View File

@@ -0,0 +1,205 @@
"""Signal extractors (PRD §6.2; T04).
Pure functions over a session digest (Tier 2) — the compact, durable view. Each
extractor emits zero or more :class:`Signal`s. A signal records its source
session, a *locus* (what it's about), a *polarity* (problem vs. success), and a
*magnitude*. Signals are the atoms the clusterer groups into candidate patterns.
No new capture happens here; everything is derived from digests already written
by the Capture layer, so detection is cheap and re-runnable.
"""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Any, Callable, Optional
# polarity
PROBLEM = "problem"
SUCCESS = "success"
@dataclass
class Signal:
session_uid: str
flavor: str
repo: Optional[str]
type: str # e.g. "budget_overrun", "clean_pass"
polarity: str # PROBLEM | SUCCESS
locus: str # normalized subject key (tool, marker, ...)
magnitude: float = 1.0 # strength / cost weight
detail: dict[str, Any] = field(default_factory=dict)
# --- individual extractors --------------------------------------------------
# Each takes (digest, ctx) and returns a list[Signal]. ctx carries corpus-level
# stats (e.g. cost percentiles) so extractors can compare a session to its peers.
def _base(digest, type_, polarity, locus, magnitude=1.0, **detail) -> Signal:
return Signal(
session_uid=digest["session_uid"], flavor=digest["flavor"],
repo=digest.get("repo"), type=type_, polarity=polarity, locus=locus,
magnitude=magnitude, detail=detail,
)
def sig_retry_storm(digest, ctx) -> list[Signal]:
retries = digest.get("markers", {}).get("retries", 0)
if retries >= ctx.get("retry_storm_threshold", 3):
return [_base(digest, "retry_storm", PROBLEM, "retries", float(retries), retries=retries)]
return []
def sig_repeated_errors(digest, ctx) -> list[Signal]:
errors = digest.get("markers", {}).get("errors", 0)
if errors >= ctx.get("error_threshold", 3):
return [_base(digest, "repeated_errors", PROBLEM, "errors", float(errors), errors=errors)]
return []
def sig_budget_overrun(digest, ctx) -> list[Signal]:
total = digest.get("cost", {}).get("input_tokens", 0) + digest.get("cost", {}).get("output_tokens", 0)
p90 = ctx.get("tokens_p90", 0)
if p90 and total > p90:
return [_base(digest, "budget_overrun", PROBLEM, "tokens",
float(total) / max(p90, 1), tokens=total, p90=p90)]
return []
def sig_abandoned(digest, ctx) -> list[Signal]:
if digest.get("outcome") == "abandoned":
return [_base(digest, "abandoned", PROBLEM, "outcome", 1.0)]
return []
def sig_clean_pass(digest, ctx) -> list[Signal]:
"""Success: ended success, ran tests, no errors, modest cost."""
m = digest.get("markers", {})
if (digest.get("outcome") == "success" and m.get("test_runs", 0) >= 1
and m.get("errors", 0) == 0 and m.get("retries", 0) == 0):
return [_base(digest, "clean_pass", SUCCESS, "outcome", 1.0,
test_runs=m.get("test_runs"))]
return []
def sig_error_then_recovery(digest, ctx) -> list[Signal]:
"""Success despite hitting errors — a recovery worth learning from."""
m = digest.get("markers", {})
if digest.get("outcome") == "success" and m.get("errors", 0) >= 1:
return [_base(digest, "error_then_recovery", SUCCESS, "errors",
float(m.get("errors", 1)), errors=m.get("errors"))]
return []
# --- tool-mix / infrastructure-overhead signals (WP-0005 T02) ----------------
# These read the captured ``tool_histogram`` — friction that the outcome+marker
# signals above are blind to (sessions still "succeed", just expensively).
def tool_bucket(tool: str) -> str:
"""Group a tool name into a coarse activity bucket (flavor-agnostic)."""
if tool.startswith("mcp__state-hub"):
return "statehub_mcp"
if tool in ("TaskUpdate", "TaskCreate", "TaskGet", "TaskList", "TaskOutput",
"TaskStop", "todo_write", "update_task_status"):
return "task_mgmt"
if tool == "ToolSearch":
return "schema_load"
if tool in ("Bash", "run_terminal_command"):
return "shell"
if tool in ("Edit", "Write", "search_replace", "write", "NotebookEdit"):
return "edit"
if tool in ("Read", "read_file", "grep", "Grep", "glob", "Glob"):
return "read"
return "other"
def _bucketed(digest) -> tuple[dict, int]:
buckets: dict[str, int] = {}
for tool, n in (digest.get("tool_histogram") or {}).items():
buckets[tool_bucket(tool)] = buckets.get(tool_bucket(tool), 0) + n
return buckets, sum(buckets.values())
def sig_infra_overhead(digest, ctx) -> list[Signal]:
"""Problem: a large share of tool calls is hub/task/schema plumbing, not work."""
buckets, total = _bucketed(digest)
if total < ctx.get("infra_min_calls", 20):
return []
overhead = buckets.get("statehub_mcp", 0) + buckets.get("task_mgmt", 0) + buckets.get("schema_load", 0)
share = overhead / total
if share >= ctx.get("infra_overhead_threshold", 0.30):
return [_base(digest, "infra_overhead", PROBLEM, "infra_overhead", round(share, 3),
overhead_calls=overhead, total_calls=total,
statehub=buckets.get("statehub_mcp", 0),
task_mgmt=buckets.get("task_mgmt", 0),
schema_load=buckets.get("schema_load", 0))]
return []
def sig_schema_thrash(digest, ctx) -> list[Signal]:
"""Problem: repeated ToolSearch — deferred-tool schemas reloaded over and over."""
buckets, _ = _bucketed(digest)
n = buckets.get("schema_load", 0)
if n >= ctx.get("schema_thrash_threshold", 5):
return [_base(digest, "schema_thrash", PROBLEM, "schema_load", float(n), tool_searches=n)]
return []
def sig_tool_thrash(digest, ctx) -> list[Signal]:
"""Problem: a single tool is hammered far more than any other — likely churn."""
hist = digest.get("tool_histogram") or {}
if not hist:
return []
tool, n = max(hist.items(), key=lambda kv: kv[1])
if n >= ctx.get("tool_thrash_threshold", 80):
return [_base(digest, "tool_thrash", PROBLEM, f"tool:{tool}", float(n), tool=tool, calls=n)]
return []
def sig_recurring_error(digest, ctx) -> list[Signal]:
"""Problem: a normalized error fingerprint (WP-0006) — one signal per distinct
error in the session, so the same error across sessions/repos/flavors clusters
into a candidate root-cause pattern (locus = fingerprint, magnitude = in-session
occurrences). This is the content-level 'why', not just a coarse error count.
"""
out: list[Signal] = []
for snip in digest.get("error_snippets", []) or []:
fp = snip.get("fingerprint")
if not fp:
continue
out.append(_base(digest, "recurring_error", PROBLEM, fp, float(snip.get("count", 1)),
sample=snip.get("sample", ""), tool=snip.get("tool"),
occurrences=snip.get("count", 1)))
return out
EXTRACTORS: list[Callable] = [
sig_retry_storm, sig_repeated_errors, sig_budget_overrun, sig_abandoned,
sig_clean_pass, sig_error_then_recovery,
sig_infra_overhead, sig_schema_thrash, sig_tool_thrash,
sig_recurring_error,
]
def build_context(digests: list[dict]) -> dict[str, Any]:
"""Corpus-level stats so extractors can compare a session to its peers."""
totals = sorted(
d.get("cost", {}).get("input_tokens", 0) + d.get("cost", {}).get("output_tokens", 0)
for d in digests
)
p90 = totals[int(0.9 * (len(totals) - 1))] if totals else 0
return {
"tokens_p90": p90, "retry_storm_threshold": 3, "error_threshold": 3,
# tool-mix / infra-overhead thresholds (WP-0005 T02)
"infra_min_calls": 20, "infra_overhead_threshold": 0.30,
"schema_thrash_threshold": 5, "tool_thrash_threshold": 80,
}
def extract_signals(digests: list[dict], ctx: Optional[dict] = None) -> list[Signal]:
ctx = ctx or build_context(digests)
out: list[Signal] = []
for d in digests:
for ex in EXTRACTORS:
out.extend(ex(d, ctx))
return out

View File

@@ -0,0 +1,76 @@
"""Read a single session digest from the local store (AGENTIC-WP-0011 T03).
Thin read path for ``kaizen-agentic metrics correlate`` and other consumers.
Does not run ingest.
Usage:
python -m session_memory.digest_lookup <session_uid> [--json]
HELIX_STORE_DB=/abs/path/to/mem.db python -m session_memory.digest_lookup <uid>
"""
from __future__ import annotations
import argparse
import json
import os
import sys
from .core.store import Store
from .ingest import _expand, load_config
def resolve_store_paths(*, config_path: str | None = None) -> tuple[str, str]:
"""Resolve db + blob paths from HELIX_STORE_DB or config.toml [store]."""
env_db = os.environ.get("HELIX_STORE_DB")
if env_db:
db_path = _expand(env_db)
blob_dir = os.path.join(os.path.dirname(db_path), "blobs")
return db_path, blob_dir
here = os.path.dirname(os.path.abspath(__file__))
cfg_path = config_path or os.path.join(here, "config.toml")
store_cfg = load_config(cfg_path).get("store", {})
return _expand(store_cfg.get("db_path", "session_memory/.store/mem.db")), _expand(
store_cfg.get("blob_dir", "session_memory/.store/blobs")
)
def lookup_digest(session_uid: str, *, config_path: str | None = None) -> dict | None:
db_path, blob_dir = resolve_store_paths(config_path=config_path)
store = Store(db_path, blob_dir)
try:
return store.get_digest(session_uid)
finally:
store.close()
def main(argv: list[str] | None = None) -> int:
here = os.path.dirname(os.path.abspath(__file__))
ap = argparse.ArgumentParser(
description="Read one session digest from the Helix Forge store (no ingest)."
)
ap.add_argument("session_uid", help="Normalized session uid, e.g. claude:abc-123")
ap.add_argument("--config", default=os.path.join(here, "config.toml"),
help="config.toml when HELIX_STORE_DB is unset")
ap.add_argument("--json", action="store_true", help="print digest JSON to stdout")
args = ap.parse_args(argv)
digest = lookup_digest(args.session_uid, config_path=args.config)
if digest is None:
print(f"digest not found: {args.session_uid}", file=sys.stderr)
return 1
if args.json:
print(json.dumps(digest, indent=2, sort_keys=True))
else:
cost = digest.get("cost") or {}
tokens = cost.get("input_tokens", 0) + cost.get("output_tokens", 0)
print(f"session_uid: {digest.get('session_uid')}")
print(f"repo: {digest.get('repo')} flavor: {digest.get('flavor')}")
print(f"outcome: {digest.get('outcome')} tokens: {tokens}")
print(f"started_at: {digest.get('started_at')} ended_at: {digest.get('ended_at')}")
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,9 @@
"""Distribute phase (PRD §6.4) — render approved Solution Patterns into per-flavor
artifacts. Mirror of the collector design: agnostic core, thin distributor edges.
base.py Artifact + Distributor protocol + idempotent snippet markers (T01)
claude.py CLAUDE.md snippet distributor (T02)
codex.py AGENTS.md snippet distributor (T03)
grok.py native instruction distributor (T03)
__main__.py `python -m session_memory.distribute` (T05)
"""

View File

@@ -0,0 +1,89 @@
"""Distribute entrypoint (T05): catalog -> per-flavor proposals (HITL).
python -m session_memory.distribute [--config PATH] [--repo R] [--flavor F] [--json]
Reads approved / distribution-ready Solution Patterns from the Pattern Catalog and
renders them into per-flavor **proposals** (never auto-applied) scoped by
repo/domain, recording what is proposed where in the active-pattern registry.
Targets are the repo->domain map in ``config.toml`` crossed with the known
distributor flavors; each pattern's own ``Scope`` filters where it actually lands.
"""
from __future__ import annotations
import argparse
import json
import os
from ..curate.catalog import Catalog
from ..ingest import _expand, load_config
from .proposals import ActiveRegistry, Target, propose
from .registry import all_flavors
def build_targets(config: dict, repo_filter=None, flavor_filter=None) -> list[Target]:
repo_map = config.get("repo_domain_map", {})
flavors = [flavor_filter] if flavor_filter else all_flavors()
targets = []
for repo, domain in repo_map.items():
if repo_filter and repo != repo_filter:
continue
for flavor in flavors:
targets.append(Target(repo=repo, domain=domain, flavor=flavor))
return targets
def run_distribute(config: dict, *, repo_filter=None, flavor_filter=None):
cur = config.get("curate", {})
dist = config.get("distribute", {})
catalog = Catalog(_expand(cur.get("catalog_dir", "session_memory/catalog")))
patterns = catalog.list()
targets = build_targets(config, repo_filter, flavor_filter)
registry = ActiveRegistry(_expand(dist.get("active_registry",
"session_memory/distribute/active_patterns.json")))
out_dir = _expand(dist.get("proposals_dir", "session_memory/proposals"))
return propose(patterns, targets, out_dir, registry)
def _summary(res) -> str:
by_repo = {}
for repo, flavor, pid, _ in res.proposals:
by_repo.setdefault(repo, []).append(f"{pid}[{flavor}]")
lines = [f"# Distribute proposals ({len(res.proposals)} renders, "
f"{len(res.files_written)} files)"]
for repo in sorted(by_repo):
lines.append(f" {repo}: {', '.join(sorted(by_repo[repo]))}")
if res.skipped_not_distributable:
lines.append(f" skipped (not distribution-ready): "
f"{len(set(res.skipped_not_distributable))} pattern(s)")
if not res.proposals:
lines.append(" (no approved/distribution-ready patterns matched any target)")
return "\n".join(lines)
def main(argv=None) -> int:
here = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
ap = argparse.ArgumentParser(description="Distribute approved patterns as per-flavor proposals.")
ap.add_argument("--config", default=os.path.join(here, "config.toml"))
ap.add_argument("--repo", default=None, help="limit to one target repo")
ap.add_argument("--flavor", default=None, help="limit to one flavor")
ap.add_argument("--json", action="store_true")
args = ap.parse_args(argv)
config = load_config(args.config)
res = run_distribute(config, repo_filter=args.repo, flavor_filter=args.flavor)
if args.json:
print(json.dumps({
"proposals": [{"repo": r, "flavor": f, "pattern_id": p, "path": path}
for r, f, p, path in res.proposals],
"files_written": res.files_written,
"skipped": sorted(set(res.skipped_not_distributable)),
}, indent=2))
else:
print(_summary(res))
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,242 @@
[
{
"flavor": "claude",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "agentic-resources",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "codex",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "agentic-resources",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "grok",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "agentic-resources",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "claude",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "can-you-assist",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "codex",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "can-you-assist",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "grok",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "can-you-assist",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "claude",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "net-kingdom",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "codex",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "net-kingdom",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "grok",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "net-kingdom",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "claude",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "ops-bridge",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "codex",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "ops-bridge",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "grok",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "ops-bridge",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "claude",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "state-hub",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "codex",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "state-hub",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "grok",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "state-hub",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "claude",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "the-custodian",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "codex",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "the-custodian",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "grok",
"pattern_id": "sp-problem-file_not_read-edit",
"repo": "the-custodian",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.0"
},
{
"flavor": "claude",
"pattern_id": "sp-problem-schema_thrash-schema_load",
"repo": "ops-bridge",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "claude",
"pattern_id": "sp-problem-tool_thrash-tool-bash",
"repo": "state-hub",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "claude",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "agentic-resources",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "grok",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "agentic-resources",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "claude",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "can-you-assist",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "grok",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "can-you-assist",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "claude",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "ops-bridge",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "grok",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "ops-bridge",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "claude",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "state-hub",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "grok",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "state-hub",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "claude",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "the-custodian",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
},
{
"flavor": "grok",
"pattern_id": "sp-success-clean_pass-outcome",
"repo": "the-custodian",
"status": "proposed",
"updated_at": "2026-06-07T14:25:34Z",
"version": "1.0.1"
}
]

View File

@@ -0,0 +1,115 @@
"""Distributor base — Artifact, the Distributor protocol, and idempotent markers
(PRD §6.4 FR-X1; T01).
A **distributor** turns one agnostic :class:`SolutionPattern` into a per-flavor
:class:`Artifact` (a target path + a snippet of content). Everything flavor-neutral
lives here; each flavor adapter (T02/T03) only supplies its target filename and may
override the rendered body using the pattern's ``rendering_hints``.
Snippets carry stable ``BEGIN/END`` markers keyed on the pattern id, so
re-distributing a pattern **updates its block in place** instead of duplicating it
— the property that lets Distribute run repeatedly (HITL) without drift.
"""
from __future__ import annotations
import re
from dataclasses import dataclass
from typing import Any, Optional, Protocol, runtime_checkable
from ..curate.schema import SolutionPattern
@dataclass
class Artifact:
"""A proposed per-flavor rendering of a pattern (FR-X1/FR-X3 — proposed, not applied)."""
flavor: str
target_path: str # repo-relative file the snippet belongs in (e.g. "CLAUDE.md")
pattern_id: str
content: str # the marker-wrapped snippet block
@runtime_checkable
class Distributor(Protocol):
flavor: str
target_path: str
def render(self, pattern: SolutionPattern) -> Artifact: ...
# --- idempotent snippet markers ---------------------------------------------
_MARK = "helix-forge pattern"
def begin_marker(pattern_id: str) -> str:
return f"<!-- BEGIN {_MARK}:{pattern_id} -->"
def end_marker(pattern_id: str) -> str:
return f"<!-- END {_MARK}:{pattern_id} -->"
def wrap_block(pattern_id: str, body: str, version: str = "") -> str:
"""Wrap a rendered body in stable BEGIN/END markers."""
ver = f" v{version}" if version else ""
return f"{begin_marker(pattern_id)}{ver}\n{body.strip()}\n{end_marker(pattern_id)}"
def upsert_block(doc_text: str, pattern_id: str, block: str) -> str:
"""Insert or replace a pattern's marked block within a document (idempotent)."""
pat = re.compile(
re.escape(begin_marker(pattern_id)) + r".*?" + re.escape(end_marker(pattern_id)),
re.DOTALL,
)
if pat.search(doc_text):
return pat.sub(block, doc_text)
sep = "" if doc_text.endswith("\n\n") or not doc_text else "\n\n"
return f"{doc_text}{sep}{block}\n"
# --- agnostic body rendering ------------------------------------------------
def render_markdown_body(pattern: SolutionPattern) -> str:
"""Default flavor-neutral snippet body from the agnostic pattern fields."""
label = "Avoid" if pattern.polarity == "problem" else "Prefer"
lines = [f"### {pattern.name}", "", pattern.problem.strip(), ""]
if pattern.resolutions:
lines.append(f"**{label}:**")
for r in pattern.resolutions:
detail = f"{r.detail}" if r.detail else ""
lines.append(f"- {r.summary}{detail}")
for step in r.steps:
lines.append(f" - {step}")
return "\n".join(lines).strip()
def hint(pattern: SolutionPattern, flavor: str, key: str, default: Any = None) -> Any:
"""Read a per-flavor rendering hint, falling back to ``default``."""
return (pattern.rendering_hints.get(flavor) or {}).get(key, default)
class BaseDistributor:
"""Shared distributor: renders the agnostic body, honouring a ``body`` hint
override and a ``target`` hint, then wraps it in idempotent markers."""
flavor: str = ""
target_path: str = ""
def __init__(self, flavor: Optional[str] = None, target_path: Optional[str] = None) -> None:
if flavor is not None:
self.flavor = flavor
if target_path is not None:
self.target_path = target_path
def body(self, pattern: SolutionPattern) -> str:
return hint(pattern, self.flavor, "body") or render_markdown_body(pattern)
def target(self, pattern: SolutionPattern) -> str:
return hint(pattern, self.flavor, "target") or self.target_path
def render(self, pattern: SolutionPattern) -> Artifact:
block = wrap_block(pattern.id, self.body(pattern), pattern.version)
return Artifact(flavor=self.flavor, target_path=self.target(pattern),
pattern_id=pattern.id, content=block)

View File

@@ -0,0 +1,42 @@
"""Claude distributor (PRD §6.4 FR-X1; T02).
Renders an approved Solution Pattern into a ``CLAUDE.md`` snippet block. Most logic
is inherited from :class:`BaseDistributor`; the Claude-specific touch is an
optional **skill** rendering mode (``rendering_hints["claude"]["as"] == "skill"``)
that emits a skill-style stub instead of a plain instruction snippet — Claude's
native distribution targets are CLAUDE.md snippets, skills, or hooks.
"""
from __future__ import annotations
from ..curate.schema import SolutionPattern
from .base import BaseDistributor, hint, render_markdown_body
class ClaudeDistributor(BaseDistributor):
flavor = "claude"
target_path = "CLAUDE.md"
def body(self, pattern: SolutionPattern) -> str:
override = hint(pattern, self.flavor, "body")
if override:
return override
if hint(pattern, self.flavor, "as") == "skill":
return self._skill_stub(pattern)
return render_markdown_body(pattern)
@staticmethod
def _skill_stub(pattern: SolutionPattern) -> str:
trigger = "avoid" if pattern.polarity == "problem" else "apply"
lines = [
f"## Skill: {pattern.name}",
"",
f"**When:** situations where you would {trigger}{pattern.problem.strip()}",
"",
"**Steps:**",
]
for r in pattern.resolutions:
lines.append(f"- {r.summary}" + (f"{r.detail}" if r.detail else ""))
for step in r.steps:
lines.append(f" - {step}")
return "\n".join(lines).strip()

View File

@@ -0,0 +1,15 @@
"""Codex distributor (PRD §6.4 FR-X1; T03).
Renders an approved Solution Pattern into an ``AGENTS.md`` snippet — Codex's native
repo-convention surface. Identical agnostic body to the other flavors (FR-A3: one
pattern, expressible everywhere); only the target file differs.
"""
from __future__ import annotations
from .base import BaseDistributor
class CodexDistributor(BaseDistributor):
flavor = "codex"
target_path = "AGENTS.md"

View File

@@ -0,0 +1,15 @@
"""Grok distributor (PRD §6.4 FR-X1; T03).
Renders an approved Solution Pattern into Grok's native instruction format. Defaults
to a ``.grok/instructions.md`` snippet; the same agnostic body as the other flavors
(FR-A3), overridable via ``rendering_hints["grok"]``.
"""
from __future__ import annotations
from .base import BaseDistributor
class GrokDistributor(BaseDistributor):
flavor = "grok"
target_path = ".grok/instructions.md"

View File

@@ -0,0 +1,136 @@
"""Scoping, proposed-not-applied output, and the active-pattern registry
(PRD §6.4 FR-X2/FR-X3/FR-X4; T04).
* **Scope (FR-X2):** a pattern lands in a target environment only if the target's
repo/domain/flavor are within the pattern's :class:`Scope` (an empty scope list
means "unrestricted on that axis").
* **Proposed, not applied (FR-X3):** rendered artifacts are written under a
``proposals/`` tree mirroring the target path — a reviewable diff a human applies,
never auto-written into the live file. Re-running upserts each pattern's block in
place (idempotent), so proposals don't accumulate duplicates.
* **Active-pattern registry (FR-X4):** a JSON record of which pattern (and version)
is proposed/active in which (repo, flavor) environment.
"""
from __future__ import annotations
import json
import os
from dataclasses import dataclass
from datetime import datetime, timezone
from ..curate.schema import SolutionPattern
from .base import upsert_block
from .registry import get_distributor
def _now() -> str:
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
@dataclass(frozen=True)
class Target:
"""An environment a pattern could be distributed to."""
repo: str
domain: str = ""
flavor: str = "claude"
def applies(pattern: SolutionPattern, target: Target) -> bool:
"""True if ``target`` is within the pattern's scope (empty axis == any)."""
sc = pattern.scope
if sc.repos and target.repo not in sc.repos:
return False
if sc.domains and target.domain and target.domain not in sc.domains:
return False
if sc.flavors and target.flavor not in sc.flavors:
return False
return True
def is_distributable(pattern: SolutionPattern) -> bool:
return pattern.status == "approved" and pattern.distribution_ready
class ActiveRegistry:
"""JSON record of patterns proposed/active per (repo, flavor) — FR-X4."""
def __init__(self, path: str) -> None:
self.path = path
self._entries: dict[str, dict] = {}
if os.path.exists(path):
with open(path, encoding="utf-8") as fh:
for e in json.load(fh):
self._entries[self._key(e["pattern_id"], e["repo"], e["flavor"])] = e
@staticmethod
def _key(pid: str, repo: str, flavor: str) -> str:
return f"{pid}|{repo}|{flavor}"
def record(self, pid: str, repo: str, flavor: str, version: str,
status: str = "proposed") -> None:
self._entries[self._key(pid, repo, flavor)] = {
"pattern_id": pid, "repo": repo, "flavor": flavor,
"version": version, "status": status, "updated_at": _now(),
}
def entries(self) -> list[dict]:
return [self._entries[k] for k in sorted(self._entries)]
def save(self) -> None:
os.makedirs(os.path.dirname(self.path) or ".", exist_ok=True)
with open(self.path, "w", encoding="utf-8") as fh:
json.dump(self.entries(), fh, indent=2, sort_keys=True)
fh.write("\n")
@dataclass
class ProposalResult:
proposals: list = None # (repo, flavor, pattern_id, proposal_path)
files_written: list = None # absolute proposal paths
skipped_not_distributable: list = None # pattern ids
def __post_init__(self):
self.proposals = self.proposals or []
self.files_written = self.files_written or []
self.skipped_not_distributable = self.skipped_not_distributable or []
def propose(patterns: list[SolutionPattern], targets: list[Target], out_dir: str,
registry: ActiveRegistry) -> ProposalResult:
"""Render in-scope, distributable patterns into per-target proposal files."""
result = ProposalResult()
pending: dict[str, str] = {} # proposal path -> accumulated content
for p in patterns:
if not is_distributable(p):
result.skipped_not_distributable.append(p.id)
continue
for t in targets:
dist = get_distributor(t.flavor)
if dist is None or not applies(p, t):
continue
art = dist.render(p)
path = os.path.join(out_dir, t.repo, art.target_path)
if path not in pending:
pending[path] = _read(path)
pending[path] = upsert_block(pending[path], p.id, art.content)
registry.record(p.id, t.repo, t.flavor, p.version)
result.proposals.append((t.repo, t.flavor, p.id, path))
for path, content in pending.items():
os.makedirs(os.path.dirname(path), exist_ok=True)
with open(path, "w", encoding="utf-8") as fh:
fh.write(content if content.endswith("\n") else content + "\n")
result.files_written.append(path)
registry.save()
return result
def _read(path: str) -> str:
if os.path.exists(path):
with open(path, encoding="utf-8") as fh:
return fh.read()
return ""

View File

@@ -0,0 +1,26 @@
"""Distributor registry (T03) — flavor -> distributor, the one place that knows
about all flavor edges. Adding a flavor = one entry here + one adapter module.
"""
from __future__ import annotations
from typing import Optional
from .base import BaseDistributor
from .claude import ClaudeDistributor
from .codex import CodexDistributor
from .grok import GrokDistributor
_REGISTRY: dict[str, BaseDistributor] = {
"claude": ClaudeDistributor(),
"codex": CodexDistributor(),
"grok": GrokDistributor(),
}
def get_distributor(flavor: str) -> Optional[BaseDistributor]:
return _REGISTRY.get(flavor)
def all_flavors() -> list[str]:
return list(_REGISTRY)

134
session_memory/ingest.py Normal file
View File

@@ -0,0 +1,134 @@
"""Session-memory sweep entrypoint (design §7; T06).
One sweep: discover (per enabled source) -> normalize (adapter) -> store ->
digest -> retention-evict. Idempotent and re-runnable; intended to be triggered
on the configured cadence (``/schedule`` daily/weekly) or by an agent hook.
Usage:
python -m session_memory.ingest [--config PATH] [--once] [--dry-run]
"""
from __future__ import annotations
import argparse
import glob
import os
import sys
import tomllib
from dataclasses import dataclass, field
from typing import Any
from .adapters import claude as claude_adapter
from .adapters import codex as codex_adapter
from .adapters import grok as grok_adapter
from .core import digest as digest_mod
from .core.cursor import Cursors
from .core.retention import RetentionConfig, sweep as retention_sweep
from .core.store import Store
# adapter dispatch by source name
_ADAPTERS = {
"claude": claude_adapter.parse_session,
"codex": codex_adapter.parse_session,
"grok": grok_adapter.parse_session,
}
@dataclass
class SweepResult:
discovered: int = 0
ingested: int = 0
skipped_unchanged: int = 0
analyzed: int = 0
warnings: list[str] = field(default_factory=list)
retention: Any = None
def _expand(p: str) -> str:
return os.path.expanduser(p)
def load_config(path: str) -> dict[str, Any]:
with open(path, "rb") as f:
return tomllib.load(f)
def run_sweep(config: dict[str, Any], *, dry_run: bool = False) -> SweepResult:
store_cfg = config.get("store", {})
ret_cfg = config.get("retention", {})
repo_map = config.get("repo_domain_map", {})
res = SweepResult()
# In dry-run we only discover + parse: no store is created or written.
store = None if dry_run else Store(_expand(store_cfg["db_path"]), _expand(store_cfg["blob_dir"]))
cursors = Cursors(_expand(store_cfg["cursor"]))
for name, src in config.get("sources", {}).items():
if not src.get("enabled"):
continue
parse = _ADAPTERS.get(name)
if parse is None:
res.warnings.append(f"no adapter for source {name!r} (Phase 1)")
continue
root = _expand(src["root"])
for fp in sorted(glob.glob(os.path.join(root, src["glob"]))):
res.discovered += 1
if not cursors.is_changed(fp):
res.skipped_unchanged += 1
continue
try:
bundle = parse(fp, repo_map)
except Exception as e: # one bad file must not abort the sweep
res.warnings.append(f"parse failed {fp}: {e}")
continue
if bundle is None:
cursors.mark(fp)
continue
if not dry_run:
store.ingest(bundle)
digest_mod.analyze(store, bundle.session.session_uid)
res.analyzed += 1
res.ingested += 1
cursors.mark(fp)
if not dry_run and store is not None:
cursors.save()
rc = RetentionConfig(
raw_soft_cap_bytes=int(ret_cfg.get("raw_soft_cap_bytes", RetentionConfig.raw_soft_cap_bytes)),
raw_hard_cap_bytes=int(ret_cfg.get("raw_hard_cap_bytes", RetentionConfig.raw_hard_cap_bytes)),
raw_max_age_days=int(ret_cfg.get("raw_max_age_days", RetentionConfig.raw_max_age_days)),
distilled_cap_bytes=int(ret_cfg.get("distilled_cap_bytes", RetentionConfig.distilled_cap_bytes)),
)
res.retention = retention_sweep(store, rc, analyze_fn=digest_mod.analyze)
res.warnings.extend(res.retention.warnings)
if store is not None:
store.close()
return res
def main(argv: list[str] | None = None) -> int:
here = os.path.dirname(os.path.abspath(__file__))
ap = argparse.ArgumentParser(description="Run one coding-session-memory sweep.")
ap.add_argument("--config", default=os.path.join(here, "config.toml"))
ap.add_argument("--dry-run", action="store_true", help="discover + parse, but do not write or evict")
ap.add_argument("--once", action="store_true", help="(default) run a single sweep")
args = ap.parse_args(argv)
config = load_config(args.config)
res = run_sweep(config, dry_run=args.dry_run)
print(f"discovered={res.discovered} ingested={res.ingested} "
f"skipped_unchanged={res.skipped_unchanged} analyzed={res.analyzed}")
if res.retention is not None:
r = res.retention
print(f"retention: freed={r.bytes_freed}B final_usage={r.final_usage_bytes}B "
f"backstop={len(r.backstop_evicted)} budget={len(r.budget_evicted)} "
f"overflow_analyzed={len(r.overflow_analyzed)} data_loss={len(r.overflow_data_loss)}")
for w in res.warnings:
print(f" WARN: {w}", file=sys.stderr)
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,9 @@
"""Measure phase (PRD §6.5) — the loop-closer.
metrics.py fleet metrics + persisted baseline snapshots (T01)
effect.py before/after per-pattern effectiveness (T02)
__main__.py python -m session_memory.measure (T03)
Computation over existing digests (reusing WP-0005 tool buckets + WP-0006 error
mining); no new capture.
"""

View File

@@ -0,0 +1,101 @@
"""Measure entrypoint (T03): fleet trend + per-pattern effectiveness.
python -m session_memory.measure [--config PATH] [--label L] [--since DATE]
[--no-save] [--json]
Computes current fleet metrics over the real (quality-filtered) sessions, appends
them to the baseline trend, and reports whether the fleet is getting cheaper /
more reliable over time (FR-M3). With ``--since DATE`` it also reports before/after
effectiveness around a change (FR-M1/FR-M2).
"""
from __future__ import annotations
import argparse
import json
import os
from ..core.store import Store
from ..detect.quality import filter_real, quality_config
from ..ingest import _expand, load_config
from .effect import effectiveness
from .metrics import load_baselines, save_baseline, snapshot
_TREND_KEYS = ("infra_overhead_share_median", "error_rate", "schema_thrash_sessions",
"tokens_p50", "success_rate")
def real_digests(config: dict) -> list[dict]:
s = config.get("store", {})
store = Store(_expand(s["db_path"]), _expand(s["blob_dir"]))
out = filter_real(store.list_digests(), quality_config(config))
store.close()
return out
def _fmt_trend(baselines: list[dict]) -> str:
if not baselines:
return " (no prior snapshots)"
lines = []
recent = baselines[-5:]
for b in recent:
when = (b.get("captured_at") or "")[:10]
lbl = f" {b['label']}" if b.get("label") else ""
lines.append(f" {when}{lbl}: overhead_med={b.get('infra_overhead_share_median')} "
f"err_rate={b.get('error_rate')} schema_thrash={b.get('schema_thrash_sessions')} "
f"tok_p50={b.get('tokens_p50')} success={b.get('success_rate')} "
f"(n={b.get('n_sessions')})")
return "\n".join(lines)
def _report(current: dict, baselines: list[dict], eff: dict | None) -> str:
lines = [f"# Fleet metrics (n={current.get('n_sessions')} real sessions)"]
for k in _TREND_KEYS:
lines.append(f" {k} = {current.get(k)}")
lines.append("\n## Trend (recent snapshots)")
lines.append(_fmt_trend(baselines))
if eff is not None:
lines.append(f"\n## Effectiveness since {eff['applied_at']} "
f"(before={eff['n_before']}, after={eff['n_after']})")
if eff["insufficient_data"]:
lines.append(" insufficient data on one side of the date")
else:
for k in _TREND_KEYS:
d = eff["deltas"].get(k, {})
mark = {True: "improved", False: "worse", None: ""}[d.get("improved")]
lines.append(f" {k}: {d.get('before')} -> {d.get('after')} "
f"({d.get('change'):+}) {mark}")
return "\n".join(lines)
def main(argv=None) -> int:
here = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
ap = argparse.ArgumentParser(description="Measure fleet metrics + per-pattern effectiveness.")
ap.add_argument("--config", default=os.path.join(here, "config.toml"))
ap.add_argument("--label", default="")
ap.add_argument("--since", default=None, help="ISO date for before/after effectiveness")
ap.add_argument("--no-save", action="store_true", help="don't append to the baseline trend")
ap.add_argument("--json", action="store_true")
args = ap.parse_args(argv)
config = load_config(args.config)
digests = real_digests(config)
current = snapshot(digests, label=args.label)
path = _expand(config.get("measure", {}).get("baselines", "session_memory/measure/baselines.jsonl"))
prior = load_baselines(path)
if not args.no_save:
save_baseline(current, path)
eff = effectiveness(digests, args.since, label=args.label) if args.since else None
if args.json:
print(json.dumps({"current": current, "trend": prior + [current], "effectiveness": eff},
indent=2))
else:
print(_report(current, prior + [current], eff))
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1 @@
{"captured_at": "2026-06-07T13:30:14Z", "error_rate": 0.963, "infra_overhead_share_median": 0.117, "infra_overhead_share_p90": 0.261, "label": "phase4-baseline (pre-fixes)", "n_sessions": 27, "recurring_error_occurrences": 505, "schema_thrash_sessions": 8, "success_rate": 1.0, "tokens_p50": 250725, "tokens_p90": 1423966}

View File

@@ -0,0 +1,60 @@
"""Before/after per-pattern effectiveness (PRD §6.5 FR-M1/FR-M2; T02).
Given a change/pattern with an ``applied_at`` date, split sessions into *before*
and *after* by their start time, aggregate each side, and diff the headline
metrics — so we can say whether a distributed pattern (e.g. the Read-before-Edit
reflex, or the State Hub skill) actually moved the numbers, and retire it if not.
"""
from __future__ import annotations
from .metrics import aggregate
# Metrics where a *lower* value after the change means improvement.
_LOWER_IS_BETTER = {
"infra_overhead_share_median", "infra_overhead_share_p90", "error_rate",
"recurring_error_occurrences", "schema_thrash_sessions", "tokens_p50", "tokens_p90",
}
# Metrics where a *higher* value is improvement.
_HIGHER_IS_BETTER = {"success_rate"}
def split_by_date(digests: list[dict], applied_at: str) -> tuple[list[dict], list[dict]]:
"""Partition digests into (before, after) by ``started_at`` vs ``applied_at``."""
before, after = [], []
for d in digests:
ts = d.get("started_at") or ""
(after if ts and ts >= applied_at else before).append(d)
return before, after
def _delta(metric: str, before: float, after: float) -> dict:
change = round(after - before, 3)
if metric in _LOWER_IS_BETTER:
improved = change < 0
elif metric in _HIGHER_IS_BETTER:
improved = change > 0
else:
improved = None
return {"before": before, "after": after, "change": change, "improved": improved}
def effectiveness(digests: list[dict], applied_at: str, *, label: str = "") -> dict:
"""Compare fleet metrics after ``applied_at`` against the prior period."""
before, after = split_by_date(digests, applied_at)
b_agg, a_agg = aggregate(before), aggregate(after)
metrics = (_LOWER_IS_BETTER | _HIGHER_IS_BETTER)
deltas = {}
if before and after:
for m in metrics:
deltas[m] = _delta(m, b_agg.get(m, 0.0), a_agg.get(m, 0.0))
return {
"label": label,
"applied_at": applied_at,
"n_before": len(before),
"n_after": len(after),
"before": b_agg,
"after": a_agg,
"deltas": deltas,
"insufficient_data": not (before and after),
}

View File

@@ -0,0 +1,102 @@
"""Fleet metrics + persisted baselines (PRD §6.5 FR-M3; T01).
Computes the headline health metrics of the captured corpus — the same quantities
the friction assessment reported — so they can be tracked over time and compared
before/after a change. Reuses :func:`detect.signals.tool_bucket` (WP-0005) and the
digest ``error_snippets`` (WP-0006); no new capture.
A **baseline** is a timestamped metrics snapshot appended to a JSONL file, so
successive runs build a trend the entrypoint (T03) can chart.
"""
from __future__ import annotations
import collections
import json
import os
from datetime import datetime, timezone
from ..detect.signals import tool_bucket
def _now() -> str:
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
def _pct(values: list[float], q: float) -> float:
if not values:
return 0.0
s = sorted(values)
return round(s[int(q * (len(s) - 1))], 3)
def _median(values: list[float]) -> float:
return _pct(values, 0.5)
def _buckets(digest: dict) -> collections.Counter:
b: collections.Counter = collections.Counter()
for tool, n in (digest.get("tool_histogram") or {}).items():
b[tool_bucket(tool)] += n
return b
def session_metrics(digest: dict) -> dict:
"""Per-session metrics used to build fleet aggregates."""
b = _buckets(digest)
total = sum(b.values()) or 1
overhead = b["statehub_mcp"] + b["task_mgmt"] + b["schema_load"]
cost = digest.get("cost", {})
tokens = cost.get("input_tokens", 0) + cost.get("output_tokens", 0)
return {
"infra_overhead_share": overhead / total,
"tool_calls": total,
"schema_load": b["schema_load"],
"error_occurrences": sum(s.get("count", 1) for s in (digest.get("error_snippets") or [])),
"has_error": bool(digest.get("error_snippets")),
"tokens": tokens,
"success": digest.get("outcome") == "success",
}
def aggregate(digests: list[dict], *, schema_thrash_threshold: int = 5) -> dict:
"""Fleet-level metrics over a set of (already quality-filtered) digests."""
per = [session_metrics(d) for d in digests]
n = len(per)
if n == 0:
return {"n_sessions": 0}
shares = [m["infra_overhead_share"] for m in per]
tokens = [m["tokens"] for m in per]
return {
"n_sessions": n,
"infra_overhead_share_median": _median(shares),
"infra_overhead_share_p90": _pct(shares, 0.9),
"error_rate": round(sum(m["has_error"] for m in per) / n, 3),
"recurring_error_occurrences": sum(m["error_occurrences"] for m in per),
"schema_thrash_sessions": sum(1 for m in per if m["schema_load"] >= schema_thrash_threshold),
"tokens_p50": _pct(tokens, 0.5),
"tokens_p90": _pct(tokens, 0.9),
"success_rate": round(sum(m["success"] for m in per) / n, 3),
}
def snapshot(digests: list[dict], *, label: str = "") -> dict:
m = aggregate(digests)
m["captured_at"] = _now()
m["label"] = label
return m
def save_baseline(metrics: dict, path: str) -> None:
"""Append a metrics snapshot to the baseline JSONL trend file."""
os.makedirs(os.path.dirname(path) or ".", exist_ok=True)
with open(path, "a", encoding="utf-8") as fh:
fh.write(json.dumps(metrics, sort_keys=True))
fh.write("\n")
def load_baselines(path: str) -> list[dict]:
if not os.path.exists(path):
return []
with open(path, encoding="utf-8") as fh:
return [json.loads(line) for line in fh if line.strip()]

View File

@@ -0,0 +1,9 @@
"""Weekly retro (AGENTIC-WP-0010) — the analysis half of the coding retrospection.
build.py windowed detect + measure -> ranked top-3 suggestions per repo (T01)
publish.py publish the retro to the hub read model + local report (T02)
__main__.py python -m session_memory.retro (T03)
Consumed by activity-core's weekly-coding-retro schedule (ACTIVITY-WP-0008) via
the ``event_type=coding_retro`` read model.
"""

View File

@@ -0,0 +1,68 @@
"""Weekly retro entrypoint (AGENTIC-WP-0010 T03).
python -m session_memory.retro [--window-days 7] [--since D] [--until D]
[--publish] [--json]
Builds the windowed top-3-per-repo retro over the captured sessions, writes a local
JSON + markdown report, and (with ``--publish``) posts it to the hub as the
``coding_retro`` read model that activity-core's weekly schedule consumes.
"""
from __future__ import annotations
import argparse
import json
import os
from ..core.store import Store
from ..curate.catalog import Catalog
from ..ingest import _expand, load_config
from .build import weekly_retro
from .publish import publish_to_hub, render_markdown, write_local
def run_retro(config: dict, *, window_days=None, since=None, until=None):
s = config.get("store", {})
store = Store(_expand(s["db_path"]), _expand(s["blob_dir"]))
digests = store.list_digests()
store.close()
cur = config.get("curate", {})
catalog = Catalog(_expand(cur.get("catalog_dir", "session_memory/catalog")))
rcfg = config.get("retro", {})
return weekly_retro(digests, catalog, since=since, until=until,
window_days=window_days or rcfg.get("window_days", 7))
def main(argv=None) -> int:
here = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
ap = argparse.ArgumentParser(description="Build (and optionally publish) the weekly coding retro.")
ap.add_argument("--config", default=os.path.join(here, "config.toml"))
ap.add_argument("--window-days", type=int, default=None)
ap.add_argument("--since", default=None)
ap.add_argument("--until", default=None)
ap.add_argument("--publish", action="store_true", help="post to the hub coding_retro read model")
ap.add_argument("--json", action="store_true")
args = ap.parse_args(argv)
config = load_config(args.config)
report = run_retro(config, window_days=args.window_days, since=args.since, until=args.until)
rcfg = config.get("retro", {})
write_local(report, _expand(rcfg.get("report_json", "session_memory/retro/last_retro.json")),
_expand(rcfg.get("report_md", "session_memory/retro/last_retro.md")))
published = None
if args.publish:
published = publish_to_hub(report, base_url=rcfg.get("hub_url", "http://127.0.0.1:8000"))
if args.json:
print(json.dumps({"report": report, "published": published}, indent=2))
else:
print(render_markdown(report))
if args.publish:
print(f"\npublished to hub: {published}")
return 0
if __name__ == "__main__":
raise SystemExit(main())

View File

@@ -0,0 +1,99 @@
"""Windowed weekly retro report (AGENTIC-WP-0010 T01).
Runs the existing detect pipeline over a date window, ranks the recurring problem
patterns into **per-repo improvement suggestions** (top 3, cross-flavor first),
attaches a recommendation from the Pattern Catalog where one exists, and bundles a
fleet measure snapshot for context. Pure function over digests — the entrypoint
(T03) handles store/publish.
"""
from __future__ import annotations
import collections
from dataclasses import asdict, dataclass
from datetime import datetime, timedelta, timezone
from typing import Optional
from ..detect.cluster import cluster
from ..detect.quality import QualityConfig, filter_real
from ..detect.signals import extract_signals
from ..measure.metrics import aggregate
# score at/above which a suggestion is "high" priority even when single-flavor
_HIGH_SCORE = 100.0
def _parse(ts: str) -> datetime:
return datetime.fromisoformat(ts.replace("Z", "+00:00"))
def _iso(dt: datetime) -> str:
return dt.astimezone(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
def _now() -> datetime:
return datetime.now(timezone.utc)
@dataclass
class Suggestion:
repo: str
title: str
recommendation: str
priority: str # high | medium
score: float
signal_type: str
cross_flavor: bool
pattern_key: str
def _recommendation(pattern_key: str, locus: str, catalog) -> Optional[str]:
if catalog is None:
return None
sp = catalog.find_for(pattern_key, locus)
if sp and sp.resolutions:
return sp.resolutions[0].summary
return None
def weekly_retro(digests: list[dict], catalog=None, *, since: Optional[str] = None,
until: Optional[str] = None, window_days: int = 7,
max_per_repo: int = 3, min_frequency: int = 2,
quality: Optional[QualityConfig] = None) -> dict:
"""Build the ranked weekly retro report over a date window."""
until_dt = _parse(until) if until else _now()
since_dt = _parse(since) if since else until_dt - timedelta(days=window_days)
windowed = [d for d in digests
if d.get("started_at") and since_dt <= _parse(d["started_at"]) < until_dt]
real = filter_real(windowed, quality or QualityConfig())
patterns = cluster(extract_signals(real), min_frequency=min_frequency)
by_repo: dict[str, list[Suggestion]] = collections.defaultdict(list)
for p in patterns:
if p.polarity != "problem":
continue # improvements come from problems
rec = (_recommendation(p.key, p.locus, catalog)
or f"Investigate {p.signal_type.replace('_', ' ')} on {p.locus}")
priority = "high" if (p.cross_flavor or p.score >= _HIGH_SCORE) else "medium"
for repo in (p.repos or ["(unknown)"]):
by_repo[repo].append(Suggestion(
repo=repo, title=p.title, recommendation=rec, priority=priority,
score=p.score, signal_type=p.signal_type, cross_flavor=p.cross_flavor,
pattern_key=p.key))
suggestions: list[Suggestion] = []
for repo in sorted(by_repo):
items = sorted(by_repo[repo], key=lambda s: -s.score)
suggestions.extend(items[:max_per_repo])
# cross-flavor first, then by score (global ordering for the report)
suggestions.sort(key=lambda s: (not s.cross_flavor, -s.score))
return {
"window": {"since": _iso(since_dt), "until": _iso(until_dt), "days": window_days},
"generated_at": _iso(_now()),
"n_sessions": len(real),
"suggestions": [asdict(s) for s in suggestions],
"measure": aggregate(real),
}

View File

@@ -0,0 +1,322 @@
{
"generated_at": "2026-06-07T19:30:56Z",
"measure": {
"error_rate": 0.957,
"infra_overhead_share_median": 0.167,
"infra_overhead_share_p90": 0.23,
"n_sessions": 23,
"recurring_error_occurrences": 463,
"schema_thrash_sessions": 7,
"success_rate": 1.0,
"tokens_p50": 250725,
"tokens_p90": 901422
},
"n_sessions": 23,
"suggestions": [
{
"cross_flavor": true,
"pattern_key": "problem:recurring_error:make: *** [makefile:<n>: fix-consistency] error <n>",
"priority": "high",
"recommendation": "Investigate recurring error on make: *** [makefile:<n>: fix-consistency] error <n>",
"repo": "net-kingdom",
"score": 54.0,
"signal_type": "recurring_error",
"title": "cross-flavor problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:tool_thrash:tool:Bash",
"priority": "high",
"recommendation": "Batch related shell work into one script, not many small Bash calls",
"repo": "activity-core",
"score": 13128.0,
"signal_type": "tool_thrash",
"title": "problem: tool thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:tool_thrash:tool:Bash",
"priority": "high",
"recommendation": "Batch related shell work into one script, not many small Bash calls",
"repo": "artifact-store",
"score": 13128.0,
"signal_type": "tool_thrash",
"title": "problem: tool thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:tool_thrash:tool:Bash",
"priority": "high",
"recommendation": "Batch related shell work into one script, not many small Bash calls",
"repo": "citation-evidence",
"score": 13128.0,
"signal_type": "tool_thrash",
"title": "problem: tool thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:tool_thrash:tool:Bash",
"priority": "high",
"recommendation": "Batch related shell work into one script, not many small Bash calls",
"repo": "infospace-bench",
"score": 13128.0,
"signal_type": "tool_thrash",
"title": "problem: tool thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:tool_thrash:tool:Bash",
"priority": "high",
"recommendation": "Batch related shell work into one script, not many small Bash calls",
"repo": "railiance-apps",
"score": 13128.0,
"signal_type": "tool_thrash",
"title": "problem: tool thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:tool_thrash:tool:Bash",
"priority": "high",
"recommendation": "Batch related shell work into one script, not many small Bash calls",
"repo": "state-hub",
"score": 13128.0,
"signal_type": "tool_thrash",
"title": "problem: tool thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:schema_thrash:schema_load",
"priority": "high",
"recommendation": "Load the tool schemas you'll need once, up front",
"repo": "activity-core",
"score": 441.0,
"signal_type": "schema_thrash",
"title": "problem: schema thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:schema_thrash:schema_load",
"priority": "high",
"recommendation": "Load the tool schemas you'll need once, up front",
"repo": "citation-evidence",
"score": 441.0,
"signal_type": "schema_thrash",
"title": "problem: schema thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:schema_thrash:schema_load",
"priority": "high",
"recommendation": "Load the tool schemas you'll need once, up front",
"repo": "flex-auth",
"score": 441.0,
"signal_type": "schema_thrash",
"title": "problem: schema thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:schema_thrash:schema_load",
"priority": "high",
"recommendation": "Load the tool schemas you'll need once, up front",
"repo": "infospace-bench",
"score": 441.0,
"signal_type": "schema_thrash",
"title": "problem: schema thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:schema_thrash:schema_load",
"priority": "high",
"recommendation": "Load the tool schemas you'll need once, up front",
"repo": "ops-bridge",
"score": 441.0,
"signal_type": "schema_thrash",
"title": "problem: schema thrash"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
"priority": "high",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "activity-core",
"score": 290.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
"priority": "high",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "citation-evidence",
"score": 290.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
"priority": "high",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "infospace-bench",
"score": 290.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
"priority": "high",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "issue-facade",
"score": 290.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
"priority": "high",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "railiance-apps",
"score": 290.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
"priority": "high",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "state-hub",
"score": 290.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
"priority": "high",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "the-custodian",
"score": 290.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
"priority": "high",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "vergabe-teilnahme",
"score": 290.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has been modified since read, either by the user or by a linter. read it again before attempting to write it.<<path>>",
"priority": "medium",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "artifact-store",
"score": 78.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has been modified since read, either by the user or by a linter. read it again before attempting to write it.<<path>>",
"priority": "medium",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "issue-facade",
"score": 78.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has been modified since read, either by the user or by a linter. read it again before attempting to write it.<<path>>",
"priority": "medium",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "railiance-apps",
"score": 78.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<tool_use_error>file has been modified since read, either by the user or by a linter. read it again before attempting to write it.<<path>>",
"priority": "medium",
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
"repo": "state-hub",
"score": 78.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:budget_overrun:tokens",
"priority": "medium",
"recommendation": "Read narrowly \u2014 target the region you need, not whole large files",
"repo": "artifact-store",
"score": 50.55,
"signal_type": "budget_overrun",
"title": "problem: budget overrun"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:{",
"priority": "medium",
"recommendation": "Investigate recurring error on {",
"repo": "vergabe-teilnahme",
"score": 12.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:found <n> errors (<n> fixed, <n> remaining).",
"priority": "medium",
"recommendation": "Investigate recurring error on found <n> errors (<n> fixed, <n> remaining).",
"repo": "ops-bridge",
"score": 10.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:(note: edit also tried swapping \\uxxxx escapes and their characters; neither form matched, so the mismatch is likely elsewhere in old_string. re-read the file a",
"priority": "medium",
"recommendation": "Investigate recurring error on (note: edit also tried swapping \\uxxxx escapes and their characters; neither form matched, so the mismatch is likely elsewhere in old_string. re-read the file a",
"repo": "net-kingdom",
"score": 6.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:found <n> error (<n> fixed, <n> remaining).",
"priority": "medium",
"recommendation": "Investigate recurring error on found <n> error (<n> fixed, <n> remaining).",
"repo": "ops-bridge",
"score": 6.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
},
{
"cross_flavor": false,
"pattern_key": "problem:recurring_error:<n> failed, <n> passed in <n>.00s",
"priority": "medium",
"recommendation": "Investigate recurring error on <n> failed, <n> passed in <n>.00s",
"repo": "agentic-resources",
"score": 4.0,
"signal_type": "recurring_error",
"title": "problem: recurring error"
}
],
"window": {
"days": 30,
"since": "2026-05-08T19:30:56Z",
"until": "2026-06-07T19:30:56Z"
}
}

View File

@@ -0,0 +1,39 @@
# Weekly Coding Retro (2026-05-08 → 2026-06-07)
_23 real sessions · generated 2026-06-07T19:30:56Z_
## Top improvement suggestions (cross-flavor first, ≤3 per repo)
- **net-kingdom** (high, score=54.0) [CROSS-FLAVOR]: cross-flavor problem: recurring error — Investigate recurring error on make: *** [makefile:<n>: fix-consistency] error <n>
- **activity-core** (high, score=13128.0): problem: tool thrash — Batch related shell work into one script, not many small Bash calls
- **artifact-store** (high, score=13128.0): problem: tool thrash — Batch related shell work into one script, not many small Bash calls
- **citation-evidence** (high, score=13128.0): problem: tool thrash — Batch related shell work into one script, not many small Bash calls
- **infospace-bench** (high, score=13128.0): problem: tool thrash — Batch related shell work into one script, not many small Bash calls
- **railiance-apps** (high, score=13128.0): problem: tool thrash — Batch related shell work into one script, not many small Bash calls
- **state-hub** (high, score=13128.0): problem: tool thrash — Batch related shell work into one script, not many small Bash calls
- **activity-core** (high, score=441.0): problem: schema thrash — Load the tool schemas you'll need once, up front
- **citation-evidence** (high, score=441.0): problem: schema thrash — Load the tool schemas you'll need once, up front
- **flex-auth** (high, score=441.0): problem: schema thrash — Load the tool schemas you'll need once, up front
- **infospace-bench** (high, score=441.0): problem: schema thrash — Load the tool schemas you'll need once, up front
- **ops-bridge** (high, score=441.0): problem: schema thrash — Load the tool schemas you'll need once, up front
- **activity-core** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **citation-evidence** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **infospace-bench** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **issue-facade** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **railiance-apps** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **state-hub** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **the-custodian** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **vergabe-teilnahme** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **artifact-store** (medium, score=78.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **issue-facade** (medium, score=78.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **railiance-apps** (medium, score=78.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **state-hub** (medium, score=78.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
- **artifact-store** (medium, score=50.55): problem: budget overrun — Read narrowly — target the region you need, not whole large files
- **vergabe-teilnahme** (medium, score=12.0): problem: recurring error — Investigate recurring error on {
- **ops-bridge** (medium, score=10.0): problem: recurring error — Investigate recurring error on found <n> errors (<n> fixed, <n> remaining).
- **net-kingdom** (medium, score=6.0): problem: recurring error — Investigate recurring error on (note: edit also tried swapping \uxxxx escapes and their characters; neither form matched, so the mismatch is likely elsewhere in old_string. re-read the file a
- **ops-bridge** (medium, score=6.0): problem: recurring error — Investigate recurring error on found <n> error (<n> fixed, <n> remaining).
- **agentic-resources** (medium, score=4.0): problem: recurring error — Investigate recurring error on <n> failed, <n> passed in <n>.00s
## Fleet snapshot
- infra-overhead median: 0.167
- error rate: 0.957 · schema-thrash: 7
- success rate: 1.0 · tokens p50: 250725

View File

@@ -0,0 +1,78 @@
"""Publish the weekly retro (AGENTIC-WP-0010 T02).
The retro is published to the State Hub as a **read model** — a progress event of
``event_type=coding_retro`` whose ``detail`` carries the structured report. This is
exactly how ``daily-triage-report`` surfaces, and it is what activity-core's
``coding_retro`` resolver (ACTIVITY-WP-0008) reads. A local JSON + markdown report
is always written; the hub publish is best-effort and **degrades gracefully** when
the hub is unreachable.
"""
from __future__ import annotations
import json
import os
import urllib.request
from typing import Callable, Optional
DEFAULT_HUB = "http://127.0.0.1:8000"
def render_markdown(report: dict) -> str:
w = report.get("window", {})
lines = [
f"# Weekly Coding Retro ({w.get('since', '')[:10]}{w.get('until', '')[:10]})",
f"_{report.get('n_sessions', 0)} real sessions · generated {report.get('generated_at', '')}_",
"",
"## Top improvement suggestions (cross-flavor first, ≤3 per repo)",
]
if not report.get("suggestions"):
lines.append("- (no recurring problems above threshold this week)")
for s in report.get("suggestions", []):
flag = " [CROSS-FLAVOR]" if s.get("cross_flavor") else ""
lines.append(f"- **{s['repo']}** ({s['priority']}, score={s['score']}){flag}: "
f"{s['title']}{s['recommendation']}")
m = report.get("measure", {})
lines += ["", "## Fleet snapshot",
f"- infra-overhead median: {m.get('infra_overhead_share_median')}",
f"- error rate: {m.get('error_rate')} · schema-thrash: {m.get('schema_thrash_sessions')}",
f"- success rate: {m.get('success_rate')} · tokens p50: {m.get('tokens_p50')}"]
return "\n".join(lines)
def write_local(report: dict, json_path: str, md_path: Optional[str] = None) -> None:
os.makedirs(os.path.dirname(json_path) or ".", exist_ok=True)
with open(json_path, "w", encoding="utf-8") as fh:
json.dump(report, fh, indent=2, sort_keys=True)
fh.write("\n")
if md_path:
with open(md_path, "w", encoding="utf-8") as fh:
fh.write(render_markdown(report))
fh.write("\n")
def _http_post(url: str, payload: dict) -> None:
req = urllib.request.Request(url, data=json.dumps(payload).encode(),
headers={"Content-Type": "application/json"}, method="POST")
with urllib.request.urlopen(req, timeout=10) as r:
r.read()
def publish_to_hub(report: dict, *, base_url: str = DEFAULT_HUB,
poster: Optional[Callable[[str, dict], None]] = None) -> bool:
"""POST the retro as an event_type=coding_retro progress event. Best-effort."""
poster = poster or _http_post
n = report.get("n_sessions", 0)
k = len(report.get("suggestions", []))
payload = {
"event_type": "coding_retro",
"author": "helix-forge",
"summary": f"Weekly coding retro: {k} ranked suggestions across "
f"{report.get('window', {}).get('days', 7)} days ({n} sessions).",
"detail": report,
}
try:
poster(f"{base_url.rstrip('/')}/progress/", payload)
return True
except Exception:
return False

View File

@@ -0,0 +1,62 @@
"""find_for / covers tests (AGENTIC-WP-0010 follow-up)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.catalog import Catalog # noqa: E402
from session_memory.curate.schema import ( # noqa: E402
Provenance,
Resolution,
SolutionPattern,
)
def _pattern(pid, src, covers=None, name="P"):
return SolutionPattern(
id=pid, name=name, version="1.0.0", polarity="problem", problem="p",
resolutions=[Resolution(summary="do x")],
provenance=Provenance(source_key=src), covers=covers or [])
def test_covers_round_trips(tmp_path):
cat = Catalog(str(tmp_path))
cat.upsert(_pattern("sp-a", "problem:file_not_read:edit",
covers=["file has not been read"]))
assert cat.load("sp-a").covers == ["file has not been read"]
def test_find_for_exact_key(tmp_path):
cat = Catalog(str(tmp_path))
cat.upsert(_pattern(SolutionPattern.make_id("problem:retry_storm:retries"),
"problem:retry_storm:retries"))
got = cat.find_for("problem:retry_storm:retries")
assert got is not None and got.id == "sp-problem-retry_storm-retries"
def test_find_for_covers_match(tmp_path):
cat = Catalog(str(tmp_path))
cat.upsert(_pattern("sp-rbe", "problem:file_not_read:edit",
covers=["file has not been read", "modified since read"]))
# a recurring_error signal with a different key but matching fingerprint locus
got = cat.find_for(
"problem:recurring_error:<tool_use_error>file has not been read yet...",
locus="<tool_use_error>file has not been read yet. read it first...")
assert got is not None and got.id == "sp-rbe"
def test_find_for_no_match_returns_none(tmp_path):
cat = Catalog(str(tmp_path))
cat.upsert(_pattern("sp-rbe", "problem:file_not_read:edit",
covers=["file has not been read"]))
assert cat.find_for("problem:recurring_error:some unrelated error") is None
def test_covers_change_versions(tmp_path):
cat = Catalog(str(tmp_path))
cat.upsert(_pattern("sp-a", "problem:x:y"))
p = cat.load("sp-a")
p.covers = ["new coverage"]
assert cat.upsert(p) == "versioned" # covers is substantive content
assert cat.load("sp-a").version == "1.0.1"

View File

@@ -0,0 +1,99 @@
"""Claude adapter tests (T02): synthetic fixture + a real on-disk session."""
import glob
import json
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.adapters.claude import parse_session # noqa: E402
REPO_MAP = {"agentic-resources": "helix_forge"}
def _write_jsonl(path, records):
with open(path, "w", encoding="utf-8") as f:
for r in records:
f.write(json.dumps(r) + "\n")
def test_synthetic_session(tmp_path):
p = tmp_path / "11111111-2222-3333-4444-555555555555.jsonl"
_write_jsonl(p, [
{"type": "user", "uuid": "u1", "parentUuid": None,
"timestamp": "2026-06-06T10:00:00Z", "sessionId": "sess-1",
"cwd": "/home/worsch/agentic-resources", "gitBranch": "main",
"version": "1.0", "message": {"role": "user", "content": "fix the bug"}},
{"type": "assistant", "uuid": "a1", "parentUuid": "u1",
"timestamp": "2026-06-06T10:00:05Z", "sessionId": "sess-1",
"message": {"role": "assistant", "model": "claude-opus-4-8",
"usage": {"input_tokens": 100, "output_tokens": 20,
"cache_read_input_tokens": 10},
"content": [
{"type": "thinking", "thinking": "let me look"},
{"type": "text", "text": "I'll edit the file."},
{"type": "tool_use", "name": "Edit",
"input": {"file_path": "x.py", "old_string": "a", "new_string": "b"}},
{"type": "tool_use", "name": "Bash",
"input": {"command": "pytest -q"}},
]}},
{"type": "user", "uuid": "u2", "parentUuid": "a1",
"timestamp": "2026-06-06T10:00:10Z", "sessionId": "sess-1",
"message": {"role": "user",
"content": [{"type": "tool_result", "content": "6 passed"}]}},
])
norm = parse_session(str(p), REPO_MAP)
assert norm is not None
s = norm.session
assert s.session_uid == "claude:sess-1"
assert s.repo == "agentic-resources" and s.domain == "helix_forge"
assert s.model == "claude-opus-4-8"
assert s.cost.input_tokens == 100 and s.cost.output_tokens == 20
assert s.cost.cache_tokens == 10
assert s.cost.turns == 1
assert s.cost.wall_clock_s == 10.0
kinds = [e.kind for e in norm.events]
assert kinds == ["user_msg", "thinking", "assistant_msg", "edit", "test_run", "tool_result"]
# turn DAG: assistant events link back to the first user msg (seq 0)
edit_ev = next(e for e in norm.events if e.kind == "edit")
assert edit_ev.parent_seq == 0
assert edit_ev.tool == "Edit"
# bodies captured as blobs, referenced by payload_ref
assert edit_ev.payload_ref in norm.blobs
assert "x.py" in norm.blobs[edit_ev.payload_ref]
def test_sidechain_filename_marks_events(tmp_path):
p = tmp_path / "agent-deadbeef.jsonl"
_write_jsonl(p, [
{"type": "assistant", "uuid": "a1", "sessionId": "side-1",
"timestamp": "2026-06-06T10:00:00Z",
"message": {"role": "assistant", "content": [{"type": "text", "text": "hi"}]}},
])
norm = parse_session(str(p), REPO_MAP)
assert norm.events[0].is_sidechain is True
def test_real_local_session_if_available():
"""Smoke-parse a real Claude transcript on this workstation, if present."""
base = os.path.expanduser("~/.claude/projects/-home-worsch-agentic-resources")
files = sorted(glob.glob(os.path.join(base, "*.jsonl")))
if not files:
return # environment without local sessions; synthetic tests cover logic
parsed = 0
for fp in files:
norm = parse_session(fp, REPO_MAP)
if norm is None:
continue
parsed += 1
assert norm.session.session_uid.startswith("claude:")
# seq is monotonic and unique
seqs = [e.seq for e in norm.events]
assert seqs == sorted(seqs)
assert len(seqs) == len(set(seqs))
assert parsed >= 1

54
tests/test_cluster.py Normal file
View File

@@ -0,0 +1,54 @@
"""Clusterer + evidence + cross-flavor tests (T05/T06)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.detect.cluster import cluster # noqa: E402
from session_memory.detect.signals import PROBLEM, SUCCESS, Signal # noqa: E402
def _sig(uid, flavor, repo, type_, polarity, locus, mag=1.0):
return Signal(session_uid=uid, flavor=flavor, repo=repo, type=type_,
polarity=polarity, locus=locus, magnitude=mag)
def test_min_frequency_filters_singletons():
sigs = [_sig("claude:a", "claude", "r1", "retry_storm", PROBLEM, "retries")]
assert cluster(sigs, min_frequency=2) == []
def test_clusters_recurring_signal_with_evidence():
sigs = [
_sig("claude:a", "claude", "r1", "retry_storm", PROBLEM, "retries", 5),
_sig("claude:b", "claude", "r2", "retry_storm", PROBLEM, "retries", 3),
]
pats = cluster(sigs, min_frequency=2)
assert len(pats) == 1
p = pats[0]
assert p.frequency == 2
assert p.sessions == ["claude:a", "claude:b"]
assert sorted(p.repos) == ["r1", "r2"]
assert p.flavors == ["claude"]
assert p.cross_flavor is False
assert p.cost_impact == 8.0
def test_cross_flavor_flagged_and_ranked_first():
sigs = [
# cross-flavor problem (claude + codex)
_sig("claude:a", "claude", "r1", "repeated_errors", PROBLEM, "errors", 3),
_sig("codex:b", "codex", "r2", "repeated_errors", PROBLEM, "errors", 3),
# single-flavor success cluster with higher raw impact
_sig("grok:c", "grok", "r3", "clean_pass", SUCCESS, "outcome", 5),
_sig("grok:d", "grok", "r4", "clean_pass", SUCCESS, "outcome", 5),
]
pats = cluster(sigs, min_frequency=2)
assert len(pats) == 2
xf = next(p for p in pats if p.signal_type == "repeated_errors")
assert xf.cross_flavor is True
assert sorted(xf.flavors) == ["claude", "codex"]
# cross-flavor pattern is ranked first even if another has higher raw impact
assert pats[0].cross_flavor is True
assert "cross-flavor" in pats[0].title

View File

@@ -0,0 +1,86 @@
"""Codex adapter tests (T01): synthetic rollout fixture."""
import json
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.adapters.codex import parse_session # noqa: E402
REPO_MAP = {"agentic-resources": "helix_forge"}
def _rollout(path, lines):
with open(path, "w", encoding="utf-8") as f:
for ln in lines:
f.write(json.dumps(ln) + "\n")
def test_codex_rollout_parse(tmp_path):
p = tmp_path / "rollout-2026-06-06-abc.jsonl"
_rollout(p, [
{"timestamp": "2026-06-06T10:00:00Z", "type": "session_meta",
"payload": {"id": "cdx-1", "cwd": "/home/worsch/agentic-resources",
"model_provider": "openai", "cli_version": "0.44.0", "model": "gpt-5-codex"}},
{"timestamp": "2026-06-06T10:00:01Z", "type": "turn_context",
"payload": {"model": "gpt-5-codex", "approval_policy": "on-request"}},
{"timestamp": "2026-06-06T10:00:02Z", "type": "event_msg",
"payload": {"type": "task_started"}},
{"timestamp": "2026-06-06T10:00:03Z", "type": "response_item",
"payload": {"type": "message", "role": "user",
"content": [{"type": "input_text", "text": "fix the bug"}]}},
{"timestamp": "2026-06-06T10:00:04Z", "type": "response_item",
"payload": {"type": "reasoning", "summary": "think about it"}},
{"timestamp": "2026-06-06T10:00:05Z", "type": "response_item",
"payload": {"type": "function_call", "name": "apply_patch",
"arguments": "{\"path\":\"x.py\"}", "call_id": "call_1"}},
{"timestamp": "2026-06-06T10:00:06Z", "type": "response_item",
"payload": {"type": "function_call", "name": "shell",
"arguments": "{\"command\":\"pytest -q\"}", "call_id": "call_2"}},
{"timestamp": "2026-06-06T10:00:07Z", "type": "response_item",
"payload": {"type": "function_call_output", "call_id": "call_2", "output": "2 passed"}},
{"timestamp": "2026-06-06T10:00:08Z", "type": "response_item",
"payload": {"type": "message", "role": "assistant",
"content": [{"type": "output_text", "text": "done"}]}},
{"timestamp": "2026-06-06T10:00:09Z", "type": "event_msg",
"payload": {"type": "token_count",
"info": {"total_token_usage": {"input_tokens": 200, "output_tokens": 30,
"cached_input_tokens": 15}}}},
{"timestamp": "2026-06-06T10:00:10Z", "type": "event_msg",
"payload": {"type": "task_complete"}},
])
norm = parse_session(str(p), REPO_MAP)
assert norm is not None
s = norm.session
assert s.session_uid == "codex:cdx-1"
assert s.flavor == "codex"
assert s.repo == "agentic-resources" and s.domain == "helix_forge"
assert s.model == "gpt-5-codex"
assert s.cost.input_tokens == 200 and s.cost.output_tokens == 30 and s.cost.cache_tokens == 15
assert s.cost.turns == 1
assert s.cost.wall_clock_s == 10.0
kinds = [e.kind for e in norm.events]
assert kinds == ["lifecycle", "user_msg", "thinking", "edit", "test_run",
"tool_result", "assistant_msg", "completion"]
# flat linkage: function_call_output links to its function_call by call_id
out = next(e for e in norm.events if e.kind == "tool_result")
test_call = next(e for e in norm.events if e.kind == "test_run")
assert out.parent_seq == test_call.seq
# apply_patch classified as edit; pytest as test_run
edit = next(e for e in norm.events if e.kind == "edit")
assert edit.tool == "apply_patch"
def test_codex_empty_or_no_meta_returns_none(tmp_path):
p = tmp_path / "rollout-empty.jsonl"
p.write_text("")
assert parse_session(str(p), REPO_MAP) is None
p2 = tmp_path / "rollout-nometa.jsonl"
_rollout(p2, [{"timestamp": "t", "type": "event_msg", "payload": {"type": "task_started"}}])
assert parse_session(str(p2), REPO_MAP) is None # no session_meta -> no id

View File

@@ -0,0 +1,86 @@
"""Versioned Pattern Catalog tests (T02): round-trip, dedup, idempotent upsert."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.catalog import ( # noqa: E402
ADDED,
UNCHANGED,
UPDATED,
VERSIONED,
Catalog,
)
from session_memory.curate.schema import ( # noqa: E402
Provenance,
Resolution,
Scope,
SolutionPattern,
)
def _pattern(src="success:clean_pass:outcome", problem="ran tests, clean finish"):
return SolutionPattern(
id=SolutionPattern.make_id(src),
name="Run tests before declaring success",
version="1.0.0",
polarity="success",
problem=problem,
resolutions=[Resolution(summary="run the suite")],
scope=Scope(flavors=["claude", "grok"]),
provenance=Provenance(source_key=src, evidence={"frequency": 18}),
)
def test_add_then_load_round_trips(tmp_path):
cat = Catalog(str(tmp_path))
assert cat.upsert(_pattern()) == ADDED
loaded = cat.load(SolutionPattern.make_id("success:clean_pass:outcome"))
assert loaded is not None
assert loaded.problem == "ran tests, clean finish"
assert loaded.created_at and loaded.updated_at
assert [p.id for p in cat.list()] == [loaded.id]
def test_resave_identical_is_noop(tmp_path):
cat = Catalog(str(tmp_path))
cat.upsert(_pattern())
assert cat.upsert(_pattern()) == UNCHANGED
# version not bumped, no history written
assert cat.load(_pattern().id).version == "1.0.0"
assert cat.history(_pattern().id) == []
def test_dedup_on_source_key(tmp_path):
cat = Catalog(str(tmp_path))
cat.upsert(_pattern())
cat.upsert(_pattern()) # same source key -> same id -> one file
assert len(cat.list()) == 1
def test_content_change_bumps_version_and_archives(tmp_path):
cat = Catalog(str(tmp_path))
cat.upsert(_pattern())
assert cat.upsert(_pattern(problem="now with more nuance")) == VERSIONED
current = cat.load(_pattern().id)
assert current.version == "1.0.1"
assert current.problem == "now with more nuance"
hist = cat.history(_pattern().id)
assert len(hist) == 1
assert hist[0]["version"] == "1.0.0"
assert hist[0]["status"] == "superseded"
def test_status_only_change_updates_without_bump(tmp_path):
cat = Catalog(str(tmp_path))
cat.upsert(_pattern())
p = _pattern()
p.status = "approved"
p.distribution_ready = True
assert cat.upsert(p) == UPDATED
current = cat.load(p.id)
assert current.status == "approved"
assert current.distribution_ready is True
assert current.version == "1.0.0" # metadata change, no bump
assert cat.history(p.id) == []

View File

@@ -0,0 +1,70 @@
"""Hub decision integration tests (T05): payload shape + graceful queue/flush."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.catalog import Catalog # noqa: E402
from session_memory.curate.decisions import DecisionRecorder, build_decision # noqa: E402
from session_memory.curate.review import APPROVE, REJECT, ReviewLog, review # noqa: E402
def _candidate(key="success:clean_pass:outcome"):
return {"key": key, "frequency": 18, "sessions": ["a", "b"],
"cost_impact": 9.0, "cross_flavor": True, "flavors": ["claude", "grok"]}
def test_build_decision_payload_shape():
d = build_decision(_candidate(), "approve", "looks solid", workstream_id="ws-1")
assert d["decision_type"] == "made"
assert d["workstream_id"] == "ws-1"
assert "Promote" in d["title"]
assert d["rationale"] == "looks solid"
assert "success:clean_pass:outcome" in d["description"]
def test_sink_accepts_decision(tmp_path):
captured = []
rec = DecisionRecorder(str(tmp_path / "q.jsonl"), sink=captured.append)
assert rec.record(_candidate(), "approve", "ok") is True
assert rec.pending() == []
assert len(captured) == 1
def test_queues_when_sink_down(tmp_path):
def boom(_):
raise RuntimeError("hub down")
rec = DecisionRecorder(str(tmp_path / "q.jsonl"), sink=boom)
assert rec.record(_candidate(), "reject", "noise") is False
assert len(rec.pending()) == 1
def test_no_sink_defaults_to_queue(tmp_path):
rec = DecisionRecorder(str(tmp_path / "q.jsonl"))
rec.record(_candidate(), "approve", "ok")
assert len(rec.pending()) == 1
def test_flush_replays_queue(tmp_path):
rec = DecisionRecorder(str(tmp_path / "q.jsonl")) # offline -> queue
rec.record(_candidate("problem:abandoned:outcome"), "reject", "x")
rec.record(_candidate("success:clean_pass:outcome"), "approve", "y")
captured = []
assert rec.flush(sink=captured.append) == 2
assert rec.pending() == []
assert len(captured) == 2
def test_review_records_each_final_decision(tmp_path):
cat = Catalog(str(tmp_path / "catalog"))
log = ReviewLog(str(tmp_path / "reviews.jsonl"))
captured = []
rec = DecisionRecorder(str(tmp_path / "q.jsonl"), sink=captured.append, workstream_id="ws")
cands = [_candidate("success:clean_pass:outcome"), _candidate("problem:abandoned:outcome")]
review(cands, lambda c: (APPROVE if "success" in c["key"] else REJECT, "r"), cat, log,
recorder=rec)
assert len(captured) == 2
actions = sorted("Promote" in d["title"] for d in captured)
assert actions == [False, True]

View File

@@ -0,0 +1,84 @@
"""Curate entrypoint tests (T06): batch auto-approve end-to-end via the store."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.core.store import Store # noqa: E402
from session_memory.curate.__main__ import main # noqa: E402
from session_memory.curate.catalog import Catalog # noqa: E402
def _digest(uid, flavor, repo, **markers):
return {
"session_uid": uid, "flavor": flavor, "repo": repo, "outcome": "fail",
"cost": {"input_tokens": 10, "output_tokens": 1},
"markers": {"errors": markers.get("errors", 0), "retries": markers.get("retries", 0),
"test_runs": 0, "edits": 0, "human_interventions": 0},
# real coding session per the quality filter (WP-0005 T01)
"event_count": 40, "first_prompt": "Fix the failing build and retry the suite",
"tool_histogram": {"Bash": 20, "Edit": 12, "Read": 8},
}
def _write_config(tmp_path) -> str:
store = tmp_path / ".store"
catalog = tmp_path / "catalog"
cfg = f"""
[store]
db_path = "{store / 'm.db'}"
blob_dir = "{store / 'blobs'}"
cursor = "{store / 'c.json'}"
[curate]
catalog_dir = "{catalog}"
review_log = "{store / 'reviews.jsonl'}"
decision_queue = "{store / 'decisions.queue.jsonl'}"
[curate.gate]
min_frequency = 2
min_sessions = 2
"""
path = tmp_path / "config.toml"
path.write_text(cfg)
return str(path), str(store), str(catalog)
def test_auto_approve_promotes_cross_flavor(tmp_path, capsys):
cfg_path, store_dir, catalog_dir = _write_config(tmp_path)
st = Store(os.path.join(store_dir, "m.db"), os.path.join(store_dir, "blobs"))
st.write_digest("claude:a", _digest("claude:a", "claude", "r1", retries=5))
st.write_digest("codex:b", _digest("codex:b", "codex", "r2", retries=4))
st.close()
rc = main(["--config", cfg_path, "--auto-approve"])
assert rc == 0
cat = Catalog(catalog_dir)
patterns = cat.list()
assert len(patterns) == 1
assert patterns[0].polarity == "problem"
# clears the promote floor (freq>=2) but below the default distribution
# floor (freq>=3) -> promoted as provisional, not distribution-ready
assert patterns[0].status == "provisional"
assert patterns[0].distribution_ready is False
out = capsys.readouterr().out
assert "Curate summary" in out
# hub offline in tests -> decision queued
assert "decisions queued" in out
def test_rerun_is_idempotent(tmp_path):
cfg_path, store_dir, catalog_dir = _write_config(tmp_path)
st = Store(os.path.join(store_dir, "m.db"), os.path.join(store_dir, "blobs"))
st.write_digest("claude:a", _digest("claude:a", "claude", "r1", retries=5))
st.write_digest("codex:b", _digest("codex:b", "codex", "r2", retries=4))
st.close()
main(["--config", cfg_path, "--auto-approve"])
main(["--config", cfg_path, "--auto-approve"]) # second pass: already decided
cat = Catalog(catalog_dir)
assert len(cat.list()) == 1
assert cat.load(cat.list()[0].id).version == "1.0.0" # no spurious bump

View File

@@ -0,0 +1,76 @@
"""Evidence-bar + bloat-guard tests (T04)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.catalog import Catalog # noqa: E402
from session_memory.curate.gating import ( # noqa: E402
GateConfig,
bloat_warnings,
evaluate,
gate_config,
)
from session_memory.curate.review import candidate_to_pattern # noqa: E402
def _candidate(key="success:clean_pass:outcome", freq=5, sessions=5, impact=10.0,
cross=True, flavors=("claude", "grok")):
return {
"key": key,
"frequency": freq,
"sessions": [f"s{i}" for i in range(sessions)],
"cost_impact": impact,
"cross_flavor": cross,
"flavors": list(flavors),
}
def test_clears_bar_and_distribution_ready():
r = evaluate(_candidate(), GateConfig(dist_min_frequency=3))
assert r.promotable and r.distribution_ready
assert r.status == "approved"
def test_thin_candidate_promotable_but_provisional():
# meets promote floor (freq>=2) but below distribution floor (freq<3)
r = evaluate(_candidate(freq=2, sessions=2), GateConfig(dist_min_frequency=3))
assert r.promotable
assert not r.distribution_ready
assert r.status == "provisional"
def test_below_promote_floor_not_promotable():
r = evaluate(_candidate(freq=1, sessions=1))
assert not r.promotable
assert any("frequency" in reason for reason in r.reasons)
def test_cross_flavor_required_for_distribution():
r = evaluate(_candidate(cross=False), GateConfig(dist_require_cross_flavor=True))
assert r.promotable
assert not r.distribution_ready
assert any("cross-flavor" in reason for reason in r.reasons)
def test_gate_config_reads_toml_dict():
cfg = gate_config({"curate": {"gate": {"min_frequency": 9, "dist_require_cross_flavor": True}}})
assert cfg.min_frequency == 9
assert cfg.dist_require_cross_flavor is True
# defaults preserved for unspecified keys
assert cfg.dist_min_frequency == 3
def test_bloat_flags_duplicate_and_near_duplicate(tmp_path):
cat = Catalog(str(tmp_path))
cat.upsert(candidate_to_pattern(_candidate(key="success:clean_pass:outcome")))
existing = cat.list()
# exact same key -> duplicate
dup = bloat_warnings(_candidate(key="success:clean_pass:outcome"), existing)
assert any("duplicate" in w for w in dup)
# different polarity, same signal_type+locus -> near-duplicate
near = bloat_warnings(_candidate(key="problem:clean_pass:outcome"), existing)
assert any("near-duplicate" in w for w in near)
# unrelated -> no warnings
assert bloat_warnings(_candidate(key="problem:retry_storm:retries"), existing) == []

View File

@@ -0,0 +1,93 @@
"""Review workflow tests (T03): promote/reject/discuss + idempotent re-review."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.catalog import Catalog # noqa: E402
from session_memory.curate.review import ( # noqa: E402
APPROVE,
DISCUSS,
REJECT,
ReviewLog,
candidate_to_pattern,
review,
)
from session_memory.curate.schema import SolutionPattern # noqa: E402
def _candidate(key="success:clean_pass:outcome", freq=18, flavors=("claude", "grok")):
return {
"key": key,
"polarity": key.split(":")[0],
"signal_type": key.split(":")[1],
"locus": key.split(":")[2],
"title": "cross-flavor success: clean pass",
"frequency": freq,
"flavors": list(flavors),
"repos": ["agentic-resources"],
"sessions": [f"s{i}" for i in range(freq)],
"cross_flavor": len(flavors) > 1,
"cost_impact": 12.5,
}
def _decider(action, rationale="because"):
return lambda cand: (action, rationale)
def test_approve_promotes_to_catalog(tmp_path):
cat = Catalog(str(tmp_path / "catalog"))
log = ReviewLog(str(tmp_path / "reviews.jsonl"))
res = review([_candidate()], _decider(APPROVE), cat, log)
assert len(res.approved) == 1
p = cat.load(SolutionPattern.make_id("success:clean_pass:outcome"))
assert p is not None
assert p.scope.flavors == ["claude", "grok"]
assert set(p.rendering_hints) == {"claude", "grok"}
assert p.provenance.evidence["frequency"] == 18
def test_reject_records_no_catalog_write(tmp_path):
cat = Catalog(str(tmp_path / "catalog"))
log = ReviewLog(str(tmp_path / "reviews.jsonl"))
res = review([_candidate()], _decider(REJECT), cat, log)
assert res.rejected == ["success:clean_pass:outcome"]
assert cat.list() == []
def test_discuss_defers_and_is_not_final(tmp_path):
cat = Catalog(str(tmp_path / "catalog"))
log = ReviewLog(str(tmp_path / "reviews.jsonl"))
res = review([_candidate()], _decider(DISCUSS), cat, log)
assert res.deferred == ["success:clean_pass:outcome"]
# not recorded as final -> a later pass re-surfaces it
res2 = review([_candidate()], _decider(APPROVE), cat, log)
assert len(res2.approved) == 1
def test_prior_reject_remembered_same_evidence(tmp_path):
cat = Catalog(str(tmp_path / "catalog"))
log_path = str(tmp_path / "reviews.jsonl")
review([_candidate()], _decider(REJECT), cat, ReviewLog(log_path))
# fresh log instance (reloads from disk) + same evidence -> skipped
res = review([_candidate()], _decider(APPROVE), cat, ReviewLog(log_path))
assert res.skipped == ["success:clean_pass:outcome"]
assert cat.list() == []
def test_changed_evidence_resurfaces(tmp_path):
cat = Catalog(str(tmp_path / "catalog"))
log_path = str(tmp_path / "reviews.jsonl")
review([_candidate(freq=18)], _decider(REJECT), cat, ReviewLog(log_path))
# more evidence now -> not skipped, gets re-reviewed
res = review([_candidate(freq=40)], _decider(APPROVE), cat, ReviewLog(log_path))
assert len(res.approved) == 1
def test_candidate_to_pattern_defaults():
p = candidate_to_pattern(_candidate(flavors=("claude",)))
assert p.status == "provisional"
assert p.rendering_hints["claude"]["target"] == "CLAUDE.md"
assert p.polarity == "success"

View File

@@ -0,0 +1,80 @@
"""Round-trip + validation tests for the Solution Pattern schema (T01)."""
import os
import sys
import pytest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.schema import ( # noqa: E402
Provenance,
Resolution,
Scope,
SolutionPattern,
)
def _sample() -> SolutionPattern:
src = "success:clean_pass:outcome"
return SolutionPattern(
id=SolutionPattern.make_id(src),
name="Run tests before declaring success",
version="1.0.0",
polarity="success",
problem="Sessions that run tests and finish with no retries resolve cheaply.",
resolutions=[Resolution(summary="Always run the suite", steps=["edit", "test", "commit"])],
scope=Scope(flavors=["claude", "grok"]),
provenance=Provenance(source_key=src, evidence={"frequency": 18, "cross_flavor": True}),
rendering_hints={"claude": {"target": "CLAUDE.md"}, "codex": {"target": "AGENTS.md"}},
status="approved",
distribution_ready=True,
)
def test_round_trip_is_lossless():
p = _sample()
again = SolutionPattern.from_json(p.to_json())
assert again.to_dict() == p.to_dict()
assert again.resolutions[0].steps == ["edit", "test", "commit"]
assert again.scope.flavors == ["claude", "grok"]
assert again.provenance.evidence["cross_flavor"] is True
def test_serialization_is_deterministic():
p = _sample()
assert p.to_json() == p.to_json()
assert SolutionPattern.from_json(p.to_json()).to_json() == p.to_json()
def test_make_id_is_stable_and_slugged():
assert SolutionPattern.make_id("success:clean_pass:outcome") == "sp-success-clean_pass-outcome"
# same source key -> same id regardless of later wording
assert SolutionPattern.make_id("problem:abandoned:outcome") == SolutionPattern.make_id(
"problem:abandoned:outcome"
)
def test_bump_version():
assert SolutionPattern.bump_version("1.0.0") == "1.0.1"
assert SolutionPattern.bump_version("1.2.3", "minor") == "1.3.0"
assert SolutionPattern.bump_version("1.2.3", "major") == "2.0.0"
def test_rejects_unknown_polarity():
with pytest.raises(ValueError):
SolutionPattern(id="x", name="n", version="1.0.0", polarity="meh", problem="p")
def test_rejects_unknown_status():
with pytest.raises(ValueError):
SolutionPattern(id="x", name="n", version="1.0.0", polarity="problem",
problem="p", status="bogus")
def test_rejects_unknown_flavor_in_hints_and_scope():
with pytest.raises(ValueError):
SolutionPattern(id="x", name="n", version="1.0.0", polarity="problem",
problem="p", rendering_hints={"gpt": {}})
with pytest.raises(ValueError):
Scope(flavors=["gpt"])

View File

@@ -0,0 +1,47 @@
"""Detect entrypoint tests (T07): end-to-end digests -> patterns, persisted."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.core.store import Store # noqa: E402
from session_memory.detect.__main__ import run_detect # noqa: E402
def _digest(uid, flavor, repo, **markers):
return {
"session_uid": uid, "flavor": flavor, "repo": repo, "outcome": "fail",
"cost": {"input_tokens": 10, "output_tokens": 1},
"markers": {"errors": markers.get("errors", 0), "retries": markers.get("retries", 0),
"test_runs": 0, "edits": 0, "human_interventions": 0},
# fields the quality filter (WP-0005 T01) checks — real coding session
"event_count": 40, "first_prompt": "Fix the failing build and retry the suite",
"tool_histogram": {"Bash": 20, "Edit": 12, "Read": 8},
}
def _config(tmp_path):
return {"store": {"db_path": str(tmp_path / ".store/m.db"),
"blob_dir": str(tmp_path / ".store/blobs"),
"cursor": str(tmp_path / ".store/c.json")}}
def test_run_detect_persists_cross_flavor_pattern(tmp_path):
cfg = _config(tmp_path)
st = Store(cfg["store"]["db_path"], cfg["store"]["blob_dir"])
# same problem (retry_storm) across two flavors -> cross-flavor candidate
st.write_digest("claude:a", _digest("claude:a", "claude", "r1", retries=5))
st.write_digest("codex:b", _digest("codex:b", "codex", "r2", retries=4))
st.close()
patterns = run_detect(cfg, min_frequency=2)
assert len(patterns) == 1
assert patterns[0]["cross_flavor"] is True
assert patterns[0]["signal_type"] == "retry_storm"
# persisted to the Tier 2 patterns table
st2 = Store(cfg["store"]["db_path"], cfg["store"]["blob_dir"])
rows = st2.db.execute("SELECT key FROM patterns").fetchall()
assert len(rows) == 1
st2.close()

View File

@@ -0,0 +1,80 @@
"""Infra-overhead + thrash signal tests (WP-0005 T02)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.detect.signals import ( # noqa: E402
build_context,
extract_signals,
sig_infra_overhead,
sig_schema_thrash,
sig_tool_thrash,
tool_bucket,
)
def _digest(uid="claude:a", repo="r1", tools=None):
return {"session_uid": uid, "flavor": "claude", "repo": repo, "outcome": "success",
"cost": {"input_tokens": 1, "output_tokens": 1},
"markers": {"errors": 0, "retries": 0, "test_runs": 0},
"tool_histogram": tools or {}}
CTX = {"infra_min_calls": 20, "infra_overhead_threshold": 0.30,
"schema_thrash_threshold": 5, "tool_thrash_threshold": 80}
def test_tool_bucket_mapping():
assert tool_bucket("mcp__state-hub__update_task_status") == "statehub_mcp"
assert tool_bucket("ToolSearch") == "schema_load"
assert tool_bucket("TaskUpdate") == "task_mgmt"
assert tool_bucket("Bash") == "shell"
assert tool_bucket("Edit") == "edit"
def test_infra_overhead_fires_above_share():
# 18 statehub of 30 total = 60% overhead
d = _digest(tools={"mcp__state-hub__create_task": 18, "Bash": 8, "Edit": 4})
sig = sig_infra_overhead(d, CTX)
assert sig and sig[0].type == "infra_overhead"
assert sig[0].magnitude >= 0.30
assert sig[0].detail["statehub"] == 18
def test_infra_overhead_quiet_when_mostly_work():
d = _digest(tools={"mcp__state-hub__create_task": 3, "Bash": 40, "Edit": 30})
assert sig_infra_overhead(d, CTX) == []
def test_infra_overhead_ignores_tiny_sessions():
d = _digest(tools={"mcp__state-hub__create_task": 5}) # below infra_min_calls
assert sig_infra_overhead(d, CTX) == []
def test_schema_thrash_fires():
d = _digest(tools={"ToolSearch": 9, "Bash": 5})
sig = sig_schema_thrash(d, CTX)
assert sig and sig[0].type == "schema_thrash"
assert sig[0].detail["tool_searches"] == 9
def test_tool_thrash_fires_on_dominant_tool():
d = _digest(tools={"Bash": 120, "Edit": 5})
sig = sig_tool_thrash(d, CTX)
assert sig and sig[0].locus == "tool:Bash"
def test_extract_signals_includes_infra():
d = _digest(tools={"mcp__state-hub__create_task": 18, "Bash": 8, "Edit": 4,
"ToolSearch": 6})
types = {s.type for s in extract_signals([d])}
assert "infra_overhead" in types
assert "schema_thrash" in types
def test_build_context_has_infra_defaults():
ctx = build_context([])
assert ctx["infra_overhead_threshold"] == 0.30
assert ctx["schema_thrash_threshold"] == 5

View File

@@ -0,0 +1,61 @@
"""Session-quality filter tests (T01)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.detect.quality import ( # noqa: E402
QualityConfig,
filter_real,
is_real_coding_session,
quality_config,
)
def _digest(repo="agentic-resources", events=60, prompt="Implement the curate entrypoint",
tools=None):
return {
"session_uid": "claude:x", "flavor": "claude", "repo": repo,
"event_count": events, "first_prompt": prompt,
"tool_histogram": tools if tools is not None else {"Bash": 20, "Edit": 15, "Read": 8},
}
def test_real_session_passes():
assert is_real_coding_session(_digest()) is True
def test_healthcheck_prompt_dropped():
assert is_real_coding_session(_digest(events=3, prompt="Say hello in one word.",
tools={})) is False
def test_interrupted_dropped():
assert is_real_coding_session(_digest(events=1, prompt="[Request interrupted by user]",
tools={})) is False
def test_too_short_dropped():
assert is_real_coding_session(_digest(events=5)) is False
def test_no_repo_dropped():
assert is_real_coding_session(_digest(repo=None)) is False
def test_no_substantive_tools_dropped():
# plenty of events but only plumbing calls -> not real coding
assert is_real_coding_session(
_digest(tools={"mcp__state-hub__update_task_status": 40})) is False
def test_filter_real_keeps_only_real():
digs = [_digest(), _digest(events=3, prompt="hello", tools={}), _digest(repo=None)]
assert len(filter_real(digs)) == 1
def test_quality_config_from_toml():
cfg = quality_config({"detect": {"quality": {"min_events": 50}}})
assert cfg.min_events == 50
assert cfg.min_substantive == 3 # default preserved

View File

@@ -0,0 +1,59 @@
"""Recurring-error signal + clustering (WP-0006 T02)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.detect.cluster import cluster # noqa: E402
from session_memory.detect.signals import ( # noqa: E402
extract_signals,
sig_recurring_error,
)
def _digest(uid, repo, flavor="claude", snippets=None):
return {
"session_uid": uid, "flavor": flavor, "repo": repo, "outcome": "success",
"cost": {"input_tokens": 1, "output_tokens": 1},
"markers": {"errors": 0, "retries": 0, "test_runs": 0},
"tool_histogram": {}, "error_snippets": snippets or [],
}
_FP = "modulenotfounderror: no module named 'foo' at <path>:<n>"
def test_signal_per_distinct_fingerprint():
d = _digest("claude:a", "r1", snippets=[
{"fingerprint": _FP, "sample": "ModuleNotFoundError ...", "count": 3, "tool": "Bash"},
{"fingerprint": "keyerror: <str>", "sample": "KeyError", "count": 1, "tool": None},
])
sigs = sig_recurring_error(d, {})
assert len(sigs) == 2
top = [s for s in sigs if s.locus == _FP][0]
assert top.type == "recurring_error"
assert top.magnitude == 3.0
assert top.detail["sample"].startswith("ModuleNotFound")
def test_clusters_across_sessions_and_flavors():
# same fingerprint in a claude and a grok session -> cross-flavor candidate
digs = [
_digest("claude:a", "r1", "claude",
[{"fingerprint": _FP, "sample": "ModuleNotFoundError", "count": 2, "tool": "Bash"}]),
_digest("grok:b", "r2", "grok",
[{"fingerprint": _FP, "sample": "ModuleNotFoundError", "count": 1, "tool": None}]),
]
signals = extract_signals(digs)
pats = cluster([s for s in signals if s.type == "recurring_error"], min_frequency=2)
assert len(pats) == 1
p = pats[0]
assert p.signal_type == "recurring_error"
assert p.cross_flavor is True
assert sorted(p.flavors) == ["claude", "grok"]
assert p.frequency == 2
def test_no_snippets_no_signal():
assert sig_recurring_error(_digest("claude:a", "r1"), {}) == []

82
tests/test_digest.py Normal file
View File

@@ -0,0 +1,82 @@
"""Digest tests (T04): outcome heuristic + Tier 2 promotion."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.adapters.claude import Normalized # noqa: E402
from session_memory.core.digest import analyze, build_digest, infer_outcome # noqa: E402
from session_memory.core.schema import Cost, Session, SessionEvent # noqa: E402
from session_memory.core.store import Store # noqa: E402
def _ev(uid, seq, kind, **kw):
return SessionEvent(session_uid=uid, seq=seq, kind=kind, **kw)
def test_infer_outcome_abandoned():
uid = "claude:s"
assert infer_outcome([_ev(uid, 0, "user_msg")]) == "abandoned"
def test_infer_outcome_success_on_passing_test():
uid = "claude:s"
events = [
_ev(uid, 0, "user_msg"),
_ev(uid, 1, "assistant_msg"),
_ev(uid, 2, "test_run", tool="Bash"),
_ev(uid, 3, "tool_result", payload_ref="b3"),
]
assert infer_outcome(events, {"b3": "6 passed in 0.4s"}) == "success"
def test_infer_outcome_fail_on_failing_test():
uid = "claude:s"
events = [
_ev(uid, 0, "user_msg"),
_ev(uid, 1, "assistant_msg"),
_ev(uid, 2, "test_run", tool="Bash"),
_ev(uid, 3, "tool_result", payload_ref="b3"),
]
assert infer_outcome(events, {"b3": "1 failed, traceback ..."}) == "fail"
def test_build_digest_histograms_and_markers():
uid = "claude:s"
s = Session(session_uid=uid, flavor="claude", native_session_id="s",
repo="agentic-resources", cost=Cost(input_tokens=100, output_tokens=40, turns=2))
events = [
_ev(uid, 0, "user_msg"),
_ev(uid, 1, "edit", tool="Edit"),
_ev(uid, 2, "edit", tool="Write"),
_ev(uid, 3, "test_run", tool="Bash"),
_ev(uid, 4, "error"),
_ev(uid, 5, "assistant_msg"),
]
d = build_digest(s, events)
assert d["tool_histogram"] == {"Edit": 1, "Write": 1, "Bash": 1}
assert d["markers"]["edits"] == 2
assert d["markers"]["errors"] == 1
assert d["markers"]["test_runs"] == 1
assert d["event_count"] == 6
assert d["cost"]["input_tokens"] == 100
def test_analyze_writes_digest_and_sets_analyzed(tmp_path):
st = Store(str(tmp_path / "m.db"), str(tmp_path / "blobs"))
uid = Session.make_uid("claude", "s1")
s = Session(session_uid=uid, flavor="claude", native_session_id="s1")
events = [
SessionEvent(session_uid=uid, seq=0, kind="user_msg", payload_ref="b0"),
SessionEvent(session_uid=uid, seq=1, kind="assistant_msg", payload_ref="b1"),
]
blobs = {"b0": "please help", "b1": "done"}
st.ingest(Normalized(session=s, events=events, blobs=blobs))
assert st.get_session(uid).is_evictable is False
d = analyze(st, uid)
assert d["outcome"] == "success"
assert d["first_prompt"] == "please help"
assert st.get_session(uid).analyzed_at is not None
assert st.get_session(uid).is_evictable is True # now promoted -> evictable

101
tests/test_digest_errors.py Normal file
View File

@@ -0,0 +1,101 @@
"""Error-body mining into the digest (WP-0006 T01)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.core.digest import ( # noqa: E402
_error_fingerprint,
_error_snippets,
build_digest,
)
from session_memory.core.schema import SCHEMA_VERSION, Session, SessionEvent # noqa: E402
def _ev(seq, kind, **kw):
return SessionEvent(session_uid="claude:s", seq=seq, kind=kind, **kw)
def test_fingerprint_normalizes_paths_numbers_ids():
a = _error_fingerprint("ModuleNotFoundError: No module named 'foo' at /home/x/a.py:42")
b = _error_fingerprint("ModuleNotFoundError: No module named 'foo' at /srv/y/b.py:9991")
assert a == b # paths + line numbers stripped -> same fingerprint
assert "<path>" in a and "<n>" in a
def test_fingerprint_uuid_and_addr():
fp = _error_fingerprint("connection 0xDEADBEEF to 1972d1d9-fc35-4912-8126-1fe64cc51425 failed")
assert "<addr>" in fp and "<uuid>" in fp
def test_snippets_dedup_and_count():
blobs = {"b1": "Traceback...\nValueError: bad thing at /p/x.py:10",
"b2": "Traceback...\nValueError: bad thing at /q/y.py:99",
"b3": "KeyError: 'id'"}
events = [
_ev(0, "error", payload_ref="b1"),
_ev(1, "error", payload_ref="b2"), # same fingerprint as b1
_ev(2, "error", payload_ref="b3"),
]
snips = _error_snippets(events, blobs)
assert len(snips) == 2
top = snips[0]
assert top["count"] == 2 # the ValueError collapsed
assert "ValueError" in top["sample"]
def test_failed_tool_result_mined():
blobs = {"b1": "npm ERR! something failed with non-zero exit"}
events = [_ev(0, "tool_result", tool="Bash", payload_ref="b1")]
snips = _error_snippets(events, blobs)
assert len(snips) == 1
assert snips[0]["tool"] == "Bash"
def test_clean_tool_result_not_mined():
blobs = {"b1": "6 passed in 0.4s"}
events = [_ev(0, "tool_result", tool="Bash", payload_ref="b1")]
assert _error_snippets(events, blobs) == []
def test_success_json_not_mined():
# a hub MCP success payload mentioning 'error' deep inside is NOT a failure
blobs = {"b1": '{"result": "{\\"domain\\": \\"custodian\\", \\"note\\": \\"no errors\\"}"}'}
events = [_ev(0, "tool_result", tool="mcp__state-hub__get_domain_summary", payload_ref="b1")]
assert _error_snippets(events, blobs) == []
def test_error_json_still_mined():
blobs = {"b1": '{"detail": "Invalid request parameters"}'}
events = [_ev(0, "tool_result", tool="Bash", payload_ref="b1")]
snips = _error_snippets(events, blobs)
assert len(snips) == 1
def test_plain_mcp_error_still_mined():
blobs = {"b1": "MCP error -32602: Invalid request parameters"}
events = [_ev(0, "tool_result", tool="Bash", payload_ref="b1")]
assert len(_error_snippets(events, blobs)) == 1
def test_file_read_snapshot_not_mined():
# a Read result of source code containing 'raise ...Error' is not a runtime error
blobs = {"b1": "227\t def f():\n228\t x = 1\n229\t raise InfospaceError()\n"}
events = [_ev(0, "tool_result", tool="Read", payload_ref="b1")]
assert _error_snippets(events, blobs) == []
def test_build_digest_includes_error_snippets_and_v2():
s = Session(session_uid="claude:s", flavor="claude", native_session_id="s", repo="r")
events = [_ev(0, "user_msg"), _ev(1, "error", payload_ref="b1"), _ev(2, "assistant_msg")]
d = build_digest(s, events, {"b1": "RuntimeError: kaboom at /a/b.py:3"})
assert d["schema_version"] == SCHEMA_VERSION == 2
assert d["error_snippets"][0]["count"] == 1
assert "RuntimeError" in d["error_snippets"][0]["sample"]
def test_no_errors_empty_list():
s = Session(session_uid="claude:s", flavor="claude", native_session_id="s", repo="r")
d = build_digest(s, [_ev(0, "user_msg"), _ev(1, "assistant_msg")])
assert d["error_snippets"] == []

View File

@@ -0,0 +1,78 @@
"""digest_lookup entrypoint tests (AGENTIC-WP-0011 T03)."""
import json
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.core.store import Store # noqa: E402
from session_memory.digest_lookup import lookup_digest, main, resolve_store_paths # noqa: E402
def _write_config(tmp_path) -> str:
store = tmp_path / ".store"
toml = tmp_path / "config.toml"
toml.write_text(
f'[store]\ndb_path = "{store / "m.db"}"\nblob_dir = "{store / "blobs"}"\n'
f'cursor = "{store / "c.json"}"\n')
return str(toml), str(store)
def _seed(store_dir, uid="claude:test-uid"):
st = Store(os.path.join(store_dir, "m.db"), os.path.join(store_dir, "blobs"))
st.write_digest(uid, {
"session_uid": uid,
"flavor": "claude",
"repo": "agentic-resources",
"outcome": "success",
"started_at": "2026-06-19T10:00:00Z",
"ended_at": "2026-06-19T11:00:00Z",
"cost": {"input_tokens": 100, "output_tokens": 25},
"tool_histogram": {"Bash": 10, "Edit": 5},
})
st.close()
return uid
def test_resolve_store_paths_from_config(tmp_path):
cfg_path, store_dir = _write_config(tmp_path)
db, blob = resolve_store_paths(config_path=cfg_path)
assert db.endswith("m.db")
assert blob.endswith("blobs")
assert store_dir in db
def test_resolve_store_paths_from_env(tmp_path, monkeypatch):
db = tmp_path / "custom" / "mem.db"
db.parent.mkdir(parents=True)
monkeypatch.setenv("HELIX_STORE_DB", str(db))
resolved_db, blob = resolve_store_paths()
assert resolved_db == str(db)
assert blob == str(tmp_path / "custom" / "blobs")
def test_lookup_digest_found_and_missing(tmp_path):
cfg_path, store_dir = _write_config(tmp_path)
uid = _seed(store_dir)
found = lookup_digest(uid, config_path=cfg_path)
assert found is not None and found["outcome"] == "success"
assert lookup_digest("claude:missing", config_path=cfg_path) is None
def test_main_json_success(tmp_path, capsys):
cfg_path, store_dir = _write_config(tmp_path)
uid = _seed(store_dir)
rc = main(["--config", cfg_path, uid, "--json"])
assert rc == 0
data = json.loads(capsys.readouterr().out)
assert data["session_uid"] == uid
assert data["repo"] == "agentic-resources"
def test_main_not_found(tmp_path, capsys):
cfg_path, store_dir = _write_config(tmp_path)
_seed(store_dir)
rc = main(["--config", cfg_path, "claude:missing"])
assert rc == 1
assert "not found" in capsys.readouterr().err.lower()

View File

@@ -0,0 +1,88 @@
"""Distributor base tests (WP-0007 T01): markers, idempotent upsert, rendering."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.schema import Resolution, SolutionPattern # noqa: E402
from session_memory.distribute.base import ( # noqa: E402
Artifact,
BaseDistributor,
Distributor,
render_markdown_body,
upsert_block,
wrap_block,
)
def _pattern(pid="sp-x", polarity="problem"):
return SolutionPattern(
id=pid, name="Read before edit", version="1.2.0", polarity=polarity,
problem="Agents edit files they have not read.",
resolutions=[Resolution(summary="Read the file first", detail="then Edit",
steps=["Read", "Edit"])],
rendering_hints={"claude": {"target": "CLAUDE.md"}},
)
def test_render_markdown_body_has_problem_and_resolution():
body = render_markdown_body(_pattern())
assert "### Read before edit" in body
assert "Agents edit files" in body
assert "**Avoid:**" in body # problem polarity
assert "- Read the file first — then Edit" in body
assert " - Read" in body
def test_success_polarity_label():
assert "**Prefer:**" in render_markdown_body(_pattern(polarity="success"))
def test_wrap_block_has_markers_and_version():
block = wrap_block("sp-x", "hello", "1.2.0")
assert block.startswith("<!-- BEGIN helix-forge pattern:sp-x --> v1.2.0")
assert block.rstrip().endswith("<!-- END helix-forge pattern:sp-x -->")
def test_upsert_inserts_then_replaces_in_place():
doc = "# Title\n\nsome text\n"
b1 = wrap_block("sp-x", "first", "1")
once = upsert_block(doc, "sp-x", b1)
assert "first" in once and once.count("BEGIN helix-forge pattern:sp-x") == 1
# re-distributing the same id replaces, does not duplicate
b2 = wrap_block("sp-x", "second", "2")
twice = upsert_block(once, "sp-x", b2)
assert "second" in twice and "first" not in twice
assert twice.count("BEGIN helix-forge pattern:sp-x") == 1
def test_upsert_keeps_other_patterns():
doc = upsert_block("", "sp-a", wrap_block("sp-a", "A"))
doc = upsert_block(doc, "sp-b", wrap_block("sp-b", "B"))
assert "sp-a" in doc and "sp-b" in doc
def test_base_distributor_renders_artifact():
d = BaseDistributor(flavor="claude", target_path="CLAUDE.md")
art = d.render(_pattern())
assert isinstance(art, Artifact)
assert isinstance(d, Distributor) # satisfies the protocol
assert art.flavor == "claude"
assert art.target_path == "CLAUDE.md"
assert "BEGIN helix-forge pattern:sp-x" in art.content
assert "Read before edit" in art.content
def test_body_hint_overrides_default():
p = _pattern()
p.rendering_hints["claude"]["body"] = "custom claude body"
d = BaseDistributor(flavor="claude", target_path="CLAUDE.md")
assert "custom claude body" in d.render(p).content
def test_target_hint_overrides_default():
p = _pattern()
p.rendering_hints["claude"]["target"] = "docs/CLAUDE.md"
d = BaseDistributor(flavor="claude", target_path="CLAUDE.md")
assert d.render(p).target_path == "docs/CLAUDE.md"

View File

@@ -0,0 +1,40 @@
"""Claude distributor tests (WP-0007 T02)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.schema import Resolution, SolutionPattern # noqa: E402
from session_memory.distribute.claude import ClaudeDistributor # noqa: E402
def _pattern(hints=None):
return SolutionPattern(
id="sp-read-before-edit", name="Read before edit", version="1.0.0",
polarity="problem", problem="Agents edit files they have not read.",
resolutions=[Resolution(summary="Read the file first", steps=["Read", "Edit"])],
rendering_hints=hints or {"claude": {}},
)
def test_default_targets_claude_md():
art = ClaudeDistributor().render(_pattern())
assert art.flavor == "claude"
assert art.target_path == "CLAUDE.md"
assert "BEGIN helix-forge pattern:sp-read-before-edit" in art.content
assert "### Read before edit" in art.content
def test_skill_mode_emits_skill_stub():
art = ClaudeDistributor().render(_pattern({"claude": {"as": "skill"}}))
assert "## Skill: Read before edit" in art.content
assert "**When:**" in art.content
assert " - Read" in art.content
def test_idempotent_marker_present_for_reupsert():
art = ClaudeDistributor().render(_pattern())
# same id in both renders -> caller can upsert in place
art2 = ClaudeDistributor().render(_pattern())
assert art.pattern_id == art2.pattern_id == "sp-read-before-edit"

View File

@@ -0,0 +1,49 @@
"""Codex + Grok distributor + registry tests (WP-0007 T03)."""
import os
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from session_memory.curate.schema import Resolution, SolutionPattern # noqa: E402
from session_memory.distribute.codex import CodexDistributor # noqa: E402
from session_memory.distribute.grok import GrokDistributor # noqa: E402
from session_memory.distribute.registry import all_flavors, get_distributor # noqa: E402
def _pattern():
return SolutionPattern(
id="sp-x", name="Read before edit", version="1.0.0", polarity="problem",
problem="Agents edit files they have not read.",
resolutions=[Resolution(summary="Read the file first")],
)
def test_codex_targets_agents_md():
art = CodexDistributor().render(_pattern())
assert art.flavor == "codex" and art.target_path == "AGENTS.md"
assert "Read before edit" in art.content
def test_grok_targets_native_instructions():
art = GrokDistributor().render(_pattern())
assert art.flavor == "grok" and art.target_path == ".grok/instructions.md"
def test_same_pattern_expressible_for_all_flavors():
# FR-A3: one pattern, rendered for every flavor (same body, different targets)
p = _pattern()
bodies = {}
for f in all_flavors():
art = get_distributor(f).render(p)
# strip markers -> compare agnostic body
inner = art.content.split("\n", 1)[1].rsplit("\n", 1)[0]
bodies[f] = inner
targets = {get_distributor(f).render(p).target_path for f in all_flavors()}
assert len(targets) == 3 # distinct per-flavor targets
assert len(set(bodies.values())) == 1 # identical agnostic body
def test_registry_unknown_flavor():
assert get_distributor("gpt") is None
assert set(all_flavors()) == {"claude", "codex", "grok"}

Some files were not shown because too many files have changed in this diff Show More