33 Commits

Author SHA1 Message Date
06bcfdc1d9 session-memory: refresh published retro report artifacts
Latest retro publish (30-day window) regenerated last_retro.{json,md} — 30
ranked suggestions across 13 repos with catalog-sourced recommendations. This is
the read model published to the hub to unblock activity-core ACTIVITY-WP-0008.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 21:48:18 +02:00
e237dcc622 session-memory: map signals to catalog recommendations via covers (WP-0010 follow-up)
Closes the gap where recurring_error suggestions showed generic 'Investigate'
instead of the curated recommendation. Added a covers[] field to SolutionPattern
(lowercase substrings a pattern's recommendation also applies to) + Catalog.find_for
(exact key first, then covers match against signal key+locus). Retro now resolves
recommendations through find_for. Tagged the read-before-edit pattern with
covers=['file has not been read','modified since read','file_not_read'] (v1.0.1).
Live: file-not-read suggestions across all repos now inherit 'Read the file before
Edit/Write'. 6 new tests; suite 158/158.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 21:09:44 +02:00
0d05dfcc5d session-memory: weekly retro entrypoint + hub publish (AGENTIC-WP-0010)
The analysis half of the weekly coding retrospection. retro/build.py: windowed
detect+measure -> top-3 improvement suggestions per repo (cross-flavor first,
recommendations pulled from the Pattern Catalog) + fleet snapshot. retro/publish.py:
publishes the report to the hub as the coding_retro read model (event_type=
coding_retro progress event) + local JSON/md, graceful degrade. retro entrypoint
with --window-days/--publish/--json. Live verify over real sessions surfaced
per-repo suggestions with catalog recommendations. 13 new tests; suite 152/152.

Consumed by activity-core ACTIVITY-WP-0008 (Weekly Coding Retrospection, Sat 19:00).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 19:17:24 +02:00
15ba625351 session-memory: fill real resolutions into auto-approved catalog stubs
Replaced the placeholder 'TODO: capture the recommended resolution' in the five
auto-approved patterns with grounded problem descriptions + concrete resolutions
drawn from the friction assessment: budget_overrun (read narrowly / checkpoint),
infra_overhead (batch hub writes / orient once), schema_thrash (front-load tool
schemas), tool_thrash (batch shell + larger edits), clean_pass (tests gate done).
Each versioned 1.0.0 -> 1.0.1 with the stub archived to <id>.history.jsonl.
Proposals regenerate with real content (0 TODO). Suite 139/139.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 16:26:56 +02:00
4f28cd67cf session-memory: Phase 4 Measure — baseline, effectiveness, trend (WP-0009)
Closes the loop. metrics.py: fleet metrics (infra-overhead share, error rate,
schema-thrash, token percentiles, success) + persisted baseline trend. effect.py:
before/after per-pattern effectiveness with an improved verdict per metric.
measure entrypoint with trend + --since effectiveness + JSON. Recorded pre-fix
baseline: 27 sessions, overhead median 11.7%, error rate 0.96, schema-thrash 8.
13 new tests; suite 139/139. Capture->Detect->Curate->Distribute->Measure complete.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 15:49:22 +02:00
035c7a20d3 session-memory: Read-before-Edit reflex + curated pattern (WP-0008)
Acts on the #1 friction finding. T01: added a data-cited Read-before-Edit /
re-read-on-stale reflex to AGENTS.md (top error: 'File has not been read yet',
12/27 sessions). T02: captured it as a curated SolutionPattern
(sp-problem-file_not_read-edit, approved/distribution_ready) with real
resolutions + per-flavor hints, so Distribute proposes it across repos/flavors —
closing assess->curate->distribute on a real pattern. Suite 126/126.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 15:27:22 +02:00
59632e94db session-memory: distribute entrypoint + live verify (WP-0007 T05)
python -m session_memory.distribute: reads approved catalog patterns, builds
targets from repo->domain map x flavors, renders scoped per-flavor proposals
(HITL) + active registry. Live verify against the real catalog: 12 renders
across 5 repos, idempotent, provisional skipped. proposals/ gitignored
(regenerated); active_patterns.json committed. README documents detect->curate->
distribute. Phase 3 finished; suite 126/126.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 15:25:20 +02:00
00e8958540 session-memory: scoping + proposals + active registry (WP-0007 T04)
distribute/proposals.py: Scope-aware targeting (FR-X2, empty axis = any), render
distributable (approved+distribution_ready) patterns into a proposals/ tree
mirroring target paths — proposed not applied (FR-X3, HITL), idempotent on re-run.
ActiveRegistry (FR-X4) records which pattern+version is proposed in which
(repo,flavor). 6 new tests; suite 123/123.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 15:09:40 +02:00
9e28b1b806 session-memory: Claude + Codex + Grok distributors + registry (WP-0007 T02/T03)
Thin per-flavor distributors over the shared base: Claude (CLAUDE.md, optional
skill-stub mode), Codex (AGENTS.md), Grok (.grok/instructions.md). registry maps
flavor->distributor — adding a flavor is one entry + one module. Same agnostic
body renders to distinct per-flavor targets (FR-A3). 7 new tests; suite 117/117.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 15:06:15 +02:00
7646cbc358 session-memory: distributor base + Artifact (WP-0007 T01)
distribute/base.py: Artifact dataclass + Distributor protocol + idempotent
BEGIN/END snippet markers (upsert_block replaces a pattern's block in place so
re-distribution doesn't duplicate) + agnostic markdown body rendering from
SolutionPattern fields. BaseDistributor honours per-flavor body/target hints.
8 new tests; suite 110/110.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 15:02:47 +02:00
1b6081cd88 session-memory: denoise error fingerprints (WP-0006 follow-up)
Tighten _is_failed: exclude successful hub JSON responses (top-level no-error
payloads) and file-read snapshots (numbered cat -n source lines) that were
polluting error_snippets. JSON verdict classifies error vs success payloads
directly. Cuts distinct fingerprints 444 -> 269 (~40%) over the real corpus with
the top errors unchanged. Assessment caveat updated. 5 new tests; suite 102/102.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 13:39:08 +02:00
e022c0f9d6 session-memory: recurring-error signal + clustering (WP-0006 T02)
detect/signals.py sig_recurring_error emits one signal per distinct error
fingerprint per session (magnitude = in-session occurrences), so the same error
recurring across sessions/repos/flavors clusters into a candidate root-cause
problem pattern via the existing clusterer — cross-flavor flagged automatically.
3 new tests; suite 98/98 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 13:01:29 +02:00
97379e9658 session-memory: error-body mining into digest (WP-0006 T01)
build_digest now extracts normalized error fingerprints + samples from failed
events (error kind + failing tool_result bodies) into a durable error_snippets
list — paths/numbers/uuids/addrs stripped so the same error collapses to one
fingerprint with a count; Python traceback header skipped in favour of the real
exception line. Durable in Tier 2 (survives Tier 1 eviction). SCHEMA_VERSION ->
2 (re-ingest needed to populate). 7 new tests; suite 95/95 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 12:45:01 +02:00
48618293b0 session-memory: friction assessment + hardened catalog (WP-0005 T03)
Re-ran ingest->detect with the quality filter + infra signals over real local
sessions (72 captured -> 27 real). Purged the false-positive 'abandoned' catalog
entry and re-curated; catalog now carries tool_thrash/schema_thrash/infra_overhead
patterns. docs/ASSESSMENT-infra-friction.md ranks the friction: ~17.6% of real
tool activity is hub/task/schema plumbing (State Hub 10.3%, one session 231 calls;
ToolSearch in 81% of sessions). Validates the CLI/MCP-skill hypothesis as top-2;
recommends a State Hub skill (front-load schemas + batched writes) + bulk hub ops.
Workplan finished; suite 88/88.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 11:18:27 +02:00
21c714e286 session-memory: infra-overhead + thrash signals (WP-0005 T02)
signals.py: tool_bucket helper + three tool_histogram-based extractors that the
outcome/marker signals were blind to — sig_infra_overhead (hub+task+schema share
of tool calls over threshold), sig_schema_thrash (repeated ToolSearch), and
sig_tool_thrash (one tool dominating). Thresholds in build_context. 8 new tests;
suite 88/88 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 11:12:09 +02:00
70433cda61 session-memory: session-quality filter (WP-0005 T01)
detect/quality.py: is_real_coding_session drops health-checks / smoke-tests /
interrupted / trivially-short sessions (event floor, repo present, substantive
tool activity, non-trivial prompt). Wired into run_detect so signals only form
over real sessions — fixes the abandoned false-positive. [detect.quality] knobs;
existing detect/curate fixtures made realistic. 8 new tests; suite 80/80.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 11:07:22 +02:00
d06791f070 session-memory Phase 2: verify + catalog artifacts (T07)
End-to-end verification over real local sessions: ingest 94->93 -> 72 digests;
detect 3 candidates (2 cross-flavor); curate --auto-approve cataloged 3
SolutionPatterns (2 cross-flavor approved/distribution_ready, 1 Claude-only),
re-run fully idempotent, 3 hub decisions queued (API offline). Commits the 3
catalog artifacts as the source of truth. PRD §12 OQ4/OQ5/OQ6 marked resolved;
README + design refreshed. Workplan finished; suite 72/72.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 10:08:52 +02:00
519e76442a session-memory Phase 2: curate entrypoint + README (T06)
python -m session_memory.curate: refreshes detect candidates, then drives them
through review interactively or with --auto-approve (batch, gate-driven) /
--json. Emits a catalog diff summary; queues hub decisions when offline.
[curate] config gains decision_queue + workstream id. README documents the
detect -> curate -> distribute flow and the gate knobs. 2 new tests; suite 72/72.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 10:00:56 +02:00
4b7a628b6f session-memory Phase 2: hub decision integration (T05)
decisions.py: every final promote/reject becomes a record_decision-shaped
payload (rationale + source key + evidence snapshot). DecisionRecorder degrades
gracefully under a hub outage — pluggable sink with a durable local-queue
fallback and ordered flush/replay (mirrors Phase 1's after-the-fact sync).
Wired into review() via an optional recorder. 6 new tests; suite 70/70 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 00:31:22 +02:00
ab22d22bfb session-memory Phase 2: evidence-bar + bloat guard (T04)
gating.py: two-tier evidence bar (OQ5) — promote floor (frequency/sessions/
cost_impact) plus a stricter distribution-eligibility floor that sets a
promoted pattern to approved+distribution_ready vs provisional. Wired into
review() so thin approvals land provisional. bloat_warnings flags duplicate
and near-duplicate (same signal-type+locus) candidates (OQ6). [curate]/
[curate.gate] knobs in config.toml. 6 new tests; suite 64/64 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 00:28:34 +02:00
e51fd8154d session-memory Phase 2: review workflow (T03)
UI-free discuss/approve/reject engine driving detect candidates into the
catalog via a decide callback. candidate_to_pattern builds a provisional
SolutionPattern with per-flavor rendering-hint stubs. ReviewLog makes
re-review idempotent: prior rejects remembered, re-surfaced only when the
evidence fingerprint changes. 6 new tests; suite 58/58 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 00:25:10 +02:00
c6164a82ba session-memory Phase 2: versioned Pattern Catalog store (T02)
Files-first catalog (one JSON per pattern, id = source-key). Single
idempotent upsert path: added / unchanged / updated (status-only, no bump) /
versioned (content change bumps semver + archives prior to <id>.history.jsonl).
Dedup is structural on pattern id. 5 new tests; suite 52/52 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 00:18:01 +02:00
5f810a6992 session-memory Phase 2: Solution Pattern schema (T01)
Curate package scaffold + flavor-agnostic SolutionPattern artifact with
separate per-flavor rendering hints (OQ4): Resolution/Scope/Provenance
sub-records, stable source-key id, semver bump helper, deterministic
round-trip serialization. 7 new tests; suite 47/47 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 00:16:46 +02:00
055713aa4f session-memory Phase 1: T08 verify across all three flavors + docs
Marks AGENTIC-WP-0003 finished. Full suite 40/40 green; live pipeline
over real local sessions (Codex via fixtures) surfaced 3 candidate
patterns, 2 cross-flavor (Claude+Grok) — PRD success metric met.
README documents the detect entrypoint and Phase 0/1/next status.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 23:39:37 +02:00
436a96dcd8 session-memory Phase 1: Detect pipeline (T04-T07)
- detect/signals.py: pure extractors over digests (retry storm, repeated
  errors, budget overrun vs corpus p90, abandoned, clean pass, recovery)
- detect/cluster.py: deterministic clustering into candidate Patterns with
  evidence (sessions/repos/flavors/cost impact) + cross-flavor flagging
- detect/__main__.py: python -m session_memory.detect, ranked report
  (cross-flavor first) + --json; persists candidates to Tier 2 patterns table
- core/store.py: list_digests + save_patterns
- tests for signals, cluster, detect entrypoint

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 22:31:13 +02:00
06767ef924 session-memory Phase 1: Grok adapter (T02)
- adapters/grok.py: reads the per-session dir (summary.json + chat_history.jsonl
  + events.jsonl + updates.jsonl); conversation from chat_history, lifecycle/
  turn from events, tool-call names paired in order from updates ACP stream
- registered in ingest dispatch; codex+grok sources enabled in config.toml
- tests/test_grok_adapter.py (synthetic + real local sessions)
- live multi-flavor dry-run discovers 89 sessions across flavors

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 22:12:30 +02:00
bc11cb9aec session-memory Phase 1: Codex adapter (T01) + multi-file merge (T03)
- adapters/common.py: shared Normalized + helpers (resolve_repo, classify_tool,
  jsonl iter, etc.); claude.py refactored to use it (Normalized re-exported)
- adapters/codex.py: rollout {timestamp,type,payload} parser; session_meta/
  response_item/event_msg mapping; flat call_id join; token_count cost;
  registered in ingest dispatch
- core/store.py: ingest() now merges multi-file sessions by content
  fingerprint, appends new events with offset seq (design OQ6); idempotent
- tests/test_codex_adapter.py, tests/test_merge.py

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 21:55:32 +02:00
7c6f4358ee session-memory Phase 0: end-to-end verification + docs (T07)
- verified full sweep over 85 real local Claude transcripts: 63 sessions
  ingested+analyzed, eviction under tiny cap freed 26MB with zero data loss,
  digest-preservation invariant holds, idempotent re-run
- session_memory/README.md: usage, scheduling, retention knobs
- design doc: OQ4 resolved (median ~49KB/session), OQ6 (multi-file sessions)
- workplan AGENTIC-WP-0002 finished

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 21:44:46 +02:00
586ed90948 session-memory Phase 0: ingest cursor + sweep entrypoint + config (T06)
- session_memory/core/cursor.py: size/mtime change detection sidecar
- session_memory/config.toml: store paths, retention caps, per-source
  globs (claude on, codex/grok off for Phase 1), repo->domain map
- session_memory/ingest.py: discover->normalize->store->digest->evict;
  --dry-run creates/writes nothing; python -m session_memory.ingest
- tests/test_ingest.py; live dry-run parsed 84/85 real local sessions

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 21:41:59 +02:00
451fb8f1f3 session-memory Phase 0: budget-based retention sweep (T05)
- session_memory/core/retention.py: RetentionConfig + sweep() with backstop,
  budget (oldest-analyzed-first, never touches un-analyzed), and hard-cap
  overflow (analyze-now then reported last-resort data_loss); EvictionReport
- tests/test_retention.py covers all four branches

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 21:37:40 +02:00
abb888f3ef session-memory Phase 0: session digest + outcome heuristic (T04)
- session_memory/core/digest.py: build_digest (cost totals, kind/tool
  histograms, markers, snippets) + cross-flavor infer_outcome heuristic;
  analyze() promotes Tier1->Tier2 and sets analyzed_at (-> evictable)
- tests/test_digest.py

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 20:03:04 +02:00
29fc211a14 session-memory Phase 0: Tier1/Tier2 store (T03)
- session_memory/core/store.py: SQLite rows + blob-dir bodies, idempotent
  ingest on (session_uid,seq), Tier1/Tier2 usage accounting, evict_raw that
  drops raw but preserves the digest; watermark columns authoritative
- tests/test_store.py: ingest idempotency, accounting, eviction invariant

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 19:10:02 +02:00
1c29a94fa9 session-memory Phase 0: normalized schema (T01) + Claude adapter (T02)
- session_memory/core/schema.py: Session/SessionEvent/Cost dataclasses,
  flavor-prefixed uids, watermarks, kind/outcome validation (T01)
- session_memory/adapters/claude.py: JSONL -> Normalized bundle, turn DAG
  via uuid/parentUuid, kind mapping, cost from message.usage (T02)
- tests: schema round-trip + adapter (synthetic + real local session)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 19:06:10 +02:00