generated from coulomb/repo-seed
Closes the loop. metrics.py: fleet metrics (infra-overhead share, error rate, schema-thrash, token percentiles, success) + persisted baseline trend. effect.py: before/after per-pattern effectiveness with an improved verdict per metric. measure entrypoint with trend + --since effectiveness + JSON. Recorded pre-fix baseline: 27 sessions, overhead median 11.7%, error rate 0.96, schema-thrash 8. 13 new tests; suite 139/139. Capture->Detect->Curate->Distribute->Measure complete. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
202 lines
9.8 KiB
Markdown
202 lines
9.8 KiB
Markdown
# session_memory
|
|
|
|
Capture + retention layer for Helix Forge — the **Capture** stage of the loop in
|
|
[../docs/PRD-helix-forge.md](../docs/PRD-helix-forge.md), built to the
|
|
[../docs/DESIGN-session-memory.md](../docs/DESIGN-session-memory.md) spec.
|
|
|
|
It scans coding-agent session logs, normalizes them into one schema, distills a
|
|
compact per-session digest, and ages out raw bulk under a **storage budget**
|
|
(dropping sessions once analyzed and once space is needed) rather than a fixed
|
|
time window.
|
|
|
|
## Layout
|
|
|
|
```
|
|
session_memory/
|
|
adapters/common.py # shared Normalized bundle + helpers
|
|
adapters/claude.py # Tier0 -> Tier1 normalizers, one per flavor
|
|
adapters/codex.py # (rollout {timestamp,type,payload}, flat call_id join)
|
|
adapters/grok.py # (per-session dir: chat_history + events + updates)
|
|
core/schema.py # Session / SessionEvent / Cost
|
|
core/store.py # SQLite rows + blob-dir bodies (Tier1) + digests/patterns (Tier2)
|
|
core/cursor.py # incremental ingest cursors
|
|
core/digest.py # Tier1 -> Tier2 promotion + outcome heuristic
|
|
core/retention.py # budget-based eviction sweep
|
|
ingest.py # one sweep: discover -> normalize -> store -> digest -> evict
|
|
detect/signals.py # signal extractors over digests
|
|
detect/cluster.py # cluster signals -> candidate patterns + cross-flavor flag
|
|
detect/__main__.py # python -m session_memory.detect (ranked report)
|
|
curate/schema.py # SolutionPattern artifact + per-flavor rendering hints
|
|
curate/catalog.py # versioned, files-first Pattern Catalog (dedup on id)
|
|
curate/gating.py # promotion evidence bar + bloat guard
|
|
curate/review.py # discuss/approve/reject -> promote workflow
|
|
curate/decisions.py # hub decision audit trail (graceful local-queue fallback)
|
|
curate/__main__.py # python -m session_memory.curate (interactive / --auto-approve)
|
|
catalog/ # the committed Pattern Catalog (source of truth)
|
|
distribute/base.py # Artifact + Distributor protocol + idempotent snippet markers
|
|
distribute/claude.py # CLAUDE.md (or skill) renderer } per-flavor edges
|
|
distribute/codex.py # AGENTS.md renderer } (agnostic body,
|
|
distribute/grok.py # native instruction renderer } different targets)
|
|
distribute/proposals.py # scoping + proposed-not-applied output + active registry
|
|
distribute/__main__.py # python -m session_memory.distribute
|
|
measure/metrics.py # fleet metrics + persisted baseline snapshots
|
|
measure/effect.py # before/after per-pattern effectiveness
|
|
measure/__main__.py # python -m session_memory.measure
|
|
config.toml # store paths, retention caps, sources, repo->domain map, curate gate
|
|
```
|
|
|
|
The local store lives under `session_memory/.store/` (gitignored).
|
|
|
|
## Run a sweep
|
|
|
|
```bash
|
|
# from the repo root
|
|
python -m session_memory.ingest # ingest + analyze + evict
|
|
python -m session_memory.ingest --dry-run # discover + parse only, writes nothing
|
|
python -m session_memory.ingest --config path/to/config.toml
|
|
```
|
|
|
|
Output reports `discovered / ingested / skipped_unchanged / analyzed` and a
|
|
retention line (`freed`, `final_usage`, and per-pass eviction counts). Sweeps are
|
|
idempotent — re-running skips unchanged files via the cursor.
|
|
|
|
## Scheduling (cadence)
|
|
|
|
Retention is budget-based; the `cadence` in `config.toml` only decides how often
|
|
the sweep *runs*. Trigger it with the repo scheduler, e.g. daily:
|
|
|
|
```bash
|
|
# Claude Code: schedule a daily routine that runs the sweep
|
|
/schedule "daily session-memory sweep" -- python -m session_memory.ingest
|
|
```
|
|
|
|
or a cron entry / `/loop` on a timer. Push-capture (agent Stop/SessionEnd hooks)
|
|
can also enqueue a sweep; see design §7.
|
|
|
|
## Detect candidate patterns
|
|
|
|
After ingesting, mine the digests for recurring problem/success patterns:
|
|
|
|
```bash
|
|
python -m session_memory.detect # ranked report, cross-flavor first
|
|
python -m session_memory.detect --json # machine-readable candidates
|
|
python -m session_memory.detect --min-frequency 3
|
|
```
|
|
|
|
Candidates are persisted to a Tier 2 `patterns` table and are the input to the
|
|
Curate phase (Phase 2). Patterns whose evidence spans more than one agent flavor
|
|
are flagged `[CROSS-FLAVOR]` — the highest-value reuse targets.
|
|
|
|
## Curate candidates into the Pattern Catalog
|
|
|
|
Review detect candidates into versioned **Solution Patterns** held in the
|
|
files-first catalog (`session_memory/catalog/`). The flow is **detect → curate →
|
|
(Phase 3) distribute**; `curate` refreshes candidates by running detect first.
|
|
|
|
```bash
|
|
python -m session_memory.curate # interactive review (a/r/d per candidate)
|
|
python -m session_memory.curate --auto-approve # batch: promote all that clear the evidence bar
|
|
python -m session_memory.curate --json # machine-readable result
|
|
```
|
|
|
|
- **Promotion** writes a `SolutionPattern` file (id = source candidate key, so
|
|
re-promoting the same candidate dedups; content changes bump the semver and
|
|
archive the prior version to `<id>.history.jsonl`).
|
|
- The **evidence bar** (`[curate.gate]`) sets two floors: a promote floor and a
|
|
stricter *distribution* floor. A thin-but-real candidate lands `provisional`;
|
|
one clearing the distribution floor lands `approved` + `distribution_ready`.
|
|
- A **bloat guard** flags duplicate / near-duplicate candidates so the catalog
|
|
stays lean.
|
|
- Re-review is **idempotent** — a remembered decision is skipped unless the
|
|
candidate's evidence changed; a prior reject is not re-surfaced.
|
|
- Each final promote/reject is recorded as a **hub decision**; if the hub is
|
|
offline the decision is queued to `[curate].decision_queue` for later sync
|
|
(the same after-the-fact pattern used in Phase 1).
|
|
|
|
### Curate knobs (`[curate]` / `[curate.gate]` in config.toml)
|
|
|
|
| Key | Meaning |
|
|
|-----|---------|
|
|
| `catalog_dir` | committed Pattern Catalog dir (source of truth) |
|
|
| `review_log` / `decision_queue` | remembered decisions + pending hub decisions (gitignored) |
|
|
| `min_frequency` / `min_sessions` / `min_cost_impact` | floor to promote at all |
|
|
| `dist_require_cross_flavor` | require cross-flavor evidence to be distribution-eligible |
|
|
| `dist_min_frequency` / `dist_min_cost_impact` | stricter floor for `distribution_ready` |
|
|
|
|
## Distribute patterns as per-flavor proposals
|
|
|
|
Render approved catalog patterns into per-flavor artifacts — **proposed, never
|
|
auto-applied** (HITL). Completes the loop: **detect → curate → distribute**.
|
|
|
|
```bash
|
|
python -m session_memory.distribute # proposals for all repos/flavors
|
|
python -m session_memory.distribute --repo state-hub --flavor claude
|
|
python -m session_memory.distribute --json
|
|
```
|
|
|
|
- Only `approved` + `distribution_ready` patterns are rendered; each pattern's
|
|
`Scope` (repos/domains/flavors) decides where it lands (FR-X2).
|
|
- Each flavor renders the **same agnostic body** to its own target (Claude →
|
|
`CLAUDE.md`/skill, Codex → `AGENTS.md`, Grok → native) via `rendering_hints`
|
|
(FR-A3); blocks carry stable `BEGIN/END` markers so re-running updates in place.
|
|
- Output goes to `session_memory/proposals/<repo>/<target>` (gitignored,
|
|
regenerated) — a reviewable diff a human applies (FR-X3). The committed
|
|
`distribute/active_patterns.json` records which pattern+version is proposed in
|
|
which `(repo, flavor)` (FR-X4).
|
|
|
|
## Measure effectiveness (closing the loop)
|
|
|
|
Track whether the fleet is getting cheaper / more reliable, and whether a
|
|
distributed pattern actually helped.
|
|
|
|
```bash
|
|
python -m session_memory.measure --label "baseline" # snapshot + trend
|
|
python -m session_memory.measure --since 2026-06-07 # before/after a change
|
|
python -m session_memory.measure --no-save --json
|
|
```
|
|
|
|
- A **snapshot** (infra-overhead share, error rate, schema-thrash, token
|
|
percentiles, success rate) is appended to `measure/baselines.jsonl` to build a
|
|
trend (FR-M3).
|
|
- `--since DATE` splits sessions before/after a change and diffs the metrics, with
|
|
an `improved` verdict per metric (FR-M1/FR-M2) — so ineffective patterns can be
|
|
retired. Recorded pre-fix baseline (2026-06-07): 27 sessions, infra-overhead
|
|
median 11.7 %, error rate 0.96, schema-thrash 8 sessions.
|
|
|
|
## Retention knobs (`[retention]` in config.toml)
|
|
|
|
| Key | Meaning |
|
|
|-----|---------|
|
|
| `raw_soft_cap_bytes` | begin evicting **analyzed** sessions above this (oldest first) |
|
|
| `raw_hard_cap_bytes` | absolute Tier 1 ceiling; overflow path may, as a last resort, evict un-analyzed sessions and report `data_loss` |
|
|
| `raw_max_age_days` | backstop: analyzed raw older than this is evictable regardless of space |
|
|
| `distilled_cap_bytes` | Tier 2 ceiling — **alert only**, never auto-dropped |
|
|
|
|
**Invariant:** a session's raw bytes are never dropped before its Tier 2 digest
|
|
exists, except the explicitly-reported hard-cap overflow path.
|
|
|
|
## Tests
|
|
|
|
```bash
|
|
python -m pytest # schema, adapters, store, digest, retention, ingest, detect, curate
|
|
```
|
|
|
|
## Status
|
|
|
|
- **Phase 0** (AGENTIC-WP-0002): schema, store, digest, budget retention, Claude
|
|
adapter, ingest sweep.
|
|
- **Phase 1** (AGENTIC-WP-0003): Codex + Grok adapters, multi-file session merge,
|
|
and the Detect pipeline (signals → clustering → cross-flavor candidate patterns).
|
|
- **Phase 2** (AGENTIC-WP-0004): Curate — Solution Pattern schema, versioned
|
|
files-first Pattern Catalog, discuss/approve/reject review with an evidence bar +
|
|
bloat guard, and hub-decision audit trail.
|
|
- **Detect hardening** (AGENTIC-WP-0005): session-quality filter + tool-mix /
|
|
infra-overhead signals. **Error mining** (AGENTIC-WP-0006): recurring error
|
|
fingerprints → root-cause patterns.
|
|
- **Phase 3** (AGENTIC-WP-0007): Distribute — per-flavor distributor adapters
|
|
render approved patterns into proposed (HITL) artifacts, scoped by repo/domain,
|
|
with an active-pattern registry.
|
|
- **Phase 4** (AGENTIC-WP-0009): Measure — fleet baseline/trend + before/after
|
|
per-pattern effectiveness. The Capture → Detect → Curate → Distribute → Measure
|
|
loop is closed.
|