generated from coulomb/repo-seed
Marks AGENTIC-WP-0003 finished. Full suite 40/40 green; live pipeline over real local sessions (Codex via fixtures) surfaced 3 candidate patterns, 2 cross-flavor (Claude+Grok) — PRD success metric met. README documents the detect entrypoint and Phase 0/1/next status. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
100 lines
4.2 KiB
Markdown
100 lines
4.2 KiB
Markdown
# session_memory
|
|
|
|
Capture + retention layer for Helix Forge — the **Capture** stage of the loop in
|
|
[../docs/PRD-helix-forge.md](../docs/PRD-helix-forge.md), built to the
|
|
[../docs/DESIGN-session-memory.md](../docs/DESIGN-session-memory.md) spec.
|
|
|
|
It scans coding-agent session logs, normalizes them into one schema, distills a
|
|
compact per-session digest, and ages out raw bulk under a **storage budget**
|
|
(dropping sessions once analyzed and once space is needed) rather than a fixed
|
|
time window.
|
|
|
|
## Layout
|
|
|
|
```
|
|
session_memory/
|
|
adapters/common.py # shared Normalized bundle + helpers
|
|
adapters/claude.py # Tier0 -> Tier1 normalizers, one per flavor
|
|
adapters/codex.py # (rollout {timestamp,type,payload}, flat call_id join)
|
|
adapters/grok.py # (per-session dir: chat_history + events + updates)
|
|
core/schema.py # Session / SessionEvent / Cost
|
|
core/store.py # SQLite rows + blob-dir bodies (Tier1) + digests/patterns (Tier2)
|
|
core/cursor.py # incremental ingest cursors
|
|
core/digest.py # Tier1 -> Tier2 promotion + outcome heuristic
|
|
core/retention.py # budget-based eviction sweep
|
|
ingest.py # one sweep: discover -> normalize -> store -> digest -> evict
|
|
detect/signals.py # signal extractors over digests
|
|
detect/cluster.py # cluster signals -> candidate patterns + cross-flavor flag
|
|
detect/__main__.py # python -m session_memory.detect (ranked report)
|
|
config.toml # store paths, retention caps, sources, repo->domain map
|
|
```
|
|
|
|
The local store lives under `session_memory/.store/` (gitignored).
|
|
|
|
## Run a sweep
|
|
|
|
```bash
|
|
# from the repo root
|
|
python -m session_memory.ingest # ingest + analyze + evict
|
|
python -m session_memory.ingest --dry-run # discover + parse only, writes nothing
|
|
python -m session_memory.ingest --config path/to/config.toml
|
|
```
|
|
|
|
Output reports `discovered / ingested / skipped_unchanged / analyzed` and a
|
|
retention line (`freed`, `final_usage`, and per-pass eviction counts). Sweeps are
|
|
idempotent — re-running skips unchanged files via the cursor.
|
|
|
|
## Scheduling (cadence)
|
|
|
|
Retention is budget-based; the `cadence` in `config.toml` only decides how often
|
|
the sweep *runs*. Trigger it with the repo scheduler, e.g. daily:
|
|
|
|
```bash
|
|
# Claude Code: schedule a daily routine that runs the sweep
|
|
/schedule "daily session-memory sweep" -- python -m session_memory.ingest
|
|
```
|
|
|
|
or a cron entry / `/loop` on a timer. Push-capture (agent Stop/SessionEnd hooks)
|
|
can also enqueue a sweep; see design §7.
|
|
|
|
## Detect candidate patterns
|
|
|
|
After ingesting, mine the digests for recurring problem/success patterns:
|
|
|
|
```bash
|
|
python -m session_memory.detect # ranked report, cross-flavor first
|
|
python -m session_memory.detect --json # machine-readable candidates
|
|
python -m session_memory.detect --min-frequency 3
|
|
```
|
|
|
|
Candidates are persisted to a Tier 2 `patterns` table and are the input to the
|
|
Curate phase (Phase 2). Patterns whose evidence spans more than one agent flavor
|
|
are flagged `[CROSS-FLAVOR]` — the highest-value reuse targets.
|
|
|
|
## Retention knobs (`[retention]` in config.toml)
|
|
|
|
| Key | Meaning |
|
|
|-----|---------|
|
|
| `raw_soft_cap_bytes` | begin evicting **analyzed** sessions above this (oldest first) |
|
|
| `raw_hard_cap_bytes` | absolute Tier 1 ceiling; overflow path may, as a last resort, evict un-analyzed sessions and report `data_loss` |
|
|
| `raw_max_age_days` | backstop: analyzed raw older than this is evictable regardless of space |
|
|
| `distilled_cap_bytes` | Tier 2 ceiling — **alert only**, never auto-dropped |
|
|
|
|
**Invariant:** a session's raw bytes are never dropped before its Tier 2 digest
|
|
exists, except the explicitly-reported hard-cap overflow path.
|
|
|
|
## Tests
|
|
|
|
```bash
|
|
python -m pytest # 26 tests: schema, adapter, store, digest, retention, ingest
|
|
```
|
|
|
|
## Status
|
|
|
|
- **Phase 0** (AGENTIC-WP-0002): schema, store, digest, budget retention, Claude
|
|
adapter, ingest sweep.
|
|
- **Phase 1** (AGENTIC-WP-0003): Codex + Grok adapters, multi-file session merge,
|
|
and the Detect pipeline (signals → clustering → cross-flavor candidate patterns).
|
|
- **Next — Phase 2 (Curate):** review/approve candidates into a versioned pattern
|
|
catalog. **Phase 3 (Distribute) / Phase 4 (Measure)** follow per the PRD.
|