Files
agentic-resources/session_memory/README.md
tegwick 7c6f4358ee session-memory Phase 0: end-to-end verification + docs (T07)
- verified full sweep over 85 real local Claude transcripts: 63 sessions
  ingested+analyzed, eviction under tiny cap freed 26MB with zero data loss,
  digest-preservation invariant holds, idempotent re-run
- session_memory/README.md: usage, scheduling, retention knobs
- design doc: OQ4 resolved (median ~49KB/session), OQ6 (multi-file sessions)
- workplan AGENTIC-WP-0002 finished

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 21:44:46 +02:00

76 lines
2.9 KiB
Markdown

# session_memory
Capture + retention layer for Helix Forge — the **Capture** stage of the loop in
[../docs/PRD-helix-forge.md](../docs/PRD-helix-forge.md), built to the
[../docs/DESIGN-session-memory.md](../docs/DESIGN-session-memory.md) spec.
It scans coding-agent session logs, normalizes them into one schema, distills a
compact per-session digest, and ages out raw bulk under a **storage budget**
(dropping sessions once analyzed and once space is needed) rather than a fixed
time window.
## Layout
```
session_memory/
adapters/claude.py # Tier0 -> Tier1 normalizer (Codex/Grok land in Phase 1)
core/schema.py # Session / SessionEvent / Cost
core/store.py # SQLite rows + blob-dir bodies (Tier1) + digests (Tier2)
core/cursor.py # incremental ingest cursors
core/digest.py # Tier1 -> Tier2 promotion + outcome heuristic
core/retention.py # budget-based eviction sweep
ingest.py # one sweep: discover -> normalize -> store -> digest -> evict
config.toml # store paths, retention caps, sources, repo->domain map
```
The local store lives under `session_memory/.store/` (gitignored).
## Run a sweep
```bash
# from the repo root
python -m session_memory.ingest # ingest + analyze + evict
python -m session_memory.ingest --dry-run # discover + parse only, writes nothing
python -m session_memory.ingest --config path/to/config.toml
```
Output reports `discovered / ingested / skipped_unchanged / analyzed` and a
retention line (`freed`, `final_usage`, and per-pass eviction counts). Sweeps are
idempotent — re-running skips unchanged files via the cursor.
## Scheduling (cadence)
Retention is budget-based; the `cadence` in `config.toml` only decides how often
the sweep *runs*. Trigger it with the repo scheduler, e.g. daily:
```bash
# Claude Code: schedule a daily routine that runs the sweep
/schedule "daily session-memory sweep" -- python -m session_memory.ingest
```
or a cron entry / `/loop` on a timer. Push-capture (agent Stop/SessionEnd hooks)
can also enqueue a sweep; see design §7.
## Retention knobs (`[retention]` in config.toml)
| Key | Meaning |
|-----|---------|
| `raw_soft_cap_bytes` | begin evicting **analyzed** sessions above this (oldest first) |
| `raw_hard_cap_bytes` | absolute Tier 1 ceiling; overflow path may, as a last resort, evict un-analyzed sessions and report `data_loss` |
| `raw_max_age_days` | backstop: analyzed raw older than this is evictable regardless of space |
| `distilled_cap_bytes` | Tier 2 ceiling — **alert only**, never auto-dropped |
**Invariant:** a session's raw bytes are never dropped before its Tier 2 digest
exists, except the explicitly-reported hard-cap overflow path.
## Tests
```bash
python -m pytest # 26 tests: schema, adapter, store, digest, retention, ingest
```
## Status
Phase 0 (AGENTIC-WP-0002): Claude adapter only, end to end. Codex and Grok
adapters are designed (schemas confirmed in the design doc) and land in Phase 1.