# session_memory Capture + retention layer for Helix Forge โ€” the **Capture** stage of the loop in [../docs/PRD-helix-forge.md](../docs/PRD-helix-forge.md), built to the [../docs/DESIGN-session-memory.md](../docs/DESIGN-session-memory.md) spec. It scans coding-agent session logs, normalizes them into one schema, distills a compact per-session digest, and ages out raw bulk under a **storage budget** (dropping sessions once analyzed and once space is needed) rather than a fixed time window. ## Layout ``` session_memory/ adapters/claude.py # Tier0 -> Tier1 normalizer (Codex/Grok land in Phase 1) core/schema.py # Session / SessionEvent / Cost core/store.py # SQLite rows + blob-dir bodies (Tier1) + digests (Tier2) core/cursor.py # incremental ingest cursors core/digest.py # Tier1 -> Tier2 promotion + outcome heuristic core/retention.py # budget-based eviction sweep ingest.py # one sweep: discover -> normalize -> store -> digest -> evict config.toml # store paths, retention caps, sources, repo->domain map ``` The local store lives under `session_memory/.store/` (gitignored). ## Run a sweep ```bash # from the repo root python -m session_memory.ingest # ingest + analyze + evict python -m session_memory.ingest --dry-run # discover + parse only, writes nothing python -m session_memory.ingest --config path/to/config.toml ``` Output reports `discovered / ingested / skipped_unchanged / analyzed` and a retention line (`freed`, `final_usage`, and per-pass eviction counts). Sweeps are idempotent โ€” re-running skips unchanged files via the cursor. ## Scheduling (cadence) Retention is budget-based; the `cadence` in `config.toml` only decides how often the sweep *runs*. Trigger it with the repo scheduler, e.g. daily: ```bash # Claude Code: schedule a daily routine that runs the sweep /schedule "daily session-memory sweep" -- python -m session_memory.ingest ``` or a cron entry / `/loop` on a timer. Push-capture (agent Stop/SessionEnd hooks) can also enqueue a sweep; see design ยง7. ## Retention knobs (`[retention]` in config.toml) | Key | Meaning | |-----|---------| | `raw_soft_cap_bytes` | begin evicting **analyzed** sessions above this (oldest first) | | `raw_hard_cap_bytes` | absolute Tier 1 ceiling; overflow path may, as a last resort, evict un-analyzed sessions and report `data_loss` | | `raw_max_age_days` | backstop: analyzed raw older than this is evictable regardless of space | | `distilled_cap_bytes` | Tier 2 ceiling โ€” **alert only**, never auto-dropped | **Invariant:** a session's raw bytes are never dropped before its Tier 2 digest exists, except the explicitly-reported hard-cap overflow path. ## Tests ```bash python -m pytest # 26 tests: schema, adapter, store, digest, retention, ingest ``` ## Status Phase 0 (AGENTIC-WP-0002): Claude adapter only, end to end. Codex and Grok adapters are designed (schemas confirmed in the design doc) and land in Phase 1.