- adapters/common.py: shared Normalized + helpers (resolve_repo, classify_tool,
jsonl iter, etc.); claude.py refactored to use it (Normalized re-exported)
- adapters/codex.py: rollout {timestamp,type,payload} parser; session_meta/
response_item/event_msg mapping; flat call_id join; token_count cost;
registered in ingest dispatch
- core/store.py: ingest() now merges multi-file sessions by content
fingerprint, appends new events with offset seq (design OQ6); idempotent
- tests/test_codex_adapter.py, tests/test_merge.py
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
session_memory
Capture + retention layer for Helix Forge — the Capture stage of the loop in ../docs/PRD-helix-forge.md, built to the ../docs/DESIGN-session-memory.md spec.
It scans coding-agent session logs, normalizes them into one schema, distills a compact per-session digest, and ages out raw bulk under a storage budget (dropping sessions once analyzed and once space is needed) rather than a fixed time window.
Layout
session_memory/
adapters/claude.py # Tier0 -> Tier1 normalizer (Codex/Grok land in Phase 1)
core/schema.py # Session / SessionEvent / Cost
core/store.py # SQLite rows + blob-dir bodies (Tier1) + digests (Tier2)
core/cursor.py # incremental ingest cursors
core/digest.py # Tier1 -> Tier2 promotion + outcome heuristic
core/retention.py # budget-based eviction sweep
ingest.py # one sweep: discover -> normalize -> store -> digest -> evict
config.toml # store paths, retention caps, sources, repo->domain map
The local store lives under session_memory/.store/ (gitignored).
Run a sweep
# from the repo root
python -m session_memory.ingest # ingest + analyze + evict
python -m session_memory.ingest --dry-run # discover + parse only, writes nothing
python -m session_memory.ingest --config path/to/config.toml
Output reports discovered / ingested / skipped_unchanged / analyzed and a
retention line (freed, final_usage, and per-pass eviction counts). Sweeps are
idempotent — re-running skips unchanged files via the cursor.
Scheduling (cadence)
Retention is budget-based; the cadence in config.toml only decides how often
the sweep runs. Trigger it with the repo scheduler, e.g. daily:
# Claude Code: schedule a daily routine that runs the sweep
/schedule "daily session-memory sweep" -- python -m session_memory.ingest
or a cron entry / /loop on a timer. Push-capture (agent Stop/SessionEnd hooks)
can also enqueue a sweep; see design §7.
Retention knobs ([retention] in config.toml)
| Key | Meaning |
|---|---|
raw_soft_cap_bytes |
begin evicting analyzed sessions above this (oldest first) |
raw_hard_cap_bytes |
absolute Tier 1 ceiling; overflow path may, as a last resort, evict un-analyzed sessions and report data_loss |
raw_max_age_days |
backstop: analyzed raw older than this is evictable regardless of space |
distilled_cap_bytes |
Tier 2 ceiling — alert only, never auto-dropped |
Invariant: a session's raw bytes are never dropped before its Tier 2 digest exists, except the explicitly-reported hard-cap overflow path.
Tests
python -m pytest # 26 tests: schema, adapter, store, digest, retention, ingest
Status
Phase 0 (AGENTIC-WP-0002): Claude adapter only, end to end. Codex and Grok adapters are designed (schemas confirmed in the design doc) and land in Phase 1.