Files
agentic-resources/session_memory
tegwick 436a96dcd8 session-memory Phase 1: Detect pipeline (T04-T07)
- detect/signals.py: pure extractors over digests (retry storm, repeated
  errors, budget overrun vs corpus p90, abandoned, clean pass, recovery)
- detect/cluster.py: deterministic clustering into candidate Patterns with
  evidence (sessions/repos/flavors/cost impact) + cross-flavor flagging
- detect/__main__.py: python -m session_memory.detect, ranked report
  (cross-flavor first) + --json; persists candidates to Tier 2 patterns table
- core/store.py: list_digests + save_patterns
- tests for signals, cluster, detect entrypoint

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 22:31:13 +02:00
..

session_memory

Capture + retention layer for Helix Forge — the Capture stage of the loop in ../docs/PRD-helix-forge.md, built to the ../docs/DESIGN-session-memory.md spec.

It scans coding-agent session logs, normalizes them into one schema, distills a compact per-session digest, and ages out raw bulk under a storage budget (dropping sessions once analyzed and once space is needed) rather than a fixed time window.

Layout

session_memory/
  adapters/claude.py   # Tier0 -> Tier1 normalizer (Codex/Grok land in Phase 1)
  core/schema.py       # Session / SessionEvent / Cost
  core/store.py        # SQLite rows + blob-dir bodies (Tier1) + digests (Tier2)
  core/cursor.py       # incremental ingest cursors
  core/digest.py       # Tier1 -> Tier2 promotion + outcome heuristic
  core/retention.py    # budget-based eviction sweep
  ingest.py            # one sweep: discover -> normalize -> store -> digest -> evict
  config.toml          # store paths, retention caps, sources, repo->domain map

The local store lives under session_memory/.store/ (gitignored).

Run a sweep

# from the repo root
python -m session_memory.ingest                 # ingest + analyze + evict
python -m session_memory.ingest --dry-run       # discover + parse only, writes nothing
python -m session_memory.ingest --config path/to/config.toml

Output reports discovered / ingested / skipped_unchanged / analyzed and a retention line (freed, final_usage, and per-pass eviction counts). Sweeps are idempotent — re-running skips unchanged files via the cursor.

Scheduling (cadence)

Retention is budget-based; the cadence in config.toml only decides how often the sweep runs. Trigger it with the repo scheduler, e.g. daily:

# Claude Code: schedule a daily routine that runs the sweep
/schedule "daily session-memory sweep" -- python -m session_memory.ingest

or a cron entry / /loop on a timer. Push-capture (agent Stop/SessionEnd hooks) can also enqueue a sweep; see design §7.

Retention knobs ([retention] in config.toml)

Key Meaning
raw_soft_cap_bytes begin evicting analyzed sessions above this (oldest first)
raw_hard_cap_bytes absolute Tier 1 ceiling; overflow path may, as a last resort, evict un-analyzed sessions and report data_loss
raw_max_age_days backstop: analyzed raw older than this is evictable regardless of space
distilled_cap_bytes Tier 2 ceiling — alert only, never auto-dropped

Invariant: a session's raw bytes are never dropped before its Tier 2 digest exists, except the explicitly-reported hard-cap overflow path.

Tests

python -m pytest          # 26 tests: schema, adapter, store, digest, retention, ingest

Status

Phase 0 (AGENTIC-WP-0002): Claude adapter only, end to end. Codex and Grok adapters are designed (schemas confirmed in the design doc) and land in Phase 1.