--- id: AGENTIC-WP-0002 type: workplan title: "Coding Session Memory — Phase 0 (Capture + budget-based retention)" domain: helix_forge repo: agentic-resources status: finished owner: codex topic_slug: helix-forge created: "2026-06-06" updated: "2026-06-06" state_hub_workstream_id: "06e6726d-057d-47d8-84f4-0974858f6288" --- # Coding Session Memory — Phase 0 Implements Phase 0 of [PRD-helix-forge](../docs/PRD-helix-forge.md) per the [session-memory design](../docs/DESIGN-session-memory.md): a normalized session schema, the first (Claude) collector, a two-tier store, and a budget-based eviction sweep that drops analyzed/over-budget raw content while preserving compact digests. Scope is deliberately one agent flavor (Claude, schema verified on disk) end to end, so the agnostic-core / thin-adapter boundary is proven before Codex and Grok adapters land in Phase 1. ## Define Normalized Session Schema ```task id: AGENTIC-WP-0002-T01 status: done priority: high state_hub_task_id: "61297a16-257c-4579-bd1f-3db035781258" ``` Implement `core/schema.py` with the `Session` and `SessionEvent` dataclasses from design §4, including `schema_version`, the `flavor`-prefixed `session_uid`, the cost block, and the `discovered/ingested/analyzed/evicted` watermarks. Add round-trip (de)serialization tests. This is the contract every adapter targets. ## Claude Collector Adapter ```task id: AGENTIC-WP-0002-T02 status: done priority: high state_hub_task_id: "3b4e6b35-b4f3-40dc-a845-7ac78aa20d62" ``` Implement `adapters/claude.py`: read `~/.claude/projects//.jsonl`, discriminate on `type`, reconstruct the turn DAG via `uuid`/`parentUuid`, map records onto `SessionEvent.kind`, capture `message.usage` into the cost block, handle `agent-*.jsonl` sidechains (`is_sidechain`), and resolve `repo`/`domain` from `cwd`. Verify against real local sessions in this repo's project dir. No Codex/Grok work in Phase 0 (designed for, not built). ## Tier 1 / Tier 2 Store ```task id: AGENTIC-WP-0002-T03 status: done priority: high state_hub_task_id: "2387258e-ba6d-4a41-919e-f2f4e0822110" ``` Implement `core/store.py`: SQLite for `Session`/`SessionEvent` rows plus a blob dir for `payload_ref` bodies (Tier 1), and a compact `digest` table (Tier 2). Writes are idempotent on `(session_uid, seq)`. Provide usage-bytes accounting for Tier 1 (rows + blobs) and Tier 2, used by retention. ## Session Digest (Tier 1 → Tier 2) ```task id: AGENTIC-WP-0002-T04 status: done priority: medium state_hub_task_id: "017d8e90-633a-49f2-b342-8690938798cd" ``` Implement `core/digest.py`: produce a per-session digest (outcome heuristic, cost totals, tool histogram, error/retry/human-intervention markers, key snippets) and set `analyzed_at`. This is the promotion step that makes a session evictable. Signal extraction beyond the digest stays stubbed for the Detect phase. ## Budget-Based Retention Sweep ```task id: AGENTIC-WP-0002-T05 status: done priority: high state_hub_task_id: "89177c79-528e-4023-a7eb-67f8e0276ba9" ``` Implement `core/retention.py` per design §5: backstop pass (`raw_max_age_days`), budget pass (evict oldest `analyzed_at` first while over `raw_soft_cap_bytes`, never touching un-analyzed sessions), and the hard-cap overflow path (analyze-now, then last-resort evict oldest un-analyzed with a reported `data_loss` event). Enforce the invariant: raw bytes are never dropped before the Tier 2 digest exists (except the reported overflow path). Cover each branch with tests using synthetic sessions and tiny caps. ## Ingest Cursor + Sweep Entrypoint ```task id: AGENTIC-WP-0002-T06 status: done priority: medium state_hub_task_id: "a4b35c76-154d-4e99-b6d0-61cb6e47ecc0" ``` Implement `core/cursor.py` (per-source `(path,size,mtime,offset)` cursors, idempotent re-runs) and `ingest.py` wiring one sweep: discover → normalize (Claude) → store → digest → evict. Add `config.toml` with the §5.1 retention knobs, source paths, and repo→domain map. Document running a sweep and the intended `cadence` trigger (`/schedule` daily/weekly) in the repo docs. ## Verify End-to-End on Real Sessions ```task id: AGENTIC-WP-0002-T07 status: done priority: medium state_hub_task_id: "98d5cc7c-c285-4556-91a3-a85e0a2bb6df" ``` Run the full sweep against this workstation's real Claude sessions; confirm normalized rows, digests, idempotent re-run, and an eviction cycle under a small test cap (analyzed dropped, un-analyzed preserved, overflow reported). Record results and update the design doc's open questions (esp. OQ4 real per-session sizes). After workplan file updates, notify the custodian operator to run from `~/state-hub`: ```bash make fix-consistency REPO=agentic-resources ``` **Verification results (2026-06-06):** full suite 26/26 green. Live sweep over 85 real local Claude transcripts → 84 parsed (1 empty, tolerated) → 63 distinct sessions ingested + analyzed. Eviction under a 2 MB soft cap freed 26 MB, ended at 1.34 MB (under cap), zero data loss, and every evicted session kept its Tier 2 digest (invariant holds). Idempotent re-run skipped 84 unchanged, re-ingested 1 (the live session). Outcomes 52 success / 11 abandoned. OQ4 sizes recorded in the design doc; OQ6 (multi-file sessions) noted as a Phase-1 refinement.