- verified full sweep over 85 real local Claude transcripts: 63 sessions ingested+analyzed, eviction under tiny cap freed 26MB with zero data loss, digest-preservation invariant holds, idempotent re-run - session_memory/README.md: usage, scheduling, retention knobs - design doc: OQ4 resolved (median ~49KB/session), OQ6 (multi-file sessions) - workplan AGENTIC-WP-0002 finished Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5.2 KiB
id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_workstream_id
| id | type | title | domain | repo | status | owner | topic_slug | created | updated | state_hub_workstream_id |
|---|---|---|---|---|---|---|---|---|---|---|
| AGENTIC-WP-0002 | workplan | Coding Session Memory — Phase 0 (Capture + budget-based retention) | helix_forge | agentic-resources | finished | codex | helix-forge | 2026-06-06 | 2026-06-06 | 06e6726d-057d-47d8-84f4-0974858f6288 |
Coding Session Memory — Phase 0
Implements Phase 0 of PRD-helix-forge per the session-memory design: a normalized session schema, the first (Claude) collector, a two-tier store, and a budget-based eviction sweep that drops analyzed/over-budget raw content while preserving compact digests.
Scope is deliberately one agent flavor (Claude, schema verified on disk) end to end, so the agnostic-core / thin-adapter boundary is proven before Codex and Grok adapters land in Phase 1.
Define Normalized Session Schema
id: AGENTIC-WP-0002-T01
status: done
priority: high
state_hub_task_id: "61297a16-257c-4579-bd1f-3db035781258"
Implement core/schema.py with the Session and SessionEvent dataclasses from
design §4, including schema_version, the flavor-prefixed session_uid, the
cost block, and the discovered/ingested/analyzed/evicted watermarks. Add
round-trip (de)serialization tests. This is the contract every adapter targets.
Claude Collector Adapter
id: AGENTIC-WP-0002-T02
status: done
priority: high
state_hub_task_id: "3b4e6b35-b4f3-40dc-a845-7ac78aa20d62"
Implement adapters/claude.py: read ~/.claude/projects/<cwd>/<uuid>.jsonl,
discriminate on type, reconstruct the turn DAG via uuid/parentUuid, map
records onto SessionEvent.kind, capture message.usage into the cost block,
handle agent-*.jsonl sidechains (is_sidechain), and resolve repo/domain
from cwd. Verify against real local sessions in this repo's project dir. No
Codex/Grok work in Phase 0 (designed for, not built).
Tier 1 / Tier 2 Store
id: AGENTIC-WP-0002-T03
status: done
priority: high
state_hub_task_id: "2387258e-ba6d-4a41-919e-f2f4e0822110"
Implement core/store.py: SQLite for Session/SessionEvent rows plus a blob
dir for payload_ref bodies (Tier 1), and a compact digest table (Tier 2).
Writes are idempotent on (session_uid, seq). Provide usage-bytes accounting for
Tier 1 (rows + blobs) and Tier 2, used by retention.
Session Digest (Tier 1 → Tier 2)
id: AGENTIC-WP-0002-T04
status: done
priority: medium
state_hub_task_id: "017d8e90-633a-49f2-b342-8690938798cd"
Implement core/digest.py: produce a per-session digest (outcome heuristic, cost
totals, tool histogram, error/retry/human-intervention markers, key snippets) and
set analyzed_at. This is the promotion step that makes a session evictable.
Signal extraction beyond the digest stays stubbed for the Detect phase.
Budget-Based Retention Sweep
id: AGENTIC-WP-0002-T05
status: done
priority: high
state_hub_task_id: "89177c79-528e-4023-a7eb-67f8e0276ba9"
Implement core/retention.py per design §5: backstop pass (raw_max_age_days),
budget pass (evict oldest analyzed_at first while over raw_soft_cap_bytes,
never touching un-analyzed sessions), and the hard-cap overflow path (analyze-now,
then last-resort evict oldest un-analyzed with a reported data_loss event).
Enforce the invariant: raw bytes are never dropped before the Tier 2 digest
exists (except the reported overflow path). Cover each branch with tests using
synthetic sessions and tiny caps.
Ingest Cursor + Sweep Entrypoint
id: AGENTIC-WP-0002-T06
status: done
priority: medium
state_hub_task_id: "a4b35c76-154d-4e99-b6d0-61cb6e47ecc0"
Implement core/cursor.py (per-source (path,size,mtime,offset) cursors,
idempotent re-runs) and ingest.py wiring one sweep: discover → normalize
(Claude) → store → digest → evict. Add config.toml with the §5.1 retention
knobs, source paths, and repo→domain map. Document running a sweep and the
intended cadence trigger (/schedule daily/weekly) in the repo docs.
Verify End-to-End on Real Sessions
id: AGENTIC-WP-0002-T07
status: done
priority: medium
state_hub_task_id: "98d5cc7c-c285-4556-91a3-a85e0a2bb6df"
Run the full sweep against this workstation's real Claude sessions; confirm
normalized rows, digests, idempotent re-run, and an eviction cycle under a small
test cap (analyzed dropped, un-analyzed preserved, overflow reported). Record
results and update the design doc's open questions (esp. OQ4 real per-session
sizes). After workplan file updates, notify the custodian operator to run from
~/state-hub:
make fix-consistency REPO=agentic-resources
Verification results (2026-06-06): full suite 26/26 green. Live sweep over 85 real local Claude transcripts → 84 parsed (1 empty, tolerated) → 63 distinct sessions ingested + analyzed. Eviction under a 2 MB soft cap freed 26 MB, ended at 1.34 MB (under cap), zero data loss, and every evicted session kept its Tier 2 digest (invariant holds). Idempotent re-run skipped 84 unchanged, re-ingested 1 (the live session). Outcomes 52 success / 11 abandoned. OQ4 sizes recorded in the design doc; OQ6 (multi-file sessions) noted as a Phase-1 refinement.