Files
agentic-resources/workplans/AGENTIC-WP-0002-session-memory-phase0.md
tegwick 586ed90948 session-memory Phase 0: ingest cursor + sweep entrypoint + config (T06)
- session_memory/core/cursor.py: size/mtime change detection sidecar
- session_memory/config.toml: store paths, retention caps, per-source
  globs (claude on, codex/grok off for Phase 1), repo->domain map
- session_memory/ingest.py: discover->normalize->store->digest->evict;
  --dry-run creates/writes nothing; python -m session_memory.ingest
- tests/test_ingest.py; live dry-run parsed 84/85 real local sessions

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 21:41:59 +02:00

4.6 KiB

id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_workstream_id
id type title domain repo status owner topic_slug created updated state_hub_workstream_id
AGENTIC-WP-0002 workplan Coding Session Memory — Phase 0 (Capture + budget-based retention) helix_forge agentic-resources active codex helix-forge 2026-06-06 2026-06-06 06e6726d-057d-47d8-84f4-0974858f6288

Coding Session Memory — Phase 0

Implements Phase 0 of PRD-helix-forge per the session-memory design: a normalized session schema, the first (Claude) collector, a two-tier store, and a budget-based eviction sweep that drops analyzed/over-budget raw content while preserving compact digests.

Scope is deliberately one agent flavor (Claude, schema verified on disk) end to end, so the agnostic-core / thin-adapter boundary is proven before Codex and Grok adapters land in Phase 1.

Define Normalized Session Schema

id: AGENTIC-WP-0002-T01
status: done
priority: high
state_hub_task_id: "61297a16-257c-4579-bd1f-3db035781258"

Implement core/schema.py with the Session and SessionEvent dataclasses from design §4, including schema_version, the flavor-prefixed session_uid, the cost block, and the discovered/ingested/analyzed/evicted watermarks. Add round-trip (de)serialization tests. This is the contract every adapter targets.

Claude Collector Adapter

id: AGENTIC-WP-0002-T02
status: done
priority: high
state_hub_task_id: "3b4e6b35-b4f3-40dc-a845-7ac78aa20d62"

Implement adapters/claude.py: read ~/.claude/projects/<cwd>/<uuid>.jsonl, discriminate on type, reconstruct the turn DAG via uuid/parentUuid, map records onto SessionEvent.kind, capture message.usage into the cost block, handle agent-*.jsonl sidechains (is_sidechain), and resolve repo/domain from cwd. Verify against real local sessions in this repo's project dir. No Codex/Grok work in Phase 0 (designed for, not built).

Tier 1 / Tier 2 Store

id: AGENTIC-WP-0002-T03
status: done
priority: high
state_hub_task_id: "2387258e-ba6d-4a41-919e-f2f4e0822110"

Implement core/store.py: SQLite for Session/SessionEvent rows plus a blob dir for payload_ref bodies (Tier 1), and a compact digest table (Tier 2). Writes are idempotent on (session_uid, seq). Provide usage-bytes accounting for Tier 1 (rows + blobs) and Tier 2, used by retention.

Session Digest (Tier 1 → Tier 2)

id: AGENTIC-WP-0002-T04
status: done
priority: medium
state_hub_task_id: "017d8e90-633a-49f2-b342-8690938798cd"

Implement core/digest.py: produce a per-session digest (outcome heuristic, cost totals, tool histogram, error/retry/human-intervention markers, key snippets) and set analyzed_at. This is the promotion step that makes a session evictable. Signal extraction beyond the digest stays stubbed for the Detect phase.

Budget-Based Retention Sweep

id: AGENTIC-WP-0002-T05
status: done
priority: high
state_hub_task_id: "89177c79-528e-4023-a7eb-67f8e0276ba9"

Implement core/retention.py per design §5: backstop pass (raw_max_age_days), budget pass (evict oldest analyzed_at first while over raw_soft_cap_bytes, never touching un-analyzed sessions), and the hard-cap overflow path (analyze-now, then last-resort evict oldest un-analyzed with a reported data_loss event). Enforce the invariant: raw bytes are never dropped before the Tier 2 digest exists (except the reported overflow path). Cover each branch with tests using synthetic sessions and tiny caps.

Ingest Cursor + Sweep Entrypoint

id: AGENTIC-WP-0002-T06
status: done
priority: medium
state_hub_task_id: "a4b35c76-154d-4e99-b6d0-61cb6e47ecc0"

Implement core/cursor.py (per-source (path,size,mtime,offset) cursors, idempotent re-runs) and ingest.py wiring one sweep: discover → normalize (Claude) → store → digest → evict. Add config.toml with the §5.1 retention knobs, source paths, and repo→domain map. Document running a sweep and the intended cadence trigger (/schedule daily/weekly) in the repo docs.

Verify End-to-End on Real Sessions

id: AGENTIC-WP-0002-T07
status: progress
priority: medium
state_hub_task_id: "98d5cc7c-c285-4556-91a3-a85e0a2bb6df"

Run the full sweep against this workstation's real Claude sessions; confirm normalized rows, digests, idempotent re-run, and an eviction cycle under a small test cap (analyzed dropped, un-analyzed preserved, overflow reported). Record results and update the design doc's open questions (esp. OQ4 real per-session sizes). After workplan file updates, notify the custodian operator to run from ~/state-hub:

make fix-consistency REPO=agentic-resources