generated from coulomb/repo-seed
- session_memory/core/store.py: SQLite rows + blob-dir bodies, idempotent ingest on (session_uid,seq), Tier1/Tier2 usage accounting, evict_raw that drops raw but preserves the digest; watermark columns authoritative - tests/test_store.py: ingest idempotency, accounting, eviction invariant Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
136 lines
4.6 KiB
Markdown
136 lines
4.6 KiB
Markdown
---
|
|
id: AGENTIC-WP-0002
|
|
type: workplan
|
|
title: "Coding Session Memory — Phase 0 (Capture + budget-based retention)"
|
|
domain: helix_forge
|
|
repo: agentic-resources
|
|
status: active
|
|
owner: codex
|
|
topic_slug: helix-forge
|
|
created: "2026-06-06"
|
|
updated: "2026-06-06"
|
|
state_hub_workstream_id: "06e6726d-057d-47d8-84f4-0974858f6288"
|
|
---
|
|
|
|
# Coding Session Memory — Phase 0
|
|
|
|
Implements Phase 0 of [PRD-helix-forge](../docs/PRD-helix-forge.md) per the
|
|
[session-memory design](../docs/DESIGN-session-memory.md): a normalized session
|
|
schema, the first (Claude) collector, a two-tier store, and a budget-based
|
|
eviction sweep that drops analyzed/over-budget raw content while preserving
|
|
compact digests.
|
|
|
|
Scope is deliberately one agent flavor (Claude, schema verified on disk) end to
|
|
end, so the agnostic-core / thin-adapter boundary is proven before Codex and Grok
|
|
adapters land in Phase 1.
|
|
|
|
## Define Normalized Session Schema
|
|
|
|
```task
|
|
id: AGENTIC-WP-0002-T01
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "61297a16-257c-4579-bd1f-3db035781258"
|
|
```
|
|
|
|
Implement `core/schema.py` with the `Session` and `SessionEvent` dataclasses from
|
|
design §4, including `schema_version`, the `flavor`-prefixed `session_uid`, the
|
|
cost block, and the `discovered/ingested/analyzed/evicted` watermarks. Add
|
|
round-trip (de)serialization tests. This is the contract every adapter targets.
|
|
|
|
## Claude Collector Adapter
|
|
|
|
```task
|
|
id: AGENTIC-WP-0002-T02
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "3b4e6b35-b4f3-40dc-a845-7ac78aa20d62"
|
|
```
|
|
|
|
Implement `adapters/claude.py`: read `~/.claude/projects/<cwd>/<uuid>.jsonl`,
|
|
discriminate on `type`, reconstruct the turn DAG via `uuid`/`parentUuid`, map
|
|
records onto `SessionEvent.kind`, capture `message.usage` into the cost block,
|
|
handle `agent-*.jsonl` sidechains (`is_sidechain`), and resolve `repo`/`domain`
|
|
from `cwd`. Verify against real local sessions in this repo's project dir. No
|
|
Codex/Grok work in Phase 0 (designed for, not built).
|
|
|
|
## Tier 1 / Tier 2 Store
|
|
|
|
```task
|
|
id: AGENTIC-WP-0002-T03
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "2387258e-ba6d-4a41-919e-f2f4e0822110"
|
|
```
|
|
|
|
Implement `core/store.py`: SQLite for `Session`/`SessionEvent` rows plus a blob
|
|
dir for `payload_ref` bodies (Tier 1), and a compact `digest` table (Tier 2).
|
|
Writes are idempotent on `(session_uid, seq)`. Provide usage-bytes accounting for
|
|
Tier 1 (rows + blobs) and Tier 2, used by retention.
|
|
|
|
## Session Digest (Tier 1 → Tier 2)
|
|
|
|
```task
|
|
id: AGENTIC-WP-0002-T04
|
|
status: progress
|
|
priority: medium
|
|
state_hub_task_id: "017d8e90-633a-49f2-b342-8690938798cd"
|
|
```
|
|
|
|
Implement `core/digest.py`: produce a per-session digest (outcome heuristic, cost
|
|
totals, tool histogram, error/retry/human-intervention markers, key snippets) and
|
|
set `analyzed_at`. This is the promotion step that makes a session evictable.
|
|
Signal extraction beyond the digest stays stubbed for the Detect phase.
|
|
|
|
## Budget-Based Retention Sweep
|
|
|
|
```task
|
|
id: AGENTIC-WP-0002-T05
|
|
status: todo
|
|
priority: high
|
|
state_hub_task_id: "89177c79-528e-4023-a7eb-67f8e0276ba9"
|
|
```
|
|
|
|
Implement `core/retention.py` per design §5: backstop pass (`raw_max_age_days`),
|
|
budget pass (evict oldest `analyzed_at` first while over `raw_soft_cap_bytes`,
|
|
never touching un-analyzed sessions), and the hard-cap overflow path (analyze-now,
|
|
then last-resort evict oldest un-analyzed with a reported `data_loss` event).
|
|
Enforce the invariant: raw bytes are never dropped before the Tier 2 digest
|
|
exists (except the reported overflow path). Cover each branch with tests using
|
|
synthetic sessions and tiny caps.
|
|
|
|
## Ingest Cursor + Sweep Entrypoint
|
|
|
|
```task
|
|
id: AGENTIC-WP-0002-T06
|
|
status: todo
|
|
priority: medium
|
|
state_hub_task_id: "a4b35c76-154d-4e99-b6d0-61cb6e47ecc0"
|
|
```
|
|
|
|
Implement `core/cursor.py` (per-source `(path,size,mtime,offset)` cursors,
|
|
idempotent re-runs) and `ingest.py` wiring one sweep: discover → normalize
|
|
(Claude) → store → digest → evict. Add `config.toml` with the §5.1 retention
|
|
knobs, source paths, and repo→domain map. Document running a sweep and the
|
|
intended `cadence` trigger (`/schedule` daily/weekly) in the repo docs.
|
|
|
|
## Verify End-to-End on Real Sessions
|
|
|
|
```task
|
|
id: AGENTIC-WP-0002-T07
|
|
status: todo
|
|
priority: medium
|
|
state_hub_task_id: "98d5cc7c-c285-4556-91a3-a85e0a2bb6df"
|
|
```
|
|
|
|
Run the full sweep against this workstation's real Claude sessions; confirm
|
|
normalized rows, digests, idempotent re-run, and an eviction cycle under a small
|
|
test cap (analyzed dropped, un-analyzed preserved, overflow reported). Record
|
|
results and update the design doc's open questions (esp. OQ4 real per-session
|
|
sizes). After workplan file updates, notify the custodian operator to run from
|
|
`~/state-hub`:
|
|
|
|
```bash
|
|
make fix-consistency REPO=agentic-resources
|
|
```
|