Files
agentic-resources/workplans/AGENTIC-WP-0003-session-memory-phase1.md
tegwick 055713aa4f session-memory Phase 1: T08 verify across all three flavors + docs
Marks AGENTIC-WP-0003 finished. Full suite 40/40 green; live pipeline
over real local sessions (Codex via fixtures) surfaced 3 candidate
patterns, 2 cross-flavor (Claude+Grok) — PRD success metric met.
README documents the detect entrypoint and Phase 0/1/next status.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 23:39:37 +02:00

170 lines
6.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: AGENTIC-WP-0003
type: workplan
title: "Coding Session Memory — Phase 1 (Codex + Grok adapters, Detect)"
domain: helix_forge
repo: agentic-resources
status: finished
owner: codex
topic_slug: helix-forge
created: "2026-06-06"
updated: "2026-06-06"
state_hub_workstream_id: "88c75b47-1c89-43bc-bb3e-739ec3c8f7d4"
---
# Coding Session Memory — Phase 1
Extends Phase 0 ([AGENTIC-WP-0002](AGENTIC-WP-0002-session-memory-phase0.md)) along
two axes of [PRD-helix-forge](../docs/PRD-helix-forge.md):
1. **Multi-flavor capture (G1/G6):** add the Codex and Grok collector adapters so
the agnostic core ingests all three families through thin edges.
2. **Detect (PRD §6.2):** run signal extractors over normalized sessions, cluster
recurring signals into candidate problem/success patterns, attach evidence, and
flag cross-flavor patterns.
Both flavors' on-disk schemas are already confirmed in
[DESIGN-session-memory.md](../docs/DESIGN-session-memory.md) §2.2 (Codex) and §2.3
(Grok), with the native→`kind` mapping in §4.3 — so the adapters are written
against known structures, not discovered ones.
## Codex Collector Adapter
```task
id: AGENTIC-WP-0003-T01
status: done
priority: high
state_hub_task_id: "91264fd4-ba99-4add-b317-e2320c3c932c"
```
Implement `adapters/codex.py` reading `~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl`
per design §2.2: line wrapper `{timestamp,type,payload}`; map `session_meta`→Session
fields, `turn_context`→model, `response_item/message``user_msg`/`assistant_msg`,
`function_call`+`function_call_output` (joined on `call_id`)→`tool_call`/`tool_result`,
`reasoning``thinking`, `event_msg/task_*``lifecycle`/`completion`,
`event_msg/token_count`→cost. Codex is flat: assign `seq`/`parent_seq` by temporal
order (no native DAG). Version-detect on `session_meta.cli_version`. Reuse the
`Normalized` bundle contract. Tests use synthetic rollout fixtures; confirm the
`token_count` payload field names against a real install if Codex is present
(design OQ1 residual).
## Grok Collector Adapter
```task
id: AGENTIC-WP-0003-T02
status: done
priority: high
state_hub_task_id: "fe3d7d1c-110e-4f16-8d56-062fa4a651aa"
```
Implement `adapters/grok.py` reading the per-session directory
`~/.grok/sessions/<cwd>/<uuid>/` per design §2.3: `summary.json`→Session
id/cwd/timestamps, `chat_history.jsonl`→messages, `events.jsonl`→explicit
`lifecycle` events and `turn_number` (key `seq` off it), tool calls/results from
`chat_history`/`updates.jsonl`, token fields from events/updates. Resolve the
url-encoded cwd dir name back to a path. Tests against the real local Grok
sessions on this workstation plus a synthetic dir fixture.
## Multi-File / Multi-Part Session Merge
```task
id: AGENTIC-WP-0003-T03
status: done
priority: medium
state_hub_task_id: "c4acfb63-84cd-4299-a44d-91bb6857fa88"
```
Address design OQ6 (surfaced in Phase 0): several files can map to one
`session_uid` (resume, sidechains; Grok dirs are inherently multi-file). Change
the store/ingest path to **merge** events across parts of one session rather than
last-file-wins upsert — stable event ordering and de-duplication keyed on native
identity. Verify event counts are additive and idempotent on re-run.
## Signal Extractors
```task
id: AGENTIC-WP-0003-T04
status: done
priority: high
state_hub_task_id: "20920c5d-16f7-43bb-9ed7-9afbfeaf7207"
```
Implement `detect/signals.py`: derive `Signal`s from normalized sessions/digests —
e.g. repeated test failure on the same target, budget overrun (cost vs. peers),
retry storm, fast clean resolution, human escalation, error-then-recovery. Each
signal carries its source `session_uid`, locus (file/tool/task), polarity
(problem|success), and magnitude. Pure functions over Tier 1 events + Tier 2
digests; no new capture. Unit-tested on synthetic sessions.
## Pattern Clusterer
```task
id: AGENTIC-WP-0003-T05
status: done
priority: high
state_hub_task_id: "f42d57f6-34dc-4a92-bf6a-4d8eab572467"
```
Implement `detect/cluster.py`: group recurring signals across sessions/repos/
flavors into candidate `ProblemPattern`/`SuccessPattern` records (PRD §5). Start
with deterministic keyed clustering (locus + signal-type + normalized message);
leave embedding-based similarity as a later option. Output candidates with
frequency and member session lists.
## Pattern Evidence + Cross-Flavor Flagging
```task
id: AGENTIC-WP-0003-T06
status: done
priority: medium
state_hub_task_id: "8fd502d6-d138-4a42-acd5-6f5921859605"
```
For each candidate pattern (PRD §6.2 FR-D3/FR-D4) attach evidence: supporting
sessions, frequency, affected repos, affected **flavors**, and estimated cost
impact (token/retry deltas vs. baseline). Explicitly flag candidates whose
evidence spans more than one flavor as `cross_flavor: true` — the highest-value
reuse targets. Persist candidates to a Tier 2 `patterns` store/table.
## Candidate Pattern Report
```task
id: AGENTIC-WP-0003-T07
status: done
priority: medium
state_hub_task_id: "34a96d5d-9165-4761-b91e-3643b0401410"
```
Add a `detect` entrypoint (`python -m session_memory.detect`) that runs extractors
→ clusterer → evidence and emits a human-readable candidate report (ranked by
cost impact × frequency, cross-flavor first), plus machine-readable JSON. This is
the input to the Curate phase (Phase 2) review workflow. Document usage in the
session_memory README.
## Verify Across All Three Flavors
```task
id: AGENTIC-WP-0003-T08
status: done
priority: medium
state_hub_task_id: "b272c3fa-af81-4a6c-9ed9-7b42173efa81"
```
Run the full pipeline (ingest all enabled sources → digest → detect) against the
real local Claude and Grok sessions on this workstation (Codex via fixtures if not
installed). Confirm: normalized rows for each flavor, at least one candidate
pattern surfaced, and at least one **cross-flavor** pattern detected if the data
supports it (PRD success metric). Record results and refresh design open
questions. After workplan file updates, notify the custodian operator to run from
`~/state-hub`:
```bash
make fix-consistency REPO=agentic-resources
```
**Verification results (2026-06-06):** full suite 40/40 green. Live pipeline over
real local sessions (Codex not installed → fixtures): ingested 88 → 67 digests
(63 Claude + 4 Grok); detect surfaced 3 candidate patterns, **2 cross-flavor**
(Claude+Grok) — "clean pass" success across 18 sessions and "abandoned" problem
across 13 — plus a Claude-only budget-overrun. PRD cross-flavor success metric met.