generated from coulomb/repo-seed
Marks AGENTIC-WP-0003 finished. Full suite 40/40 green; live pipeline over real local sessions (Codex via fixtures) surfaced 3 candidate patterns, 2 cross-flavor (Claude+Grok) — PRD success metric met. README documents the detect entrypoint and Phase 0/1/next status. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
170 lines
6.4 KiB
Markdown
170 lines
6.4 KiB
Markdown
---
|
||
id: AGENTIC-WP-0003
|
||
type: workplan
|
||
title: "Coding Session Memory — Phase 1 (Codex + Grok adapters, Detect)"
|
||
domain: helix_forge
|
||
repo: agentic-resources
|
||
status: finished
|
||
owner: codex
|
||
topic_slug: helix-forge
|
||
created: "2026-06-06"
|
||
updated: "2026-06-06"
|
||
state_hub_workstream_id: "88c75b47-1c89-43bc-bb3e-739ec3c8f7d4"
|
||
---
|
||
|
||
# Coding Session Memory — Phase 1
|
||
|
||
Extends Phase 0 ([AGENTIC-WP-0002](AGENTIC-WP-0002-session-memory-phase0.md)) along
|
||
two axes of [PRD-helix-forge](../docs/PRD-helix-forge.md):
|
||
|
||
1. **Multi-flavor capture (G1/G6):** add the Codex and Grok collector adapters so
|
||
the agnostic core ingests all three families through thin edges.
|
||
2. **Detect (PRD §6.2):** run signal extractors over normalized sessions, cluster
|
||
recurring signals into candidate problem/success patterns, attach evidence, and
|
||
flag cross-flavor patterns.
|
||
|
||
Both flavors' on-disk schemas are already confirmed in
|
||
[DESIGN-session-memory.md](../docs/DESIGN-session-memory.md) §2.2 (Codex) and §2.3
|
||
(Grok), with the native→`kind` mapping in §4.3 — so the adapters are written
|
||
against known structures, not discovered ones.
|
||
|
||
## Codex Collector Adapter
|
||
|
||
```task
|
||
id: AGENTIC-WP-0003-T01
|
||
status: done
|
||
priority: high
|
||
state_hub_task_id: "91264fd4-ba99-4add-b317-e2320c3c932c"
|
||
```
|
||
|
||
Implement `adapters/codex.py` reading `~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl`
|
||
per design §2.2: line wrapper `{timestamp,type,payload}`; map `session_meta`→Session
|
||
fields, `turn_context`→model, `response_item/message`→`user_msg`/`assistant_msg`,
|
||
`function_call`+`function_call_output` (joined on `call_id`)→`tool_call`/`tool_result`,
|
||
`reasoning`→`thinking`, `event_msg/task_*`→`lifecycle`/`completion`,
|
||
`event_msg/token_count`→cost. Codex is flat: assign `seq`/`parent_seq` by temporal
|
||
order (no native DAG). Version-detect on `session_meta.cli_version`. Reuse the
|
||
`Normalized` bundle contract. Tests use synthetic rollout fixtures; confirm the
|
||
`token_count` payload field names against a real install if Codex is present
|
||
(design OQ1 residual).
|
||
|
||
## Grok Collector Adapter
|
||
|
||
```task
|
||
id: AGENTIC-WP-0003-T02
|
||
status: done
|
||
priority: high
|
||
state_hub_task_id: "fe3d7d1c-110e-4f16-8d56-062fa4a651aa"
|
||
```
|
||
|
||
Implement `adapters/grok.py` reading the per-session directory
|
||
`~/.grok/sessions/<cwd>/<uuid>/` per design §2.3: `summary.json`→Session
|
||
id/cwd/timestamps, `chat_history.jsonl`→messages, `events.jsonl`→explicit
|
||
`lifecycle` events and `turn_number` (key `seq` off it), tool calls/results from
|
||
`chat_history`/`updates.jsonl`, token fields from events/updates. Resolve the
|
||
url-encoded cwd dir name back to a path. Tests against the real local Grok
|
||
sessions on this workstation plus a synthetic dir fixture.
|
||
|
||
## Multi-File / Multi-Part Session Merge
|
||
|
||
```task
|
||
id: AGENTIC-WP-0003-T03
|
||
status: done
|
||
priority: medium
|
||
state_hub_task_id: "c4acfb63-84cd-4299-a44d-91bb6857fa88"
|
||
```
|
||
|
||
Address design OQ6 (surfaced in Phase 0): several files can map to one
|
||
`session_uid` (resume, sidechains; Grok dirs are inherently multi-file). Change
|
||
the store/ingest path to **merge** events across parts of one session rather than
|
||
last-file-wins upsert — stable event ordering and de-duplication keyed on native
|
||
identity. Verify event counts are additive and idempotent on re-run.
|
||
|
||
## Signal Extractors
|
||
|
||
```task
|
||
id: AGENTIC-WP-0003-T04
|
||
status: done
|
||
priority: high
|
||
state_hub_task_id: "20920c5d-16f7-43bb-9ed7-9afbfeaf7207"
|
||
```
|
||
|
||
Implement `detect/signals.py`: derive `Signal`s from normalized sessions/digests —
|
||
e.g. repeated test failure on the same target, budget overrun (cost vs. peers),
|
||
retry storm, fast clean resolution, human escalation, error-then-recovery. Each
|
||
signal carries its source `session_uid`, locus (file/tool/task), polarity
|
||
(problem|success), and magnitude. Pure functions over Tier 1 events + Tier 2
|
||
digests; no new capture. Unit-tested on synthetic sessions.
|
||
|
||
## Pattern Clusterer
|
||
|
||
```task
|
||
id: AGENTIC-WP-0003-T05
|
||
status: done
|
||
priority: high
|
||
state_hub_task_id: "f42d57f6-34dc-4a92-bf6a-4d8eab572467"
|
||
```
|
||
|
||
Implement `detect/cluster.py`: group recurring signals across sessions/repos/
|
||
flavors into candidate `ProblemPattern`/`SuccessPattern` records (PRD §5). Start
|
||
with deterministic keyed clustering (locus + signal-type + normalized message);
|
||
leave embedding-based similarity as a later option. Output candidates with
|
||
frequency and member session lists.
|
||
|
||
## Pattern Evidence + Cross-Flavor Flagging
|
||
|
||
```task
|
||
id: AGENTIC-WP-0003-T06
|
||
status: done
|
||
priority: medium
|
||
state_hub_task_id: "8fd502d6-d138-4a42-acd5-6f5921859605"
|
||
```
|
||
|
||
For each candidate pattern (PRD §6.2 FR-D3/FR-D4) attach evidence: supporting
|
||
sessions, frequency, affected repos, affected **flavors**, and estimated cost
|
||
impact (token/retry deltas vs. baseline). Explicitly flag candidates whose
|
||
evidence spans more than one flavor as `cross_flavor: true` — the highest-value
|
||
reuse targets. Persist candidates to a Tier 2 `patterns` store/table.
|
||
|
||
## Candidate Pattern Report
|
||
|
||
```task
|
||
id: AGENTIC-WP-0003-T07
|
||
status: done
|
||
priority: medium
|
||
state_hub_task_id: "34a96d5d-9165-4761-b91e-3643b0401410"
|
||
```
|
||
|
||
Add a `detect` entrypoint (`python -m session_memory.detect`) that runs extractors
|
||
→ clusterer → evidence and emits a human-readable candidate report (ranked by
|
||
cost impact × frequency, cross-flavor first), plus machine-readable JSON. This is
|
||
the input to the Curate phase (Phase 2) review workflow. Document usage in the
|
||
session_memory README.
|
||
|
||
## Verify Across All Three Flavors
|
||
|
||
```task
|
||
id: AGENTIC-WP-0003-T08
|
||
status: done
|
||
priority: medium
|
||
state_hub_task_id: "b272c3fa-af81-4a6c-9ed9-7b42173efa81"
|
||
```
|
||
|
||
Run the full pipeline (ingest all enabled sources → digest → detect) against the
|
||
real local Claude and Grok sessions on this workstation (Codex via fixtures if not
|
||
installed). Confirm: normalized rows for each flavor, at least one candidate
|
||
pattern surfaced, and at least one **cross-flavor** pattern detected if the data
|
||
supports it (PRD success metric). Record results and refresh design open
|
||
questions. After workplan file updates, notify the custodian operator to run from
|
||
`~/state-hub`:
|
||
|
||
```bash
|
||
make fix-consistency REPO=agentic-resources
|
||
```
|
||
|
||
**Verification results (2026-06-06):** full suite 40/40 green. Live pipeline over
|
||
real local sessions (Codex not installed → fixtures): ingested 88 → 67 digests
|
||
(63 Claude + 4 Grok); detect surfaced 3 candidate patterns, **2 cross-flavor**
|
||
(Claude+Grok) — "clean pass" success across 18 sessions and "abandoned" problem
|
||
across 13 — plus a Claude-only budget-overrun. PRD cross-flavor success metric met.
|