Files
agentic-resources/workplans/AGENTIC-WP-0003-session-memory-phase1.md
tegwick 06767ef924 session-memory Phase 1: Grok adapter (T02)
- adapters/grok.py: reads the per-session dir (summary.json + chat_history.jsonl
  + events.jsonl + updates.jsonl); conversation from chat_history, lifecycle/
  turn from events, tool-call names paired in order from updates ACP stream
- registered in ingest dispatch; codex+grok sources enabled in config.toml
- tests/test_grok_adapter.py (synthetic + real local sessions)
- live multi-flavor dry-run discovers 89 sessions across flavors

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 22:12:30 +02:00

6.0 KiB
Raw Blame History

id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_workstream_id
id type title domain repo status owner topic_slug created updated state_hub_workstream_id
AGENTIC-WP-0003 workplan Coding Session Memory — Phase 1 (Codex + Grok adapters, Detect) helix_forge agentic-resources active codex helix-forge 2026-06-06 2026-06-06 88c75b47-1c89-43bc-bb3e-739ec3c8f7d4

Coding Session Memory — Phase 1

Extends Phase 0 (AGENTIC-WP-0002) along two axes of PRD-helix-forge:

  1. Multi-flavor capture (G1/G6): add the Codex and Grok collector adapters so the agnostic core ingests all three families through thin edges.
  2. Detect (PRD §6.2): run signal extractors over normalized sessions, cluster recurring signals into candidate problem/success patterns, attach evidence, and flag cross-flavor patterns.

Both flavors' on-disk schemas are already confirmed in DESIGN-session-memory.md §2.2 (Codex) and §2.3 (Grok), with the native→kind mapping in §4.3 — so the adapters are written against known structures, not discovered ones.

Codex Collector Adapter

id: AGENTIC-WP-0003-T01
status: done
priority: high
state_hub_task_id: "91264fd4-ba99-4add-b317-e2320c3c932c"

Implement adapters/codex.py reading ~/.codex/sessions/YYYY/MM/DD/rollout-*.jsonl per design §2.2: line wrapper {timestamp,type,payload}; map session_meta→Session fields, turn_context→model, response_item/messageuser_msg/assistant_msg, function_call+function_call_output (joined on call_id)→tool_call/tool_result, reasoningthinking, event_msg/task_*lifecycle/completion, event_msg/token_count→cost. Codex is flat: assign seq/parent_seq by temporal order (no native DAG). Version-detect on session_meta.cli_version. Reuse the Normalized bundle contract. Tests use synthetic rollout fixtures; confirm the token_count payload field names against a real install if Codex is present (design OQ1 residual).

Grok Collector Adapter

id: AGENTIC-WP-0003-T02
status: done
priority: high
state_hub_task_id: "fe3d7d1c-110e-4f16-8d56-062fa4a651aa"

Implement adapters/grok.py reading the per-session directory ~/.grok/sessions/<cwd>/<uuid>/ per design §2.3: summary.json→Session id/cwd/timestamps, chat_history.jsonl→messages, events.jsonl→explicit lifecycle events and turn_number (key seq off it), tool calls/results from chat_history/updates.jsonl, token fields from events/updates. Resolve the url-encoded cwd dir name back to a path. Tests against the real local Grok sessions on this workstation plus a synthetic dir fixture.

Multi-File / Multi-Part Session Merge

id: AGENTIC-WP-0003-T03
status: done
priority: medium
state_hub_task_id: "c4acfb63-84cd-4299-a44d-91bb6857fa88"

Address design OQ6 (surfaced in Phase 0): several files can map to one session_uid (resume, sidechains; Grok dirs are inherently multi-file). Change the store/ingest path to merge events across parts of one session rather than last-file-wins upsert — stable event ordering and de-duplication keyed on native identity. Verify event counts are additive and idempotent on re-run.

Signal Extractors

id: AGENTIC-WP-0003-T04
status: todo
priority: high
state_hub_task_id: "20920c5d-16f7-43bb-9ed7-9afbfeaf7207"

Implement detect/signals.py: derive Signals from normalized sessions/digests — e.g. repeated test failure on the same target, budget overrun (cost vs. peers), retry storm, fast clean resolution, human escalation, error-then-recovery. Each signal carries its source session_uid, locus (file/tool/task), polarity (problem|success), and magnitude. Pure functions over Tier 1 events + Tier 2 digests; no new capture. Unit-tested on synthetic sessions.

Pattern Clusterer

id: AGENTIC-WP-0003-T05
status: todo
priority: high
state_hub_task_id: "f42d57f6-34dc-4a92-bf6a-4d8eab572467"

Implement detect/cluster.py: group recurring signals across sessions/repos/ flavors into candidate ProblemPattern/SuccessPattern records (PRD §5). Start with deterministic keyed clustering (locus + signal-type + normalized message); leave embedding-based similarity as a later option. Output candidates with frequency and member session lists.

Pattern Evidence + Cross-Flavor Flagging

id: AGENTIC-WP-0003-T06
status: todo
priority: medium
state_hub_task_id: "8fd502d6-d138-4a42-acd5-6f5921859605"

For each candidate pattern (PRD §6.2 FR-D3/FR-D4) attach evidence: supporting sessions, frequency, affected repos, affected flavors, and estimated cost impact (token/retry deltas vs. baseline). Explicitly flag candidates whose evidence spans more than one flavor as cross_flavor: true — the highest-value reuse targets. Persist candidates to a Tier 2 patterns store/table.

Candidate Pattern Report

id: AGENTIC-WP-0003-T07
status: todo
priority: medium
state_hub_task_id: "34a96d5d-9165-4761-b91e-3643b0401410"

Add a detect entrypoint (python -m session_memory.detect) that runs extractors → clusterer → evidence and emits a human-readable candidate report (ranked by cost impact × frequency, cross-flavor first), plus machine-readable JSON. This is the input to the Curate phase (Phase 2) review workflow. Document usage in the session_memory README.

Verify Across All Three Flavors

id: AGENTIC-WP-0003-T08
status: todo
priority: medium
state_hub_task_id: "b272c3fa-af81-4a6c-9ed9-7b42173efa81"

Run the full pipeline (ingest all enabled sources → digest → detect) against the real local Claude and Grok sessions on this workstation (Codex via fixtures if not installed). Confirm: normalized rows for each flavor, at least one candidate pattern surfaced, and at least one cross-flavor pattern detected if the data supports it (PRD success metric). Record results and refresh design open questions. After workplan file updates, notify the custodian operator to run from ~/state-hub:

make fix-consistency REPO=agentic-resources