generated from coulomb/repo-seed
Add bidirectional doc links (PRD §9.1, README, DESIGN §11), session-close HELIX_* env convention, stable digest JSON contract, and digest_lookup CLI for read-only correlate lookups. All tasks done; 163 tests green.
462 lines
26 KiB
Markdown
462 lines
26 KiB
Markdown
# Design Document — Coding Session Memory
|
||
|
||
**Domain:** helix_forge
|
||
**Repo:** agentic-resources
|
||
**Status:** Draft v0.1
|
||
**Author:** Claude (drafted with Bernd Worsch)
|
||
**Created:** 2026-06-06
|
||
**Updated:** 2026-06-06
|
||
**Related:** [PRD-helix-forge.md](./PRD-helix-forge.md) (this is the Capture + storage layer, FR-C* / §8)
|
||
|
||
---
|
||
|
||
## 1. Purpose
|
||
|
||
Helix Forge's loop (Capture → Detect → Curate → Distribute → Measure) needs a
|
||
durable, bounded **memory of coding sessions**. This document specifies that
|
||
memory: how we **access** each coding agent's session protocol, how we
|
||
**normalize** those protocols into one schema, where we **store** the result, and
|
||
how we **age it out** — preferring a *storage-budget-based* eviction that drops
|
||
old raw content once it has been analyzed or no longer fits, rather than a naive
|
||
fixed time window.
|
||
|
||
The guiding asymmetry: **raw transcripts are bulky and re-derivable; the distilled
|
||
analysis is small and precious.** So we keep a *bounded cache* of raw sessions and
|
||
a *durable, compact* layer of extracted digests/signals. Eviction targets the
|
||
former, never the latter.
|
||
|
||
## 2. Research — How to Access Each Agent's Session Protocol
|
||
|
||
All three families persist sessions to the local filesystem as JSONL (plus, for
|
||
Grok, a per-session directory). All findings below were verified against the live
|
||
installs on this workstation (`~/.claude`, `~/.grok`) and public docs (Codex; not
|
||
installed here).
|
||
|
||
### 2.1 Claude Code ✅ verified on disk
|
||
|
||
| Aspect | Finding |
|
||
|--------|---------|
|
||
| Session transcripts | `~/.claude/projects/<url-encoded-cwd>/<session-uuid>.jsonl` — one JSONL per session |
|
||
| Subagent sidechains | same dir, `agent-<id>.jsonl`; records carry `isSidechain: true` |
|
||
| Global prompt history | `~/.claude/history.jsonl` |
|
||
| Record format | one JSON object per line; **`type`** discriminates: `user`, `assistant`, `attachment`, `queue-operation`, `ai-title`, `last-prompt`, `summary`, plus tool-result records |
|
||
| Key fields | `type`, `timestamp`, `sessionId`, `uuid`, `parentUuid` (turn DAG), `message` (`role` + content blocks: `text`/`thinking`/`tool_use`/`tool_result`), `cwd`, `gitBranch`, `version`, `requestId`, `toolUseResult`, `userType` |
|
||
| Token usage | inside assistant `message.usage` (input/output/cache tokens) |
|
||
| Model | `message.model` (e.g. `claude-opus-4-8`) |
|
||
| Side data | `~/.claude/todos/`, `~/.claude/tasks/`, `~/.claude/file-history/`, `~/.claude/shell-snapshots/` |
|
||
| Live capture hook | Claude Code **SessionEnd / Stop / SessionStart hooks** can fire our ingest on session close (push), in addition to batch scanning (pull) |
|
||
|
||
The turn DAG (`uuid`/`parentUuid`) lets us reconstruct branching, retries, and
|
||
sidechains exactly.
|
||
|
||
### 2.2 OpenAI Codex CLI ✅ schema confirmed from source (not installed locally)
|
||
|
||
Schema confirmed from the openai/codex source (`codex-rs/protocol/src/protocol.rs`
|
||
via DeepWiki) and a reverse-engineering writeup with real example lines — the two
|
||
cross-agree.
|
||
|
||
| Aspect | Finding |
|
||
|--------|---------|
|
||
| Session ("rollout") files | `$CODEX_HOME/sessions/YYYY/MM/DD/rollout-*.jsonl` (default `$CODEX_HOME = ~/.codex`) |
|
||
| Line wrapper (`RolloutLine`) | every line: **`{timestamp, type, payload}`** (UTC ts + a `RolloutItem`) |
|
||
| `type` discriminator | `session_meta` · `response_item` · `event_msg` · `turn_context` · `compacted` |
|
||
| `session_meta` | `{id, source, cwd, model_provider, cli_version}` (+ model) — restores env |
|
||
| `turn_context` | `{model, approval_policy, sandbox_policy}` — per-turn settings snapshot |
|
||
| `response_item` | raw model output / tool calls; `payload.type` ∈ `message` · `function_call` · `function_call_output` · `reasoning` |
|
||
| → `message` | `{role: developer\|user\|assistant, content:[{type:"output_text"\|…, text}]}` |
|
||
| → `function_call` | `{name, arguments (JSON string), call_id}` |
|
||
| → `function_call_output` | `{call_id, output}` |
|
||
| `event_msg` | protocol events; `payload.type` ∈ `task_started` · `task_complete` · `user_message` · `agent_message` · `token_count` · lifecycle |
|
||
| Token usage | `event_msg` with `payload.type = token_count`, interspersed (no fixed cadence) |
|
||
| Turn linkage | **flat — tool calls/outputs linked by `call_id`, no parent-ref DAG**; causality inferred from temporal order (unlike Claude's `uuid`/`parentUuid`) |
|
||
| Schema versions | older installs differ ("new ≥0.44 / mid / oldest 2025/08"); adapter version-detects on `session_meta.cli_version` |
|
||
| Naming / resume | filenames + `session_id` auto-generated; `codex resume --last`; `codex exec` for headless (trajectory-JSON is gh issue #2288) |
|
||
| Override location | `CODEX_HOME` env var |
|
||
|
||
**Adapter notes:** map `event_msg/task_started|task_complete` → `lifecycle`
|
||
events and outcome; `response_item/message` → `user_msg`/`assistant_msg`;
|
||
`function_call`+`function_call_output` → `tool_call`/`tool_result` joined on
|
||
`call_id`; `response_item/reasoning` → `thinking`; `event_msg/token_count` → cost
|
||
block. Because there is no parent-ref DAG, the adapter assigns `seq`/`parent_seq`
|
||
from temporal order rather than native links.
|
||
|
||
### 2.3 Grok CLI (xAI) ✅ verified on disk
|
||
|
||
Grok stores **a directory per session**, which is the richest source of the three.
|
||
|
||
| Aspect | Finding |
|
||
|--------|---------|
|
||
| Session dir | `~/.grok/sessions/<url-encoded-cwd>/<session-uuid>/` |
|
||
| `chat_history.jsonl` | full conversation; `type` = `system`/`user`/`assistant` + content |
|
||
| `events.jsonl` | **structured lifecycle events** — `{ts, type, session_id, turn_number, model_id, yolo_mode, conversation_message_count, session_relationship, schema_version}`; types like `turn_started`, `loop_started` |
|
||
| `updates.jsonl` | streaming incremental updates |
|
||
| `summary.json` | `{id, cwd, session_summary, created_at, updated_at}` |
|
||
| `prompt_context.json` | injected context, incl. which AGENTS.md/CLAUDE.md files were loaded |
|
||
| `system_prompt.txt` | exact system prompt for the session |
|
||
| `rewind_points.jsonl`, `plan_mode.json` | rewind/plan-mode state |
|
||
| Per-cwd prompt history | `~/.grok/sessions/<cwd>/prompt_history.jsonl` — `{timestamp, session_id, prompt, is_bash}` |
|
||
| Global structured log | `~/.grok/logs/unified.jsonl` — `{ts, src, pid, lvl, msg, ctx, sid, ver}` |
|
||
| Search index | `~/.grok/sessions/session_search.sqlite` — `session_docs(session_id, cwd, updated_at, title)` + FTS5 (`session_docs_fts`) we can query directly |
|
||
| Integration surfaces | Grok exposes **ACP (Agent Client Protocol)**, **headless mode** (`grok -p`), and **hooks** (`~/.grok/docs/user-guide/10-hooks.md`) — push-capture options |
|
||
|
||
### 2.4 Cross-family summary
|
||
|
||
| | Claude Code | Codex CLI | Grok CLI |
|
||
|--|--|--|--|
|
||
| Root | `~/.claude/projects/` | `~/.codex/sessions/` | `~/.grok/sessions/` |
|
||
| Unit | one `.jsonl`/session | one `rollout-*.jsonl`/session | one **dir**/session |
|
||
| Layout | flat per-cwd dir | date-partitioned `YYYY/MM/DD` | per-cwd, per-session dir |
|
||
| Discriminator | `type` | `type` (version-dependent) | `type` (in `chat_history`/`events`) |
|
||
| Lifecycle events | inferred from records | inferred from records | **explicit** `events.jsonl` |
|
||
| Token usage | `message.usage` | per-line usage | from events/updates |
|
||
| Push capture | Stop/SessionEnd hooks | `codex exec` wrappers | hooks / ACP |
|
||
| Pull capture | scan dir by mtime | scan date partitions | scan dirs / query FTS sqlite |
|
||
|
||
**Implication:** the common denominator is *"JSONL records discriminated by a
|
||
`type` field, with a session id, timestamps, turn linkage, tool calls, and token
|
||
usage."* That maps cleanly onto one normalized schema (§4). Per-family quirks
|
||
(Grok's explicit `events.jsonl`, Codex's schema versions, Claude's sidechains) are
|
||
handled inside each adapter.
|
||
|
||
## 3. Tiered Storage Model
|
||
|
||
```
|
||
Tier 0 SOURCE (agents' own logs) read-only, never mutated
|
||
~/.claude/projects ~/.codex/sessions ~/.grok/sessions
|
||
│ collector adapters (per family) + ingest cursor
|
||
▼
|
||
Tier 1 RAW CACHE (bounded, EVICTABLE) normalized Session + Event records
|
||
│ signal extractors / digesters
|
||
▼
|
||
Tier 2 DISTILLED MEMORY (durable, small) session digests + signals + pattern evidence
|
||
```
|
||
|
||
- **Tier 0 — Source.** The agents' own logs. We treat them as read-only. We keep a
|
||
small **ingest cursor** per source so re-scans are incremental (see §6).
|
||
- **Tier 1 — Raw cache.** Normalized copies of sessions/events. This is the bulky
|
||
tier and the *only* tier subject to budget eviction.
|
||
- **Tier 2 — Distilled memory.** Per-session **digest** (outcome, costs, tool
|
||
histogram, error/retry/intervention markers, key snippets) plus extracted
|
||
**signals** and **pattern evidence pointers**. Compact and durable. A session can
|
||
be fully evicted from Tier 1 once its Tier 2 digest exists.
|
||
|
||
This is what makes "drop old content once it has been analyzed" safe: analysis
|
||
*promotes* the valuable bits into Tier 2 before the raw bytes are dropped.
|
||
|
||
### 3.1 Per-session lifecycle / watermarks
|
||
|
||
Each session row carries timestamps that drive eviction:
|
||
|
||
```
|
||
discovered_at → ingested_at → analyzed_at → [evictable] → evicted_at
|
||
```
|
||
|
||
- `ingested_at` set when normalized into Tier 1.
|
||
- `analyzed_at` set when the Tier 2 digest is written. **A session is evictable iff
|
||
`analyzed_at` is set.**
|
||
- `evicted_at` set when raw bytes are dropped from Tier 1 (Tier 2 digest remains).
|
||
|
||
## 4. Normalized Schema (Tier 1)
|
||
|
||
Two record kinds. Field names are stable across all adapters.
|
||
|
||
### 4.1 `Session`
|
||
|
||
```jsonc
|
||
{
|
||
"session_uid": "claude:17092961-…", // "<flavor>:<native id>", globally unique
|
||
"flavor": "claude" | "codex" | "grok",
|
||
"native_session_id": "17092961-…",
|
||
"repo": "agentic-resources", // resolved from cwd
|
||
"domain": "helix_forge", // resolved from repo→domain map
|
||
"cwd": "/home/worsch/agentic-resources",
|
||
"git_branch": "main",
|
||
"model": "claude-opus-4-8",
|
||
"started_at": "2026-06-05T21:59:30Z",
|
||
"ended_at": "2026-06-05T22:14:00Z",
|
||
"outcome": "success|fail|abandoned|unknown",
|
||
"cost": { "input_tokens": 0, "output_tokens": 0, "cache_tokens": 0,
|
||
"wall_clock_s": 0, "turns": 0, "retries": 0 },
|
||
"task_ref": "AGENTIC-WP-0002-T01", // if derivable; else null
|
||
"source_path": "~/.claude/projects/…/….jsonl",
|
||
"source_bytes": 0,
|
||
"schema_version": 1,
|
||
"ingested_at": "…", "analyzed_at": null, "evicted_at": null
|
||
}
|
||
```
|
||
|
||
### 4.2 `SessionEvent`
|
||
|
||
```jsonc
|
||
{
|
||
"session_uid": "claude:17092961-…",
|
||
"seq": 12, // monotonic within session
|
||
"parent_seq": 11, // turn DAG (Claude uuid/parentUuid)
|
||
"ts": "2026-06-05T22:01:13Z",
|
||
"kind": "user_msg | assistant_msg | thinking | tool_call | tool_result"
|
||
+ "| error | test_run | edit | retry | human_intervention | decision"
|
||
+ "| lifecycle | completion",
|
||
"role": "user|assistant|system|tool",
|
||
"tool": "Bash|Edit|Read|…", // when kind=tool_call/result
|
||
"summary": "ran pytest -q", // short, human-readable
|
||
"payload_ref": "blob://…", // pointer to full content in Tier 1 blob store
|
||
"tokens": 0,
|
||
"is_sidechain": false
|
||
}
|
||
```
|
||
|
||
Adapters map native records onto `kind`. Grok's `events.jsonl` populates
|
||
`lifecycle`/`turn` events directly; Claude/Codex lifecycle is inferred from the
|
||
record stream. Bulky bodies live behind `payload_ref` so Tier 1 rows stay light
|
||
and blobs can be evicted independently.
|
||
|
||
### 4.3 Native → `kind` mapping (all three families)
|
||
|
||
Each cell is the native record/discriminator an adapter reads to emit that
|
||
`SessionEvent.kind`. `—` = not natively present; the adapter synthesizes or omits.
|
||
|
||
| `kind` | Claude Code (`type` / `message`) | Codex CLI (`type` → `payload.type`) | Grok CLI (file → `type`) |
|
||
|--------|----------------------------------|--------------------------------------|---------------------------|
|
||
| `user_msg` | `user`, `message.role=user` | `response_item` → `message` `role=user`/`developer` | `chat_history` → `user` |
|
||
| `assistant_msg` | `assistant`, `message.role=assistant`, content `text` | `response_item` → `message` `role=assistant` (`output_text`) | `chat_history` → `assistant` |
|
||
| `thinking` | `assistant` content block `type=thinking` | `response_item` → `reasoning` | `chat_history`/`updates` reasoning block |
|
||
| `tool_call` | `assistant` content block `type=tool_use` (`name`,`input`) | `response_item` → `function_call` (`name`,`arguments`,`call_id`) | `chat_history`/`updates` tool-call entry |
|
||
| `tool_result` | `user`/tool record `type=tool_result` + `toolUseResult` | `response_item` → `function_call_output` (join on `call_id`) | `updates` tool-result entry |
|
||
| `test_run` | derived from `tool_call` (Bash running tests) | derived from `function_call` (`exec_command`) | derived from tool-call entry |
|
||
| `edit` | `tool_use` where `name` ∈ Edit/Write/NotebookEdit | `function_call` apply-patch/file-write tool | tool-call entry (edit/write) |
|
||
| `error` | `toolUseResult` error / non-zero result | `function_call_output` error / `event_msg` error | `events.jsonl` error / failed update |
|
||
| `retry` | repeated `tool_use` after error (inferred via DAG) | repeated `function_call` after error (inferred, temporal) | `events.jsonl` loop/retry event |
|
||
| `human_intervention` | `user` record mid-turn (interrupt), `userType` | `event_msg` → `user_message` mid-task | `prompt_history` mid-session / `events.jsonl` |
|
||
| `decision` | recorded out-of-band (State Hub `/decisions`) | recorded out-of-band (State Hub) | recorded out-of-band (State Hub) |
|
||
| `lifecycle` | inferred: first/last record, `summary`, `queue-operation` | `event_msg` → `task_started` / `task_complete` | **`events.jsonl`** → `turn_started`/`loop_started`/… (explicit) |
|
||
| `completion` | inferred: last `assistant` + `Stop`/`SessionEnd` hook | `event_msg` → `task_complete` | `events.jsonl` turn end + `summary.json` |
|
||
|
||
**Linkage note (drives `seq`/`parent_seq`):** Claude has a true turn DAG
|
||
(`uuid`/`parentUuid`) — preserve it directly. Codex is **flat**, joined only by
|
||
`call_id`; assign `seq` by temporal order. Grok carries explicit `turn_number` in
|
||
`events.jsonl`; key `seq` off that plus record order.
|
||
|
||
**Cost block sources:** Claude `message.usage`; Codex `event_msg/token_count`;
|
||
Grok `events.jsonl` / `updates.jsonl` token fields.
|
||
|
||
## 5. Retention & Eviction
|
||
|
||
The user's stated preference: **storage-budget-based**, dropping old content once
|
||
it has been analyzed or once it no longer fits — *better than* a fixed daily/weekly
|
||
window. We implement budget-based as primary, with a time backstop and a scheduled
|
||
cadence as the trigger.
|
||
|
||
### 5.1 Configurable knobs
|
||
|
||
```toml
|
||
[session_memory.retention]
|
||
raw_soft_cap_bytes = "4GiB" # begin evicting analyzed sessions above this
|
||
raw_hard_cap_bytes = "6GiB" # absolute ceiling for Tier 1
|
||
raw_max_age_days = 45 # backstop: analyzed raw older than this is evictable regardless of space
|
||
distilled_cap_bytes = "1GiB" # Tier 2 ceiling (should grow slowly; alert, don't auto-drop)
|
||
cadence = "daily" # ingest+analyze+evict sweep: daily | weekly | on-hook
|
||
```
|
||
|
||
### 5.2 Eviction algorithm (runs after each ingest+analyze sweep)
|
||
|
||
1. **Compute** current Tier 1 usage.
|
||
2. **Backstop pass:** evict any session where `analyzed_at` is set AND
|
||
`age > raw_max_age_days`.
|
||
3. **Budget pass:** while `usage > raw_soft_cap_bytes`:
|
||
- pick the **oldest `analyzed_at`** session that is not yet evicted;
|
||
- drop its Tier 1 raw rows + blobs (Tier 2 digest is kept), set `evicted_at`;
|
||
- if **no analyzed-but-unevicted session remains**, stop the budget pass
|
||
(we will not destroy un-analyzed data to free space) and go to step 4.
|
||
4. **Back-pressure / overflow:** if `usage > raw_hard_cap_bytes` and the only
|
||
remaining bulk is **un-analyzed**:
|
||
- first try to **analyze now** (run extraction) to make those sessions
|
||
evictable, then re-run the budget pass;
|
||
- if still over hard cap (analysis can't keep up or fails), evict the **oldest
|
||
un-analyzed** sessions as a last resort and emit a
|
||
`session_memory.data_loss` warning event + a State Hub progress note. This is
|
||
the only path that loses un-analyzed data, and it is always reported.
|
||
5. **Tier 2 guard:** if distilled usage > `distilled_cap_bytes`, **do not
|
||
auto-drop**; flag for human/curation review (digests are the product).
|
||
|
||
**Invariant:** *no session's raw bytes are dropped before its Tier 2 digest
|
||
exists, except the explicitly-reported hard-cap overflow path.*
|
||
|
||
### 5.3 Why budget-based beats fixed-window
|
||
|
||
A fixed daily/weekly drop either deletes data we never analyzed (lossy) or hoards
|
||
data we already distilled (wasteful). Budget + `analyzed_at` watermark ties
|
||
deletion to **two** real conditions the user named — *"once it has been analyzed"*
|
||
(promoted to Tier 2) and *"doesn't fit any longer"* (over budget) — and only falls
|
||
back to time as a backstop.
|
||
|
||
## 6. Ingest Cursors (incremental, idempotent)
|
||
|
||
Per source, persist a small cursor so sweeps are cheap and re-runnable:
|
||
|
||
- **Claude / Grok (per-cwd dirs):** track `(file_path, size, mtime)` and last
|
||
parsed line offset; re-ingest only grown/changed files. `session_uid` dedupes.
|
||
- **Codex (date partitions):** track last-seen `YYYY/MM/DD` + per-file offset.
|
||
- Ingest is **idempotent** keyed on `(session_uid, seq)` — safe to re-run after a
|
||
crash or partial sweep.
|
||
|
||
## 7. Capture Modes
|
||
|
||
- **Pull (default, portable):** scheduled sweep scans Tier 0 by mtime/partition.
|
||
Works for all three families with zero coupling to the agent. Triggered on the
|
||
configured `cadence` via the repo's scheduler (`/schedule`, cron, or `/loop`).
|
||
- **Push (optional, low-latency):** wire the agent's own hooks to ping the ingester
|
||
on session close — Claude `Stop`/`SessionEnd` hooks, Grok hooks/ACP, Codex
|
||
`exec` wrappers. Push just enqueues; the same idempotent pull pipeline does the
|
||
work.
|
||
|
||
Capture must be **non-blocking** (PRD FR-C5): we read copies of logs out-of-band;
|
||
we never sit in the agent's critical path.
|
||
|
||
## 8. Component Layout (proposed, in-repo)
|
||
|
||
```
|
||
session-memory/
|
||
adapters/
|
||
claude.py # Tier0→Tier1 normalizer (verified schema)
|
||
codex.py # version-detecting normalizer (confirm against real rollout)
|
||
grok.py # reads session dir incl. events.jsonl
|
||
core/
|
||
schema.py # Session / SessionEvent dataclasses + versioning
|
||
store.py # Tier1 (rows+blobs) and Tier2 (digests) — SQLite to start
|
||
cursor.py # per-source ingest cursors
|
||
retention.py # §5 eviction algorithm
|
||
digest.py # Tier1→Tier2 session digest + signal stubs
|
||
ingest.py # one sweep: discover → normalize → analyze → evict
|
||
config.toml # §5.1 knobs + repo→domain map + source paths
|
||
```
|
||
|
||
Storage starts as **SQLite + a blob dir** (rows in SQLite, bulky payloads as files
|
||
under `payload_ref`); graduate to Postgres alongside the State Hub only if volume
|
||
demands. Digests/decisions are also surfaced to the hub per ADR-001 (files-first;
|
||
hub indexes).
|
||
|
||
## 9. Privacy / Safety
|
||
|
||
- Tier 0 logs can contain secrets (the Grok `auth.json` and Claude `.credentials`
|
||
live in the same trees). The ingester reads **only** session transcripts, never
|
||
credential files, and **redacts** obvious secret patterns into `payload_ref`
|
||
blobs.
|
||
- All data is local; nothing leaves the workstation. Eviction of Tier 1 is a real
|
||
delete (not just an index drop) so the bounded cache is also a privacy bound.
|
||
|
||
## 10. Open Questions
|
||
|
||
- ~~**OQ1** Confirm Codex `rollout-*.jsonl` per-line schema.~~ **Resolved** (§2.2):
|
||
`{timestamp,type,payload}` lines, `type` ∈ `session_meta`/`response_item`/`event_msg`/`turn_context`/`compacted`,
|
||
tool calls flat-linked by `call_id`, tokens via `event_msg/token_count`. Remaining
|
||
sub-item: verify the `token_count` payload field names against a real install when
|
||
Codex is present (older-version variance only).
|
||
- **OQ2** Outcome inference: how do we reliably label `success/fail/abandoned`
|
||
across flavors (exit signals differ)? Start heuristic (last-turn + test results +
|
||
human-intervention markers), refine in Detect phase.
|
||
- **OQ3** `task_ref` resolution — can we always map a session to a workplan task
|
||
(via cwd + branch + state-hub), or only sometimes?
|
||
- ~~**OQ4** Right default for `raw_soft_cap_bytes`.~~ **Measured** (Phase 0, 85
|
||
real local Claude files / 63 distinct sessions): source bytes per session
|
||
min 396 · **median ~49 KB** · max 48 MB (one outlier) · ~103 MB total. Claude
|
||
defaults (4 GiB soft / 6 GiB hard) leave ample headroom; revisit once Grok dirs
|
||
(heavier, multi-file) are ingested in Phase 1.
|
||
- **OQ6 (new, found in Phase 0)** Multi-file sessions: ~84 transcript files mapped
|
||
to ~63 `session_uid`s — some sessions span multiple files (resume/sidechain
|
||
sharing a `sessionId`). Current behavior upserts (last file wins per
|
||
`(session_uid, seq)`); a future refinement is to *merge* events across files of
|
||
one session rather than overwrite. Acceptable for Phase 0.
|
||
- **OQ5** Should push-hooks be opt-in per machine to avoid surprising the agents?
|
||
|
||
---
|
||
|
||
## 11. Project metrics correlation (kaizen-agentic)
|
||
|
||
Helix Forge owns **fleet-level** session capture and digests (this repo). The
|
||
**kaizen-agentic** framework owns **project-scoped** agent execution metrics
|
||
(ADR-004: `.kaizen/metrics/<agent>/executions.jsonl`). The two layers correlate
|
||
by optional `helix_session_uid` on project records — link-by-reference, no
|
||
duplicate ingestion in either repo.
|
||
|
||
| Layer | Owner | Storage |
|
||
|-------|-------|---------|
|
||
| Fleet | agentic-resources (Helix Forge) | digest store (`digests` table) |
|
||
| Project | kaizen-agentic | `.kaizen/metrics/<agent>/executions.jsonl` |
|
||
|
||
**Cross-repo contract:** [Helix Forge Correlation Contract](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/docs/integrations/helix-forge-correlation.md)
|
||
(kaizen-agentic). Field mapping from `Session.session_uid` → `helix_session_uid`,
|
||
`digest.cost` → `tokens`, `tool_histogram` MCP share → `infra_overhead_share`.
|
||
|
||
**Read path:** `kaizen-agentic metrics correlate <uid>` looks up a digest via
|
||
`HELIX_STORE_DB` (this repo's session store). No write path from kaizen-agentic
|
||
into Helix Forge.
|
||
|
||
**Related kaizen-agentic docs:** [ADR-004 project metrics convention](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/docs/adr/ADR-004-project-metrics-convention.md),
|
||
[wiki/EcosystemIntegration.md](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/wiki/EcosystemIntegration.md).
|
||
|
||
### 11.1 Session-close env export (dual-layer agents)
|
||
|
||
Agents that run **both** Helix Forge capture and kaizen `metrics record` should
|
||
export the following **after** the ingest sweep has written the session digest
|
||
(`python -m session_memory.ingest` or an equivalent Stop/SessionEnd hook). Names
|
||
match kaizen-agentic ADR-004 — do not invent parallel aliases.
|
||
|
||
| Variable | Source in Helix Forge | Purpose |
|
||
|----------|----------------------|---------|
|
||
| `HELIX_SESSION_UID` | `Session.session_uid` | Primary correlation key → `helix_session_uid` |
|
||
| `HELIX_REPO` | `digest.repo` | Project/repo scoping |
|
||
| `HELIX_FLAVOR` | `digest.flavor` | Agent runtime (`claude` / `codex` / `grok`) |
|
||
| `HELIX_TOKENS` | `digest.cost.input_tokens + digest.cost.output_tokens` | Token rollup → `tokens` |
|
||
| `HELIX_INFRA_OVERHEAD_SHARE` | infra bucket share over `tool_histogram` (see `measure.metrics.session_metrics`) | MCP/plumbing overhead → `infra_overhead_share` |
|
||
|
||
Example (after digest exists):
|
||
|
||
```bash
|
||
export HELIX_SESSION_UID="claude:abc-123"
|
||
export HELIX_REPO="agentic-resources"
|
||
export HELIX_FLAVOR="claude"
|
||
export HELIX_TOKENS=125000
|
||
export HELIX_INFRA_OVERHEAD_SHARE=0.117
|
||
# optional — lets kaizen correlate without guessing the store location:
|
||
export HELIX_STORE_DB="$(pwd)/session_memory/.store/mem.db"
|
||
kaizen-agentic metrics record # merges HELIX_* when present
|
||
```
|
||
|
||
### 11.2 Digest store location and read API
|
||
|
||
- **`HELIX_STORE_DB`** — absolute path to the SQLite file holding Tier 2 digests.
|
||
Defaults to `config.toml` `[store].db_path` (`session_memory/.store/mem.db` relative
|
||
to the repo root). Export as an absolute path when setting the variable on session
|
||
close so `metrics correlate` works across hosts and working directories.
|
||
- **Thin CLI** — `python -m session_memory.digest_lookup <session_uid> [--json]`
|
||
prints one digest without running ingest. Exit `0` on hit, `1` when missing.
|
||
- **Programmatic** — `Store.get_digest(session_uid)` returns the JSON blob written
|
||
by `build_digest` / `analyze`.
|
||
|
||
**Stable digest JSON shape** (fields consumers may rely on):
|
||
|
||
| Field | Type | Notes |
|
||
|-------|------|-------|
|
||
| `session_uid` | string | Normalized uid (`<flavor>:<native-id>`) |
|
||
| `flavor`, `repo`, `domain` | string | Session attribution |
|
||
| `model` | string | Model id when known |
|
||
| `started_at`, `ended_at` | string | ISO timestamps |
|
||
| `outcome` | string | `success` / `fail` / `abandoned` / `unknown` |
|
||
| `cost` | object | `input_tokens`, `output_tokens`, `cache_tokens`, `wall_clock_s`, `turns`, `retries` |
|
||
| `tool_histogram` | object | Tool name → call count |
|
||
| `event_count`, `kind_counts`, `markers` | object/int | Compact activity summary |
|
||
| `first_prompt`, `last_assistant` | string | Short text snippets |
|
||
| `error_snippets` | array | `{fingerprint, sample, count, tool}` entries |
|
||
| `schema_version` | int | Digest schema version |
|
||
|
||
---
|
||
|
||
*Implemented:* Phases 0–4, weekly retro ([AGENTIC-WP-0002]–[AGENTIC-WP-0010]);
|
||
kaizen correlation follow-up ([AGENTIC-WP-0011]).
|
||
|
||
## Sources
|
||
|
||
- Claude Code session format — verified on disk: `~/.claude/projects/*/*.jsonl`, `~/.claude/history.jsonl`.
|
||
- Grok CLI session format — verified on disk: `~/.grok/sessions/`, `~/.grok/logs/unified.jsonl`, `~/.grok/sessions/session_search.sqlite`; `~/.grok/README.md` (ACP/headless/hooks).
|
||
- Codex CLI session format — [ccusage Codex guide](https://ccusage.com/guide/codex/), [Codex advanced config](https://developers.openai.com/codex/config-advanced), [codex-trace](https://github.com/PixelPaw-Labs/codex-trace), [codex-logs](https://github.com/wondercoms/codex-logs), [Session/Rollout Files discussion #3827](https://github.com/openai/codex/discussions/3827), [trajectory-JSON issue #2288](https://github.com/openai/codex/issues/2288).
|