Files
agentic-resources/docs/DESIGN-session-memory.md
tegwick 01d2affc3b Implement AGENTIC-WP-0011 kaizen correlation follow-up
Add bidirectional doc links (PRD §9.1, README, DESIGN §11), session-close
HELIX_* env convention, stable digest JSON contract, and digest_lookup CLI
for read-only correlate lookups. All tasks done; 163 tests green.
2026-06-19 20:27:00 +02:00

462 lines
26 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Design Document — Coding Session Memory
**Domain:** helix_forge
**Repo:** agentic-resources
**Status:** Draft v0.1
**Author:** Claude (drafted with Bernd Worsch)
**Created:** 2026-06-06
**Updated:** 2026-06-06
**Related:** [PRD-helix-forge.md](./PRD-helix-forge.md) (this is the Capture + storage layer, FR-C* / §8)
---
## 1. Purpose
Helix Forge's loop (Capture → Detect → Curate → Distribute → Measure) needs a
durable, bounded **memory of coding sessions**. This document specifies that
memory: how we **access** each coding agent's session protocol, how we
**normalize** those protocols into one schema, where we **store** the result, and
how we **age it out** — preferring a *storage-budget-based* eviction that drops
old raw content once it has been analyzed or no longer fits, rather than a naive
fixed time window.
The guiding asymmetry: **raw transcripts are bulky and re-derivable; the distilled
analysis is small and precious.** So we keep a *bounded cache* of raw sessions and
a *durable, compact* layer of extracted digests/signals. Eviction targets the
former, never the latter.
## 2. Research — How to Access Each Agent's Session Protocol
All three families persist sessions to the local filesystem as JSONL (plus, for
Grok, a per-session directory). All findings below were verified against the live
installs on this workstation (`~/.claude`, `~/.grok`) and public docs (Codex; not
installed here).
### 2.1 Claude Code ✅ verified on disk
| Aspect | Finding |
|--------|---------|
| Session transcripts | `~/.claude/projects/<url-encoded-cwd>/<session-uuid>.jsonl` — one JSONL per session |
| Subagent sidechains | same dir, `agent-<id>.jsonl`; records carry `isSidechain: true` |
| Global prompt history | `~/.claude/history.jsonl` |
| Record format | one JSON object per line; **`type`** discriminates: `user`, `assistant`, `attachment`, `queue-operation`, `ai-title`, `last-prompt`, `summary`, plus tool-result records |
| Key fields | `type`, `timestamp`, `sessionId`, `uuid`, `parentUuid` (turn DAG), `message` (`role` + content blocks: `text`/`thinking`/`tool_use`/`tool_result`), `cwd`, `gitBranch`, `version`, `requestId`, `toolUseResult`, `userType` |
| Token usage | inside assistant `message.usage` (input/output/cache tokens) |
| Model | `message.model` (e.g. `claude-opus-4-8`) |
| Side data | `~/.claude/todos/`, `~/.claude/tasks/`, `~/.claude/file-history/`, `~/.claude/shell-snapshots/` |
| Live capture hook | Claude Code **SessionEnd / Stop / SessionStart hooks** can fire our ingest on session close (push), in addition to batch scanning (pull) |
The turn DAG (`uuid`/`parentUuid`) lets us reconstruct branching, retries, and
sidechains exactly.
### 2.2 OpenAI Codex CLI ✅ schema confirmed from source (not installed locally)
Schema confirmed from the openai/codex source (`codex-rs/protocol/src/protocol.rs`
via DeepWiki) and a reverse-engineering writeup with real example lines — the two
cross-agree.
| Aspect | Finding |
|--------|---------|
| Session ("rollout") files | `$CODEX_HOME/sessions/YYYY/MM/DD/rollout-*.jsonl` (default `$CODEX_HOME = ~/.codex`) |
| Line wrapper (`RolloutLine`) | every line: **`{timestamp, type, payload}`** (UTC ts + a `RolloutItem`) |
| `type` discriminator | `session_meta` · `response_item` · `event_msg` · `turn_context` · `compacted` |
| `session_meta` | `{id, source, cwd, model_provider, cli_version}` (+ model) — restores env |
| `turn_context` | `{model, approval_policy, sandbox_policy}` — per-turn settings snapshot |
| `response_item` | raw model output / tool calls; `payload.type``message` · `function_call` · `function_call_output` · `reasoning` |
| → `message` | `{role: developer\|user\|assistant, content:[{type:"output_text"\|…, text}]}` |
| → `function_call` | `{name, arguments (JSON string), call_id}` |
| → `function_call_output` | `{call_id, output}` |
| `event_msg` | protocol events; `payload.type``task_started` · `task_complete` · `user_message` · `agent_message` · `token_count` · lifecycle |
| Token usage | `event_msg` with `payload.type = token_count`, interspersed (no fixed cadence) |
| Turn linkage | **flat — tool calls/outputs linked by `call_id`, no parent-ref DAG**; causality inferred from temporal order (unlike Claude's `uuid`/`parentUuid`) |
| Schema versions | older installs differ ("new ≥0.44 / mid / oldest 2025/08"); adapter version-detects on `session_meta.cli_version` |
| Naming / resume | filenames + `session_id` auto-generated; `codex resume --last`; `codex exec` for headless (trajectory-JSON is gh issue #2288) |
| Override location | `CODEX_HOME` env var |
**Adapter notes:** map `event_msg/task_started|task_complete``lifecycle`
events and outcome; `response_item/message``user_msg`/`assistant_msg`;
`function_call`+`function_call_output``tool_call`/`tool_result` joined on
`call_id`; `response_item/reasoning``thinking`; `event_msg/token_count` → cost
block. Because there is no parent-ref DAG, the adapter assigns `seq`/`parent_seq`
from temporal order rather than native links.
### 2.3 Grok CLI (xAI) ✅ verified on disk
Grok stores **a directory per session**, which is the richest source of the three.
| Aspect | Finding |
|--------|---------|
| Session dir | `~/.grok/sessions/<url-encoded-cwd>/<session-uuid>/` |
| `chat_history.jsonl` | full conversation; `type` = `system`/`user`/`assistant` + content |
| `events.jsonl` | **structured lifecycle events**`{ts, type, session_id, turn_number, model_id, yolo_mode, conversation_message_count, session_relationship, schema_version}`; types like `turn_started`, `loop_started` |
| `updates.jsonl` | streaming incremental updates |
| `summary.json` | `{id, cwd, session_summary, created_at, updated_at}` |
| `prompt_context.json` | injected context, incl. which AGENTS.md/CLAUDE.md files were loaded |
| `system_prompt.txt` | exact system prompt for the session |
| `rewind_points.jsonl`, `plan_mode.json` | rewind/plan-mode state |
| Per-cwd prompt history | `~/.grok/sessions/<cwd>/prompt_history.jsonl``{timestamp, session_id, prompt, is_bash}` |
| Global structured log | `~/.grok/logs/unified.jsonl``{ts, src, pid, lvl, msg, ctx, sid, ver}` |
| Search index | `~/.grok/sessions/session_search.sqlite``session_docs(session_id, cwd, updated_at, title)` + FTS5 (`session_docs_fts`) we can query directly |
| Integration surfaces | Grok exposes **ACP (Agent Client Protocol)**, **headless mode** (`grok -p`), and **hooks** (`~/.grok/docs/user-guide/10-hooks.md`) — push-capture options |
### 2.4 Cross-family summary
| | Claude Code | Codex CLI | Grok CLI |
|--|--|--|--|
| Root | `~/.claude/projects/` | `~/.codex/sessions/` | `~/.grok/sessions/` |
| Unit | one `.jsonl`/session | one `rollout-*.jsonl`/session | one **dir**/session |
| Layout | flat per-cwd dir | date-partitioned `YYYY/MM/DD` | per-cwd, per-session dir |
| Discriminator | `type` | `type` (version-dependent) | `type` (in `chat_history`/`events`) |
| Lifecycle events | inferred from records | inferred from records | **explicit** `events.jsonl` |
| Token usage | `message.usage` | per-line usage | from events/updates |
| Push capture | Stop/SessionEnd hooks | `codex exec` wrappers | hooks / ACP |
| Pull capture | scan dir by mtime | scan date partitions | scan dirs / query FTS sqlite |
**Implication:** the common denominator is *"JSONL records discriminated by a
`type` field, with a session id, timestamps, turn linkage, tool calls, and token
usage."* That maps cleanly onto one normalized schema (§4). Per-family quirks
(Grok's explicit `events.jsonl`, Codex's schema versions, Claude's sidechains) are
handled inside each adapter.
## 3. Tiered Storage Model
```
Tier 0 SOURCE (agents' own logs) read-only, never mutated
~/.claude/projects ~/.codex/sessions ~/.grok/sessions
│ collector adapters (per family) + ingest cursor
Tier 1 RAW CACHE (bounded, EVICTABLE) normalized Session + Event records
│ signal extractors / digesters
Tier 2 DISTILLED MEMORY (durable, small) session digests + signals + pattern evidence
```
- **Tier 0 — Source.** The agents' own logs. We treat them as read-only. We keep a
small **ingest cursor** per source so re-scans are incremental (see §6).
- **Tier 1 — Raw cache.** Normalized copies of sessions/events. This is the bulky
tier and the *only* tier subject to budget eviction.
- **Tier 2 — Distilled memory.** Per-session **digest** (outcome, costs, tool
histogram, error/retry/intervention markers, key snippets) plus extracted
**signals** and **pattern evidence pointers**. Compact and durable. A session can
be fully evicted from Tier 1 once its Tier 2 digest exists.
This is what makes "drop old content once it has been analyzed" safe: analysis
*promotes* the valuable bits into Tier 2 before the raw bytes are dropped.
### 3.1 Per-session lifecycle / watermarks
Each session row carries timestamps that drive eviction:
```
discovered_at → ingested_at → analyzed_at → [evictable] → evicted_at
```
- `ingested_at` set when normalized into Tier 1.
- `analyzed_at` set when the Tier 2 digest is written. **A session is evictable iff
`analyzed_at` is set.**
- `evicted_at` set when raw bytes are dropped from Tier 1 (Tier 2 digest remains).
## 4. Normalized Schema (Tier 1)
Two record kinds. Field names are stable across all adapters.
### 4.1 `Session`
```jsonc
{
"session_uid": "claude:17092961-…", // "<flavor>:<native id>", globally unique
"flavor": "claude" | "codex" | "grok",
"native_session_id": "17092961-…",
"repo": "agentic-resources", // resolved from cwd
"domain": "helix_forge", // resolved from repo→domain map
"cwd": "/home/worsch/agentic-resources",
"git_branch": "main",
"model": "claude-opus-4-8",
"started_at": "2026-06-05T21:59:30Z",
"ended_at": "2026-06-05T22:14:00Z",
"outcome": "success|fail|abandoned|unknown",
"cost": { "input_tokens": 0, "output_tokens": 0, "cache_tokens": 0,
"wall_clock_s": 0, "turns": 0, "retries": 0 },
"task_ref": "AGENTIC-WP-0002-T01", // if derivable; else null
"source_path": "~/.claude/projects/…/….jsonl",
"source_bytes": 0,
"schema_version": 1,
"ingested_at": "…", "analyzed_at": null, "evicted_at": null
}
```
### 4.2 `SessionEvent`
```jsonc
{
"session_uid": "claude:17092961-…",
"seq": 12, // monotonic within session
"parent_seq": 11, // turn DAG (Claude uuid/parentUuid)
"ts": "2026-06-05T22:01:13Z",
"kind": "user_msg | assistant_msg | thinking | tool_call | tool_result"
+ "| error | test_run | edit | retry | human_intervention | decision"
+ "| lifecycle | completion",
"role": "user|assistant|system|tool",
"tool": "Bash|Edit|Read|…", // when kind=tool_call/result
"summary": "ran pytest -q", // short, human-readable
"payload_ref": "blob://…", // pointer to full content in Tier 1 blob store
"tokens": 0,
"is_sidechain": false
}
```
Adapters map native records onto `kind`. Grok's `events.jsonl` populates
`lifecycle`/`turn` events directly; Claude/Codex lifecycle is inferred from the
record stream. Bulky bodies live behind `payload_ref` so Tier 1 rows stay light
and blobs can be evicted independently.
### 4.3 Native → `kind` mapping (all three families)
Each cell is the native record/discriminator an adapter reads to emit that
`SessionEvent.kind`. `—` = not natively present; the adapter synthesizes or omits.
| `kind` | Claude Code (`type` / `message`) | Codex CLI (`type``payload.type`) | Grok CLI (file → `type`) |
|--------|----------------------------------|--------------------------------------|---------------------------|
| `user_msg` | `user`, `message.role=user` | `response_item``message` `role=user`/`developer` | `chat_history``user` |
| `assistant_msg` | `assistant`, `message.role=assistant`, content `text` | `response_item``message` `role=assistant` (`output_text`) | `chat_history``assistant` |
| `thinking` | `assistant` content block `type=thinking` | `response_item``reasoning` | `chat_history`/`updates` reasoning block |
| `tool_call` | `assistant` content block `type=tool_use` (`name`,`input`) | `response_item``function_call` (`name`,`arguments`,`call_id`) | `chat_history`/`updates` tool-call entry |
| `tool_result` | `user`/tool record `type=tool_result` + `toolUseResult` | `response_item``function_call_output` (join on `call_id`) | `updates` tool-result entry |
| `test_run` | derived from `tool_call` (Bash running tests) | derived from `function_call` (`exec_command`) | derived from tool-call entry |
| `edit` | `tool_use` where `name` ∈ Edit/Write/NotebookEdit | `function_call` apply-patch/file-write tool | tool-call entry (edit/write) |
| `error` | `toolUseResult` error / non-zero result | `function_call_output` error / `event_msg` error | `events.jsonl` error / failed update |
| `retry` | repeated `tool_use` after error (inferred via DAG) | repeated `function_call` after error (inferred, temporal) | `events.jsonl` loop/retry event |
| `human_intervention` | `user` record mid-turn (interrupt), `userType` | `event_msg``user_message` mid-task | `prompt_history` mid-session / `events.jsonl` |
| `decision` | recorded out-of-band (State Hub `/decisions`) | recorded out-of-band (State Hub) | recorded out-of-band (State Hub) |
| `lifecycle` | inferred: first/last record, `summary`, `queue-operation` | `event_msg``task_started` / `task_complete` | **`events.jsonl`** → `turn_started`/`loop_started`/… (explicit) |
| `completion` | inferred: last `assistant` + `Stop`/`SessionEnd` hook | `event_msg``task_complete` | `events.jsonl` turn end + `summary.json` |
**Linkage note (drives `seq`/`parent_seq`):** Claude has a true turn DAG
(`uuid`/`parentUuid`) — preserve it directly. Codex is **flat**, joined only by
`call_id`; assign `seq` by temporal order. Grok carries explicit `turn_number` in
`events.jsonl`; key `seq` off that plus record order.
**Cost block sources:** Claude `message.usage`; Codex `event_msg/token_count`;
Grok `events.jsonl` / `updates.jsonl` token fields.
## 5. Retention & Eviction
The user's stated preference: **storage-budget-based**, dropping old content once
it has been analyzed or once it no longer fits — *better than* a fixed daily/weekly
window. We implement budget-based as primary, with a time backstop and a scheduled
cadence as the trigger.
### 5.1 Configurable knobs
```toml
[session_memory.retention]
raw_soft_cap_bytes = "4GiB" # begin evicting analyzed sessions above this
raw_hard_cap_bytes = "6GiB" # absolute ceiling for Tier 1
raw_max_age_days = 45 # backstop: analyzed raw older than this is evictable regardless of space
distilled_cap_bytes = "1GiB" # Tier 2 ceiling (should grow slowly; alert, don't auto-drop)
cadence = "daily" # ingest+analyze+evict sweep: daily | weekly | on-hook
```
### 5.2 Eviction algorithm (runs after each ingest+analyze sweep)
1. **Compute** current Tier 1 usage.
2. **Backstop pass:** evict any session where `analyzed_at` is set AND
`age > raw_max_age_days`.
3. **Budget pass:** while `usage > raw_soft_cap_bytes`:
- pick the **oldest `analyzed_at`** session that is not yet evicted;
- drop its Tier 1 raw rows + blobs (Tier 2 digest is kept), set `evicted_at`;
- if **no analyzed-but-unevicted session remains**, stop the budget pass
(we will not destroy un-analyzed data to free space) and go to step 4.
4. **Back-pressure / overflow:** if `usage > raw_hard_cap_bytes` and the only
remaining bulk is **un-analyzed**:
- first try to **analyze now** (run extraction) to make those sessions
evictable, then re-run the budget pass;
- if still over hard cap (analysis can't keep up or fails), evict the **oldest
un-analyzed** sessions as a last resort and emit a
`session_memory.data_loss` warning event + a State Hub progress note. This is
the only path that loses un-analyzed data, and it is always reported.
5. **Tier 2 guard:** if distilled usage > `distilled_cap_bytes`, **do not
auto-drop**; flag for human/curation review (digests are the product).
**Invariant:** *no session's raw bytes are dropped before its Tier 2 digest
exists, except the explicitly-reported hard-cap overflow path.*
### 5.3 Why budget-based beats fixed-window
A fixed daily/weekly drop either deletes data we never analyzed (lossy) or hoards
data we already distilled (wasteful). Budget + `analyzed_at` watermark ties
deletion to **two** real conditions the user named — *"once it has been analyzed"*
(promoted to Tier 2) and *"doesn't fit any longer"* (over budget) — and only falls
back to time as a backstop.
## 6. Ingest Cursors (incremental, idempotent)
Per source, persist a small cursor so sweeps are cheap and re-runnable:
- **Claude / Grok (per-cwd dirs):** track `(file_path, size, mtime)` and last
parsed line offset; re-ingest only grown/changed files. `session_uid` dedupes.
- **Codex (date partitions):** track last-seen `YYYY/MM/DD` + per-file offset.
- Ingest is **idempotent** keyed on `(session_uid, seq)` — safe to re-run after a
crash or partial sweep.
## 7. Capture Modes
- **Pull (default, portable):** scheduled sweep scans Tier 0 by mtime/partition.
Works for all three families with zero coupling to the agent. Triggered on the
configured `cadence` via the repo's scheduler (`/schedule`, cron, or `/loop`).
- **Push (optional, low-latency):** wire the agent's own hooks to ping the ingester
on session close — Claude `Stop`/`SessionEnd` hooks, Grok hooks/ACP, Codex
`exec` wrappers. Push just enqueues; the same idempotent pull pipeline does the
work.
Capture must be **non-blocking** (PRD FR-C5): we read copies of logs out-of-band;
we never sit in the agent's critical path.
## 8. Component Layout (proposed, in-repo)
```
session-memory/
adapters/
claude.py # Tier0→Tier1 normalizer (verified schema)
codex.py # version-detecting normalizer (confirm against real rollout)
grok.py # reads session dir incl. events.jsonl
core/
schema.py # Session / SessionEvent dataclasses + versioning
store.py # Tier1 (rows+blobs) and Tier2 (digests) — SQLite to start
cursor.py # per-source ingest cursors
retention.py # §5 eviction algorithm
digest.py # Tier1→Tier2 session digest + signal stubs
ingest.py # one sweep: discover → normalize → analyze → evict
config.toml # §5.1 knobs + repo→domain map + source paths
```
Storage starts as **SQLite + a blob dir** (rows in SQLite, bulky payloads as files
under `payload_ref`); graduate to Postgres alongside the State Hub only if volume
demands. Digests/decisions are also surfaced to the hub per ADR-001 (files-first;
hub indexes).
## 9. Privacy / Safety
- Tier 0 logs can contain secrets (the Grok `auth.json` and Claude `.credentials`
live in the same trees). The ingester reads **only** session transcripts, never
credential files, and **redacts** obvious secret patterns into `payload_ref`
blobs.
- All data is local; nothing leaves the workstation. Eviction of Tier 1 is a real
delete (not just an index drop) so the bounded cache is also a privacy bound.
## 10. Open Questions
- ~~**OQ1** Confirm Codex `rollout-*.jsonl` per-line schema.~~ **Resolved** (§2.2):
`{timestamp,type,payload}` lines, `type``session_meta`/`response_item`/`event_msg`/`turn_context`/`compacted`,
tool calls flat-linked by `call_id`, tokens via `event_msg/token_count`. Remaining
sub-item: verify the `token_count` payload field names against a real install when
Codex is present (older-version variance only).
- **OQ2** Outcome inference: how do we reliably label `success/fail/abandoned`
across flavors (exit signals differ)? Start heuristic (last-turn + test results +
human-intervention markers), refine in Detect phase.
- **OQ3** `task_ref` resolution — can we always map a session to a workplan task
(via cwd + branch + state-hub), or only sometimes?
- ~~**OQ4** Right default for `raw_soft_cap_bytes`.~~ **Measured** (Phase 0, 85
real local Claude files / 63 distinct sessions): source bytes per session
min 396 · **median ~49 KB** · max 48 MB (one outlier) · ~103 MB total. Claude
defaults (4 GiB soft / 6 GiB hard) leave ample headroom; revisit once Grok dirs
(heavier, multi-file) are ingested in Phase 1.
- **OQ6 (new, found in Phase 0)** Multi-file sessions: ~84 transcript files mapped
to ~63 `session_uid`s — some sessions span multiple files (resume/sidechain
sharing a `sessionId`). Current behavior upserts (last file wins per
`(session_uid, seq)`); a future refinement is to *merge* events across files of
one session rather than overwrite. Acceptable for Phase 0.
- **OQ5** Should push-hooks be opt-in per machine to avoid surprising the agents?
---
## 11. Project metrics correlation (kaizen-agentic)
Helix Forge owns **fleet-level** session capture and digests (this repo). The
**kaizen-agentic** framework owns **project-scoped** agent execution metrics
(ADR-004: `.kaizen/metrics/<agent>/executions.jsonl`). The two layers correlate
by optional `helix_session_uid` on project records — link-by-reference, no
duplicate ingestion in either repo.
| Layer | Owner | Storage |
|-------|-------|---------|
| Fleet | agentic-resources (Helix Forge) | digest store (`digests` table) |
| Project | kaizen-agentic | `.kaizen/metrics/<agent>/executions.jsonl` |
**Cross-repo contract:** [Helix Forge Correlation Contract](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/docs/integrations/helix-forge-correlation.md)
(kaizen-agentic). Field mapping from `Session.session_uid``helix_session_uid`,
`digest.cost``tokens`, `tool_histogram` MCP share → `infra_overhead_share`.
**Read path:** `kaizen-agentic metrics correlate <uid>` looks up a digest via
`HELIX_STORE_DB` (this repo's session store). No write path from kaizen-agentic
into Helix Forge.
**Related kaizen-agentic docs:** [ADR-004 project metrics convention](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/docs/adr/ADR-004-project-metrics-convention.md),
[wiki/EcosystemIntegration.md](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/wiki/EcosystemIntegration.md).
### 11.1 Session-close env export (dual-layer agents)
Agents that run **both** Helix Forge capture and kaizen `metrics record` should
export the following **after** the ingest sweep has written the session digest
(`python -m session_memory.ingest` or an equivalent Stop/SessionEnd hook). Names
match kaizen-agentic ADR-004 — do not invent parallel aliases.
| Variable | Source in Helix Forge | Purpose |
|----------|----------------------|---------|
| `HELIX_SESSION_UID` | `Session.session_uid` | Primary correlation key → `helix_session_uid` |
| `HELIX_REPO` | `digest.repo` | Project/repo scoping |
| `HELIX_FLAVOR` | `digest.flavor` | Agent runtime (`claude` / `codex` / `grok`) |
| `HELIX_TOKENS` | `digest.cost.input_tokens + digest.cost.output_tokens` | Token rollup → `tokens` |
| `HELIX_INFRA_OVERHEAD_SHARE` | infra bucket share over `tool_histogram` (see `measure.metrics.session_metrics`) | MCP/plumbing overhead → `infra_overhead_share` |
Example (after digest exists):
```bash
export HELIX_SESSION_UID="claude:abc-123"
export HELIX_REPO="agentic-resources"
export HELIX_FLAVOR="claude"
export HELIX_TOKENS=125000
export HELIX_INFRA_OVERHEAD_SHARE=0.117
# optional — lets kaizen correlate without guessing the store location:
export HELIX_STORE_DB="$(pwd)/session_memory/.store/mem.db"
kaizen-agentic metrics record # merges HELIX_* when present
```
### 11.2 Digest store location and read API
- **`HELIX_STORE_DB`** — absolute path to the SQLite file holding Tier 2 digests.
Defaults to `config.toml` `[store].db_path` (`session_memory/.store/mem.db` relative
to the repo root). Export as an absolute path when setting the variable on session
close so `metrics correlate` works across hosts and working directories.
- **Thin CLI** — `python -m session_memory.digest_lookup <session_uid> [--json]`
prints one digest without running ingest. Exit `0` on hit, `1` when missing.
- **Programmatic** — `Store.get_digest(session_uid)` returns the JSON blob written
by `build_digest` / `analyze`.
**Stable digest JSON shape** (fields consumers may rely on):
| Field | Type | Notes |
|-------|------|-------|
| `session_uid` | string | Normalized uid (`<flavor>:<native-id>`) |
| `flavor`, `repo`, `domain` | string | Session attribution |
| `model` | string | Model id when known |
| `started_at`, `ended_at` | string | ISO timestamps |
| `outcome` | string | `success` / `fail` / `abandoned` / `unknown` |
| `cost` | object | `input_tokens`, `output_tokens`, `cache_tokens`, `wall_clock_s`, `turns`, `retries` |
| `tool_histogram` | object | Tool name → call count |
| `event_count`, `kind_counts`, `markers` | object/int | Compact activity summary |
| `first_prompt`, `last_assistant` | string | Short text snippets |
| `error_snippets` | array | `{fingerprint, sample, count, tool}` entries |
| `schema_version` | int | Digest schema version |
---
*Implemented:* Phases 04, weekly retro ([AGENTIC-WP-0002][AGENTIC-WP-0010]);
kaizen correlation follow-up ([AGENTIC-WP-0011]).
## Sources
- Claude Code session format — verified on disk: `~/.claude/projects/*/*.jsonl`, `~/.claude/history.jsonl`.
- Grok CLI session format — verified on disk: `~/.grok/sessions/`, `~/.grok/logs/unified.jsonl`, `~/.grok/sessions/session_search.sqlite`; `~/.grok/README.md` (ACP/headless/hooks).
- Codex CLI session format — [ccusage Codex guide](https://ccusage.com/guide/codex/), [Codex advanced config](https://developers.openai.com/codex/config-advanced), [codex-trace](https://github.com/PixelPaw-Labs/codex-trace), [codex-logs](https://github.com/wondercoms/codex-logs), [Session/Rollout Files discussion #3827](https://github.com/openai/codex/discussions/3827), [trajectory-JSON issue #2288](https://github.com/openai/codex/issues/2288).