generated from coulomb/repo-seed
Compare commits
46 Commits
7c6f4358ee
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| 43bea485aa | |||
| 63eb431db9 | |||
| 3250a1746f | |||
| 41bfb6e0f3 | |||
| d2e50cf96a | |||
| 01d2affc3b | |||
| 292b656952 | |||
| 0a5ba5c24a | |||
| a66d502b95 | |||
| f9f91a0ca8 | |||
| 06bcfdc1d9 | |||
| e237dcc622 | |||
| 0d05dfcc5d | |||
| 15ba625351 | |||
| 4f28cd67cf | |||
| 035c7a20d3 | |||
| 59632e94db | |||
| 00e8958540 | |||
| 9e28b1b806 | |||
| 7646cbc358 | |||
| 9e6f8a6e08 | |||
| ea03cbdd47 | |||
| 1b6081cd88 | |||
| 7cce276d32 | |||
| e022c0f9d6 | |||
| 2bd6aa3b41 | |||
| 97379e9658 | |||
| dbd212d2b1 | |||
| 896fde59f0 | |||
| 48618293b0 | |||
| 21c714e286 | |||
| 70433cda61 | |||
| 56b2f576de | |||
| d06791f070 | |||
| 519e76442a | |||
| 4b7a628b6f | |||
| ab22d22bfb | |||
| e51fd8154d | |||
| c6164a82ba | |||
| 5f810a6992 | |||
| 43d76b5cf8 | |||
| 055713aa4f | |||
| 436a96dcd8 | |||
| 06767ef924 | |||
| bc11cb9aec | |||
| 5aea22f24f |
20
.claude/rules/agents.md
Normal file
20
.claude/rules/agents.md
Normal file
@@ -0,0 +1,20 @@
|
|||||||
|
## Kaizen Agents
|
||||||
|
|
||||||
|
Specialized agent personas available on demand via the state-hub MCP.
|
||||||
|
|
||||||
|
**Discover:** `list_kaizen_agents()` — returns all agents with name, description, category
|
||||||
|
**Load:** `get_kaizen_agent("tdd-workflow")` — returns full instructions; read and follow them
|
||||||
|
|
||||||
|
Common agents:
|
||||||
|
|
||||||
|
| Agent | Category | When to use |
|
||||||
|
|-------|----------|-------------|
|
||||||
|
| `tdd-workflow` | testing | Step-by-step TDD8 workflow for any feature |
|
||||||
|
| `code-refactoring` | quality | Code quality analysis and safe refactoring |
|
||||||
|
| `test-maintenance` | testing | Diagnose and fix failing tests |
|
||||||
|
| `requirements-engineering` | process | Prevent interface/mock mismatches upfront |
|
||||||
|
| `keepaTodofile` | process | Maintain TODO.md during work |
|
||||||
|
| `project-management` | process | Track status, determine next steps |
|
||||||
|
| `datamodel-optimization` | quality | Optimize dataclasses and data structures |
|
||||||
|
|
||||||
|
All 17 agents: call `list_kaizen_agents()` for the full list.
|
||||||
8
.claude/rules/architecture.md
Normal file
8
.claude/rules/architecture.md
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
## Architecture
|
||||||
|
|
||||||
|
<!-- TODO: Describe the key design decisions and component structure.
|
||||||
|
Key modules, data flows, external integrations, state machines, etc. -->
|
||||||
|
|
||||||
|
## Quick Reference
|
||||||
|
|
||||||
|
`~/state-hub/mcp_server/TOOLS.md` — MCP tool reference
|
||||||
50
.claude/rules/credential-routing.md
Normal file
50
.claude/rules/credential-routing.md
Normal file
@@ -0,0 +1,50 @@
|
|||||||
|
# Credential and access routing
|
||||||
|
|
||||||
|
**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
|
||||||
|
for inference. Run this check **before** requesting secrets, API keys, SSH access,
|
||||||
|
login tokens, or database passwords — in any repo, not only `ops-warden`.
|
||||||
|
|
||||||
|
ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
|
||||||
|
other credential need belongs to another subsystem. **Do not** message
|
||||||
|
`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
|
||||||
|
|
||||||
|
### Lookup (do this first)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
warden route find "<describe your need>" --json
|
||||||
|
warden route show <catalog-id> --json
|
||||||
|
```
|
||||||
|
|
||||||
|
Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
|
||||||
|
|
||||||
|
| Agent runtime | How to orient |
|
||||||
|
| --- | --- |
|
||||||
|
| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=agentic-resources` is for coordination, not secret vending |
|
||||||
|
| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
|
||||||
|
| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
|
||||||
|
|
||||||
|
### Quick routing table
|
||||||
|
|
||||||
|
| I need… | Owner | ops-warden executes? |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes** — `warden sign` |
|
||||||
|
| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
|
||||||
|
| Login / OIDC / MFA | key-cape / Keycloak | No — route only |
|
||||||
|
| Authorization decision | flex-auth | No — route only |
|
||||||
|
| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
|
||||||
|
| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
|
||||||
|
|
||||||
|
### Anti-patterns (do not do these)
|
||||||
|
|
||||||
|
- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
|
||||||
|
- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
|
||||||
|
- Pasting secrets into Git, State Hub, workplans, logs, or chat
|
||||||
|
|
||||||
|
### Other capabilities (reuse-surface)
|
||||||
|
|
||||||
|
Non-credential capabilities are usually discovered through **reuse-surface** federation
|
||||||
|
(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
|
||||||
|
every repo's agent instructions because it is high-frequency, high-risk, and easy to
|
||||||
|
get wrong.
|
||||||
|
|
||||||
|
**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`
|
||||||
38
.claude/rules/first-session.md
Normal file
38
.claude/rules/first-session.md
Normal file
@@ -0,0 +1,38 @@
|
|||||||
|
## First Session Protocol
|
||||||
|
|
||||||
|
Triggered when `get_domain_summary("infotech")` shows **no workstreams**.
|
||||||
|
The project is registered but work has not yet been structured.
|
||||||
|
|
||||||
|
**Step 1 — Read, don't write**
|
||||||
|
- `~/the-custodian/canon/projects/infotech/project_charter_v0.1.md` — purpose, scope
|
||||||
|
- `~/the-custodian/canon/projects/infotech/roadmap_v0.1.md` — planned phases
|
||||||
|
- Scan repo root: README, directory structure, existing code or docs
|
||||||
|
|
||||||
|
**Step 2 — Survey in-progress work**
|
||||||
|
Look for TODOs, open branches, half-finished files. Note done vs. started but incomplete.
|
||||||
|
|
||||||
|
**Step 3 — Propose workstreams to Bernd**
|
||||||
|
Propose 1–3 workstreams — each a coherent strand, weeks to months, anchored to a
|
||||||
|
roadmap phase. **Wait for approval before creating.**
|
||||||
|
|
||||||
|
**Step 4 — Create workplan file first, then DB record (ADR-001)**
|
||||||
|
```
|
||||||
|
workplans/AGENTIC-WP-NNNN-<slug>.md ← write this first
|
||||||
|
```
|
||||||
|
Then register in the hub:
|
||||||
|
```
|
||||||
|
create_workstream(topic_id="f39fa2a3-c491-414c-a91b-b4c5fcc6139c", title="...", owner="...", description="...")
|
||||||
|
create_task(workstream_id="<id>", title="...", priority="high|medium|low")
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 5 — Record the setup**
|
||||||
|
```
|
||||||
|
add_progress_event(
|
||||||
|
summary="First session: structured infotech into N workstreams, M tasks",
|
||||||
|
event_type="milestone",
|
||||||
|
topic_id="f39fa2a3-c491-414c-a91b-b4c5fcc6139c",
|
||||||
|
detail={"workstreams": [...], "tasks_created": M}
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
<!-- Delete or archive this file once past first session -->
|
||||||
8
.claude/rules/repo-boundary.md
Normal file
8
.claude/rules/repo-boundary.md
Normal file
@@ -0,0 +1,8 @@
|
|||||||
|
## Repo boundary
|
||||||
|
|
||||||
|
This repo owns **agentic-resources** only. It does not own:
|
||||||
|
|
||||||
|
<!-- TODO: List what belongs in adjacent repos, e.g.:
|
||||||
|
- SSH key management → railiance-infra/
|
||||||
|
- State hub code → state-hub/
|
||||||
|
-->
|
||||||
5
.claude/rules/repo-identity.md
Normal file
5
.claude/rules/repo-identity.md
Normal file
@@ -0,0 +1,5 @@
|
|||||||
|
**Purpose:** Iterating towards optimal agentic performance.
|
||||||
|
|
||||||
|
**Domain:** infotech
|
||||||
|
**Repo slug:** agentic-resources
|
||||||
|
**Topic ID:** f39fa2a3-c491-414c-a91b-b4c5fcc6139c
|
||||||
85
.claude/rules/session-protocol.md
Normal file
85
.claude/rules/session-protocol.md
Normal file
@@ -0,0 +1,85 @@
|
|||||||
|
## Session Protocol
|
||||||
|
|
||||||
|
Dev Hub (State Hub API): http://127.0.0.1:8000
|
||||||
|
MCP server name in `~/.claude.json`: `dev-hub`
|
||||||
|
|
||||||
|
**Step 1 — Orient**
|
||||||
|
|
||||||
|
Read the offline-safe brief first — it works without a live hub connection:
|
||||||
|
```bash
|
||||||
|
cat .custodian-brief.md
|
||||||
|
```
|
||||||
|
Then call the MCP tool for richer cross-domain context when MCP tools are exposed:
|
||||||
|
```
|
||||||
|
get_domain_summary("infotech")
|
||||||
|
```
|
||||||
|
If MCP tools are unavailable in the current agent session, use the REST API:
|
||||||
|
```bash
|
||||||
|
curl -s "http://127.0.0.1:8000/state/summary" | python3 -m json.tool
|
||||||
|
```
|
||||||
|
If the hub is offline: `cd ~/state-hub && make api`
|
||||||
|
|
||||||
|
**Step 2 — Check inbox**
|
||||||
|
With MCP tools:
|
||||||
|
```
|
||||||
|
get_messages(to_agent="agentic-resources", unread_only=True)
|
||||||
|
```
|
||||||
|
Mark read with `mark_message_read(message_id)`. Reply or act on coordination
|
||||||
|
requests before proceeding.
|
||||||
|
|
||||||
|
Without MCP tools:
|
||||||
|
```bash
|
||||||
|
curl -s "http://127.0.0.1:8000/messages/?to_agent=agentic-resources&unread_only=true" \
|
||||||
|
| python3 -m json.tool
|
||||||
|
curl -s -X PATCH "http://127.0.0.1:8000/messages/<id>/read" \
|
||||||
|
-H "Content-Type: application/json" -d '{}'
|
||||||
|
```
|
||||||
|
|
||||||
|
**Step 3 — Scan workplans**
|
||||||
|
```bash
|
||||||
|
ls workplans/
|
||||||
|
```
|
||||||
|
For each file with `status: ready`, `active`, or `blocked`, note pending
|
||||||
|
`wait`/`todo`/`progress` tasks.
|
||||||
|
|
||||||
|
**Step 4 — Present brief**
|
||||||
|
|
||||||
|
1. **Active workstreams** for `infotech` — title, task counts, blocking decisions
|
||||||
|
2. **Pending tasks** from `workplans/` + any `[repo:agentic-resources]` hub tasks
|
||||||
|
3. **Goal guidance** — if `goal_guidance` in summary:
|
||||||
|
- `needs_workplan`: surface as top action — *"Repo goal '{title}' has no workplan yet"*
|
||||||
|
- `alignment_warnings`: flag if active work is not aligned with current goal
|
||||||
|
4. **Suggested next action** — highest-priority open item
|
||||||
|
5. **SBOM status** — flag if `last_sbom_at` is unset for this repo
|
||||||
|
|
||||||
|
If no workstreams: follow First Session Protocol (`first-session.md`).
|
||||||
|
|
||||||
|
**During work:** `record_decision()` · `add_progress_event()` · `resolve_decision()`
|
||||||
|
|
||||||
|
> State Hub is a *read model*. Bootstrap tools (`create_workstream`, `create_task`)
|
||||||
|
> are First Session Protocol only. Work structure belongs in repo files (ADR-001).
|
||||||
|
|
||||||
|
**Session close:**
|
||||||
|
With MCP tools:
|
||||||
|
```
|
||||||
|
add_progress_event(summary="...", topic_id="f39fa2a3-c491-414c-a91b-b4c5fcc6139c", workstream_id="<uuid>")
|
||||||
|
```
|
||||||
|
Without MCP tools:
|
||||||
|
```bash
|
||||||
|
curl -s -X POST http://127.0.0.1:8000/progress/ \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"topic_id":"f39fa2a3-c491-414c-a91b-b4c5fcc6139c","workstream_id":"<uuid>","event_type":"note","summary":"what changed","author":"codex"}'
|
||||||
|
```
|
||||||
|
If workplan files were modified, ensure the local copy is up to date first:
|
||||||
|
```bash
|
||||||
|
git -C <repo_path> pull --ff-only
|
||||||
|
cd ~/state-hub && make fix-consistency REPO=agentic-resources
|
||||||
|
```
|
||||||
|
For repos where implementation runs on a remote machine (e.g. CoulombCore),
|
||||||
|
use the combined target which pulls before fixing:
|
||||||
|
```bash
|
||||||
|
cd ~/state-hub && make fix-consistency-remote REPO=agentic-resources
|
||||||
|
```
|
||||||
|
**C-15** (DB task ahead of file) is normal in multi-machine workflows — writeback
|
||||||
|
will sync the file to match DB. **C-16** (repo behind remote) blocks all writes
|
||||||
|
until you pull — intentional to prevent clobbering remote progress.
|
||||||
19
.claude/rules/stack-and-commands.md
Normal file
19
.claude/rules/stack-and-commands.md
Normal file
@@ -0,0 +1,19 @@
|
|||||||
|
## Stack
|
||||||
|
|
||||||
|
<!-- TODO: Fill in language, frameworks, and key dependencies -->
|
||||||
|
- **Language:**
|
||||||
|
- **Key deps:**
|
||||||
|
|
||||||
|
## Dev Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# TODO: Fill in the standard commands for this repo
|
||||||
|
|
||||||
|
# Install dependencies
|
||||||
|
|
||||||
|
# Run tests
|
||||||
|
|
||||||
|
# Lint / type check
|
||||||
|
|
||||||
|
# Build / package (if applicable)
|
||||||
|
```
|
||||||
40
.claude/rules/workplan-convention.md
Normal file
40
.claude/rules/workplan-convention.md
Normal file
@@ -0,0 +1,40 @@
|
|||||||
|
## Workplan Convention (ADR-001)
|
||||||
|
|
||||||
|
File location: `workplans/AGENTIC-WP-NNNN-<slug>.md`
|
||||||
|
ID prefix: `AGENTIC-WP-`
|
||||||
|
|
||||||
|
Work items originate as files in this repo **before** being registered in the hub.
|
||||||
|
|
||||||
|
Canonical workplan/workstream frontmatter statuses are:
|
||||||
|
`proposed`, `ready`, `active`, `blocked`, `backlog`, `finished`, `archived`.
|
||||||
|
Use `proposed` for a newly drafted plan, `ready` after review against current
|
||||||
|
repo state, and `finished` when implementation is complete. `stalled` and
|
||||||
|
`needs_review` are derived health labels, not stored statuses.
|
||||||
|
|
||||||
|
Closed workplans may be moved to `workplans/archived/` with a completion-date
|
||||||
|
prefix: `YYMMDD-AGENTIC-WP-NNNN-<slug>.md`. The frontmatter id remains
|
||||||
|
unchanged; the prefix is only for quick visual reference.
|
||||||
|
|
||||||
|
Small opportunistic tasks discovered during another session use **Ad Hoc Tasks**:
|
||||||
|
`workplans/ADHOC-YYYY-MM-DD.md`, workstream slug `adhoc-YYYY-MM-DD`, and task ids
|
||||||
|
`ADHOC-YYYY-MM-DD-T01`, `T02`, etc. Use adhocs only for low-risk work completed
|
||||||
|
directly. Promote anything requiring analysis, design, approval, dependencies, or
|
||||||
|
multiple planned phases into a normal workplan.
|
||||||
|
|
||||||
|
Ecosystem todos from other agents arrive as `[repo:agentic-resources]` hub tasks —
|
||||||
|
visible at session start. Pick one up by creating the workplan file, then registering
|
||||||
|
the workstream.
|
||||||
|
|
||||||
|
Task blocks use this shape:
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: AGENTIC-WP-NNNN-T01
|
||||||
|
status: wait | todo | progress | done | cancel
|
||||||
|
priority: high | medium | low
|
||||||
|
state_hub_task_id: "<uuid>" # written by fix-consistency — do not edit
|
||||||
|
```
|
||||||
|
|
||||||
|
Status progression is `todo` → `progress` → `done`; use `wait` for waiting or
|
||||||
|
blocked work and `cancel` for stopped work.
|
||||||
|
|
||||||
|
<!-- Ralph Loop rules and HEUREKA sequence: ~/.claude/CLAUDE.md — do not duplicate here -->
|
||||||
@@ -2,18 +2,12 @@
|
|||||||
# Custodian Brief — agentic-resources
|
# Custodian Brief — agentic-resources
|
||||||
|
|
||||||
**Domain:** helix_forge
|
**Domain:** helix_forge
|
||||||
**Last synced:** 2026-06-05 22:10 UTC
|
**Last synced:** 2026-06-21 14:09 UTC
|
||||||
**State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)*
|
**State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)*
|
||||||
|
|
||||||
## Active Workstreams
|
## Active Workstreams
|
||||||
|
|
||||||
### Bootstrap State Hub integration
|
*(none — repo may need first-session setup)*
|
||||||
Progress: 0/3 done | workstream_id: `bb9a43a3-a54f-434b-97c2-e1c7142b52f5`
|
|
||||||
|
|
||||||
**Open tasks:**
|
|
||||||
- · Review Generated Integration Files `3ad7b7a9`
|
|
||||||
- · Verify Local Developer Workflow `db248d57`
|
|
||||||
- · Seed First Real Workplan `9cbb7aa5`
|
|
||||||
|
|
||||||
---
|
---
|
||||||
## MCP Orientation (when available)
|
## MCP Orientation (when available)
|
||||||
|
|||||||
2
.gitignore
vendored
2
.gitignore
vendored
@@ -177,6 +177,8 @@ cython_debug/
|
|||||||
|
|
||||||
# session-memory local store
|
# session-memory local store
|
||||||
session_memory/.store/
|
session_memory/.store/
|
||||||
|
# generated per-flavor distribution proposals (HITL, regenerated each run)
|
||||||
|
session_memory/proposals/
|
||||||
__pycache__/
|
__pycache__/
|
||||||
*.pyc
|
*.pyc
|
||||||
.pytest_cache/
|
.pytest_cache/
|
||||||
|
|||||||
18
.repo-classification.yaml
Normal file
18
.repo-classification.yaml
Normal file
@@ -0,0 +1,18 @@
|
|||||||
|
repo_classification:
|
||||||
|
standard: Repo Classification Standard
|
||||||
|
version: '1.0'
|
||||||
|
classified_at: '2026-06-22'
|
||||||
|
classified_by: agent
|
||||||
|
category: project
|
||||||
|
domain: infotech
|
||||||
|
secondary_domains: []
|
||||||
|
capability_tags:
|
||||||
|
- automation
|
||||||
|
- orchestration
|
||||||
|
business_stake:
|
||||||
|
- technology
|
||||||
|
- product
|
||||||
|
- operations
|
||||||
|
business_mechanics:
|
||||||
|
- coordination
|
||||||
|
- operation
|
||||||
61
AGENTS.md
61
AGENTS.md
@@ -4,7 +4,7 @@
|
|||||||
|
|
||||||
**Purpose:** Iterating towards optimal agentic performance.
|
**Purpose:** Iterating towards optimal agentic performance.
|
||||||
|
|
||||||
**Domain:** helix_forge
|
**Domain:** infotech
|
||||||
**Repo slug:** agentic-resources
|
**Repo slug:** agentic-resources
|
||||||
**Topic ID:** `f39fa2a3-c491-414c-a91b-b4c5fcc6139c`
|
**Topic ID:** `f39fa2a3-c491-414c-a91b-b4c5fcc6139c`
|
||||||
**Workplan prefix:** `AGENTIC-WP-`
|
**Workplan prefix:** `AGENTIC-WP-`
|
||||||
@@ -101,6 +101,63 @@ curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## Credential and access routing
|
||||||
|
|
||||||
|
**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
|
||||||
|
for inference. Run this check **before** requesting secrets, API keys, SSH access,
|
||||||
|
login tokens, or database passwords — in any repo, not only `ops-warden`.
|
||||||
|
|
||||||
|
ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
|
||||||
|
other credential need belongs to another subsystem. **Do not** message
|
||||||
|
`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
|
||||||
|
|
||||||
|
### Lookup (do this first)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
warden route find "<describe your need>" --json
|
||||||
|
warden route show <catalog-id> --json
|
||||||
|
```
|
||||||
|
|
||||||
|
Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
|
||||||
|
|
||||||
|
| Agent runtime | How to orient |
|
||||||
|
| --- | --- |
|
||||||
|
| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=agentic-resources` is for coordination, not secret vending |
|
||||||
|
| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
|
||||||
|
| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
|
||||||
|
|
||||||
|
### Quick routing table
|
||||||
|
|
||||||
|
| I need… | Owner | ops-warden executes? |
|
||||||
|
| --- | --- | --- |
|
||||||
|
| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes** — `warden sign` |
|
||||||
|
| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
|
||||||
|
| Login / OIDC / MFA | key-cape / Keycloak | No — route only |
|
||||||
|
| Authorization decision | flex-auth | No — route only |
|
||||||
|
| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
|
||||||
|
| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
|
||||||
|
|
||||||
|
### Anti-patterns (do not do these)
|
||||||
|
|
||||||
|
- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
|
||||||
|
- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
|
||||||
|
- Pasting secrets into Git, State Hub, workplans, logs, or chat
|
||||||
|
|
||||||
|
### Other capabilities (reuse-surface)
|
||||||
|
|
||||||
|
Non-credential capabilities are usually discovered through **reuse-surface** federation
|
||||||
|
(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
|
||||||
|
every repo's agent instructions because it is high-frequency, high-risk, and easy to
|
||||||
|
get wrong.
|
||||||
|
|
||||||
|
**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`
|
||||||
|
|
||||||
|
<!-- REPO-AGENTS-EXTENSIONS -->
|
||||||
|
<!-- Append repo-specific agent instructions below this marker.
|
||||||
|
The state-hub template sync preserves content after this line. -->
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## Workplan Convention (ADR-001)
|
## Workplan Convention (ADR-001)
|
||||||
|
|
||||||
Work items originate as files in this repo — not in the hub. The hub is a
|
Work items originate as files in this repo — not in the hub. The hub is a
|
||||||
@@ -124,7 +181,7 @@ anything needing analysis, design, approval, dependencies, or multiple phases.
|
|||||||
id: AGENTIC-WP-NNNN
|
id: AGENTIC-WP-NNNN
|
||||||
type: workplan
|
type: workplan
|
||||||
title: "..."
|
title: "..."
|
||||||
domain: helix_forge
|
domain: infotech
|
||||||
repo: agentic-resources
|
repo: agentic-resources
|
||||||
status: proposed | ready | active | blocked | backlog | finished | archived
|
status: proposed | ready | active | blocked | backlog | finished | archived
|
||||||
owner: codex
|
owner: codex
|
||||||
|
|||||||
12
CLAUDE.md
Normal file
12
CLAUDE.md
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
# agentic-resources — Claude Code Instructions
|
||||||
|
|
||||||
|
@SCOPE.md
|
||||||
|
@.claude/rules/repo-identity.md
|
||||||
|
@.claude/rules/session-protocol.md
|
||||||
|
@.claude/rules/first-session.md
|
||||||
|
@.claude/rules/workplan-convention.md
|
||||||
|
@.claude/rules/stack-and-commands.md
|
||||||
|
@.claude/rules/architecture.md
|
||||||
|
@.claude/rules/repo-boundary.md
|
||||||
|
@.claude/rules/credential-routing.md
|
||||||
|
@.claude/rules/agents.md
|
||||||
144
docs/ASSESSMENT-infra-friction.md
Normal file
144
docs/ASSESSMENT-infra-friction.md
Normal file
@@ -0,0 +1,144 @@
|
|||||||
|
# Infrastructure Friction Assessment
|
||||||
|
|
||||||
|
*Generated 2026-06-07 from captured coding-session data (Helix Forge session
|
||||||
|
memory), after the Detect-hardening pass ([AGENTIC-WP-0005]). First data-driven
|
||||||
|
assessment of where our agentic coding sessions spend effort on plumbing rather
|
||||||
|
than work.*
|
||||||
|
|
||||||
|
## Method & data quality
|
||||||
|
|
||||||
|
- **Corpus:** 72 sessions captured across Claude + Grok. A session-quality filter
|
||||||
|
([detect/quality.py]) drops health-checks, smoke-tests, and interrupted runs
|
||||||
|
(mostly `llm-connect` *"Say hello in one word"*). **27 are real coding sessions.**
|
||||||
|
- **Caveat:** the 41 % that were filtered out had been mislabeled `abandoned` by
|
||||||
|
the outcome heuristic and produced a *false-positive* "cross-flavor abandoned"
|
||||||
|
pattern in the first catalog — now purged. Treat any pre-hardening finding with
|
||||||
|
suspicion.
|
||||||
|
- **Key framing:** all 27 real sessions ended in `success`. So the friction here
|
||||||
|
is **cost/efficiency, not failure** — sessions get there, but pay an avoidable
|
||||||
|
tax to do it.
|
||||||
|
|
||||||
|
## The headline number
|
||||||
|
|
||||||
|
Across the 27 real sessions, tool-call activity breaks down as:
|
||||||
|
|
||||||
|
| Bucket | Share |
|
||||||
|
|--------|------:|
|
||||||
|
| shell (Bash / run_terminal) | 38.2 % |
|
||||||
|
| edit | 30.2 % |
|
||||||
|
| read | 12.9 % |
|
||||||
|
| **State Hub MCP** | **10.3 %** |
|
||||||
|
| **task-management plumbing** | **5.8 %** |
|
||||||
|
| **schema-loading (`ToolSearch`)** | **1.5 %** |
|
||||||
|
| other | 1.1 % |
|
||||||
|
|
||||||
|
**~17.6 % of all tool calls in real coding sessions are coordination plumbing
|
||||||
|
(hub + task + schema-loading), not touching the repo.** Per-session infra-overhead
|
||||||
|
share: median **11.7 %**, p90 **26.1 %**, max **43.3 %** — it concentrates badly.
|
||||||
|
|
||||||
|
## Ranked friction
|
||||||
|
|
||||||
|
### 1. State Hub call volume — *highest cost, addressable*
|
||||||
|
State Hub MCP is 10.3 % of all tool calls and dominates the worst sessions:
|
||||||
|
|
||||||
|
| Repo (one session) | total calls | State Hub calls | overhead share |
|
||||||
|
|--------------------|------:|------:|------:|
|
||||||
|
| vergabe-teilnahme | 570 | **231** | 43 % |
|
||||||
|
| activity-core | 488 | 98 | 23 % |
|
||||||
|
| flex-auth | 236 | 35 (+27 task) | 29 % |
|
||||||
|
| net-kingdom | 129 | 25 | 22 % |
|
||||||
|
|
||||||
|
Root cause: many **fine-grained** calls — per-task status updates, per-event
|
||||||
|
progress writes, repeated `get_domain_summary`. 231 hub calls in a single session
|
||||||
|
is coordination overhead, not work.
|
||||||
|
|
||||||
|
### 2. Schema-loading thrash (`ToolSearch`) — *low cost, near-zero-effort fix*
|
||||||
|
**106 `ToolSearch` calls across 22 of 27 sessions (81 %).** The State Hub MCP
|
||||||
|
tools are *deferred*, so nearly every session re-discovers and re-loads the same
|
||||||
|
tool schemas before it can call them. This is pure overhead with no work value —
|
||||||
|
and it is **exactly the CLI/MCP-interface friction hypothesized.**
|
||||||
|
|
||||||
|
### 3. Task-management plumbing — 5.8 %
|
||||||
|
`TaskUpdate` / `TaskCreate` / `todo_write` / `update_task_status`. Overlaps with
|
||||||
|
(1); much of it is redundant status churn within a session.
|
||||||
|
|
||||||
|
### 4. Tool thrash — *session-shape, watch only*
|
||||||
|
11 sessions hammer a single tool 80–230× (usually Bash or Edit). Less an infra
|
||||||
|
problem than a sign of missing higher-level tooling; low priority.
|
||||||
|
|
||||||
|
### 5. Budget overrun — 3 sessions
|
||||||
|
Token cost well above peers. Secondary; revisit once (1)–(2) are addressed.
|
||||||
|
|
||||||
|
## Recommendations
|
||||||
|
|
||||||
|
**The CLI/MCP-interface hypothesis is validated as a top-2 friction, not a minor
|
||||||
|
issue.** Two high-ROI moves:
|
||||||
|
|
||||||
|
- **A. A State Hub skill (highest ROI).** A skill (or a pre-loaded tool manifest)
|
||||||
|
that (i) **front-loads the common hub tool schemas** so agents stop
|
||||||
|
`ToolSearch`-ing for them — eliminates finding #2 almost entirely (81 % of
|
||||||
|
sessions) — and (ii) **teaches batched writes** (sync N task statuses in one
|
||||||
|
call, fewer progress events) to attack finding #1. Low effort, broad reach.
|
||||||
|
- **B. Coarser hub operations.** Add bulk endpoints / a single "sync workplan
|
||||||
|
statuses" op so a session doesn't make 200+ individual hub calls. This is the
|
||||||
|
structural fix behind the skill's guidance.
|
||||||
|
- **C. Measure the effect (Phase 4).** After A/B land, compare infra-overhead
|
||||||
|
share on subsequent sessions against this baseline (median 11.7 %, p90 26.1 %).
|
||||||
|
This is precisely what the Measure phase is for — the loop closes here.
|
||||||
|
|
||||||
|
## Content-level root causes (error-body mining)
|
||||||
|
|
||||||
|
*Added 2026-06-07 from [AGENTIC-WP-0006] — `build_digest` now mines normalized
|
||||||
|
error fingerprints into the durable digest, and `sig_recurring_error` clusters
|
||||||
|
them. This is the "why" the tool-mix view above could not see.*
|
||||||
|
|
||||||
|
**26 of 27 real sessions hit at least one error.** Top recurring error
|
||||||
|
fingerprints across the corpus (by # sessions affected):
|
||||||
|
|
||||||
|
| # sessions | occ | flavors | top sample |
|
||||||
|
|-----------:|----:|---------|------------|
|
||||||
|
| **12** | 32 | claude | `<tool_use_error>File has not been read yet. Read it first before writing to it.` |
|
||||||
|
| **6** | 13 | claude | `<tool_use_error>File has been modified since read …` |
|
||||||
|
| **4** | 9 | **claude + grok** | `make: *** [Makefile:227: fix-consistency] Error 1` |
|
||||||
|
| 3 | 21 | claude | `MCP error -32602: Invalid request parameters` |
|
||||||
|
| 3 | 6 | claude | `Error calling tool 'update_task_status': 'title'` |
|
||||||
|
| 2 | 6 | claude | `make: *** [Makefile:21: test] Error 1` |
|
||||||
|
|
||||||
|
Reading:
|
||||||
|
|
||||||
|
- **#1 — Edit/Write-before-Read (12/27 sessions, 8 repos).** The single most
|
||||||
|
common error is agents trying to edit a file they haven't read into context.
|
||||||
|
This is a *workflow* friction, highly addressable: a Read-before-Edit reflex in
|
||||||
|
the agent instructions / a skill, or a harness affordance. (Observed live: the
|
||||||
|
author hit this exact error twice while writing this workplan.)
|
||||||
|
- **#2 — stale-read conflicts (6 sessions):** "File has been modified since read"
|
||||||
|
— same family, a re-read-before-edit discipline fixes both.
|
||||||
|
- **#3 — cross-flavor `make fix-consistency` failures (claude + grok, 3 repos):**
|
||||||
|
the consistency tooling itself fails across flavors — a shared infra issue worth
|
||||||
|
a look on the state-hub side (cf. [STATE-WP-0058]).
|
||||||
|
- **State Hub MCP instability** (`-32602`, `update_task_status 'title'`) shows up
|
||||||
|
in 3 sessions each — corroborates the plumbing-overhead story and the live MCP
|
||||||
|
flakiness seen during this work (REST fallback used).
|
||||||
|
|
||||||
|
**Fingerprint noise — mostly handled.** `_is_failed` now excludes successful hub
|
||||||
|
JSON responses (top-level no-error payloads) and file-read snapshots (numbered
|
||||||
|
`cat -n` source lines), which cut distinct fingerprints **444 → 269 (~40 %)**
|
||||||
|
without touching the top entries. Residual low-value items remain in the long tail
|
||||||
|
(bare structural lines like `{`, linter "N errors" summaries); the *top*
|
||||||
|
fingerprints are real. Note several entries (`MCP error -32602`,
|
||||||
|
`update_task_status 'title'`) reflect the State Hub MCP instability hit live during
|
||||||
|
this work — genuine, if self-referential, friction.
|
||||||
|
|
||||||
|
## What this assessment still can't see
|
||||||
|
|
||||||
|
- ~~**Why** a session was expensive at the content level.~~ **Now addressed**
|
||||||
|
(error-body mining, above), modulo the fingerprint-noise caveat.
|
||||||
|
- Repeated *failed approaches* (as opposed to surfaced errors) — e.g. an agent
|
||||||
|
silently retrying a wrong strategy without an error — are still invisible.
|
||||||
|
- Grok/Codex are thin in the corpus (4 Grok, 0 Codex sessions), so cross-flavor
|
||||||
|
friction claims are Claude-weighted for now.
|
||||||
|
|
||||||
|
[AGENTIC-WP-0005]: ../workplans/AGENTIC-WP-0005-detect-hardening.md
|
||||||
|
[AGENTIC-WP-0006]: ../workplans/AGENTIC-WP-0006-error-body-mining.md
|
||||||
|
[STATE-WP-0058]: handed off to the state-hub repo worker
|
||||||
|
[detect/quality.py]: ../session_memory/detect/quality.py
|
||||||
@@ -370,8 +370,89 @@ hub indexes).
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
*Next step: [AGENTIC-WP-0002] implements Phase 0 — the schema, the Claude
|
## 11. Project metrics correlation (kaizen-agentic)
|
||||||
collector, the Tier1/Tier2 store, and the budget-based eviction sweep.*
|
|
||||||
|
Helix Forge owns **fleet-level** session capture and digests (this repo). The
|
||||||
|
**kaizen-agentic** framework owns **project-scoped** agent execution metrics
|
||||||
|
(ADR-004: `.kaizen/metrics/<agent>/executions.jsonl`). The two layers correlate
|
||||||
|
by optional `helix_session_uid` on project records — link-by-reference, no
|
||||||
|
duplicate ingestion in either repo.
|
||||||
|
|
||||||
|
| Layer | Owner | Storage |
|
||||||
|
|-------|-------|---------|
|
||||||
|
| Fleet | agentic-resources (Helix Forge) | digest store (`digests` table) |
|
||||||
|
| Project | kaizen-agentic | `.kaizen/metrics/<agent>/executions.jsonl` |
|
||||||
|
|
||||||
|
**Cross-repo contract:** [Helix Forge Correlation Contract](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/docs/integrations/helix-forge-correlation.md)
|
||||||
|
(kaizen-agentic). Field mapping from `Session.session_uid` → `helix_session_uid`,
|
||||||
|
`digest.cost` → `tokens`, `tool_histogram` MCP share → `infra_overhead_share`.
|
||||||
|
|
||||||
|
**Read path:** `kaizen-agentic metrics correlate <uid>` looks up a digest via
|
||||||
|
`HELIX_STORE_DB` (this repo's session store). No write path from kaizen-agentic
|
||||||
|
into Helix Forge.
|
||||||
|
|
||||||
|
**Related kaizen-agentic docs:** [ADR-004 project metrics convention](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/docs/adr/ADR-004-project-metrics-convention.md),
|
||||||
|
[wiki/EcosystemIntegration.md](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/wiki/EcosystemIntegration.md).
|
||||||
|
|
||||||
|
### 11.1 Session-close env export (dual-layer agents)
|
||||||
|
|
||||||
|
Agents that run **both** Helix Forge capture and kaizen `metrics record` should
|
||||||
|
export the following **after** the ingest sweep has written the session digest
|
||||||
|
(`python -m session_memory.ingest` or an equivalent Stop/SessionEnd hook). Names
|
||||||
|
match kaizen-agentic ADR-004 — do not invent parallel aliases.
|
||||||
|
|
||||||
|
| Variable | Source in Helix Forge | Purpose |
|
||||||
|
|----------|----------------------|---------|
|
||||||
|
| `HELIX_SESSION_UID` | `Session.session_uid` | Primary correlation key → `helix_session_uid` |
|
||||||
|
| `HELIX_REPO` | `digest.repo` | Project/repo scoping |
|
||||||
|
| `HELIX_FLAVOR` | `digest.flavor` | Agent runtime (`claude` / `codex` / `grok`) |
|
||||||
|
| `HELIX_TOKENS` | `digest.cost.input_tokens + digest.cost.output_tokens` | Token rollup → `tokens` |
|
||||||
|
| `HELIX_INFRA_OVERHEAD_SHARE` | infra bucket share over `tool_histogram` (see `measure.metrics.session_metrics`) | MCP/plumbing overhead → `infra_overhead_share` |
|
||||||
|
|
||||||
|
Example (after digest exists):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export HELIX_SESSION_UID="claude:abc-123"
|
||||||
|
export HELIX_REPO="agentic-resources"
|
||||||
|
export HELIX_FLAVOR="claude"
|
||||||
|
export HELIX_TOKENS=125000
|
||||||
|
export HELIX_INFRA_OVERHEAD_SHARE=0.117
|
||||||
|
# optional — lets kaizen correlate without guessing the store location:
|
||||||
|
export HELIX_STORE_DB="$(pwd)/session_memory/.store/mem.db"
|
||||||
|
kaizen-agentic metrics record # merges HELIX_* when present
|
||||||
|
```
|
||||||
|
|
||||||
|
### 11.2 Digest store location and read API
|
||||||
|
|
||||||
|
- **`HELIX_STORE_DB`** — absolute path to the SQLite file holding Tier 2 digests.
|
||||||
|
Defaults to `config.toml` `[store].db_path` (`session_memory/.store/mem.db` relative
|
||||||
|
to the repo root). Export as an absolute path when setting the variable on session
|
||||||
|
close so `metrics correlate` works across hosts and working directories.
|
||||||
|
- **Thin CLI** — `python -m session_memory.digest_lookup <session_uid> [--json]`
|
||||||
|
prints one digest without running ingest. Exit `0` on hit, `1` when missing.
|
||||||
|
- **Programmatic** — `Store.get_digest(session_uid)` returns the JSON blob written
|
||||||
|
by `build_digest` / `analyze`.
|
||||||
|
|
||||||
|
**Stable digest JSON shape** (fields consumers may rely on):
|
||||||
|
|
||||||
|
| Field | Type | Notes |
|
||||||
|
|-------|------|-------|
|
||||||
|
| `session_uid` | string | Normalized uid (`<flavor>:<native-id>`) |
|
||||||
|
| `flavor`, `repo`, `domain` | string | Session attribution |
|
||||||
|
| `model` | string | Model id when known |
|
||||||
|
| `started_at`, `ended_at` | string | ISO timestamps |
|
||||||
|
| `outcome` | string | `success` / `fail` / `abandoned` / `unknown` |
|
||||||
|
| `cost` | object | `input_tokens`, `output_tokens`, `cache_tokens`, `wall_clock_s`, `turns`, `retries` |
|
||||||
|
| `tool_histogram` | object | Tool name → call count |
|
||||||
|
| `event_count`, `kind_counts`, `markers` | object/int | Compact activity summary |
|
||||||
|
| `first_prompt`, `last_assistant` | string | Short text snippets |
|
||||||
|
| `error_snippets` | array | `{fingerprint, sample, count, tool}` entries |
|
||||||
|
| `schema_version` | int | Digest schema version |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Implemented:* Phases 0–4, weekly retro ([AGENTIC-WP-0002]–[AGENTIC-WP-0010]);
|
||||||
|
kaizen correlation follow-up ([AGENTIC-WP-0011]).
|
||||||
|
|
||||||
## Sources
|
## Sources
|
||||||
|
|
||||||
|
|||||||
@@ -5,7 +5,7 @@
|
|||||||
**Status:** Draft v0.1
|
**Status:** Draft v0.1
|
||||||
**Author:** Claude (drafted with Bernd Worsch)
|
**Author:** Claude (drafted with Bernd Worsch)
|
||||||
**Created:** 2026-06-06
|
**Created:** 2026-06-06
|
||||||
**Updated:** 2026-06-06
|
**Updated:** 2026-06-19
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -223,6 +223,32 @@ record:
|
|||||||
- The hub remains a **read model**; Helix Forge writes its durable artifacts as files
|
- The hub remains a **read model**; Helix Forge writes its durable artifacts as files
|
||||||
and lets the hub index them.
|
and lets the hub index them.
|
||||||
|
|
||||||
|
### 9.1 Downstream: kaizen-agentic project metrics correlation
|
||||||
|
|
||||||
|
Helix Forge is a **fleet-level** producer of normalized session digests. The
|
||||||
|
**kaizen-agentic** framework is a **project-scoped** consumer of optional
|
||||||
|
correlation fields on its execution metrics (ADR-004). The two layers link
|
||||||
|
**by reference** — kaizen-agentic does not re-implement JSONL ingestion or write
|
||||||
|
into the Helix Forge store.
|
||||||
|
|
||||||
|
| Layer | Owner | What it stores |
|
||||||
|
|-------|-------|----------------|
|
||||||
|
| Fleet | agentic-resources (`session_memory`) | Per-session digests in the local SQLite store |
|
||||||
|
| Project | kaizen-agentic | `.kaizen/metrics/<agent>/executions.jsonl` |
|
||||||
|
|
||||||
|
**Canonical spec in this repo:** [DESIGN-session-memory.md §11](DESIGN-session-memory.md#11-project-metrics-correlation-kaizen-agentic)
|
||||||
|
(session-close env export, digest read path, stable JSON shape).
|
||||||
|
|
||||||
|
**Authoritative cross-repo contract (kaizen-agentic):**
|
||||||
|
[Helix Forge Correlation Contract](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/docs/integrations/helix-forge-correlation.md).
|
||||||
|
Field mapping: `Session.session_uid` → `helix_session_uid`; digest token totals →
|
||||||
|
`tokens`; MCP/tool overhead share → `infra_overhead_share`.
|
||||||
|
|
||||||
|
**Read path for consumers:** `HELIX_STORE_DB` points at the digest SQLite file
|
||||||
|
(default `session_memory/.store/mem.db`); `python -m session_memory.digest_lookup
|
||||||
|
<uid> --json` or `kaizen-agentic metrics correlate <uid>` performs a read-only
|
||||||
|
lookup. No ingestion code belongs in kaizen-agentic.
|
||||||
|
|
||||||
## 10. Success Metrics
|
## 10. Success Metrics
|
||||||
|
|
||||||
| Metric | Meaning | Target (directional, v1) |
|
| Metric | Meaning | Target (directional, v1) |
|
||||||
@@ -255,12 +281,26 @@ record:
|
|||||||
three flavors?
|
three flavors?
|
||||||
- **OQ3** Where does detection logic run — local batch jobs, hub-side, or a dedicated
|
- **OQ3** Where does detection logic run — local batch jobs, hub-side, or a dedicated
|
||||||
service? What volume do we actually expect?
|
service? What volume do we actually expect?
|
||||||
- **OQ4** Pattern format: how do we keep one agnostic representation while giving each
|
- ~~**OQ4** Pattern format: how do we keep one agnostic representation while giving each
|
||||||
distributor enough to render high-quality native artifacts?
|
distributor enough to render high-quality native artifacts?~~ **Resolved (Phase 2,
|
||||||
- **OQ5** What's the minimum trustworthy evidence bar before a pattern is allowed to be
|
AGENTIC-WP-0004):** the `SolutionPattern` core is flavor-agnostic (problem,
|
||||||
distributed to live agent environments?
|
resolutions, scope, provenance) and carries per-flavor knowledge only in a separate
|
||||||
- **OQ6** How do we prevent pattern bloat — too many low-value instructions degrading
|
`rendering_hints` sub-structure keyed by flavor — distributors read the hints, the
|
||||||
agent context budgets (cf. the token-budget policy in global instructions)?
|
core stays neutral. Catalogued as versioned files-first artifacts (FR-U3).
|
||||||
|
- ~~**OQ5** What's the minimum trustworthy evidence bar before a pattern is allowed to be
|
||||||
|
distributed to live agent environments?~~ **Resolved (Phase 2):** a two-tier
|
||||||
|
evidence bar (`[curate.gate]`). A *promote* floor (frequency / distinct sessions /
|
||||||
|
cost-impact) admits a candidate as `provisional`; a stricter *distribution* floor
|
||||||
|
(higher frequency, optional cross-flavor requirement, cost-impact) is required to
|
||||||
|
mark a pattern `approved` + `distribution_ready`. Defaults are conservative and
|
||||||
|
config-tunable.
|
||||||
|
- ~~**OQ6** How do we prevent pattern bloat — too many low-value instructions degrading
|
||||||
|
agent context budgets (cf. the token-budget policy in global instructions)?~~
|
||||||
|
**Resolved (Phase 2):** a bloat guard flags duplicate (same id) and near-duplicate
|
||||||
|
(same signal-type+locus) candidates at review time, and the catalog dedups
|
||||||
|
structurally on the source-candidate key so re-promotion never multiplies entries.
|
||||||
|
Thin candidates stay `provisional` (not distributed) rather than padding live
|
||||||
|
context.
|
||||||
|
|
||||||
## 13. Risks
|
## 13. Risks
|
||||||
|
|
||||||
|
|||||||
12
registry/README.md
Normal file
12
registry/README.md
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
# Capability Registry
|
||||||
|
|
||||||
|
Markdown-first capability index for federation and reuse planning.
|
||||||
|
|
||||||
|
## Authoring
|
||||||
|
|
||||||
|
1. Copy a capability entry template (see reuse-surface `templates/capability-entry.template.md`).
|
||||||
|
2. Add the row to `indexes/capabilities.yaml`.
|
||||||
|
3. Run `reuse-surface validate` from a checkout with the CLI installed.
|
||||||
|
4. Merge to `main` and verify publish with `reuse-surface establish --publish-check`.
|
||||||
|
|
||||||
|
Federation contract: reuse-surface `docs/RegistryFederation.md`.
|
||||||
0
registry/capabilities/.gitkeep
Normal file
0
registry/capabilities/.gitkeep
Normal file
4
registry/indexes/capabilities.yaml
Normal file
4
registry/indexes/capabilities.yaml
Normal file
@@ -0,0 +1,4 @@
|
|||||||
|
version: 1
|
||||||
|
updated: '2026-06-16'
|
||||||
|
domain: helix_forge
|
||||||
|
capabilities: []
|
||||||
@@ -13,14 +13,40 @@ time window.
|
|||||||
|
|
||||||
```
|
```
|
||||||
session_memory/
|
session_memory/
|
||||||
adapters/claude.py # Tier0 -> Tier1 normalizer (Codex/Grok land in Phase 1)
|
adapters/common.py # shared Normalized bundle + helpers
|
||||||
|
adapters/claude.py # Tier0 -> Tier1 normalizers, one per flavor
|
||||||
|
adapters/codex.py # (rollout {timestamp,type,payload}, flat call_id join)
|
||||||
|
adapters/grok.py # (per-session dir: chat_history + events + updates)
|
||||||
core/schema.py # Session / SessionEvent / Cost
|
core/schema.py # Session / SessionEvent / Cost
|
||||||
core/store.py # SQLite rows + blob-dir bodies (Tier1) + digests (Tier2)
|
core/store.py # SQLite rows + blob-dir bodies (Tier1) + digests/patterns (Tier2)
|
||||||
core/cursor.py # incremental ingest cursors
|
core/cursor.py # incremental ingest cursors
|
||||||
core/digest.py # Tier1 -> Tier2 promotion + outcome heuristic
|
core/digest.py # Tier1 -> Tier2 promotion + outcome heuristic
|
||||||
core/retention.py # budget-based eviction sweep
|
core/retention.py # budget-based eviction sweep
|
||||||
ingest.py # one sweep: discover -> normalize -> store -> digest -> evict
|
ingest.py # one sweep: discover -> normalize -> store -> digest -> evict
|
||||||
config.toml # store paths, retention caps, sources, repo->domain map
|
detect/signals.py # signal extractors over digests
|
||||||
|
detect/cluster.py # cluster signals -> candidate patterns + cross-flavor flag
|
||||||
|
detect/__main__.py # python -m session_memory.detect (ranked report)
|
||||||
|
curate/schema.py # SolutionPattern artifact + per-flavor rendering hints
|
||||||
|
curate/catalog.py # versioned, files-first Pattern Catalog (dedup on id)
|
||||||
|
curate/gating.py # promotion evidence bar + bloat guard
|
||||||
|
curate/review.py # discuss/approve/reject -> promote workflow
|
||||||
|
curate/decisions.py # hub decision audit trail (graceful local-queue fallback)
|
||||||
|
curate/__main__.py # python -m session_memory.curate (interactive / --auto-approve)
|
||||||
|
catalog/ # the committed Pattern Catalog (source of truth)
|
||||||
|
distribute/base.py # Artifact + Distributor protocol + idempotent snippet markers
|
||||||
|
distribute/claude.py # CLAUDE.md (or skill) renderer } per-flavor edges
|
||||||
|
distribute/codex.py # AGENTS.md renderer } (agnostic body,
|
||||||
|
distribute/grok.py # native instruction renderer } different targets)
|
||||||
|
distribute/proposals.py # scoping + proposed-not-applied output + active registry
|
||||||
|
distribute/__main__.py # python -m session_memory.distribute
|
||||||
|
measure/metrics.py # fleet metrics + persisted baseline snapshots
|
||||||
|
measure/effect.py # before/after per-pattern effectiveness
|
||||||
|
measure/__main__.py # python -m session_memory.measure
|
||||||
|
retro/build.py # windowed top-3-per-repo suggestions
|
||||||
|
retro/publish.py # hub coding_retro read model + local report
|
||||||
|
retro/__main__.py # python -m session_memory.retro
|
||||||
|
digest_lookup.py # python -m session_memory.digest_lookup (read one digest, no ingest)
|
||||||
|
config.toml # store paths, retention caps, sources, repo->domain map, curate gate
|
||||||
```
|
```
|
||||||
|
|
||||||
The local store lives under `session_memory/.store/` (gitignored).
|
The local store lives under `session_memory/.store/` (gitignored).
|
||||||
@@ -51,6 +77,147 @@ the sweep *runs*. Trigger it with the repo scheduler, e.g. daily:
|
|||||||
or a cron entry / `/loop` on a timer. Push-capture (agent Stop/SessionEnd hooks)
|
or a cron entry / `/loop` on a timer. Push-capture (agent Stop/SessionEnd hooks)
|
||||||
can also enqueue a sweep; see design §7.
|
can also enqueue a sweep; see design §7.
|
||||||
|
|
||||||
|
## Detect candidate patterns
|
||||||
|
|
||||||
|
After ingesting, mine the digests for recurring problem/success patterns:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m session_memory.detect # ranked report, cross-flavor first
|
||||||
|
python -m session_memory.detect --json # machine-readable candidates
|
||||||
|
python -m session_memory.detect --min-frequency 3
|
||||||
|
```
|
||||||
|
|
||||||
|
Candidates are persisted to a Tier 2 `patterns` table and are the input to the
|
||||||
|
Curate phase (Phase 2). Patterns whose evidence spans more than one agent flavor
|
||||||
|
are flagged `[CROSS-FLAVOR]` — the highest-value reuse targets.
|
||||||
|
|
||||||
|
## Curate candidates into the Pattern Catalog
|
||||||
|
|
||||||
|
Review detect candidates into versioned **Solution Patterns** held in the
|
||||||
|
files-first catalog (`session_memory/catalog/`). The flow is **detect → curate →
|
||||||
|
(Phase 3) distribute**; `curate` refreshes candidates by running detect first.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m session_memory.curate # interactive review (a/r/d per candidate)
|
||||||
|
python -m session_memory.curate --auto-approve # batch: promote all that clear the evidence bar
|
||||||
|
python -m session_memory.curate --json # machine-readable result
|
||||||
|
```
|
||||||
|
|
||||||
|
- **Promotion** writes a `SolutionPattern` file (id = source candidate key, so
|
||||||
|
re-promoting the same candidate dedups; content changes bump the semver and
|
||||||
|
archive the prior version to `<id>.history.jsonl`).
|
||||||
|
- The **evidence bar** (`[curate.gate]`) sets two floors: a promote floor and a
|
||||||
|
stricter *distribution* floor. A thin-but-real candidate lands `provisional`;
|
||||||
|
one clearing the distribution floor lands `approved` + `distribution_ready`.
|
||||||
|
- A **bloat guard** flags duplicate / near-duplicate candidates so the catalog
|
||||||
|
stays lean.
|
||||||
|
- Re-review is **idempotent** — a remembered decision is skipped unless the
|
||||||
|
candidate's evidence changed; a prior reject is not re-surfaced.
|
||||||
|
- Each final promote/reject is recorded as a **hub decision**; if the hub is
|
||||||
|
offline the decision is queued to `[curate].decision_queue` for later sync
|
||||||
|
(the same after-the-fact pattern used in Phase 1).
|
||||||
|
|
||||||
|
### Curate knobs (`[curate]` / `[curate.gate]` in config.toml)
|
||||||
|
|
||||||
|
| Key | Meaning |
|
||||||
|
|-----|---------|
|
||||||
|
| `catalog_dir` | committed Pattern Catalog dir (source of truth) |
|
||||||
|
| `review_log` / `decision_queue` | remembered decisions + pending hub decisions (gitignored) |
|
||||||
|
| `min_frequency` / `min_sessions` / `min_cost_impact` | floor to promote at all |
|
||||||
|
| `dist_require_cross_flavor` | require cross-flavor evidence to be distribution-eligible |
|
||||||
|
| `dist_min_frequency` / `dist_min_cost_impact` | stricter floor for `distribution_ready` |
|
||||||
|
|
||||||
|
## Distribute patterns as per-flavor proposals
|
||||||
|
|
||||||
|
Render approved catalog patterns into per-flavor artifacts — **proposed, never
|
||||||
|
auto-applied** (HITL). Completes the loop: **detect → curate → distribute**.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m session_memory.distribute # proposals for all repos/flavors
|
||||||
|
python -m session_memory.distribute --repo state-hub --flavor claude
|
||||||
|
python -m session_memory.distribute --json
|
||||||
|
```
|
||||||
|
|
||||||
|
- Only `approved` + `distribution_ready` patterns are rendered; each pattern's
|
||||||
|
`Scope` (repos/domains/flavors) decides where it lands (FR-X2).
|
||||||
|
- Each flavor renders the **same agnostic body** to its own target (Claude →
|
||||||
|
`CLAUDE.md`/skill, Codex → `AGENTS.md`, Grok → native) via `rendering_hints`
|
||||||
|
(FR-A3); blocks carry stable `BEGIN/END` markers so re-running updates in place.
|
||||||
|
- Output goes to `session_memory/proposals/<repo>/<target>` (gitignored,
|
||||||
|
regenerated) — a reviewable diff a human applies (FR-X3). The committed
|
||||||
|
`distribute/active_patterns.json` records which pattern+version is proposed in
|
||||||
|
which `(repo, flavor)` (FR-X4).
|
||||||
|
|
||||||
|
## Measure effectiveness (closing the loop)
|
||||||
|
|
||||||
|
Track whether the fleet is getting cheaper / more reliable, and whether a
|
||||||
|
distributed pattern actually helped.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m session_memory.measure --label "baseline" # snapshot + trend
|
||||||
|
python -m session_memory.measure --since 2026-06-07 # before/after a change
|
||||||
|
python -m session_memory.measure --no-save --json
|
||||||
|
```
|
||||||
|
|
||||||
|
- A **snapshot** (infra-overhead share, error rate, schema-thrash, token
|
||||||
|
percentiles, success rate) is appended to `measure/baselines.jsonl` to build a
|
||||||
|
trend (FR-M3).
|
||||||
|
- `--since DATE` splits sessions before/after a change and diffs the metrics, with
|
||||||
|
an `improved` verdict per metric (FR-M1/FR-M2) — so ineffective patterns can be
|
||||||
|
retired. Recorded pre-fix baseline (2026-06-07): 27 sessions, infra-overhead
|
||||||
|
median 11.7 %, error rate 0.96, schema-thrash 8 sessions.
|
||||||
|
|
||||||
|
## Weekly retro (the input to the scheduled retrospection)
|
||||||
|
|
||||||
|
A windowed roll-up: detect + measure over the last N days → the **top-3
|
||||||
|
improvement suggestions per repo** (cross-flavor first; recommendations pulled
|
||||||
|
from the Pattern Catalog) → published to the hub as the `coding_retro` read model.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m session_memory.retro # last 7 days, local report
|
||||||
|
python -m session_memory.retro --window-days 30 --json
|
||||||
|
python -m session_memory.retro --publish # also post coding_retro to the hub
|
||||||
|
```
|
||||||
|
|
||||||
|
Writes `retro/last_retro.{json,md}` and (with `--publish`) posts an
|
||||||
|
`event_type=coding_retro` progress event. This is consumed by activity-core's
|
||||||
|
**Weekly Coding Retrospection** schedule (ACTIVITY-WP-0008, Saturday 19:00 Berlin),
|
||||||
|
which emits one improvement task per relevant repo. Hub publish degrades
|
||||||
|
gracefully when the hub is unreachable.
|
||||||
|
|
||||||
|
## Correlation with kaizen-agentic
|
||||||
|
|
||||||
|
Helix Forge owns **fleet-level** session digests; **kaizen-agentic** owns
|
||||||
|
**project-scoped** execution metrics (ADR-004). The two layers correlate by
|
||||||
|
optional `helix_session_uid` on project records — **link-by-reference only**;
|
||||||
|
kaizen-agentic does not ingest JSONL into this store.
|
||||||
|
|
||||||
|
| Layer | Storage |
|
||||||
|
|-------|---------|
|
||||||
|
| Fleet (here) | `session_memory/.store/mem.db` → `digests` table |
|
||||||
|
| Project (kaizen) | `.kaizen/metrics/<agent>/executions.jsonl` |
|
||||||
|
|
||||||
|
- **Spec:** [DESIGN-session-memory.md §11](../docs/DESIGN-session-memory.md#11-project-metrics-correlation-kaizen-agentic)
|
||||||
|
- **Contract (kaizen-agentic):** [Helix Forge Correlation Contract](https://gitea.coulomb.social/coulomb/kaizen-agentic/src/branch/main/docs/integrations/helix-forge-correlation.md)
|
||||||
|
|
||||||
|
### Session-close env export
|
||||||
|
|
||||||
|
After ingest has written the digest, agents using both layers export `HELIX_*`
|
||||||
|
vars for `kaizen-agentic metrics record` to merge (names match ADR-004):
|
||||||
|
|
||||||
|
`HELIX_SESSION_UID`, `HELIX_REPO`, `HELIX_FLAVOR`, `HELIX_TOKENS`,
|
||||||
|
`HELIX_INFRA_OVERHEAD_SHARE`, and optionally `HELIX_STORE_DB` (absolute path to
|
||||||
|
`mem.db`). See DESIGN §11.1 for field sources.
|
||||||
|
|
||||||
|
### Read one digest (for `metrics correlate`)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m session_memory.digest_lookup claude:abc-123 --json
|
||||||
|
HELIX_STORE_DB=/abs/path/to/mem.db python -m session_memory.digest_lookup <uid>
|
||||||
|
```
|
||||||
|
|
||||||
|
Defaults to `[store].db_path` in `config.toml`. Read-only — does not run ingest.
|
||||||
|
|
||||||
## Retention knobs (`[retention]` in config.toml)
|
## Retention knobs (`[retention]` in config.toml)
|
||||||
|
|
||||||
| Key | Meaning |
|
| Key | Meaning |
|
||||||
@@ -66,10 +233,28 @@ exists, except the explicitly-reported hard-cap overflow path.
|
|||||||
## Tests
|
## Tests
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python -m pytest # 26 tests: schema, adapter, store, digest, retention, ingest
|
python -m pytest # schema, adapters, store, digest, retention, ingest, detect, curate
|
||||||
```
|
```
|
||||||
|
|
||||||
## Status
|
## Status
|
||||||
|
|
||||||
Phase 0 (AGENTIC-WP-0002): Claude adapter only, end to end. Codex and Grok
|
- **Phase 0** (AGENTIC-WP-0002): schema, store, digest, budget retention, Claude
|
||||||
adapters are designed (schemas confirmed in the design doc) and land in Phase 1.
|
adapter, ingest sweep.
|
||||||
|
- **Phase 1** (AGENTIC-WP-0003): Codex + Grok adapters, multi-file session merge,
|
||||||
|
and the Detect pipeline (signals → clustering → cross-flavor candidate patterns).
|
||||||
|
- **Phase 2** (AGENTIC-WP-0004): Curate — Solution Pattern schema, versioned
|
||||||
|
files-first Pattern Catalog, discuss/approve/reject review with an evidence bar +
|
||||||
|
bloat guard, and hub-decision audit trail.
|
||||||
|
- **Detect hardening** (AGENTIC-WP-0005): session-quality filter + tool-mix /
|
||||||
|
infra-overhead signals. **Error mining** (AGENTIC-WP-0006): recurring error
|
||||||
|
fingerprints → root-cause patterns.
|
||||||
|
- **Phase 3** (AGENTIC-WP-0007): Distribute — per-flavor distributor adapters
|
||||||
|
render approved patterns into proposed (HITL) artifacts, scoped by repo/domain,
|
||||||
|
with an active-pattern registry.
|
||||||
|
- **Phase 4** (AGENTIC-WP-0009): Measure — fleet baseline/trend + before/after
|
||||||
|
per-pattern effectiveness. The Capture → Detect → Curate → Distribute → Measure
|
||||||
|
loop is closed.
|
||||||
|
- **Weekly retro** (AGENTIC-WP-0010): windowed top-3-per-repo + hub `coding_retro`
|
||||||
|
publish.
|
||||||
|
- **Kaizen correlation** (AGENTIC-WP-0011): bidirectional doc links, session-close
|
||||||
|
`HELIX_*` env convention, `digest_lookup` read path.
|
||||||
|
|||||||
@@ -11,54 +11,23 @@ that the store persists out-of-line so Tier 1 rows stay light.
|
|||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
import json
|
|
||||||
import os
|
import os
|
||||||
from dataclasses import dataclass, field
|
from typing import Any, Optional
|
||||||
from datetime import datetime, timezone
|
|
||||||
from typing import Any, Iterable, Optional
|
|
||||||
|
|
||||||
from ..core.schema import Cost, Session, SessionEvent
|
from ..core.schema import Cost, Session, SessionEvent
|
||||||
|
from .common import ( # noqa: F401 (Normalized re-exported for back-compat)
|
||||||
|
Normalized,
|
||||||
|
classify_tool,
|
||||||
|
first_line as _first_line,
|
||||||
|
iter_jsonl as _iter_records,
|
||||||
|
now_iso as _now,
|
||||||
|
resolve_repo as _resolve_repo,
|
||||||
|
seconds_between as _seconds_between,
|
||||||
|
stringify as _stringify,
|
||||||
|
)
|
||||||
|
|
||||||
FLAVOR = "claude"
|
FLAVOR = "claude"
|
||||||
|
|
||||||
# tool_use names that mutate files -> kind "edit"
|
|
||||||
_EDIT_TOOLS = {"Edit", "Write", "NotebookEdit", "MultiEdit"}
|
|
||||||
# crude test-runner detection inside Bash commands -> kind "test_run"
|
|
||||||
_TEST_HINTS = ("pytest", "unittest", "npm test", "npm run test", "go test", "cargo test", "jest", "vitest")
|
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
|
||||||
class Normalized:
|
|
||||||
session: Session
|
|
||||||
events: list[SessionEvent]
|
|
||||||
blobs: dict[str, str] = field(default_factory=dict)
|
|
||||||
|
|
||||||
|
|
||||||
def _iter_records(path: str) -> Iterable[dict[str, Any]]:
|
|
||||||
with open(path, "r", encoding="utf-8") as f:
|
|
||||||
for line in f:
|
|
||||||
line = line.strip()
|
|
||||||
if not line:
|
|
||||||
continue
|
|
||||||
try:
|
|
||||||
yield json.loads(line)
|
|
||||||
except json.JSONDecodeError:
|
|
||||||
continue # tolerate partial/corrupt trailing lines
|
|
||||||
|
|
||||||
|
|
||||||
def _resolve_repo(cwd: Optional[str], repo_domain_map: dict[str, str]) -> tuple[Optional[str], Optional[str]]:
|
|
||||||
"""cwd -> (repo, domain). repo is the cwd basename; domain via map."""
|
|
||||||
if not cwd:
|
|
||||||
return None, None
|
|
||||||
repo = os.path.basename(cwd.rstrip("/")) or None
|
|
||||||
domain = repo_domain_map.get(repo) if repo else None
|
|
||||||
return repo, domain
|
|
||||||
|
|
||||||
|
|
||||||
def _is_test_command(text: str) -> bool:
|
|
||||||
low = text.lower()
|
|
||||||
return any(h in low for h in _TEST_HINTS)
|
|
||||||
|
|
||||||
|
|
||||||
def _content_blocks(message: dict[str, Any]) -> list[dict[str, Any]]:
|
def _content_blocks(message: dict[str, Any]) -> list[dict[str, Any]]:
|
||||||
content = message.get("content")
|
content = message.get("content")
|
||||||
@@ -159,11 +128,8 @@ def parse_session(path: str, repo_domain_map: Optional[dict[str, str]] = None) -
|
|||||||
name = b.get("name", "")
|
name = b.get("name", "")
|
||||||
inp = b.get("input", {})
|
inp = b.get("input", {})
|
||||||
body = _stringify(inp)
|
body = _stringify(inp)
|
||||||
kind = "tool_call"
|
cmd = inp.get("command", "") if isinstance(inp, dict) else ""
|
||||||
if name in _EDIT_TOOLS:
|
kind = classify_tool(name, _stringify(cmd))
|
||||||
kind = "edit"
|
|
||||||
elif name == "Bash" and _is_test_command(_stringify(inp.get("command", ""))):
|
|
||||||
kind = "test_run"
|
|
||||||
add_event(uuid, parent, ts, kind, role="assistant", tool=name,
|
add_event(uuid, parent, ts, kind, role="assistant", tool=name,
|
||||||
summary=f"{name}", body=body, sidechain=sidechain)
|
summary=f"{name}", body=body, sidechain=sidechain)
|
||||||
|
|
||||||
@@ -194,35 +160,3 @@ def parse_session(path: str, repo_domain_map: Optional[dict[str, str]] = None) -
|
|||||||
discovered_at=_now(),
|
discovered_at=_now(),
|
||||||
)
|
)
|
||||||
return Normalized(session=session, events=events, blobs=blobs)
|
return Normalized(session=session, events=events, blobs=blobs)
|
||||||
|
|
||||||
|
|
||||||
# ---- helpers ---------------------------------------------------------------
|
|
||||||
|
|
||||||
def _stringify(v: Any) -> str:
|
|
||||||
if v is None:
|
|
||||||
return ""
|
|
||||||
if isinstance(v, str):
|
|
||||||
return v
|
|
||||||
try:
|
|
||||||
return json.dumps(v, ensure_ascii=False)[:20000]
|
|
||||||
except (TypeError, ValueError):
|
|
||||||
return str(v)[:20000]
|
|
||||||
|
|
||||||
|
|
||||||
def _first_line(text: str) -> str:
|
|
||||||
return (text or "").strip().splitlines()[0] if (text or "").strip() else ""
|
|
||||||
|
|
||||||
|
|
||||||
def _seconds_between(start: Optional[str], end: Optional[str]) -> float:
|
|
||||||
if not start or not end:
|
|
||||||
return 0.0
|
|
||||||
try:
|
|
||||||
a = datetime.fromisoformat(start.replace("Z", "+00:00"))
|
|
||||||
b = datetime.fromisoformat(end.replace("Z", "+00:00"))
|
|
||||||
return max(0.0, (b - a).total_seconds())
|
|
||||||
except ValueError:
|
|
||||||
return 0.0
|
|
||||||
|
|
||||||
|
|
||||||
def _now() -> str:
|
|
||||||
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
|
|
||||||
|
|||||||
167
session_memory/adapters/codex.py
Normal file
167
session_memory/adapters/codex.py
Normal file
@@ -0,0 +1,167 @@
|
|||||||
|
"""OpenAI Codex CLI collector adapter — Tier 0 -> Tier 1 (design §2.2, §4.3).
|
||||||
|
|
||||||
|
Reads ``$CODEX_HOME/sessions/YYYY/MM/DD/rollout-*.jsonl``. Each line is a
|
||||||
|
``RolloutLine`` wrapper ``{timestamp, type, payload}``; ``type`` discriminates
|
||||||
|
``session_meta`` / ``response_item`` / ``event_msg`` / ``turn_context`` /
|
||||||
|
``compacted``.
|
||||||
|
|
||||||
|
Codex is **flat** — tool calls and outputs are joined only by ``call_id`` with no
|
||||||
|
parent-ref DAG — so ``seq`` is assigned by temporal (line) order and
|
||||||
|
``parent_seq`` is set for ``function_call_output`` back to its ``function_call``.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import os
|
||||||
|
from typing import Any, Optional
|
||||||
|
|
||||||
|
from ..core.schema import Cost, Session, SessionEvent
|
||||||
|
from .common import (
|
||||||
|
Normalized,
|
||||||
|
classify_tool,
|
||||||
|
first_line,
|
||||||
|
iter_jsonl,
|
||||||
|
now_iso,
|
||||||
|
resolve_repo,
|
||||||
|
seconds_between,
|
||||||
|
stringify,
|
||||||
|
)
|
||||||
|
|
||||||
|
FLAVOR = "codex"
|
||||||
|
|
||||||
|
|
||||||
|
def _message_text(payload: dict[str, Any]) -> str:
|
||||||
|
content = payload.get("content")
|
||||||
|
if isinstance(content, str):
|
||||||
|
return content
|
||||||
|
parts = []
|
||||||
|
if isinstance(content, list):
|
||||||
|
for b in content:
|
||||||
|
if isinstance(b, dict):
|
||||||
|
parts.append(b.get("text") or b.get("output_text") or "")
|
||||||
|
elif isinstance(b, str):
|
||||||
|
parts.append(b)
|
||||||
|
return "\n".join(p for p in parts if p)
|
||||||
|
|
||||||
|
|
||||||
|
def _extract_tokens(payload: dict[str, Any]) -> tuple[int, int, int]:
|
||||||
|
"""Best-effort (input, output, cache) from a token_count payload.
|
||||||
|
|
||||||
|
Field shapes vary across Codex versions; probe known locations, else recurse.
|
||||||
|
"""
|
||||||
|
for scope in (payload, payload.get("info") or {}, payload.get("usage") or {},
|
||||||
|
(payload.get("info") or {}).get("total_token_usage") or {}):
|
||||||
|
if isinstance(scope, dict):
|
||||||
|
i = scope.get("input_tokens") or scope.get("prompt_tokens")
|
||||||
|
o = scope.get("output_tokens") or scope.get("completion_tokens")
|
||||||
|
if i is not None or o is not None:
|
||||||
|
cache = scope.get("cached_input_tokens") or scope.get("cache_read_input_tokens") or 0
|
||||||
|
return int(i or 0), int(o or 0), int(cache or 0)
|
||||||
|
return 0, 0, 0
|
||||||
|
|
||||||
|
|
||||||
|
def parse_session(path: str, repo_domain_map: Optional[dict[str, str]] = None) -> Optional[Normalized]:
|
||||||
|
repo_domain_map = repo_domain_map or {}
|
||||||
|
records = list(iter_jsonl(path))
|
||||||
|
if not records:
|
||||||
|
return None
|
||||||
|
|
||||||
|
session_id: Optional[str] = None
|
||||||
|
cwd = model = cli_version = None
|
||||||
|
timestamps: list[str] = []
|
||||||
|
events: list[SessionEvent] = []
|
||||||
|
blobs: dict[str, str] = {}
|
||||||
|
call_seq: dict[str, int] = {} # call_id -> seq of its function_call
|
||||||
|
cost = Cost()
|
||||||
|
seq = 0
|
||||||
|
|
||||||
|
def add_event(ts, kind, *, role=None, tool=None, summary=None, body=None,
|
||||||
|
tokens=0, parent_seq=None) -> int:
|
||||||
|
nonlocal seq
|
||||||
|
s = seq
|
||||||
|
seq += 1
|
||||||
|
payload_ref = None
|
||||||
|
if body:
|
||||||
|
payload_ref = f"blob://{session_id}/{s}"
|
||||||
|
blobs[payload_ref] = body
|
||||||
|
events.append(SessionEvent(
|
||||||
|
session_uid=Session.make_uid(FLAVOR, session_id or "unknown"),
|
||||||
|
seq=s, parent_seq=parent_seq, ts=ts, kind=kind, role=role, tool=tool,
|
||||||
|
summary=(summary or "")[:300] or None, payload_ref=payload_ref, tokens=tokens,
|
||||||
|
))
|
||||||
|
return s
|
||||||
|
|
||||||
|
for rec in records:
|
||||||
|
rtype = rec.get("type")
|
||||||
|
ts = rec.get("timestamp")
|
||||||
|
if ts:
|
||||||
|
timestamps.append(ts)
|
||||||
|
payload = rec.get("payload") or {}
|
||||||
|
|
||||||
|
if rtype == "session_meta":
|
||||||
|
session_id = session_id or payload.get("id")
|
||||||
|
cwd = cwd or payload.get("cwd")
|
||||||
|
model = model or payload.get("model")
|
||||||
|
cli_version = cli_version or payload.get("cli_version")
|
||||||
|
|
||||||
|
elif rtype == "turn_context":
|
||||||
|
model = model or payload.get("model")
|
||||||
|
|
||||||
|
elif rtype == "response_item":
|
||||||
|
ptype = payload.get("type")
|
||||||
|
if ptype == "message":
|
||||||
|
role = payload.get("role", "assistant")
|
||||||
|
text = _message_text(payload)
|
||||||
|
kind = "assistant_msg" if role == "assistant" else "user_msg"
|
||||||
|
add_event(ts, kind, role=role, summary=first_line(text), body=text)
|
||||||
|
elif ptype == "function_call":
|
||||||
|
name = payload.get("name", "")
|
||||||
|
args = stringify(payload.get("arguments"))
|
||||||
|
kind = classify_tool(name, args)
|
||||||
|
s = add_event(ts, kind, role="assistant", tool=name,
|
||||||
|
summary=name, body=args)
|
||||||
|
call_id = payload.get("call_id")
|
||||||
|
if call_id:
|
||||||
|
call_seq[call_id] = s
|
||||||
|
elif ptype == "function_call_output":
|
||||||
|
call_id = payload.get("call_id")
|
||||||
|
parent = call_seq.get(call_id)
|
||||||
|
body = stringify(payload.get("output"))
|
||||||
|
add_event(ts, "tool_result", role="tool", tool=None,
|
||||||
|
summary="tool result", body=body, parent_seq=parent)
|
||||||
|
elif ptype == "reasoning":
|
||||||
|
body = _message_text(payload) or stringify(payload.get("summary"))
|
||||||
|
add_event(ts, "thinking", role="assistant", summary="reasoning", body=body)
|
||||||
|
|
||||||
|
elif rtype == "event_msg":
|
||||||
|
ptype = payload.get("type")
|
||||||
|
if ptype == "task_started":
|
||||||
|
add_event(ts, "lifecycle", summary="task_started")
|
||||||
|
elif ptype == "task_complete":
|
||||||
|
add_event(ts, "completion", summary="task_complete")
|
||||||
|
elif ptype == "token_count":
|
||||||
|
i, o, c = _extract_tokens(payload)
|
||||||
|
cost.input_tokens += i
|
||||||
|
cost.output_tokens += o
|
||||||
|
cost.cache_tokens += c
|
||||||
|
# user_message / agent_message echoes are duplicated by response_item
|
||||||
|
# messages on modern Codex; skipped to avoid double counting.
|
||||||
|
|
||||||
|
if session_id is None:
|
||||||
|
return None
|
||||||
|
|
||||||
|
cost.turns = sum(1 for e in events if e.kind == "user_msg")
|
||||||
|
started = min(timestamps) if timestamps else None
|
||||||
|
ended = max(timestamps) if timestamps else None
|
||||||
|
cost.wall_clock_s = seconds_between(started, ended)
|
||||||
|
|
||||||
|
repo, domain = resolve_repo(cwd, repo_domain_map)
|
||||||
|
session = Session(
|
||||||
|
session_uid=Session.make_uid(FLAVOR, session_id),
|
||||||
|
flavor=FLAVOR, native_session_id=session_id,
|
||||||
|
repo=repo, domain=domain, cwd=cwd, model=model,
|
||||||
|
started_at=started, ended_at=ended, outcome="unknown", cost=cost,
|
||||||
|
source_path=path, source_bytes=os.path.getsize(path) if os.path.exists(path) else 0,
|
||||||
|
discovered_at=now_iso(),
|
||||||
|
)
|
||||||
|
return Normalized(session=session, events=events, blobs=blobs)
|
||||||
100
session_memory/adapters/common.py
Normal file
100
session_memory/adapters/common.py
Normal file
@@ -0,0 +1,100 @@
|
|||||||
|
"""Shared adapter helpers (Tier 0 -> Tier 1).
|
||||||
|
|
||||||
|
The ``Normalized`` bundle contract and small flavor-agnostic helpers used by every
|
||||||
|
collector adapter. Per-flavor parsing lives in the individual adapter modules.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from typing import Any, Optional
|
||||||
|
|
||||||
|
from ..core.schema import Session, SessionEvent
|
||||||
|
|
||||||
|
# tool names that mutate files -> kind "edit" (union across flavors)
|
||||||
|
EDIT_TOOLS = {
|
||||||
|
"Edit", "Write", "NotebookEdit", "MultiEdit", # Claude
|
||||||
|
"apply_patch", "write_file", "edit_file", # Codex / Grok variants
|
||||||
|
}
|
||||||
|
# substrings in a shell/tool command that indicate a test run -> kind "test_run"
|
||||||
|
TEST_HINTS = (
|
||||||
|
"pytest", "unittest", "npm test", "npm run test", "go test",
|
||||||
|
"cargo test", "jest", "vitest", "make test", "tox",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Normalized:
|
||||||
|
session: Session
|
||||||
|
events: list[SessionEvent]
|
||||||
|
blobs: dict[str, str] = field(default_factory=dict)
|
||||||
|
|
||||||
|
|
||||||
|
def resolve_repo(cwd: Optional[str], repo_domain_map: dict[str, str]) -> tuple[Optional[str], Optional[str]]:
|
||||||
|
"""cwd -> (repo, domain). repo is the cwd basename; domain via map."""
|
||||||
|
if not cwd:
|
||||||
|
return None, None
|
||||||
|
repo = os.path.basename(cwd.rstrip("/")) or None
|
||||||
|
domain = repo_domain_map.get(repo) if repo else None
|
||||||
|
return repo, domain
|
||||||
|
|
||||||
|
|
||||||
|
def is_test_command(text: str) -> bool:
|
||||||
|
low = (text or "").lower()
|
||||||
|
return any(h in low for h in TEST_HINTS)
|
||||||
|
|
||||||
|
|
||||||
|
def classify_tool(name: str, command_text: str = "") -> str:
|
||||||
|
"""Map a tool invocation to an event kind: edit | test_run | tool_call."""
|
||||||
|
if name in EDIT_TOOLS:
|
||||||
|
return "edit"
|
||||||
|
if is_test_command(command_text) or is_test_command(name):
|
||||||
|
return "test_run"
|
||||||
|
return "tool_call"
|
||||||
|
|
||||||
|
|
||||||
|
def stringify(v: Any, limit: int = 20000) -> str:
|
||||||
|
if v is None:
|
||||||
|
return ""
|
||||||
|
if isinstance(v, str):
|
||||||
|
return v[:limit]
|
||||||
|
try:
|
||||||
|
return json.dumps(v, ensure_ascii=False)[:limit]
|
||||||
|
except (TypeError, ValueError):
|
||||||
|
return str(v)[:limit]
|
||||||
|
|
||||||
|
|
||||||
|
def first_line(text: str) -> str:
|
||||||
|
t = (text or "").strip()
|
||||||
|
return t.splitlines()[0] if t else ""
|
||||||
|
|
||||||
|
|
||||||
|
def seconds_between(start: Optional[str], end: Optional[str]) -> float:
|
||||||
|
if not start or not end:
|
||||||
|
return 0.0
|
||||||
|
try:
|
||||||
|
a = datetime.fromisoformat(start.replace("Z", "+00:00"))
|
||||||
|
b = datetime.fromisoformat(end.replace("Z", "+00:00"))
|
||||||
|
return max(0.0, (b - a).total_seconds())
|
||||||
|
except ValueError:
|
||||||
|
return 0.0
|
||||||
|
|
||||||
|
|
||||||
|
def iter_jsonl(path: str):
|
||||||
|
"""Yield parsed JSON objects from a JSONL file, tolerating bad lines."""
|
||||||
|
with open(path, "r", encoding="utf-8") as f:
|
||||||
|
for line in f:
|
||||||
|
line = line.strip()
|
||||||
|
if not line:
|
||||||
|
continue
|
||||||
|
try:
|
||||||
|
yield json.loads(line)
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
continue
|
||||||
|
|
||||||
|
|
||||||
|
def now_iso() -> str:
|
||||||
|
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
|
||||||
182
session_memory/adapters/grok.py
Normal file
182
session_memory/adapters/grok.py
Normal file
@@ -0,0 +1,182 @@
|
|||||||
|
"""Grok CLI collector adapter — Tier 0 -> Tier 1 (design §2.3, §4.3).
|
||||||
|
|
||||||
|
A Grok session is a *directory* ``~/.grok/sessions/<enc-cwd>/<uuid>/`` containing
|
||||||
|
``summary.json`` (metadata), ``chat_history.jsonl`` (the canonical transcript),
|
||||||
|
``events.jsonl`` (explicit lifecycle + ``turn_number``), and ``updates.jsonl``
|
||||||
|
(ACP ``session/update`` stream, which carries tool-call names/args).
|
||||||
|
|
||||||
|
The ingest glob matches ``chat_history.jsonl``; this adapter derives its sibling
|
||||||
|
files from the same directory. Conversation order is taken from
|
||||||
|
``chat_history.jsonl``; tool-call names are paired, in order, from
|
||||||
|
``updates.jsonl`` ``tool_call`` entries to classify edits/test runs.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
from typing import Any, Optional
|
||||||
|
|
||||||
|
from ..core.schema import Cost, Session, SessionEvent
|
||||||
|
from .common import (
|
||||||
|
Normalized,
|
||||||
|
classify_tool,
|
||||||
|
first_line,
|
||||||
|
iter_jsonl,
|
||||||
|
now_iso,
|
||||||
|
resolve_repo,
|
||||||
|
seconds_between,
|
||||||
|
stringify,
|
||||||
|
)
|
||||||
|
|
||||||
|
FLAVOR = "grok"
|
||||||
|
|
||||||
|
|
||||||
|
def _text_content(content: Any) -> str:
|
||||||
|
if isinstance(content, str):
|
||||||
|
return content
|
||||||
|
if isinstance(content, list):
|
||||||
|
return "\n".join(
|
||||||
|
(b.get("text") or "") for b in content if isinstance(b, dict)
|
||||||
|
)
|
||||||
|
return ""
|
||||||
|
|
||||||
|
|
||||||
|
def _tool_calls_in_order(session_dir: str) -> list[dict[str, Any]]:
|
||||||
|
"""Ordered list of {title, rawInput} from updates.jsonl tool_call entries."""
|
||||||
|
calls: list[dict[str, Any]] = []
|
||||||
|
upd = os.path.join(session_dir, "updates.jsonl")
|
||||||
|
if not os.path.exists(upd):
|
||||||
|
return calls
|
||||||
|
for rec in iter_jsonl(upd):
|
||||||
|
u = (rec.get("params") or {}).get("update") or {}
|
||||||
|
if u.get("sessionUpdate") == "tool_call":
|
||||||
|
calls.append({"title": u.get("title") or "", "rawInput": u.get("rawInput") or {},
|
||||||
|
"id": u.get("toolCallId")})
|
||||||
|
return calls
|
||||||
|
|
||||||
|
|
||||||
|
def _session_meta(session_dir: str) -> dict[str, Any]:
|
||||||
|
p = os.path.join(session_dir, "summary.json")
|
||||||
|
if not os.path.exists(p):
|
||||||
|
return {}
|
||||||
|
try:
|
||||||
|
with open(p, "r", encoding="utf-8") as f:
|
||||||
|
return json.load(f)
|
||||||
|
except (OSError, ValueError):
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def _lifecycle(session_dir: str) -> tuple[list[dict[str, Any]], Optional[str]]:
|
||||||
|
"""events.jsonl records + the model id seen there."""
|
||||||
|
evs, model = [], None
|
||||||
|
p = os.path.join(session_dir, "events.jsonl")
|
||||||
|
if os.path.exists(p):
|
||||||
|
for rec in iter_jsonl(p):
|
||||||
|
evs.append(rec)
|
||||||
|
model = model or rec.get("model_id")
|
||||||
|
return evs, model
|
||||||
|
|
||||||
|
|
||||||
|
def parse_session(path: str, repo_domain_map: Optional[dict[str, str]] = None) -> Optional[Normalized]:
|
||||||
|
repo_domain_map = repo_domain_map or {}
|
||||||
|
# accept either the chat_history.jsonl path or the session dir
|
||||||
|
session_dir = path if os.path.isdir(path) else os.path.dirname(path)
|
||||||
|
chat = os.path.join(session_dir, "chat_history.jsonl")
|
||||||
|
if not os.path.exists(chat):
|
||||||
|
return None
|
||||||
|
|
||||||
|
meta = _session_meta(session_dir)
|
||||||
|
info = meta.get("info") or {}
|
||||||
|
session_id = info.get("id") or os.path.basename(session_dir.rstrip("/"))
|
||||||
|
cwd = info.get("cwd") or meta.get("git_root_dir")
|
||||||
|
life_events, life_model = _lifecycle(session_dir)
|
||||||
|
model = meta.get("current_model_id") or life_model
|
||||||
|
pending_calls = _tool_calls_in_order(session_dir)
|
||||||
|
call_idx = 0
|
||||||
|
|
||||||
|
events: list[SessionEvent] = []
|
||||||
|
blobs: dict[str, str] = {}
|
||||||
|
seq = 0
|
||||||
|
|
||||||
|
def add(kind, *, role=None, tool=None, summary=None, body=None, parent_seq=None) -> int:
|
||||||
|
nonlocal seq
|
||||||
|
s = seq
|
||||||
|
seq += 1
|
||||||
|
ref = None
|
||||||
|
if body:
|
||||||
|
ref = f"blob://{session_id}/{s}"
|
||||||
|
blobs[ref] = body
|
||||||
|
events.append(SessionEvent(
|
||||||
|
session_uid=Session.make_uid(FLAVOR, session_id), seq=s, parent_seq=parent_seq,
|
||||||
|
ts=None, kind=kind, role=role, tool=tool,
|
||||||
|
summary=(summary or "")[:300] or None, payload_ref=ref,
|
||||||
|
))
|
||||||
|
return s
|
||||||
|
|
||||||
|
# explicit lifecycle first (turn_started/turn_ended carry no bodies)
|
||||||
|
for le in life_events:
|
||||||
|
t = le.get("type")
|
||||||
|
if t in ("turn_started", "loop_started", "turn_ended", "phase_changed"):
|
||||||
|
add("lifecycle", summary=t)
|
||||||
|
|
||||||
|
for rec in iter_jsonl(chat):
|
||||||
|
rtype = rec.get("type")
|
||||||
|
content = rec.get("content")
|
||||||
|
if rtype == "user":
|
||||||
|
text = _text_content(content)
|
||||||
|
if text.strip():
|
||||||
|
add("user_msg", role="user", summary=first_line(text), body=text)
|
||||||
|
elif rtype == "reasoning":
|
||||||
|
text = _text_content(content)
|
||||||
|
if text.strip():
|
||||||
|
add("thinking", role="assistant", summary="reasoning", body=text)
|
||||||
|
elif rtype == "assistant":
|
||||||
|
text = _text_content(content)
|
||||||
|
if text.strip():
|
||||||
|
add("assistant_msg", role="assistant", summary=first_line(text), body=text)
|
||||||
|
elif rtype == "tool_result":
|
||||||
|
# pair with the next tool_call (in order) to recover name/args
|
||||||
|
tool = None
|
||||||
|
parent = None
|
||||||
|
if call_idx < len(pending_calls):
|
||||||
|
call = pending_calls[call_idx]
|
||||||
|
call_idx += 1
|
||||||
|
tool = call["title"]
|
||||||
|
cmd = stringify(call["rawInput"])
|
||||||
|
kind = classify_tool(tool, cmd)
|
||||||
|
parent = add(kind, role="assistant", tool=tool, summary=tool, body=cmd)
|
||||||
|
body = _text_content(content) if not isinstance(content, str) else content
|
||||||
|
add("tool_result", role="tool", tool=tool, summary="tool result",
|
||||||
|
body=stringify(body), parent_seq=parent)
|
||||||
|
|
||||||
|
if not events:
|
||||||
|
return None
|
||||||
|
|
||||||
|
cost = Cost(turns=sum(1 for e in events if e.kind == "user_msg"))
|
||||||
|
started = info.get("created_at") or meta.get("created_at")
|
||||||
|
ended = meta.get("last_active_at") or info.get("updated_at") or meta.get("updated_at")
|
||||||
|
cost.wall_clock_s = seconds_between(started, ended)
|
||||||
|
|
||||||
|
repo, domain = resolve_repo(cwd, repo_domain_map)
|
||||||
|
session = Session(
|
||||||
|
session_uid=Session.make_uid(FLAVOR, session_id), flavor=FLAVOR,
|
||||||
|
native_session_id=session_id, repo=repo, domain=domain, cwd=cwd,
|
||||||
|
git_branch=meta.get("head_branch"), model=model,
|
||||||
|
started_at=started, ended_at=ended, outcome="unknown", cost=cost,
|
||||||
|
source_path=chat,
|
||||||
|
source_bytes=_dir_bytes(session_dir),
|
||||||
|
discovered_at=now_iso(),
|
||||||
|
)
|
||||||
|
return Normalized(session=session, events=events, blobs=blobs)
|
||||||
|
|
||||||
|
|
||||||
|
def _dir_bytes(d: str) -> int:
|
||||||
|
total = 0
|
||||||
|
for root, _, files in os.walk(d):
|
||||||
|
for f in files:
|
||||||
|
try:
|
||||||
|
total += os.path.getsize(os.path.join(root, f))
|
||||||
|
except OSError:
|
||||||
|
pass
|
||||||
|
return total
|
||||||
@@ -0,0 +1 @@
|
|||||||
|
{"created_at": "2026-06-07T09:13:20Z", "distribution_ready": true, "id": "sp-problem-budget_overrun-tokens", "name": "problem: budget overrun", "polarity": "problem", "problem": "problem: budget overrun", "provenance": {"detected_at": null, "evidence": {"cost_impact": 10.667, "cross_flavor": false, "flavors": ["claude"], "frequency": 3, "key": "problem:budget_overrun:tokens", "locus": "tokens", "polarity": "problem", "repos": ["artifact-store", "citation-evidence", "infospace-bench"], "score": 32.001, "sessions": ["claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca", "claude:6e0d3d68-872b-4d93-bb09-0691e091314b", "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78"], "signal_type": "budget_overrun", "title": "problem: budget overrun"}, "promoted_at": "2026-06-07T09:13:20Z", "source_key": "problem:budget_overrun:tokens"}, "rendering_hints": {"claude": {"note": "TODO: refine rendering", "target": "CLAUDE.md"}}, "resolutions": [{"detail": "", "steps": [], "summary": "TODO: capture the recommended resolution"}], "schema_version": 1, "scope": {"domains": [], "flavors": ["claude"], "repos": ["artifact-store", "citation-evidence", "infospace-bench"]}, "status": "superseded", "updated_at": "2026-06-07T09:13:20Z", "version": "1.0.0"}
|
||||||
77
session_memory/catalog/sp-problem-budget_overrun-tokens.json
Normal file
77
session_memory/catalog/sp-problem-budget_overrun-tokens.json
Normal file
@@ -0,0 +1,77 @@
|
|||||||
|
{
|
||||||
|
"created_at": "2026-06-07T09:13:20Z",
|
||||||
|
"distribution_ready": true,
|
||||||
|
"id": "sp-problem-budget_overrun-tokens",
|
||||||
|
"name": "Budget overrun: token cost above peers",
|
||||||
|
"polarity": "problem",
|
||||||
|
"problem": "A session's token cost lands well above its peers (>p90). Usually driven by re-reading large files or tool outputs, carrying redundant context, or long exploratory loops without checkpoints.",
|
||||||
|
"provenance": {
|
||||||
|
"detected_at": null,
|
||||||
|
"evidence": {
|
||||||
|
"cost_impact": 10.667,
|
||||||
|
"cross_flavor": false,
|
||||||
|
"flavors": [
|
||||||
|
"claude"
|
||||||
|
],
|
||||||
|
"frequency": 3,
|
||||||
|
"key": "problem:budget_overrun:tokens",
|
||||||
|
"locus": "tokens",
|
||||||
|
"polarity": "problem",
|
||||||
|
"repos": [
|
||||||
|
"artifact-store",
|
||||||
|
"citation-evidence",
|
||||||
|
"infospace-bench"
|
||||||
|
],
|
||||||
|
"score": 32.001,
|
||||||
|
"sessions": [
|
||||||
|
"claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca",
|
||||||
|
"claude:6e0d3d68-872b-4d93-bb09-0691e091314b",
|
||||||
|
"claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78"
|
||||||
|
],
|
||||||
|
"signal_type": "budget_overrun",
|
||||||
|
"title": "problem: budget overrun"
|
||||||
|
},
|
||||||
|
"promoted_at": "2026-06-07T09:13:20Z",
|
||||||
|
"source_key": "problem:budget_overrun:tokens"
|
||||||
|
},
|
||||||
|
"rendering_hints": {
|
||||||
|
"claude": {
|
||||||
|
"target": "CLAUDE.md"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"resolutions": [
|
||||||
|
{
|
||||||
|
"detail": "Use offset/limit; don't re-Read a file already in the transcript.",
|
||||||
|
"steps": [
|
||||||
|
"Locate with grep/glob first",
|
||||||
|
"Read only the relevant span"
|
||||||
|
],
|
||||||
|
"summary": "Read narrowly \u2014 target the region you need, not whole large files"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"detail": "Summarize progress; avoid re-pulling outputs already shown.",
|
||||||
|
"steps": [],
|
||||||
|
"summary": "Checkpoint and prune context instead of re-fetching it"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"detail": "grep/glob narrows scope far cheaper than reading whole trees.",
|
||||||
|
"steps": [],
|
||||||
|
"summary": "Prefer targeted search over broad reads to locate code"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"schema_version": 1,
|
||||||
|
"scope": {
|
||||||
|
"domains": [],
|
||||||
|
"flavors": [
|
||||||
|
"claude"
|
||||||
|
],
|
||||||
|
"repos": [
|
||||||
|
"artifact-store",
|
||||||
|
"citation-evidence",
|
||||||
|
"infospace-bench"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"status": "approved",
|
||||||
|
"updated_at": "2026-06-07T14:21:06Z",
|
||||||
|
"version": "1.0.1"
|
||||||
|
}
|
||||||
@@ -0,0 +1 @@
|
|||||||
|
{"covers": [], "created_at": "2026-06-07T13:26:25Z", "distribution_ready": true, "id": "sp-problem-file_not_read-edit", "name": "Read before you Edit", "polarity": "problem", "problem": "Agents call Edit/Write on a file they have not read in the current session, or after it changed under them. The edit tools reject this ('File has not been read yet' / 'File has been modified since read'), and the retry burns a turn. Top recurring error in the corpus (12/27 sessions, 8 repos).", "provenance": {"detected_at": null, "evidence": {"frequency": 32, "origin": "AGENTIC-WP-0006 error mining / ASSESSMENT-infra-friction.md", "polarity": "problem", "repos": 8, "sessions": 12}, "promoted_at": null, "source_key": "problem:file_not_read:edit"}, "rendering_hints": {"claude": {"target": "CLAUDE.md"}, "codex": {"target": "AGENTS.md"}, "grok": {"target": ".grok/instructions.md"}}, "resolutions": [{"detail": "Never blind-write a file you haven't read this session.", "steps": ["Read the target file", "Then Edit/Write"], "summary": "Read the file (or the region you'll touch) before Edit/Write"}, {"detail": "A stale read means the file changed under you; refresh, don't loop.", "steps": ["Re-Read the file", "Re-apply the Edit"], "summary": "On 'modified since read', re-Read then re-Edit"}], "schema_version": 1, "scope": {"domains": [], "flavors": [], "repos": []}, "status": "superseded", "updated_at": "2026-06-07T13:26:25Z", "version": "1.0.0"}
|
||||||
63
session_memory/catalog/sp-problem-file_not_read-edit.json
Normal file
63
session_memory/catalog/sp-problem-file_not_read-edit.json
Normal file
@@ -0,0 +1,63 @@
|
|||||||
|
{
|
||||||
|
"covers": [
|
||||||
|
"file has not been read",
|
||||||
|
"modified since read",
|
||||||
|
"file_not_read"
|
||||||
|
],
|
||||||
|
"created_at": "2026-06-07T13:26:25Z",
|
||||||
|
"distribution_ready": true,
|
||||||
|
"id": "sp-problem-file_not_read-edit",
|
||||||
|
"name": "Read before you Edit",
|
||||||
|
"polarity": "problem",
|
||||||
|
"problem": "Agents call Edit/Write on a file they have not read in the current session, or after it changed under them. The edit tools reject this ('File has not been read yet' / 'File has been modified since read'), and the retry burns a turn. Top recurring error in the corpus (12/27 sessions, 8 repos).",
|
||||||
|
"provenance": {
|
||||||
|
"detected_at": null,
|
||||||
|
"evidence": {
|
||||||
|
"frequency": 32,
|
||||||
|
"origin": "AGENTIC-WP-0006 error mining / ASSESSMENT-infra-friction.md",
|
||||||
|
"polarity": "problem",
|
||||||
|
"repos": 8,
|
||||||
|
"sessions": 12
|
||||||
|
},
|
||||||
|
"promoted_at": null,
|
||||||
|
"source_key": "problem:file_not_read:edit"
|
||||||
|
},
|
||||||
|
"rendering_hints": {
|
||||||
|
"claude": {
|
||||||
|
"target": "CLAUDE.md"
|
||||||
|
},
|
||||||
|
"codex": {
|
||||||
|
"target": "AGENTS.md"
|
||||||
|
},
|
||||||
|
"grok": {
|
||||||
|
"target": ".grok/instructions.md"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"resolutions": [
|
||||||
|
{
|
||||||
|
"detail": "Never blind-write a file you haven't read this session.",
|
||||||
|
"steps": [
|
||||||
|
"Read the target file",
|
||||||
|
"Then Edit/Write"
|
||||||
|
],
|
||||||
|
"summary": "Read the file (or the region you'll touch) before Edit/Write"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"detail": "A stale read means the file changed under you; refresh, don't loop.",
|
||||||
|
"steps": [
|
||||||
|
"Re-Read the file",
|
||||||
|
"Re-apply the Edit"
|
||||||
|
],
|
||||||
|
"summary": "On 'modified since read', re-Read then re-Edit"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"schema_version": 1,
|
||||||
|
"scope": {
|
||||||
|
"domains": [],
|
||||||
|
"flavors": [],
|
||||||
|
"repos": []
|
||||||
|
},
|
||||||
|
"status": "approved",
|
||||||
|
"updated_at": "2026-06-07T19:06:45Z",
|
||||||
|
"version": "1.0.1"
|
||||||
|
}
|
||||||
@@ -0,0 +1 @@
|
|||||||
|
{"created_at": "2026-06-07T09:13:20Z", "distribution_ready": false, "id": "sp-problem-infra_overhead-infra_overhead", "name": "problem: infra overhead", "polarity": "problem", "problem": "problem: infra overhead", "provenance": {"detected_at": null, "evidence": {"cost_impact": 0.801, "cross_flavor": false, "flavors": ["claude"], "frequency": 2, "key": "problem:infra_overhead:infra_overhead", "locus": "infra_overhead", "polarity": "problem", "repos": ["markitect-main", "vergabe-teilnahme"], "score": 1.602, "sessions": ["claude:135002f9-98d2-4d1b-b8fb-543b20388782", "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74"], "signal_type": "infra_overhead", "title": "problem: infra overhead"}, "promoted_at": "2026-06-07T09:13:20Z", "source_key": "problem:infra_overhead:infra_overhead"}, "rendering_hints": {"claude": {"note": "TODO: refine rendering", "target": "CLAUDE.md"}}, "resolutions": [{"detail": "", "steps": [], "summary": "TODO: capture the recommended resolution"}], "schema_version": 1, "scope": {"domains": [], "flavors": ["claude"], "repos": ["markitect-main", "vergabe-teilnahme"]}, "status": "superseded", "updated_at": "2026-06-07T09:13:20Z", "version": "1.0.0"}
|
||||||
@@ -0,0 +1,74 @@
|
|||||||
|
{
|
||||||
|
"created_at": "2026-06-07T09:13:20Z",
|
||||||
|
"distribution_ready": false,
|
||||||
|
"id": "sp-problem-infra_overhead-infra_overhead",
|
||||||
|
"name": "Infrastructure overhead: too much coordination plumbing",
|
||||||
|
"polarity": "problem",
|
||||||
|
"problem": "A large share of the session's tool calls are State Hub / task-management / schema-loading plumbing rather than touching the repo (corpus median 11.7%, up to 43% in the worst sessions; one session made 231 hub calls).",
|
||||||
|
"provenance": {
|
||||||
|
"detected_at": null,
|
||||||
|
"evidence": {
|
||||||
|
"cost_impact": 0.801,
|
||||||
|
"cross_flavor": false,
|
||||||
|
"flavors": [
|
||||||
|
"claude"
|
||||||
|
],
|
||||||
|
"frequency": 2,
|
||||||
|
"key": "problem:infra_overhead:infra_overhead",
|
||||||
|
"locus": "infra_overhead",
|
||||||
|
"polarity": "problem",
|
||||||
|
"repos": [
|
||||||
|
"markitect-main",
|
||||||
|
"vergabe-teilnahme"
|
||||||
|
],
|
||||||
|
"score": 1.602,
|
||||||
|
"sessions": [
|
||||||
|
"claude:135002f9-98d2-4d1b-b8fb-543b20388782",
|
||||||
|
"claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74"
|
||||||
|
],
|
||||||
|
"signal_type": "infra_overhead",
|
||||||
|
"title": "problem: infra overhead"
|
||||||
|
},
|
||||||
|
"promoted_at": "2026-06-07T09:13:20Z",
|
||||||
|
"source_key": "problem:infra_overhead:infra_overhead"
|
||||||
|
},
|
||||||
|
"rendering_hints": {
|
||||||
|
"claude": {
|
||||||
|
"target": "CLAUDE.md"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"resolutions": [
|
||||||
|
{
|
||||||
|
"detail": "Update several task statuses together; emit fewer, coarser progress events.",
|
||||||
|
"steps": [
|
||||||
|
"Do a chunk of work",
|
||||||
|
"Then sync statuses in one pass"
|
||||||
|
],
|
||||||
|
"summary": "Batch hub writes \u2014 sync at checkpoints, not per event"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"detail": "One scoped summary at session start beats many broad reads.",
|
||||||
|
"steps": [],
|
||||||
|
"summary": "Orient once with get_domain_summary, don't re-query repeatedly"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"detail": "See STATE-WP-0058 \u2014 stops the repeated ToolSearch for hub tools.",
|
||||||
|
"steps": [],
|
||||||
|
"summary": "Front-load hub tool knowledge via the State Hub skill"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"schema_version": 1,
|
||||||
|
"scope": {
|
||||||
|
"domains": [],
|
||||||
|
"flavors": [
|
||||||
|
"claude"
|
||||||
|
],
|
||||||
|
"repos": [
|
||||||
|
"markitect-main",
|
||||||
|
"vergabe-teilnahme"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"status": "provisional",
|
||||||
|
"updated_at": "2026-06-07T14:21:06Z",
|
||||||
|
"version": "1.0.1"
|
||||||
|
}
|
||||||
@@ -0,0 +1 @@
|
|||||||
|
{"created_at": "2026-06-07T09:13:20Z", "distribution_ready": true, "id": "sp-problem-schema_thrash-schema_load", "name": "problem: schema thrash", "polarity": "problem", "problem": "problem: schema thrash", "provenance": {"detected_at": null, "evidence": {"cost_impact": 79.0, "cross_flavor": false, "flavors": ["claude"], "frequency": 8, "key": "problem:schema_thrash:schema_load", "locus": "schema_load", "polarity": "problem", "repos": ["activity-core", "citation-evidence", "flex-auth", "infospace-bench", "ops-bridge", "vergabe-teilnahme"], "score": 632.0, "sessions": ["claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca", "claude:30dbad62-c042-41f2-80c1-5953a1100e7f", "claude:4340b160-2fb6-47d0-897c-3cac0a8855d8", "claude:63fd4df2-5add-4748-af21-c1544825e006", "claude:8313f946-f008-4e98-9915-31950380e39e", "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78", "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74", "claude:bbcf1c2b-14be-40e4-826b-4b2b49b9d212"], "signal_type": "schema_thrash", "title": "problem: schema thrash"}, "promoted_at": "2026-06-07T09:13:20Z", "source_key": "problem:schema_thrash:schema_load"}, "rendering_hints": {"claude": {"note": "TODO: refine rendering", "target": "CLAUDE.md"}}, "resolutions": [{"detail": "", "steps": [], "summary": "TODO: capture the recommended resolution"}], "schema_version": 1, "scope": {"domains": [], "flavors": ["claude"], "repos": ["activity-core", "citation-evidence", "flex-auth", "infospace-bench", "ops-bridge", "vergabe-teilnahme"]}, "status": "superseded", "updated_at": "2026-06-07T09:13:20Z", "version": "1.0.0"}
|
||||||
@@ -0,0 +1,83 @@
|
|||||||
|
{
|
||||||
|
"created_at": "2026-06-07T09:13:20Z",
|
||||||
|
"distribution_ready": true,
|
||||||
|
"id": "sp-problem-schema_thrash-schema_load",
|
||||||
|
"name": "Schema thrash: repeated ToolSearch",
|
||||||
|
"polarity": "problem",
|
||||||
|
"problem": "ToolSearch fires repeatedly within a session (seen in 81% of sessions) because the State Hub MCP tools are deferred and their schemas get re-loaded each time they are needed \u2014 pure overhead with no work value.",
|
||||||
|
"provenance": {
|
||||||
|
"detected_at": null,
|
||||||
|
"evidence": {
|
||||||
|
"cost_impact": 79.0,
|
||||||
|
"cross_flavor": false,
|
||||||
|
"flavors": [
|
||||||
|
"claude"
|
||||||
|
],
|
||||||
|
"frequency": 8,
|
||||||
|
"key": "problem:schema_thrash:schema_load",
|
||||||
|
"locus": "schema_load",
|
||||||
|
"polarity": "problem",
|
||||||
|
"repos": [
|
||||||
|
"activity-core",
|
||||||
|
"citation-evidence",
|
||||||
|
"flex-auth",
|
||||||
|
"infospace-bench",
|
||||||
|
"ops-bridge",
|
||||||
|
"vergabe-teilnahme"
|
||||||
|
],
|
||||||
|
"score": 632.0,
|
||||||
|
"sessions": [
|
||||||
|
"claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca",
|
||||||
|
"claude:30dbad62-c042-41f2-80c1-5953a1100e7f",
|
||||||
|
"claude:4340b160-2fb6-47d0-897c-3cac0a8855d8",
|
||||||
|
"claude:63fd4df2-5add-4748-af21-c1544825e006",
|
||||||
|
"claude:8313f946-f008-4e98-9915-31950380e39e",
|
||||||
|
"claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78",
|
||||||
|
"claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74",
|
||||||
|
"claude:bbcf1c2b-14be-40e4-826b-4b2b49b9d212"
|
||||||
|
],
|
||||||
|
"signal_type": "schema_thrash",
|
||||||
|
"title": "problem: schema thrash"
|
||||||
|
},
|
||||||
|
"promoted_at": "2026-06-07T09:13:20Z",
|
||||||
|
"source_key": "problem:schema_thrash:schema_load"
|
||||||
|
},
|
||||||
|
"rendering_hints": {
|
||||||
|
"claude": {
|
||||||
|
"target": "CLAUDE.md"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"resolutions": [
|
||||||
|
{
|
||||||
|
"detail": "Resolve them by name in one ToolSearch (select:...) rather than searching ad hoc.",
|
||||||
|
"steps": [
|
||||||
|
"List the hub tools the session needs",
|
||||||
|
"Load them once at the start"
|
||||||
|
],
|
||||||
|
"summary": "Load the tool schemas you'll need once, up front"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"detail": "The skill carries the schemas so no per-use discovery is needed.",
|
||||||
|
"steps": [],
|
||||||
|
"summary": "Adopt the State Hub skill that front-loads common hub tool signatures"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"schema_version": 1,
|
||||||
|
"scope": {
|
||||||
|
"domains": [],
|
||||||
|
"flavors": [
|
||||||
|
"claude"
|
||||||
|
],
|
||||||
|
"repos": [
|
||||||
|
"activity-core",
|
||||||
|
"citation-evidence",
|
||||||
|
"flex-auth",
|
||||||
|
"infospace-bench",
|
||||||
|
"ops-bridge",
|
||||||
|
"vergabe-teilnahme"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"status": "approved",
|
||||||
|
"updated_at": "2026-06-07T14:21:06Z",
|
||||||
|
"version": "1.0.1"
|
||||||
|
}
|
||||||
@@ -0,0 +1 @@
|
|||||||
|
{"created_at": "2026-06-07T09:13:20Z", "distribution_ready": true, "id": "sp-problem-tool_thrash-tool-bash", "name": "problem: tool thrash", "polarity": "problem", "problem": "problem: tool thrash", "provenance": {"detected_at": null, "evidence": {"cost_impact": 1990.0, "cross_flavor": false, "flavors": ["claude"], "frequency": 11, "key": "problem:tool_thrash:tool:Bash", "locus": "tool:Bash", "polarity": "problem", "repos": ["activity-core", "artifact-store", "citation-evidence", "ihp-railiance-probe", "infospace-bench", "railiance-apps", "state-hub", "vergabe-teilnahme"], "score": 21890.0, "sessions": ["claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca", "claude:2c0d14e1-d089-4076-bf35-b134737a261d", "claude:30dbad62-c042-41f2-80c1-5953a1100e7f", "claude:4307eff6-cd39-4189-be58-79a3acb69d6c", "claude:4340b160-2fb6-47d0-897c-3cac0a8855d8", "claude:6e0d3d68-872b-4d93-bb09-0691e091314b", "claude:8313f946-f008-4e98-9915-31950380e39e", "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78", "claude:a9483f07-c9dc-4f71-9fa0-831790ea965e", "claude:b1dfbcfa-91f9-4540-823a-26fcfaab7fc8", "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74"], "signal_type": "tool_thrash", "title": "problem: tool thrash"}, "promoted_at": "2026-06-07T09:13:20Z", "source_key": "problem:tool_thrash:tool:Bash"}, "rendering_hints": {"claude": {"note": "TODO: refine rendering", "target": "CLAUDE.md"}}, "resolutions": [{"detail": "", "steps": [], "summary": "TODO: capture the recommended resolution"}], "schema_version": 1, "scope": {"domains": [], "flavors": ["claude"], "repos": ["activity-core", "artifact-store", "citation-evidence", "ihp-railiance-probe", "infospace-bench", "railiance-apps", "state-hub", "vergabe-teilnahme"]}, "status": "superseded", "updated_at": "2026-06-07T09:13:20Z", "version": "1.0.0"}
|
||||||
95
session_memory/catalog/sp-problem-tool_thrash-tool-bash.json
Normal file
95
session_memory/catalog/sp-problem-tool_thrash-tool-bash.json
Normal file
@@ -0,0 +1,95 @@
|
|||||||
|
{
|
||||||
|
"created_at": "2026-06-07T09:13:20Z",
|
||||||
|
"distribution_ready": true,
|
||||||
|
"id": "sp-problem-tool_thrash-tool-bash",
|
||||||
|
"name": "Tool thrash: one tool hammered",
|
||||||
|
"polarity": "problem",
|
||||||
|
"problem": "A single tool (often Bash or Edit) is invoked far more than any other in a session \u2014 a sign of trial-and-error churn or missing higher-level tooling.",
|
||||||
|
"provenance": {
|
||||||
|
"detected_at": null,
|
||||||
|
"evidence": {
|
||||||
|
"cost_impact": 1990.0,
|
||||||
|
"cross_flavor": false,
|
||||||
|
"flavors": [
|
||||||
|
"claude"
|
||||||
|
],
|
||||||
|
"frequency": 11,
|
||||||
|
"key": "problem:tool_thrash:tool:Bash",
|
||||||
|
"locus": "tool:Bash",
|
||||||
|
"polarity": "problem",
|
||||||
|
"repos": [
|
||||||
|
"activity-core",
|
||||||
|
"artifact-store",
|
||||||
|
"citation-evidence",
|
||||||
|
"ihp-railiance-probe",
|
||||||
|
"infospace-bench",
|
||||||
|
"railiance-apps",
|
||||||
|
"state-hub",
|
||||||
|
"vergabe-teilnahme"
|
||||||
|
],
|
||||||
|
"score": 21890.0,
|
||||||
|
"sessions": [
|
||||||
|
"claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca",
|
||||||
|
"claude:2c0d14e1-d089-4076-bf35-b134737a261d",
|
||||||
|
"claude:30dbad62-c042-41f2-80c1-5953a1100e7f",
|
||||||
|
"claude:4307eff6-cd39-4189-be58-79a3acb69d6c",
|
||||||
|
"claude:4340b160-2fb6-47d0-897c-3cac0a8855d8",
|
||||||
|
"claude:6e0d3d68-872b-4d93-bb09-0691e091314b",
|
||||||
|
"claude:8313f946-f008-4e98-9915-31950380e39e",
|
||||||
|
"claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78",
|
||||||
|
"claude:a9483f07-c9dc-4f71-9fa0-831790ea965e",
|
||||||
|
"claude:b1dfbcfa-91f9-4540-823a-26fcfaab7fc8",
|
||||||
|
"claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74"
|
||||||
|
],
|
||||||
|
"signal_type": "tool_thrash",
|
||||||
|
"title": "problem: tool thrash"
|
||||||
|
},
|
||||||
|
"promoted_at": "2026-06-07T09:13:20Z",
|
||||||
|
"source_key": "problem:tool_thrash:tool:Bash"
|
||||||
|
},
|
||||||
|
"rendering_hints": {
|
||||||
|
"claude": {
|
||||||
|
"target": "CLAUDE.md"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"resolutions": [
|
||||||
|
{
|
||||||
|
"detail": "Compose a single command/script; run independent calls in parallel.",
|
||||||
|
"steps": [
|
||||||
|
"Group the steps",
|
||||||
|
"Run them as one block"
|
||||||
|
],
|
||||||
|
"summary": "Batch related shell work into one script, not many small Bash calls"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"detail": "Read the region, then one substantive Edit beats many tiny ones.",
|
||||||
|
"steps": [],
|
||||||
|
"summary": "Make fewer, larger edits with full context"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"detail": "If the same invocation recurs, wrap it once.",
|
||||||
|
"steps": [],
|
||||||
|
"summary": "Factor a repeated command pattern into a helper"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"schema_version": 1,
|
||||||
|
"scope": {
|
||||||
|
"domains": [],
|
||||||
|
"flavors": [
|
||||||
|
"claude"
|
||||||
|
],
|
||||||
|
"repos": [
|
||||||
|
"activity-core",
|
||||||
|
"artifact-store",
|
||||||
|
"citation-evidence",
|
||||||
|
"ihp-railiance-probe",
|
||||||
|
"infospace-bench",
|
||||||
|
"railiance-apps",
|
||||||
|
"state-hub",
|
||||||
|
"vergabe-teilnahme"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"status": "approved",
|
||||||
|
"updated_at": "2026-06-07T14:21:06Z",
|
||||||
|
"version": "1.0.1"
|
||||||
|
}
|
||||||
@@ -0,0 +1 @@
|
|||||||
|
{"created_at": "2026-06-07T09:13:20Z", "distribution_ready": true, "id": "sp-success-clean_pass-outcome", "name": "cross-flavor success: clean pass", "polarity": "success", "problem": "cross-flavor success: clean pass", "provenance": {"detected_at": null, "evidence": {"cost_impact": 17.0, "cross_flavor": true, "flavors": ["claude", "grok"], "frequency": 17, "key": "success:clean_pass:outcome", "locus": "outcome", "polarity": "success", "repos": ["activity-core", "agentic-resources", "artifact-store", "can-you-assist", "citation-evidence", "infospace-bench", "issue-facade", "ops-bridge", "railiance-apps", "state-hub", "the-custodian", "vergabe-teilnahme"], "score": 433.5, "sessions": ["claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca", "claude:16bdbec4-b018-4902-9fb5-336f8f3d61c8", "claude:2c0d14e1-d089-4076-bf35-b134737a261d", "claude:30dbad62-c042-41f2-80c1-5953a1100e7f", "claude:4307eff6-cd39-4189-be58-79a3acb69d6c", "claude:4340b160-2fb6-47d0-897c-3cac0a8855d8", "claude:631de76e-fdee-43b5-b091-7b7675467ad1", "claude:63fd4df2-5add-4748-af21-c1544825e006", "claude:6e0d3d68-872b-4d93-bb09-0691e091314b", "claude:8313f946-f008-4e98-9915-31950380e39e", "claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78", "claude:a9483f07-c9dc-4f71-9fa0-831790ea965e", "claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74", "claude:eb837dd1-5b8e-472e-b9e1-4537b10e03e6", "claude:ee9e84f2-bc35-4eb5-a7ad-aaec5f31d965", "claude:f1b25697-0e5f-45f0-81d1-af0f1762c438", "grok:019e6122-00c0-79f3-b4e5-9c70b77c015d"], "signal_type": "clean_pass", "title": "cross-flavor success: clean pass"}, "promoted_at": "2026-06-07T09:13:20Z", "source_key": "success:clean_pass:outcome"}, "rendering_hints": {"claude": {"note": "TODO: refine rendering", "target": "CLAUDE.md"}, "grok": {"note": "TODO: refine rendering", "target": "instructions"}}, "resolutions": [{"detail": "", "steps": [], "summary": "TODO: capture the recommended resolution"}], "schema_version": 1, "scope": {"domains": [], "flavors": ["claude", "grok"], "repos": ["activity-core", "agentic-resources", "artifact-store", "can-you-assist", "citation-evidence", "infospace-bench", "issue-facade", "ops-bridge", "railiance-apps", "state-hub", "the-custodian", "vergabe-teilnahme"]}, "status": "superseded", "updated_at": "2026-06-07T09:13:20Z", "version": "1.0.0"}
|
||||||
110
session_memory/catalog/sp-success-clean_pass-outcome.json
Normal file
110
session_memory/catalog/sp-success-clean_pass-outcome.json
Normal file
@@ -0,0 +1,110 @@
|
|||||||
|
{
|
||||||
|
"created_at": "2026-06-07T09:13:20Z",
|
||||||
|
"distribution_ready": true,
|
||||||
|
"id": "sp-success-clean_pass-outcome",
|
||||||
|
"name": "Clean pass: tests green, no retries",
|
||||||
|
"polarity": "success",
|
||||||
|
"problem": "The target session shape: ends in success, runs the test suite, with no errors and no retries \u2014 resolves cheaply and reliably. Seen across many sessions and both Claude and Grok (the highest-value pattern to reinforce).",
|
||||||
|
"provenance": {
|
||||||
|
"detected_at": null,
|
||||||
|
"evidence": {
|
||||||
|
"cost_impact": 17.0,
|
||||||
|
"cross_flavor": true,
|
||||||
|
"flavors": [
|
||||||
|
"claude",
|
||||||
|
"grok"
|
||||||
|
],
|
||||||
|
"frequency": 17,
|
||||||
|
"key": "success:clean_pass:outcome",
|
||||||
|
"locus": "outcome",
|
||||||
|
"polarity": "success",
|
||||||
|
"repos": [
|
||||||
|
"activity-core",
|
||||||
|
"agentic-resources",
|
||||||
|
"artifact-store",
|
||||||
|
"can-you-assist",
|
||||||
|
"citation-evidence",
|
||||||
|
"infospace-bench",
|
||||||
|
"issue-facade",
|
||||||
|
"ops-bridge",
|
||||||
|
"railiance-apps",
|
||||||
|
"state-hub",
|
||||||
|
"the-custodian",
|
||||||
|
"vergabe-teilnahme"
|
||||||
|
],
|
||||||
|
"score": 433.5,
|
||||||
|
"sessions": [
|
||||||
|
"claude:0ef1b45c-5c27-4e20-88b3-37daeaa24eca",
|
||||||
|
"claude:16bdbec4-b018-4902-9fb5-336f8f3d61c8",
|
||||||
|
"claude:2c0d14e1-d089-4076-bf35-b134737a261d",
|
||||||
|
"claude:30dbad62-c042-41f2-80c1-5953a1100e7f",
|
||||||
|
"claude:4307eff6-cd39-4189-be58-79a3acb69d6c",
|
||||||
|
"claude:4340b160-2fb6-47d0-897c-3cac0a8855d8",
|
||||||
|
"claude:631de76e-fdee-43b5-b091-7b7675467ad1",
|
||||||
|
"claude:63fd4df2-5add-4748-af21-c1544825e006",
|
||||||
|
"claude:6e0d3d68-872b-4d93-bb09-0691e091314b",
|
||||||
|
"claude:8313f946-f008-4e98-9915-31950380e39e",
|
||||||
|
"claude:8fabd5ce-6a20-4412-9a8b-0f0763394a78",
|
||||||
|
"claude:a9483f07-c9dc-4f71-9fa0-831790ea965e",
|
||||||
|
"claude:b4ae9631-a7eb-42a6-acb1-c65b660c4b74",
|
||||||
|
"claude:eb837dd1-5b8e-472e-b9e1-4537b10e03e6",
|
||||||
|
"claude:ee9e84f2-bc35-4eb5-a7ad-aaec5f31d965",
|
||||||
|
"claude:f1b25697-0e5f-45f0-81d1-af0f1762c438",
|
||||||
|
"grok:019e6122-00c0-79f3-b4e5-9c70b77c015d"
|
||||||
|
],
|
||||||
|
"signal_type": "clean_pass",
|
||||||
|
"title": "cross-flavor success: clean pass"
|
||||||
|
},
|
||||||
|
"promoted_at": "2026-06-07T09:13:20Z",
|
||||||
|
"source_key": "success:clean_pass:outcome"
|
||||||
|
},
|
||||||
|
"rendering_hints": {
|
||||||
|
"claude": {
|
||||||
|
"target": "CLAUDE.md"
|
||||||
|
},
|
||||||
|
"grok": {
|
||||||
|
"target": "instructions"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"resolutions": [
|
||||||
|
{
|
||||||
|
"detail": "A passing suite is the cheapest proof the change works.",
|
||||||
|
"steps": [
|
||||||
|
"Make the change",
|
||||||
|
"Run the suite",
|
||||||
|
"Only then report done"
|
||||||
|
],
|
||||||
|
"summary": "Run the test suite before declaring done; let green gate completion"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"detail": "Small verified steps beat large unverified ones that bounce.",
|
||||||
|
"steps": [],
|
||||||
|
"summary": "Work incrementally and verify as you go to avoid retries"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"schema_version": 1,
|
||||||
|
"scope": {
|
||||||
|
"domains": [],
|
||||||
|
"flavors": [
|
||||||
|
"claude",
|
||||||
|
"grok"
|
||||||
|
],
|
||||||
|
"repos": [
|
||||||
|
"activity-core",
|
||||||
|
"agentic-resources",
|
||||||
|
"artifact-store",
|
||||||
|
"can-you-assist",
|
||||||
|
"citation-evidence",
|
||||||
|
"infospace-bench",
|
||||||
|
"issue-facade",
|
||||||
|
"ops-bridge",
|
||||||
|
"railiance-apps",
|
||||||
|
"state-hub",
|
||||||
|
"the-custodian",
|
||||||
|
"vergabe-teilnahme"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"status": "approved",
|
||||||
|
"updated_at": "2026-06-07T14:21:06Z",
|
||||||
|
"version": "1.0.1"
|
||||||
|
}
|
||||||
@@ -20,20 +20,64 @@ root = "~/.claude/projects"
|
|||||||
# glob, relative to root; covers sessions and agent-* sidechains
|
# glob, relative to root; covers sessions and agent-* sidechains
|
||||||
glob = "*/*.jsonl"
|
glob = "*/*.jsonl"
|
||||||
|
|
||||||
# Codex / Grok adapters land in Phase 1 (schemas confirmed in the design doc).
|
# Codex / Grok adapters added in Phase 1 (AGENTIC-WP-0003).
|
||||||
[sources.codex]
|
[sources.codex]
|
||||||
enabled = false
|
enabled = true
|
||||||
root = "~/.codex/sessions"
|
root = "~/.codex/sessions"
|
||||||
glob = "*/*/*/rollout-*.jsonl"
|
glob = "*/*/*/rollout-*.jsonl"
|
||||||
|
|
||||||
[sources.grok]
|
[sources.grok]
|
||||||
enabled = false
|
enabled = true
|
||||||
root = "~/.grok/sessions"
|
root = "~/.grok/sessions"
|
||||||
glob = "*/*/chat_history.jsonl"
|
glob = "*/*/chat_history.jsonl"
|
||||||
|
|
||||||
|
# Detect phase (AGENTIC-WP-0005): quality filter — drop non-coding/trivial sessions
|
||||||
|
# before signals form, so health-checks don't mint false-positive patterns.
|
||||||
|
[detect.quality]
|
||||||
|
min_events = 20 # below this many events, not a real coding session
|
||||||
|
min_substantive = 3 # require >= this many substantive (edit/read/shell) tool calls
|
||||||
|
min_prompt_len = 25 # first prompt shorter than this is treated as trivial
|
||||||
|
|
||||||
|
# Curate phase (AGENTIC-WP-0004): catalog location + promotion evidence bar.
|
||||||
|
# Measure phase (AGENTIC-WP-0009): persisted baseline/trend of fleet metrics.
|
||||||
|
[measure]
|
||||||
|
baselines = "session_memory/measure/baselines.jsonl" # timestamped metric snapshots (committed)
|
||||||
|
|
||||||
|
# Weekly retro (AGENTIC-WP-0010): windowed top-3-per-repo report, published to the
|
||||||
|
# hub as the coding_retro read model that activity-core's weekly schedule consumes.
|
||||||
|
[retro]
|
||||||
|
window_days = 7
|
||||||
|
report_json = "session_memory/retro/last_retro.json" # latest report (committed)
|
||||||
|
report_md = "session_memory/retro/last_retro.md" # human-readable mirror
|
||||||
|
hub_url = "http://127.0.0.1:8000" # for --publish (best-effort)
|
||||||
|
|
||||||
|
# Distribute phase (AGENTIC-WP-0007): where per-flavor proposals + the active
|
||||||
|
# registry are written. Proposals are HITL — reviewed, never auto-applied.
|
||||||
|
[distribute]
|
||||||
|
proposals_dir = "session_memory/proposals" # reviewable proposals (gitignored, regenerated)
|
||||||
|
active_registry = "session_memory/distribute/active_patterns.json" # what's proposed/active where (committed)
|
||||||
|
|
||||||
|
[curate]
|
||||||
|
catalog_dir = "session_memory/catalog" # files-first Pattern Catalog (committed)
|
||||||
|
review_log = "session_memory/.store/reviews.jsonl" # remembered decisions (gitignored)
|
||||||
|
decision_queue = "session_memory/.store/decisions.queue.jsonl" # hub decisions pending sync
|
||||||
|
state_hub_workstream_id = "b3703684-f60e-42f3-b03e-dabe3e8ce3f4" # AGENTIC-WP-0004
|
||||||
|
|
||||||
|
# Evidence bar (OQ5): floors to promote at all, and stricter floors to be
|
||||||
|
# distribution-eligible (status=approved, distribution_ready=true).
|
||||||
|
[curate.gate]
|
||||||
|
min_frequency = 2 # >= this many supporting signals to promote
|
||||||
|
min_sessions = 2 # >= this many distinct sessions
|
||||||
|
min_cost_impact = 0.0
|
||||||
|
dist_require_cross_flavor = false # require cross-flavor evidence to distribute
|
||||||
|
dist_min_frequency = 3
|
||||||
|
dist_min_cost_impact = 0.0
|
||||||
|
|
||||||
# cwd basename -> domain slug. Used to tag sessions with their Custodian domain.
|
# cwd basename -> domain slug. Used to tag sessions with their Custodian domain.
|
||||||
[repo_domain_map]
|
[repo_domain_map]
|
||||||
agentic-resources = "helix_forge"
|
agentic-resources = "helix_forge"
|
||||||
the-custodian = "custodian"
|
the-custodian = "custodian"
|
||||||
state-hub = "custodian"
|
state-hub = "custodian"
|
||||||
ops-bridge = "custodian"
|
ops-bridge = "custodian"
|
||||||
|
net-kingdom = "netkingdom"
|
||||||
|
can-you-assist = "coulomb_social"
|
||||||
|
|||||||
@@ -12,6 +12,8 @@ belongs to the Detect phase (PRD §6.2).
|
|||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
import collections
|
import collections
|
||||||
|
import json
|
||||||
|
import re
|
||||||
from typing import Any
|
from typing import Any
|
||||||
|
|
||||||
from .schema import Session, SessionEvent
|
from .schema import Session, SessionEvent
|
||||||
@@ -21,6 +23,22 @@ _FAIL_HINTS = ("error", "failed", "exception", "traceback", "fatal", "non-zero")
|
|||||||
# Substrings suggesting a clean test pass.
|
# Substrings suggesting a clean test pass.
|
||||||
_PASS_HINTS = ("passed", "0 failed", "ok", "success")
|
_PASS_HINTS = ("passed", "0 failed", "ok", "success")
|
||||||
|
|
||||||
|
# A line that is numbered source content from a Read result (`cat -n` style),
|
||||||
|
# e.g. "229\t raise InfospaceError(" — code text, never a runtime error.
|
||||||
|
_NUMBERED_LINE_RE = re.compile(r"^\s*\d+\t")
|
||||||
|
# Top-level keys that mark a JSON tool-result as an actual error (vs. success).
|
||||||
|
_JSON_ERROR_KEYS = ("error", "errors", "detail")
|
||||||
|
|
||||||
|
# Normalization patterns so the same error collapses to one fingerprint
|
||||||
|
# regardless of paths / ids / counts (WP-0006 T01).
|
||||||
|
_UUID_RE = re.compile(r"\b[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}\b", re.I)
|
||||||
|
_HEXADDR_RE = re.compile(r"\b0x[0-9a-f]+\b", re.I)
|
||||||
|
_PATH_RE = re.compile(r"(?:/[\w.\-]+)+/?|[A-Za-z]:\\[\w.\\\-]+")
|
||||||
|
_NUM_RE = re.compile(r"\b\d+\b")
|
||||||
|
_WS_RE = re.compile(r"\s+")
|
||||||
|
_ERR_SAMPLE_MAX = 200
|
||||||
|
_ERR_FP_MAX = 160
|
||||||
|
|
||||||
|
|
||||||
def infer_outcome(events: list[SessionEvent], blobs: dict[str, str] | None = None) -> str:
|
def infer_outcome(events: list[SessionEvent], blobs: dict[str, str] | None = None) -> str:
|
||||||
"""Heuristic outcome label across flavors (design OQ2).
|
"""Heuristic outcome label across flavors (design OQ2).
|
||||||
@@ -100,6 +118,7 @@ def build_digest(session: Session, events: list[SessionEvent],
|
|||||||
},
|
},
|
||||||
"first_prompt": _first_prompt(events, blobs),
|
"first_prompt": _first_prompt(events, blobs),
|
||||||
"last_assistant": _last_assistant(events, blobs),
|
"last_assistant": _last_assistant(events, blobs),
|
||||||
|
"error_snippets": _error_snippets(events, blobs),
|
||||||
"schema_version": session.schema_version,
|
"schema_version": session.schema_version,
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -148,6 +167,114 @@ def _last_assistant(events, blobs):
|
|||||||
return None
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def _error_line(text: str) -> str:
|
||||||
|
"""Pick the most error-like line from a body.
|
||||||
|
|
||||||
|
Prefers the *last* line matching a fail hint — in a Python traceback the
|
||||||
|
actual exception is the final line, while the bare ``Traceback (most recent
|
||||||
|
call last):`` header is just noise and is skipped.
|
||||||
|
"""
|
||||||
|
lines = [ln.strip() for ln in text.splitlines() if ln.strip()]
|
||||||
|
matches = [ln for ln in lines
|
||||||
|
if any(h in ln.lower() for h in _FAIL_HINTS)
|
||||||
|
and not ln.lower().startswith("traceback")]
|
||||||
|
if matches:
|
||||||
|
return matches[-1]
|
||||||
|
# fall back to any fail-hint line (e.g. only the traceback header), else first
|
||||||
|
any_hint = [ln for ln in lines if any(h in ln.lower() for h in _FAIL_HINTS)]
|
||||||
|
return any_hint[-1] if any_hint else (lines[0] if lines else "")
|
||||||
|
|
||||||
|
|
||||||
|
def _error_fingerprint(text: str) -> str:
|
||||||
|
"""Stable, content-addressable key for an error, paths/ids/numbers removed."""
|
||||||
|
s = _error_line(text).lower()
|
||||||
|
s = _UUID_RE.sub("<uuid>", s)
|
||||||
|
s = _HEXADDR_RE.sub("<addr>", s)
|
||||||
|
s = _PATH_RE.sub("<path>", s)
|
||||||
|
s = _NUM_RE.sub("<n>", s)
|
||||||
|
return _WS_RE.sub(" ", s).strip()[:_ERR_FP_MAX]
|
||||||
|
|
||||||
|
|
||||||
|
def _error_body(event: SessionEvent, blobs: dict) -> str:
|
||||||
|
"""Best available text for a failed event."""
|
||||||
|
if event.payload_ref and event.payload_ref in blobs:
|
||||||
|
return blobs[event.payload_ref]
|
||||||
|
return event.summary or ""
|
||||||
|
|
||||||
|
|
||||||
|
def _looks_like_file_read(body: str) -> bool:
|
||||||
|
"""True if the body is mostly numbered source lines (a Read result), not an error."""
|
||||||
|
lines = [ln for ln in body.splitlines() if ln.strip()]
|
||||||
|
if not lines:
|
||||||
|
return False
|
||||||
|
numbered = sum(1 for ln in lines if _NUMBERED_LINE_RE.match(ln))
|
||||||
|
return numbered >= max(3, len(lines) // 2)
|
||||||
|
|
||||||
|
|
||||||
|
def _json_verdict(body: str):
|
||||||
|
"""Classify a JSON tool-result body: 'error', 'success', or None (not JSON).
|
||||||
|
|
||||||
|
Hub MCP successes look like ``{"result": "..."}`` and mention 'error' deep
|
||||||
|
inside summaries but are not failures ('success'). A payload with a top-level
|
||||||
|
error key (``{"detail": ...}`` / ``{"error": ...}``) is 'error'. Non-JSON text
|
||||||
|
returns None so the plain fail-hint heuristic still applies.
|
||||||
|
"""
|
||||||
|
s = body.strip()
|
||||||
|
if not s or s[0] not in "{[":
|
||||||
|
return None
|
||||||
|
try:
|
||||||
|
obj = json.loads(s)
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
return None
|
||||||
|
if isinstance(obj, dict) and any(k in obj for k in _JSON_ERROR_KEYS):
|
||||||
|
return "error"
|
||||||
|
return "success"
|
||||||
|
|
||||||
|
|
||||||
|
def _is_failed(event: SessionEvent, blobs: dict) -> bool:
|
||||||
|
if event.kind == "error":
|
||||||
|
return True
|
||||||
|
if event.kind == "tool_result":
|
||||||
|
body = _error_body(event, blobs)
|
||||||
|
if not body.strip():
|
||||||
|
return False
|
||||||
|
if _looks_like_file_read(body):
|
||||||
|
return False
|
||||||
|
verdict = _json_verdict(body)
|
||||||
|
if verdict is not None:
|
||||||
|
return verdict == "error"
|
||||||
|
return any(h in body.lower() for h in _FAIL_HINTS)
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def _error_snippets(events: list[SessionEvent], blobs: dict) -> list[dict]:
|
||||||
|
"""Collapse a session's failures into deduped, normalized error fingerprints.
|
||||||
|
|
||||||
|
Durable in Tier 2 (the raw blobs may be evicted): each entry is
|
||||||
|
``{fingerprint, sample, count, tool}`` with same-fingerprint occurrences
|
||||||
|
counted. Ordered by frequency (then first appearance) for stable output.
|
||||||
|
"""
|
||||||
|
agg: dict[str, dict] = {}
|
||||||
|
order: list[str] = []
|
||||||
|
for e in events:
|
||||||
|
if not _is_failed(e, blobs):
|
||||||
|
continue
|
||||||
|
body = _error_body(e, blobs)
|
||||||
|
if not body.strip():
|
||||||
|
continue
|
||||||
|
fp = _error_fingerprint(body)
|
||||||
|
if not fp:
|
||||||
|
continue
|
||||||
|
if fp not in agg:
|
||||||
|
agg[fp] = {"fingerprint": fp, "sample": _error_line(body)[:_ERR_SAMPLE_MAX],
|
||||||
|
"count": 0, "tool": e.tool}
|
||||||
|
order.append(fp)
|
||||||
|
agg[fp]["count"] += 1
|
||||||
|
snippets = [agg[fp] for fp in order]
|
||||||
|
snippets.sort(key=lambda s: (-s["count"], order.index(s["fingerprint"])))
|
||||||
|
return snippets
|
||||||
|
|
||||||
|
|
||||||
def _read_blob(store, ref):
|
def _read_blob(store, ref):
|
||||||
row = store.db.execute("SELECT path FROM blobs WHERE ref=?", (ref,)).fetchone()
|
row = store.db.execute("SELECT path FROM blobs WHERE ref=?", (ref,)).fetchone()
|
||||||
if not row:
|
if not row:
|
||||||
|
|||||||
@@ -11,7 +11,7 @@ import json
|
|||||||
from dataclasses import asdict, dataclass, field, fields
|
from dataclasses import asdict, dataclass, field, fields
|
||||||
from typing import Any, Optional
|
from typing import Any, Optional
|
||||||
|
|
||||||
SCHEMA_VERSION = 1
|
SCHEMA_VERSION = 2 # v2: digest carries error_snippets (WP-0006 T01)
|
||||||
|
|
||||||
# Supported agent flavors. ``session_uid`` is always "<flavor>:<native id>".
|
# Supported agent flavors. ``session_uid`` is always "<flavor>:<native id>".
|
||||||
FLAVORS = ("claude", "codex", "grok")
|
FLAVORS = ("claude", "codex", "grok")
|
||||||
|
|||||||
@@ -12,6 +12,7 @@ Tier 2 digest — the invariant that makes budget-based retention non-lossy.
|
|||||||
|
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import hashlib
|
||||||
import json
|
import json
|
||||||
import os
|
import os
|
||||||
import re
|
import re
|
||||||
@@ -28,6 +29,18 @@ def _now() -> str:
|
|||||||
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
|
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
|
||||||
|
|
||||||
|
|
||||||
|
def _fingerprint(ev: SessionEvent, body: Optional[str]) -> str:
|
||||||
|
"""Stable content fingerprint, independent of seq/payload_ref, for dedup."""
|
||||||
|
h = hashlib.sha1()
|
||||||
|
parts = [ev.ts or "", ev.kind, ev.role or "", ev.tool or "", ev.summary or "",
|
||||||
|
ev.role or "", str(ev.is_sidechain)]
|
||||||
|
h.update("\x1f".join(parts).encode("utf-8"))
|
||||||
|
if body is not None:
|
||||||
|
h.update(b"\x1e")
|
||||||
|
h.update(body.encode("utf-8"))
|
||||||
|
return h.hexdigest()
|
||||||
|
|
||||||
|
|
||||||
class Store:
|
class Store:
|
||||||
def __init__(self, db_path: str, blob_dir: str):
|
def __init__(self, db_path: str, blob_dir: str):
|
||||||
self.db_path = db_path
|
self.db_path = db_path
|
||||||
@@ -121,14 +134,75 @@ class Store:
|
|||||||
self.db.commit()
|
self.db.commit()
|
||||||
return total
|
return total
|
||||||
|
|
||||||
def ingest(self, bundle) -> None:
|
def ingest(self, bundle) -> int:
|
||||||
"""Persist a full Normalized bundle (session + events + blobs)."""
|
"""Persist a Normalized bundle, merging into any existing session.
|
||||||
|
|
||||||
|
Multiple files can map to one ``session_uid`` (Claude resume/sidechains;
|
||||||
|
Grok multi-file dirs). Events are de-duplicated by content fingerprint and
|
||||||
|
genuinely-new events are appended with offset ``seq`` (design OQ6 / T03).
|
||||||
|
Returns the number of new events written. Idempotent: re-ingesting the
|
||||||
|
same bundle adds nothing.
|
||||||
|
"""
|
||||||
s = bundle.session
|
s = bundle.session
|
||||||
if s.ingested_at is None:
|
existing = self.get_session(s.session_uid)
|
||||||
s.ingested_at = _now()
|
if existing is None:
|
||||||
self.upsert_session(s)
|
if s.ingested_at is None:
|
||||||
self.upsert_events(bundle.events)
|
s.ingested_at = _now()
|
||||||
self.write_blobs(s.session_uid, bundle.blobs)
|
self.upsert_session(s)
|
||||||
|
# known fingerprints + current max seq for this session
|
||||||
|
seen = self._event_fingerprints(s.session_uid)
|
||||||
|
next_seq = self._max_seq(s.session_uid) + 1
|
||||||
|
|
||||||
|
new_events: list[SessionEvent] = []
|
||||||
|
new_blobs: dict[str, str] = {}
|
||||||
|
old_to_new: dict[int, int] = {}
|
||||||
|
for ev in bundle.events:
|
||||||
|
body = bundle.blobs.get(ev.payload_ref) if ev.payload_ref else None
|
||||||
|
fp = _fingerprint(ev, body)
|
||||||
|
if fp in seen:
|
||||||
|
continue # already stored (prior file or prior sweep)
|
||||||
|
new_seq = next_seq
|
||||||
|
next_seq += 1
|
||||||
|
old_to_new[ev.seq] = new_seq
|
||||||
|
# remap parent within this bundle; cross-file parents become None
|
||||||
|
parent = old_to_new.get(ev.parent_seq) if ev.parent_seq is not None else None
|
||||||
|
ref = None
|
||||||
|
if body is not None:
|
||||||
|
ref = f"blob://{s.session_uid}/{new_seq}"
|
||||||
|
new_blobs[ref] = body
|
||||||
|
merged = SessionEvent(
|
||||||
|
session_uid=s.session_uid, seq=new_seq, parent_seq=parent, ts=ev.ts,
|
||||||
|
kind=ev.kind, role=ev.role, tool=ev.tool, summary=ev.summary,
|
||||||
|
payload_ref=ref, tokens=ev.tokens, is_sidechain=ev.is_sidechain,
|
||||||
|
)
|
||||||
|
new_events.append(merged)
|
||||||
|
seen.add(fp)
|
||||||
|
|
||||||
|
if new_events:
|
||||||
|
self.upsert_events(new_events)
|
||||||
|
self.write_blobs(s.session_uid, new_blobs)
|
||||||
|
return len(new_events)
|
||||||
|
|
||||||
|
def _max_seq(self, session_uid: str) -> int:
|
||||||
|
row = self.db.execute(
|
||||||
|
"SELECT COALESCE(MAX(seq), -1) m FROM events WHERE session_uid=?", (session_uid,)
|
||||||
|
).fetchone()
|
||||||
|
return int(row["m"])
|
||||||
|
|
||||||
|
def _event_fingerprints(self, session_uid: str) -> set[str]:
|
||||||
|
fps: set[str] = set()
|
||||||
|
for e in self.get_events(session_uid):
|
||||||
|
body = None
|
||||||
|
if e.payload_ref:
|
||||||
|
r = self.db.execute("SELECT path FROM blobs WHERE ref=?", (e.payload_ref,)).fetchone()
|
||||||
|
if r:
|
||||||
|
try:
|
||||||
|
with open(r["path"], "r", encoding="utf-8") as f:
|
||||||
|
body = f.read()
|
||||||
|
except OSError:
|
||||||
|
body = None
|
||||||
|
fps.add(_fingerprint(e, body))
|
||||||
|
return fps
|
||||||
|
|
||||||
# ---- Tier 2 (digest) ---------------------------------------------------
|
# ---- Tier 2 (digest) ---------------------------------------------------
|
||||||
|
|
||||||
@@ -149,6 +223,22 @@ class Store:
|
|||||||
row = self.db.execute("SELECT json FROM digests WHERE session_uid=?", (session_uid,)).fetchone()
|
row = self.db.execute("SELECT json FROM digests WHERE session_uid=?", (session_uid,)).fetchone()
|
||||||
return json.loads(row["json"]) if row else None
|
return json.loads(row["json"]) if row else None
|
||||||
|
|
||||||
|
def list_digests(self) -> list[dict[str, Any]]:
|
||||||
|
return [json.loads(r["json"]) for r in self.db.execute("SELECT json FROM digests")]
|
||||||
|
|
||||||
|
def save_patterns(self, patterns: list[dict[str, Any]]) -> None:
|
||||||
|
"""Persist candidate patterns to a Tier 2 table (replace prior run)."""
|
||||||
|
self.db.execute(
|
||||||
|
"CREATE TABLE IF NOT EXISTS patterns ("
|
||||||
|
"key TEXT PRIMARY KEY, json TEXT NOT NULL, detected_at TEXT NOT NULL)"
|
||||||
|
)
|
||||||
|
self.db.execute("DELETE FROM patterns")
|
||||||
|
self.db.executemany(
|
||||||
|
"INSERT INTO patterns(key, json, detected_at) VALUES(?,?,?)",
|
||||||
|
[(p["key"], json.dumps(p, sort_keys=True), _now()) for p in patterns],
|
||||||
|
)
|
||||||
|
self.db.commit()
|
||||||
|
|
||||||
# ---- reads -------------------------------------------------------------
|
# ---- reads -------------------------------------------------------------
|
||||||
|
|
||||||
def get_session(self, session_uid: str) -> Optional[Session]:
|
def get_session(self, session_uid: str) -> Optional[Session]:
|
||||||
|
|||||||
9
session_memory/curate/__init__.py
Normal file
9
session_memory/curate/__init__.py
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
"""Curate phase (PRD §6.3) — review candidate patterns into versioned Solution
|
||||||
|
Patterns held in an in-repo Pattern Catalog.
|
||||||
|
|
||||||
|
Layout mirrors ``detect/``:
|
||||||
|
schema.py Solution Pattern artifact + per-flavor rendering hints (T01)
|
||||||
|
catalog.py versioned, files-first catalog store (T02)
|
||||||
|
review.py discuss/approve/reject -> promote workflow (T03)
|
||||||
|
__main__.py `python -m session_memory.curate` entrypoint (T06)
|
||||||
|
"""
|
||||||
130
session_memory/curate/__main__.py
Normal file
130
session_memory/curate/__main__.py
Normal file
@@ -0,0 +1,130 @@
|
|||||||
|
"""Curate entrypoint (T06): review detect candidates into the Pattern Catalog.
|
||||||
|
|
||||||
|
python -m session_memory.curate [--config PATH] [--auto-approve] [--json]
|
||||||
|
[--workstream-id ID]
|
||||||
|
|
||||||
|
Refreshes candidate patterns (runs the detect pipeline), then drives them through
|
||||||
|
the review workflow — **interactive** by default, or **batch** with
|
||||||
|
``--auto-approve`` (promote everything clearing the evidence bar, reject the rest)
|
||||||
|
for kaizen-agent runs. Candidates are presented cross-flavor first (detect's
|
||||||
|
ranking). Emits a catalog diff summary and, with ``--json``, a machine-readable
|
||||||
|
result. Approvals land in the files-first catalog; each final decision is logged
|
||||||
|
as a hub decision (queued if the hub is down).
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
|
||||||
|
from ..detect.__main__ import run_detect
|
||||||
|
from ..ingest import _expand, load_config
|
||||||
|
from .catalog import Catalog
|
||||||
|
from .decisions import DecisionRecorder
|
||||||
|
from .gating import bloat_warnings, evaluate, gate_config
|
||||||
|
from .review import APPROVE, DISCUSS, REJECT, ReviewLog, review
|
||||||
|
|
||||||
|
|
||||||
|
def _curate_paths(config: dict):
|
||||||
|
c = config.get("curate", {})
|
||||||
|
catalog_dir = _expand(c.get("catalog_dir", "session_memory/catalog"))
|
||||||
|
review_log = _expand(c.get("review_log", "session_memory/.store/reviews.jsonl"))
|
||||||
|
queue = _expand(c.get("decision_queue", "session_memory/.store/decisions.queue.jsonl"))
|
||||||
|
ws_id = c.get("state_hub_workstream_id")
|
||||||
|
return catalog_dir, review_log, queue, ws_id
|
||||||
|
|
||||||
|
|
||||||
|
def _render_candidate(cand: dict, gate, existing) -> str:
|
||||||
|
g = evaluate(cand, gate)
|
||||||
|
flag = " [CROSS-FLAVOR]" if cand.get("cross_flavor") else ""
|
||||||
|
lines = [
|
||||||
|
f"\n{cand['title']}{flag}",
|
||||||
|
f" key={cand['key']} score={cand.get('score')} freq={cand['frequency']} "
|
||||||
|
f"impact={cand.get('cost_impact')}",
|
||||||
|
f" flavors={','.join(cand.get('flavors', []))} "
|
||||||
|
f"repos={','.join(cand.get('repos', [])) or '-'} sessions={len(cand.get('sessions', []))}",
|
||||||
|
f" gate: promotable={g.promotable} distribution_ready={g.distribution_ready}"
|
||||||
|
+ (f" ({'; '.join(g.reasons)})" if g.reasons else ""),
|
||||||
|
]
|
||||||
|
for w in bloat_warnings(cand, existing):
|
||||||
|
lines.append(f" bloat: {w}")
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
def _interactive_decider(gate, catalog):
|
||||||
|
def decide(cand):
|
||||||
|
print(_render_candidate(cand, gate, catalog.list()))
|
||||||
|
while True:
|
||||||
|
choice = input(" [a]pprove / [r]eject / [d]iscuss ? ").strip().lower()
|
||||||
|
if choice in ("a", "approve"):
|
||||||
|
return (APPROVE, input(" rationale: ").strip() or "approved")
|
||||||
|
if choice in ("r", "reject"):
|
||||||
|
return (REJECT, input(" rationale: ").strip() or "rejected")
|
||||||
|
if choice in ("d", "discuss"):
|
||||||
|
return (DISCUSS, "deferred for discussion")
|
||||||
|
return decide
|
||||||
|
|
||||||
|
|
||||||
|
def _auto_decider(gate):
|
||||||
|
"""Batch policy: approve candidates clearing the promote floor, reject the rest."""
|
||||||
|
def decide(cand):
|
||||||
|
g = evaluate(cand, gate)
|
||||||
|
if g.promotable:
|
||||||
|
return (APPROVE, "auto-approved: clears evidence bar")
|
||||||
|
return (REJECT, "auto-rejected: " + "; ".join(g.reasons))
|
||||||
|
return decide
|
||||||
|
|
||||||
|
|
||||||
|
def _summary(result, n_candidates: int) -> str:
|
||||||
|
added = [k for k, a in result.approved if a in ("added", "versioned", "updated")]
|
||||||
|
lines = [
|
||||||
|
f"# Curate summary ({n_candidates} candidates reviewed)",
|
||||||
|
f" approved : {len(result.approved)} ({', '.join(f'{k}:{a}' for k, a in result.approved) or '-'})",
|
||||||
|
f" rejected : {len(result.rejected)} ({', '.join(result.rejected) or '-'})",
|
||||||
|
f" deferred : {len(result.deferred)} ({', '.join(result.deferred) or '-'})",
|
||||||
|
f" skipped : {len(result.skipped)} (already decided)",
|
||||||
|
f" catalog writes: {len(added)}",
|
||||||
|
]
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
def main(argv=None) -> int:
|
||||||
|
here = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
|
||||||
|
ap = argparse.ArgumentParser(description="Curate detect candidates into the Pattern Catalog.")
|
||||||
|
ap.add_argument("--config", default=os.path.join(here, "config.toml"))
|
||||||
|
ap.add_argument("--auto-approve", action="store_true",
|
||||||
|
help="batch mode: promote everything clearing the evidence bar")
|
||||||
|
ap.add_argument("--min-frequency", type=int, default=2)
|
||||||
|
ap.add_argument("--workstream-id", default=None, help="hub workstream for decisions")
|
||||||
|
ap.add_argument("--json", action="store_true", help="emit machine-readable JSON")
|
||||||
|
args = ap.parse_args(argv)
|
||||||
|
|
||||||
|
config = load_config(args.config)
|
||||||
|
candidates = run_detect(config, min_frequency=args.min_frequency)
|
||||||
|
|
||||||
|
catalog_dir, review_log_path, queue_path, ws_id = _curate_paths(config)
|
||||||
|
gate = gate_config(config)
|
||||||
|
catalog = Catalog(catalog_dir)
|
||||||
|
log = ReviewLog(review_log_path)
|
||||||
|
recorder = DecisionRecorder(queue_path, workstream_id=args.workstream_id or ws_id)
|
||||||
|
|
||||||
|
decide = _auto_decider(gate) if args.auto_approve else _interactive_decider(gate, catalog)
|
||||||
|
result = review(candidates, decide, catalog, log, gate=gate, recorder=recorder)
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
print(json.dumps({
|
||||||
|
"approved": result.approved, "rejected": result.rejected,
|
||||||
|
"deferred": result.deferred, "skipped": result.skipped,
|
||||||
|
"decisions_queued": len(recorder.pending()),
|
||||||
|
}, indent=2))
|
||||||
|
else:
|
||||||
|
print(_summary(result, len(candidates)))
|
||||||
|
if recorder.pending():
|
||||||
|
print(f" decisions queued (hub offline): {len(recorder.pending())} "
|
||||||
|
f"-> {queue_path}")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
raise SystemExit(main())
|
||||||
148
session_memory/curate/catalog.py
Normal file
148
session_memory/curate/catalog.py
Normal file
@@ -0,0 +1,148 @@
|
|||||||
|
"""Versioned Pattern Catalog — files-first source of truth (FR-U3; T02).
|
||||||
|
|
||||||
|
The catalog is a directory of one JSON file per Solution Pattern
|
||||||
|
(``<catalog_dir>/<pattern-id>.json``). Files originate the work; the State Hub
|
||||||
|
indexes them (ADR-001 / PRD §9). Identity is the pattern ``id`` (derived from the
|
||||||
|
source candidate key), so re-promoting the same detect candidate maps to the same
|
||||||
|
file — dedup is structural, not heuristic.
|
||||||
|
|
||||||
|
:meth:`Catalog.upsert` is the one write path and is **idempotent**:
|
||||||
|
|
||||||
|
* new id -> written as-is (``added``)
|
||||||
|
* same id, identical content -> no write, no version bump (``unchanged``)
|
||||||
|
* same id, only status/flags -> updated in place, no bump (``updated``)
|
||||||
|
* same id, content changed -> version bumped, prior snapshot
|
||||||
|
appended to ``<id>.history.jsonl`` (``versioned``)
|
||||||
|
|
||||||
|
History is append-only alongside the current file, so the catalog dir stays one
|
||||||
|
clean current file per pattern while every superseded version is recoverable.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from .schema import SolutionPattern
|
||||||
|
|
||||||
|
# Content fields that define a pattern's substance. Version, timestamps, status,
|
||||||
|
# and distribution_ready are metadata — changes to them never bump the version.
|
||||||
|
_CONTENT_KEYS = ("name", "polarity", "problem", "resolutions", "scope",
|
||||||
|
"provenance", "rendering_hints", "covers")
|
||||||
|
|
||||||
|
ADDED = "added"
|
||||||
|
UNCHANGED = "unchanged"
|
||||||
|
UPDATED = "updated"
|
||||||
|
VERSIONED = "versioned"
|
||||||
|
|
||||||
|
|
||||||
|
def _now() -> str:
|
||||||
|
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
|
||||||
|
|
||||||
|
|
||||||
|
def _content(p: SolutionPattern) -> str:
|
||||||
|
d = p.to_dict()
|
||||||
|
return json.dumps({k: d[k] for k in _CONTENT_KEYS}, sort_keys=True)
|
||||||
|
|
||||||
|
|
||||||
|
class Catalog:
|
||||||
|
"""File-backed catalog of versioned :class:`SolutionPattern` artifacts."""
|
||||||
|
|
||||||
|
def __init__(self, catalog_dir: str) -> None:
|
||||||
|
self.dir = catalog_dir
|
||||||
|
os.makedirs(self.dir, exist_ok=True)
|
||||||
|
|
||||||
|
# --- paths --------------------------------------------------------------
|
||||||
|
|
||||||
|
def _path(self, pattern_id: str) -> str:
|
||||||
|
return os.path.join(self.dir, f"{pattern_id}.json")
|
||||||
|
|
||||||
|
def _history_path(self, pattern_id: str) -> str:
|
||||||
|
return os.path.join(self.dir, f"{pattern_id}.history.jsonl")
|
||||||
|
|
||||||
|
# --- reads --------------------------------------------------------------
|
||||||
|
|
||||||
|
def load(self, pattern_id: str) -> Optional[SolutionPattern]:
|
||||||
|
path = self._path(pattern_id)
|
||||||
|
if not os.path.exists(path):
|
||||||
|
return None
|
||||||
|
with open(path, encoding="utf-8") as fh:
|
||||||
|
return SolutionPattern.from_json(fh.read())
|
||||||
|
|
||||||
|
def list(self) -> list[SolutionPattern]:
|
||||||
|
out: list[SolutionPattern] = []
|
||||||
|
for name in sorted(os.listdir(self.dir)):
|
||||||
|
if name.endswith(".json") and not name.endswith(".history.jsonl"):
|
||||||
|
with open(os.path.join(self.dir, name), encoding="utf-8") as fh:
|
||||||
|
out.append(SolutionPattern.from_json(fh.read()))
|
||||||
|
return out
|
||||||
|
|
||||||
|
def history(self, pattern_id: str) -> list[dict]:
|
||||||
|
path = self._history_path(pattern_id)
|
||||||
|
if not os.path.exists(path):
|
||||||
|
return []
|
||||||
|
with open(path, encoding="utf-8") as fh:
|
||||||
|
return [json.loads(line) for line in fh if line.strip()]
|
||||||
|
|
||||||
|
def find_for(self, signal_key: str, locus: str = "") -> Optional[SolutionPattern]:
|
||||||
|
"""Best catalog pattern for a detect signal: exact id first, then ``covers``.
|
||||||
|
|
||||||
|
Lets a signal that doesn't share a pattern's exact key (e.g. a
|
||||||
|
``recurring_error`` fingerprint) inherit the curated recommendation when a
|
||||||
|
pattern declares it covers that text.
|
||||||
|
"""
|
||||||
|
exact = self.load(SolutionPattern.make_id(signal_key))
|
||||||
|
if exact is not None:
|
||||||
|
return exact
|
||||||
|
hay = f"{signal_key} {locus}".lower()
|
||||||
|
for p in self.list(): # sorted by id -> deterministic
|
||||||
|
if any(c.lower() in hay for c in p.covers):
|
||||||
|
return p
|
||||||
|
return None
|
||||||
|
|
||||||
|
# --- the single write path ---------------------------------------------
|
||||||
|
|
||||||
|
def upsert(self, pattern: SolutionPattern) -> str:
|
||||||
|
"""Insert or version-update a pattern. Returns the action taken."""
|
||||||
|
existing = self.load(pattern.id)
|
||||||
|
now = _now()
|
||||||
|
|
||||||
|
if existing is None:
|
||||||
|
pattern.created_at = pattern.created_at or now
|
||||||
|
pattern.updated_at = now
|
||||||
|
self._write(pattern)
|
||||||
|
return ADDED
|
||||||
|
|
||||||
|
if _content(existing) == _content(pattern):
|
||||||
|
# substance unchanged — only persist a metadata (status/flag) change
|
||||||
|
if (existing.status == pattern.status
|
||||||
|
and existing.distribution_ready == pattern.distribution_ready):
|
||||||
|
return UNCHANGED
|
||||||
|
existing.status = pattern.status
|
||||||
|
existing.distribution_ready = pattern.distribution_ready
|
||||||
|
existing.updated_at = now
|
||||||
|
self._write(existing)
|
||||||
|
return UPDATED
|
||||||
|
|
||||||
|
# substance changed: archive the old version, bump, write the new one
|
||||||
|
self._append_history(existing)
|
||||||
|
pattern.version = SolutionPattern.bump_version(existing.version)
|
||||||
|
pattern.created_at = existing.created_at or now
|
||||||
|
pattern.updated_at = now
|
||||||
|
self._write(pattern)
|
||||||
|
return VERSIONED
|
||||||
|
|
||||||
|
# --- internals ----------------------------------------------------------
|
||||||
|
|
||||||
|
def _write(self, pattern: SolutionPattern) -> None:
|
||||||
|
with open(self._path(pattern.id), "w", encoding="utf-8") as fh:
|
||||||
|
fh.write(pattern.to_json())
|
||||||
|
fh.write("\n")
|
||||||
|
|
||||||
|
def _append_history(self, superseded: SolutionPattern) -> None:
|
||||||
|
superseded.status = "superseded"
|
||||||
|
with open(self._history_path(superseded.id), "a", encoding="utf-8") as fh:
|
||||||
|
fh.write(json.dumps(superseded.to_dict(), sort_keys=True))
|
||||||
|
fh.write("\n")
|
||||||
114
session_memory/curate/decisions.py
Normal file
114
session_memory/curate/decisions.py
Normal file
@@ -0,0 +1,114 @@
|
|||||||
|
"""State Hub decision integration (FR-U4; T05).
|
||||||
|
|
||||||
|
Every final promote/reject is recorded as an auditable decision so the rationale,
|
||||||
|
the source candidate key, and an evidence snapshot are traceable. The catalog
|
||||||
|
file remains the durable artifact (ADR-001); the decision is the audit trail.
|
||||||
|
|
||||||
|
The recorder is **graceful under a hub outage** — exactly the condition hit during
|
||||||
|
Phase 1, where statuses were synced after the fact. A pluggable ``sink`` does the
|
||||||
|
actual write (HTTP to the hub, or the MCP ``record_decision`` tool driven by the
|
||||||
|
operator). If the sink is absent or raises, the decision is appended to a local
|
||||||
|
queue (``decisions.queue.jsonl``) and can be replayed later with :meth:`flush`.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from typing import Callable, Optional
|
||||||
|
|
||||||
|
# A sink takes a hub-shaped decision payload and persists it (may raise on failure).
|
||||||
|
Sink = Callable[[dict], None]
|
||||||
|
|
||||||
|
|
||||||
|
def _now() -> str:
|
||||||
|
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
|
||||||
|
|
||||||
|
|
||||||
|
def build_decision(candidate: dict, action: str, rationale: str,
|
||||||
|
*, workstream_id: Optional[str] = None,
|
||||||
|
decided_by: str = "curator") -> dict:
|
||||||
|
"""Shape a curate decision as a State Hub ``record_decision`` payload."""
|
||||||
|
key = candidate["key"]
|
||||||
|
verb = "Promote" if action == "approve" else "Reject"
|
||||||
|
return {
|
||||||
|
"title": f"{verb} pattern candidate {key}",
|
||||||
|
"decision_type": "made",
|
||||||
|
"workstream_id": workstream_id,
|
||||||
|
"rationale": rationale,
|
||||||
|
"decided_by": decided_by,
|
||||||
|
"description": json.dumps({
|
||||||
|
"action": action,
|
||||||
|
"source_key": key,
|
||||||
|
"evidence": candidate,
|
||||||
|
}, sort_keys=True),
|
||||||
|
"recorded_at": _now(),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class DecisionRecorder:
|
||||||
|
"""Records decisions through ``sink`` with a durable local-queue fallback."""
|
||||||
|
|
||||||
|
queue_path: str
|
||||||
|
sink: Optional[Sink] = None
|
||||||
|
workstream_id: Optional[str] = None
|
||||||
|
decided_by: str = "curator"
|
||||||
|
_queued: int = field(default=0, init=False)
|
||||||
|
|
||||||
|
def record(self, candidate: dict, action: str, rationale: str) -> bool:
|
||||||
|
"""Record one decision. Returns True if the sink accepted it, else queued."""
|
||||||
|
payload = build_decision(candidate, action, rationale,
|
||||||
|
workstream_id=self.workstream_id, decided_by=self.decided_by)
|
||||||
|
if self.sink is not None:
|
||||||
|
try:
|
||||||
|
self.sink(payload)
|
||||||
|
return True
|
||||||
|
except Exception: # hub down / transient — fall through to the queue
|
||||||
|
pass
|
||||||
|
self._append(payload)
|
||||||
|
return False
|
||||||
|
|
||||||
|
def pending(self) -> list[dict]:
|
||||||
|
if not os.path.exists(self.queue_path):
|
||||||
|
return []
|
||||||
|
with open(self.queue_path, encoding="utf-8") as fh:
|
||||||
|
return [json.loads(line) for line in fh if line.strip()]
|
||||||
|
|
||||||
|
def flush(self, sink: Optional[Sink] = None) -> int:
|
||||||
|
"""Replay queued decisions through ``sink``. Returns count synced.
|
||||||
|
|
||||||
|
Stops at the first failure so ordering is preserved; the unsynced tail is
|
||||||
|
rewritten back to the queue.
|
||||||
|
"""
|
||||||
|
sink = sink or self.sink
|
||||||
|
if sink is None:
|
||||||
|
return 0
|
||||||
|
items = self.pending()
|
||||||
|
synced = 0
|
||||||
|
for i, payload in enumerate(items):
|
||||||
|
try:
|
||||||
|
sink(payload)
|
||||||
|
synced += 1
|
||||||
|
except Exception:
|
||||||
|
self._rewrite(items[i:])
|
||||||
|
return synced
|
||||||
|
self._rewrite([])
|
||||||
|
return synced
|
||||||
|
|
||||||
|
# --- internals ----------------------------------------------------------
|
||||||
|
|
||||||
|
def _append(self, payload: dict) -> None:
|
||||||
|
os.makedirs(os.path.dirname(self.queue_path) or ".", exist_ok=True)
|
||||||
|
with open(self.queue_path, "a", encoding="utf-8") as fh:
|
||||||
|
fh.write(json.dumps(payload, sort_keys=True))
|
||||||
|
fh.write("\n")
|
||||||
|
self._queued += 1
|
||||||
|
|
||||||
|
def _rewrite(self, items: list[dict]) -> None:
|
||||||
|
with open(self.queue_path, "w", encoding="utf-8") as fh:
|
||||||
|
for payload in items:
|
||||||
|
fh.write(json.dumps(payload, sort_keys=True))
|
||||||
|
fh.write("\n")
|
||||||
117
session_memory/curate/gating.py
Normal file
117
session_memory/curate/gating.py
Normal file
@@ -0,0 +1,117 @@
|
|||||||
|
"""Promotion evidence-bar + bloat guard (design OQ5/OQ6; T04).
|
||||||
|
|
||||||
|
Two gates protect the catalog:
|
||||||
|
|
||||||
|
* **Evidence bar (OQ5)** — a candidate must clear configurable floors
|
||||||
|
(frequency, distinct supporting sessions) before it may be promoted at all.
|
||||||
|
A separate, stricter bar decides whether the promoted pattern is
|
||||||
|
*distribution-eligible* (``status="approved"``, ``distribution_ready=True``)
|
||||||
|
vs. merely ``provisional`` — the minimum trustworthy evidence before a pattern
|
||||||
|
is allowed near live agent environments.
|
||||||
|
|
||||||
|
* **Bloat guard (OQ6)** — flags candidates that would add little: a duplicate of
|
||||||
|
an already-cataloged pattern, or a near-duplicate sharing the same
|
||||||
|
signal-type+locus. Keeps the catalog lean so agent context budgets aren't
|
||||||
|
degraded by low-value instructions.
|
||||||
|
|
||||||
|
Knobs live under ``[curate]`` in ``config.toml``; :func:`gate_config` reads them
|
||||||
|
with safe defaults so the module also works config-free (tests).
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from .schema import SolutionPattern
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class GateConfig:
|
||||||
|
# promotion floor (OQ5)
|
||||||
|
min_frequency: int = 2
|
||||||
|
min_sessions: int = 2
|
||||||
|
min_cost_impact: float = 0.0
|
||||||
|
# distribution-eligibility floor (stricter; OQ5)
|
||||||
|
dist_require_cross_flavor: bool = False
|
||||||
|
dist_min_frequency: int = 3
|
||||||
|
dist_min_cost_impact: float = 0.0
|
||||||
|
|
||||||
|
|
||||||
|
def gate_config(config: Optional[dict] = None) -> GateConfig:
|
||||||
|
c = (config or {}).get("curate", {}) if config else {}
|
||||||
|
g = c.get("gate", {}) if isinstance(c, dict) else {}
|
||||||
|
return GateConfig(
|
||||||
|
min_frequency=g.get("min_frequency", 2),
|
||||||
|
min_sessions=g.get("min_sessions", 2),
|
||||||
|
min_cost_impact=g.get("min_cost_impact", 0.0),
|
||||||
|
dist_require_cross_flavor=g.get("dist_require_cross_flavor", False),
|
||||||
|
dist_min_frequency=g.get("dist_min_frequency", 3),
|
||||||
|
dist_min_cost_impact=g.get("dist_min_cost_impact", 0.0),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class GateResult:
|
||||||
|
promotable: bool
|
||||||
|
distribution_ready: bool
|
||||||
|
status: str # "approved" if distribution-ready else "provisional"
|
||||||
|
reasons: list = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
def _n_sessions(candidate: dict) -> int:
|
||||||
|
return len(candidate.get("sessions", []) or [])
|
||||||
|
|
||||||
|
|
||||||
|
def evaluate(candidate: dict, config: Optional[GateConfig] = None) -> GateResult:
|
||||||
|
"""Decide whether a candidate may be promoted, and at what trust level."""
|
||||||
|
cfg = config or GateConfig()
|
||||||
|
reasons: list[str] = []
|
||||||
|
|
||||||
|
freq = candidate.get("frequency", 0)
|
||||||
|
sessions = _n_sessions(candidate)
|
||||||
|
impact = candidate.get("cost_impact", 0.0)
|
||||||
|
|
||||||
|
promotable = True
|
||||||
|
if freq < cfg.min_frequency:
|
||||||
|
promotable = False
|
||||||
|
reasons.append(f"frequency {freq} < min {cfg.min_frequency}")
|
||||||
|
if sessions < cfg.min_sessions:
|
||||||
|
promotable = False
|
||||||
|
reasons.append(f"sessions {sessions} < min {cfg.min_sessions}")
|
||||||
|
if impact < cfg.min_cost_impact:
|
||||||
|
promotable = False
|
||||||
|
reasons.append(f"cost_impact {impact} < min {cfg.min_cost_impact}")
|
||||||
|
|
||||||
|
dist = promotable
|
||||||
|
if cfg.dist_require_cross_flavor and not candidate.get("cross_flavor", False):
|
||||||
|
dist = False
|
||||||
|
reasons.append("not cross-flavor (required for distribution)")
|
||||||
|
if freq < cfg.dist_min_frequency:
|
||||||
|
dist = False
|
||||||
|
reasons.append(f"frequency {freq} < distribution min {cfg.dist_min_frequency}")
|
||||||
|
if impact < cfg.dist_min_cost_impact:
|
||||||
|
dist = False
|
||||||
|
reasons.append(f"cost_impact {impact} < distribution min {cfg.dist_min_cost_impact}")
|
||||||
|
|
||||||
|
return GateResult(
|
||||||
|
promotable=promotable,
|
||||||
|
distribution_ready=bool(dist),
|
||||||
|
status="approved" if dist else "provisional",
|
||||||
|
reasons=reasons,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def bloat_warnings(candidate: dict, existing: list[SolutionPattern]) -> list[str]:
|
||||||
|
"""Flag low-value adds against what is already catalogued (OQ6)."""
|
||||||
|
warnings: list[str] = []
|
||||||
|
cand_id = SolutionPattern.make_id(candidate["key"])
|
||||||
|
_, sig_type, locus = (candidate["key"].split(":", 2) + ["", ""])[:3]
|
||||||
|
for p in existing:
|
||||||
|
if p.id == cand_id:
|
||||||
|
warnings.append(f"duplicate of catalogued pattern {p.id}")
|
||||||
|
continue
|
||||||
|
p_parts = (p.provenance.source_key.split(":", 2) + ["", ""])[:3]
|
||||||
|
if (p_parts[1], p_parts[2]) == (sig_type, locus):
|
||||||
|
warnings.append(f"near-duplicate of {p.id} (same {sig_type}/{locus})")
|
||||||
|
return warnings
|
||||||
158
session_memory/curate/review.py
Normal file
158
session_memory/curate/review.py
Normal file
@@ -0,0 +1,158 @@
|
|||||||
|
"""Curation review workflow (FR-U1/FR-U2; T03).
|
||||||
|
|
||||||
|
Drives Phase 1 detect candidates through a **discuss / approve / reject** review
|
||||||
|
and, on approve, promotes the candidate into a :class:`SolutionPattern` written to
|
||||||
|
the :class:`Catalog`. The actual decision is supplied by a ``decide`` callback so
|
||||||
|
this engine stays UI-free — the ``__main__`` entrypoint (T06) plugs in interactive
|
||||||
|
or batch (auto-approve) logic.
|
||||||
|
|
||||||
|
Re-review is **idempotent** via a :class:`ReviewLog`: a candidate already decided
|
||||||
|
is skipped unless its *evidence fingerprint* changed (new sessions/frequency), so
|
||||||
|
a prior **reject** is remembered and not re-surfaced, and a prior **approve** is
|
||||||
|
updated in place rather than duplicated (catalog dedup does the rest).
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import hashlib
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
from typing import Callable, Optional
|
||||||
|
|
||||||
|
from .catalog import Catalog
|
||||||
|
from .decisions import DecisionRecorder
|
||||||
|
from .gating import GateConfig, evaluate
|
||||||
|
from .schema import Provenance, Resolution, Scope, SolutionPattern
|
||||||
|
|
||||||
|
APPROVE = "approve"
|
||||||
|
REJECT = "reject"
|
||||||
|
DISCUSS = "discuss" # defer — no final decision recorded
|
||||||
|
|
||||||
|
# Default per-flavor rendering-hint stubs a reviewer can later refine (OQ4).
|
||||||
|
_DEFAULT_TARGET = {"claude": "CLAUDE.md", "codex": "AGENTS.md", "grok": "instructions"}
|
||||||
|
|
||||||
|
# A decision callback: (candidate dict) -> (action, rationale)
|
||||||
|
Decider = Callable[[dict], tuple]
|
||||||
|
|
||||||
|
|
||||||
|
def _now() -> str:
|
||||||
|
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
|
||||||
|
|
||||||
|
|
||||||
|
def evidence_fingerprint(candidate: dict) -> str:
|
||||||
|
"""Stable hash of the evidence that would justify (re)reviewing a candidate."""
|
||||||
|
keys = ("frequency", "cost_impact", "flavors", "repos", "sessions", "cross_flavor")
|
||||||
|
payload = {k: candidate.get(k) for k in keys}
|
||||||
|
return hashlib.sha1(json.dumps(payload, sort_keys=True).encode("utf-8")).hexdigest()
|
||||||
|
|
||||||
|
|
||||||
|
def candidate_to_pattern(candidate: dict, *, status: str = "provisional",
|
||||||
|
distribution_ready: bool = False) -> SolutionPattern:
|
||||||
|
"""Build a Solution Pattern from a detect candidate.
|
||||||
|
|
||||||
|
``status``/``distribution_ready`` come from the evidence gate (T04); they
|
||||||
|
default to a provisional, non-distribution-ready pattern when ungated.
|
||||||
|
"""
|
||||||
|
src = candidate["key"]
|
||||||
|
flavors = list(candidate.get("flavors", []))
|
||||||
|
hints = {f: {"target": _DEFAULT_TARGET.get(f, ""), "note": "TODO: refine rendering"}
|
||||||
|
for f in flavors}
|
||||||
|
return SolutionPattern(
|
||||||
|
id=SolutionPattern.make_id(src),
|
||||||
|
name=candidate.get("title") or src,
|
||||||
|
version="1.0.0",
|
||||||
|
polarity=candidate.get("polarity", "problem"),
|
||||||
|
problem=candidate.get("title") or src,
|
||||||
|
resolutions=[Resolution(summary="TODO: capture the recommended resolution")],
|
||||||
|
scope=Scope(flavors=flavors, repos=list(candidate.get("repos", []))),
|
||||||
|
provenance=Provenance(source_key=src, evidence=dict(candidate), promoted_at=_now()),
|
||||||
|
rendering_hints=hints,
|
||||||
|
status=status,
|
||||||
|
distribution_ready=distribution_ready,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ReviewLog:
|
||||||
|
"""Append-only record of final decisions, keyed by candidate source key."""
|
||||||
|
|
||||||
|
path: str
|
||||||
|
_by_key: dict = field(default_factory=dict)
|
||||||
|
|
||||||
|
def __post_init__(self) -> None:
|
||||||
|
if os.path.exists(self.path):
|
||||||
|
with open(self.path, encoding="utf-8") as fh:
|
||||||
|
for line in fh:
|
||||||
|
if line.strip():
|
||||||
|
rec = json.loads(line)
|
||||||
|
self._by_key[rec["source_key"]] = rec # last write wins
|
||||||
|
|
||||||
|
def prior(self, source_key: str) -> Optional[dict]:
|
||||||
|
return self._by_key.get(source_key)
|
||||||
|
|
||||||
|
def already_decided(self, candidate: dict) -> bool:
|
||||||
|
rec = self._by_key.get(candidate["key"])
|
||||||
|
return bool(rec) and rec["fingerprint"] == evidence_fingerprint(candidate)
|
||||||
|
|
||||||
|
def record(self, candidate: dict, action: str, rationale: str) -> None:
|
||||||
|
rec = {
|
||||||
|
"source_key": candidate["key"],
|
||||||
|
"action": action,
|
||||||
|
"rationale": rationale,
|
||||||
|
"fingerprint": evidence_fingerprint(candidate),
|
||||||
|
"ts": _now(),
|
||||||
|
}
|
||||||
|
self._by_key[candidate["key"]] = rec
|
||||||
|
os.makedirs(os.path.dirname(self.path) or ".", exist_ok=True)
|
||||||
|
with open(self.path, "a", encoding="utf-8") as fh:
|
||||||
|
fh.write(json.dumps(rec, sort_keys=True))
|
||||||
|
fh.write("\n")
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ReviewResult:
|
||||||
|
approved: list = field(default_factory=list) # (source_key, catalog_action)
|
||||||
|
rejected: list = field(default_factory=list) # source_key
|
||||||
|
deferred: list = field(default_factory=list) # source_key (discuss)
|
||||||
|
skipped: list = field(default_factory=list) # source_key (already decided)
|
||||||
|
|
||||||
|
|
||||||
|
def review(candidates: list[dict], decide: Decider, catalog: Catalog,
|
||||||
|
log: ReviewLog, gate: Optional[GateConfig] = None,
|
||||||
|
recorder: Optional[DecisionRecorder] = None) -> ReviewResult:
|
||||||
|
"""Run each candidate through ``decide``; promote approvals into ``catalog``.
|
||||||
|
|
||||||
|
When a ``gate`` (T04 evidence bar) is supplied, the promoted pattern's
|
||||||
|
``status``/``distribution_ready`` are set from the gate evaluation, so an
|
||||||
|
approved-but-thin candidate lands as ``provisional`` rather than
|
||||||
|
distribution-ready. When a ``recorder`` (T05) is supplied, each final
|
||||||
|
promote/reject is logged as an auditable hub decision (queued if the hub is
|
||||||
|
down).
|
||||||
|
"""
|
||||||
|
result = ReviewResult()
|
||||||
|
for cand in candidates:
|
||||||
|
key = cand["key"]
|
||||||
|
if log.already_decided(cand):
|
||||||
|
result.skipped.append(key)
|
||||||
|
continue
|
||||||
|
action, rationale = decide(cand)
|
||||||
|
if action == DISCUSS:
|
||||||
|
result.deferred.append(key)
|
||||||
|
continue # not a final decision — leave for a later pass
|
||||||
|
if action == APPROVE:
|
||||||
|
g = evaluate(cand, gate) if gate is not None else None
|
||||||
|
pattern = (candidate_to_pattern(cand, status=g.status,
|
||||||
|
distribution_ready=g.distribution_ready)
|
||||||
|
if g is not None else candidate_to_pattern(cand))
|
||||||
|
cat_action = catalog.upsert(pattern)
|
||||||
|
result.approved.append((key, cat_action))
|
||||||
|
elif action == REJECT:
|
||||||
|
result.rejected.append(key)
|
||||||
|
else:
|
||||||
|
raise ValueError(f"unknown review action {action!r}")
|
||||||
|
log.record(cand, action, rationale)
|
||||||
|
if recorder is not None:
|
||||||
|
recorder.record(cand, action, rationale)
|
||||||
|
return result
|
||||||
160
session_memory/curate/schema.py
Normal file
160
session_memory/curate/schema.py
Normal file
@@ -0,0 +1,160 @@
|
|||||||
|
"""Solution Pattern schema (PRD §6.3 FR-U2; design OQ4) — T01.
|
||||||
|
|
||||||
|
A **Solution Pattern** is the curated, reviewed artifact a candidate pattern is
|
||||||
|
promoted into: a named, versioned record pairing a problem (or success) with one
|
||||||
|
or more recommended resolutions, written **flavor-agnostically**. Everything a
|
||||||
|
distributor needs to render a native artifact lives in a *separate*
|
||||||
|
``rendering_hints`` sub-structure, keyed by flavor — so the core stays neutral
|
||||||
|
(FR-A1/FR-A2) while Phase 3 distributors still get enough to render well (OQ4).
|
||||||
|
|
||||||
|
The artifact is the durable unit of the Pattern Catalog (T02): files originate,
|
||||||
|
the State Hub indexes (ADR-001). Serialization is deterministic (sorted keys) so
|
||||||
|
catalog files diff cleanly and re-saving an unchanged pattern is a no-op.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import re
|
||||||
|
from dataclasses import asdict, dataclass, field, fields
|
||||||
|
from typing import Any, Optional
|
||||||
|
|
||||||
|
from ..core.schema import FLAVORS
|
||||||
|
|
||||||
|
SCHEMA_VERSION = 1
|
||||||
|
|
||||||
|
# Lifecycle of a catalogued pattern.
|
||||||
|
# provisional — promoted but below the distribution evidence bar (OQ5)
|
||||||
|
# approved — meets the bar; distribution-eligible (Phase 3)
|
||||||
|
# rejected — reviewed and declined; remembered so it is not re-surfaced
|
||||||
|
# superseded — replaced by a newer version of the same pattern id
|
||||||
|
STATUSES = ("provisional", "approved", "rejected", "superseded")
|
||||||
|
|
||||||
|
POLARITIES = ("problem", "success")
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Resolution:
|
||||||
|
"""One recommended resolution for the pattern's problem (FR-U2)."""
|
||||||
|
|
||||||
|
summary: str
|
||||||
|
detail: str = ""
|
||||||
|
steps: list[str] = field(default_factory=list)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Scope:
|
||||||
|
"""Where the pattern applies (FR-X2 input). Empty list == unrestricted."""
|
||||||
|
|
||||||
|
repos: list[str] = field(default_factory=list)
|
||||||
|
domains: list[str] = field(default_factory=list)
|
||||||
|
flavors: list[str] = field(default_factory=list)
|
||||||
|
|
||||||
|
def __post_init__(self) -> None:
|
||||||
|
bad = [f for f in self.flavors if f not in FLAVORS]
|
||||||
|
if bad:
|
||||||
|
raise ValueError(f"unknown flavor(s) in scope {bad!r}; expected {FLAVORS}")
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Provenance:
|
||||||
|
"""Trace back to the detect candidate this pattern was promoted from."""
|
||||||
|
|
||||||
|
source_key: str # the detect Pattern.key — stable cluster identity
|
||||||
|
evidence: dict[str, Any] = field(default_factory=dict) # snapshot of the candidate
|
||||||
|
detected_at: Optional[str] = None
|
||||||
|
promoted_at: Optional[str] = None
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class SolutionPattern:
|
||||||
|
"""A curated, versioned solution pattern (PRD §5 / §6.3)."""
|
||||||
|
|
||||||
|
id: str # stable, derived from provenance.source_key
|
||||||
|
name: str
|
||||||
|
version: str # semantic, e.g. "1.0.0"
|
||||||
|
polarity: str # problem | success
|
||||||
|
problem: str # human-readable description of the recurring situation
|
||||||
|
resolutions: list[Resolution] = field(default_factory=list)
|
||||||
|
scope: Scope = field(default_factory=Scope)
|
||||||
|
provenance: Provenance = field(default_factory=lambda: Provenance(source_key=""))
|
||||||
|
# per-flavor rendering hints, kept OUT of the agnostic core (OQ4):
|
||||||
|
# {"claude": {...}, "codex": {...}, "grok": {...}}
|
||||||
|
rendering_hints: dict[str, dict[str, Any]] = field(default_factory=dict)
|
||||||
|
# other signal keys/loci this pattern's recommendation also applies to —
|
||||||
|
# lowercase substrings matched against a candidate signal's key+locus, so a
|
||||||
|
# detect signal that doesn't share this pattern's exact key (e.g. a
|
||||||
|
# recurring_error fingerprint) can still inherit the curated resolution.
|
||||||
|
covers: list[str] = field(default_factory=list)
|
||||||
|
status: str = "provisional"
|
||||||
|
distribution_ready: bool = False
|
||||||
|
created_at: Optional[str] = None
|
||||||
|
updated_at: Optional[str] = None
|
||||||
|
schema_version: int = SCHEMA_VERSION
|
||||||
|
|
||||||
|
def __post_init__(self) -> None:
|
||||||
|
if self.polarity not in POLARITIES:
|
||||||
|
raise ValueError(f"unknown polarity {self.polarity!r}; expected {POLARITIES}")
|
||||||
|
if self.status not in STATUSES:
|
||||||
|
raise ValueError(f"unknown status {self.status!r}; expected {STATUSES}")
|
||||||
|
bad = [f for f in self.rendering_hints if f not in FLAVORS]
|
||||||
|
if bad:
|
||||||
|
raise ValueError(f"unknown flavor(s) in rendering_hints {bad!r}; expected {FLAVORS}")
|
||||||
|
|
||||||
|
# --- identity / versioning helpers -------------------------------------
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def make_id(source_key: str) -> str:
|
||||||
|
"""Stable catalog id from a detect candidate key (``polarity:type:locus``).
|
||||||
|
|
||||||
|
Identity is the source key, so re-promoting the same candidate maps to the
|
||||||
|
same pattern (dedup in T02), independent of wording or version.
|
||||||
|
"""
|
||||||
|
slug = re.sub(r"[^a-z0-9_]+", "-", source_key.lower()).strip("-")
|
||||||
|
return f"sp-{slug}"
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def bump_version(version: str, level: str = "patch") -> str:
|
||||||
|
"""Increment a ``major.minor.patch`` version string."""
|
||||||
|
parts = (version.split(".") + ["0", "0", "0"])[:3]
|
||||||
|
major, minor, patch = (int(p) for p in parts)
|
||||||
|
if level == "major":
|
||||||
|
major, minor, patch = major + 1, 0, 0
|
||||||
|
elif level == "minor":
|
||||||
|
minor, patch = minor + 1, 0
|
||||||
|
else:
|
||||||
|
patch += 1
|
||||||
|
return f"{major}.{minor}.{patch}"
|
||||||
|
|
||||||
|
# --- serialization ------------------------------------------------------
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
return asdict(self)
|
||||||
|
|
||||||
|
def to_json(self) -> str:
|
||||||
|
return json.dumps(self.to_dict(), sort_keys=True, indent=2)
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def from_dict(cls, d: dict[str, Any]) -> "SolutionPattern":
|
||||||
|
d = dict(d)
|
||||||
|
resolutions = [Resolution(**{k: v for k, v in r.items() if k in _RESOLUTION_FIELDS})
|
||||||
|
for r in d.pop("resolutions", [])]
|
||||||
|
scope = d.pop("scope", None)
|
||||||
|
prov = d.pop("provenance", None)
|
||||||
|
obj = cls(**{k: v for k, v in d.items() if k in _PATTERN_FIELDS})
|
||||||
|
obj.resolutions = resolutions
|
||||||
|
if scope is not None:
|
||||||
|
obj.scope = Scope(**{k: v for k, v in scope.items() if k in _SCOPE_FIELDS})
|
||||||
|
if prov is not None:
|
||||||
|
obj.provenance = Provenance(**{k: v for k, v in prov.items() if k in _PROV_FIELDS})
|
||||||
|
return obj
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
def from_json(cls, s: str) -> "SolutionPattern":
|
||||||
|
return cls.from_dict(json.loads(s))
|
||||||
|
|
||||||
|
|
||||||
|
_PATTERN_FIELDS = {f.name for f in fields(SolutionPattern)}
|
||||||
|
_RESOLUTION_FIELDS = {f.name for f in fields(Resolution)}
|
||||||
|
_SCOPE_FIELDS = {f.name for f in fields(Scope)}
|
||||||
|
_PROV_FIELDS = {f.name for f in fields(Provenance)}
|
||||||
1
session_memory/detect/__init__.py
Normal file
1
session_memory/detect/__init__.py
Normal file
@@ -0,0 +1 @@
|
|||||||
|
"""Detect: extract signals from sessions, cluster into candidate patterns."""
|
||||||
72
session_memory/detect/__main__.py
Normal file
72
session_memory/detect/__main__.py
Normal file
@@ -0,0 +1,72 @@
|
|||||||
|
"""Detect entrypoint (T07): digests -> signals -> clusters -> report.
|
||||||
|
|
||||||
|
python -m session_memory.detect [--config PATH] [--json] [--min-frequency N]
|
||||||
|
|
||||||
|
Reads Tier 2 digests from the store, extracts signals, clusters them into
|
||||||
|
candidate patterns, persists the candidates, and prints a ranked report
|
||||||
|
(cross-flavor first) — the input to the Curate phase (Phase 2).
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
|
||||||
|
from ..core.store import Store
|
||||||
|
from ..ingest import _expand, load_config
|
||||||
|
from .cluster import cluster
|
||||||
|
from .quality import filter_real, quality_config
|
||||||
|
from .signals import extract_signals
|
||||||
|
|
||||||
|
|
||||||
|
def run_detect(config: dict, *, min_frequency: int = 2) -> list[dict]:
|
||||||
|
store_cfg = config.get("store", {})
|
||||||
|
store = Store(_expand(store_cfg["db_path"]), _expand(store_cfg["blob_dir"]))
|
||||||
|
digests = filter_real(store.list_digests(), quality_config(config))
|
||||||
|
signals = extract_signals(digests)
|
||||||
|
patterns = [p.to_dict() for p in cluster(signals, min_frequency=min_frequency)]
|
||||||
|
store.save_patterns(patterns)
|
||||||
|
store.close()
|
||||||
|
return patterns
|
||||||
|
|
||||||
|
|
||||||
|
def _format_report(patterns: list[dict], n_digests: int) -> str:
|
||||||
|
lines = [f"# Candidate Patterns ({len(patterns)} from {n_digests} sessions)", ""]
|
||||||
|
if not patterns:
|
||||||
|
lines.append("No recurring patterns above the frequency threshold yet.")
|
||||||
|
return "\n".join(lines)
|
||||||
|
for i, p in enumerate(patterns, 1):
|
||||||
|
flag = " [CROSS-FLAVOR]" if p["cross_flavor"] else ""
|
||||||
|
lines.append(f"{i}. {p['title']}{flag}")
|
||||||
|
lines.append(f" score={p['score']} freq={p['frequency']} "
|
||||||
|
f"impact={p['cost_impact']} flavors={','.join(p['flavors'])}")
|
||||||
|
lines.append(f" repos={','.join(p['repos']) or '-'} "
|
||||||
|
f"sessions={len(p['sessions'])}")
|
||||||
|
lines.append("")
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
def main(argv=None) -> int:
|
||||||
|
here = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
|
||||||
|
ap = argparse.ArgumentParser(description="Detect candidate patterns from session digests.")
|
||||||
|
ap.add_argument("--config", default=os.path.join(here, "config.toml"))
|
||||||
|
ap.add_argument("--min-frequency", type=int, default=2)
|
||||||
|
ap.add_argument("--json", action="store_true", help="emit machine-readable JSON")
|
||||||
|
args = ap.parse_args(argv)
|
||||||
|
|
||||||
|
config = load_config(args.config)
|
||||||
|
store_cfg = config.get("store", {})
|
||||||
|
all_digests = Store(_expand(store_cfg["db_path"]), _expand(store_cfg["blob_dir"])).list_digests()
|
||||||
|
n = len(filter_real(all_digests, quality_config(config)))
|
||||||
|
patterns = run_detect(config, min_frequency=args.min_frequency)
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
print(json.dumps(patterns, indent=2))
|
||||||
|
else:
|
||||||
|
print(_format_report(patterns, n))
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
raise SystemExit(main())
|
||||||
78
session_memory/detect/cluster.py
Normal file
78
session_memory/detect/cluster.py
Normal file
@@ -0,0 +1,78 @@
|
|||||||
|
"""Pattern clusterer + evidence (PRD §5, §6.2; T05/T06).
|
||||||
|
|
||||||
|
Groups recurring :class:`Signal`s into candidate ``Pattern`` records. Clustering
|
||||||
|
is deterministic and keyed on ``(polarity, signal-type, locus)`` — enough to
|
||||||
|
surface "the same thing keeps happening" without embeddings (a later option).
|
||||||
|
|
||||||
|
Each candidate carries evidence (FR-D3): supporting sessions, frequency, affected
|
||||||
|
repos, affected **flavors**, and an estimated cost-impact score. Candidates whose
|
||||||
|
evidence spans more than one flavor are flagged ``cross_flavor`` (FR-D4) — the
|
||||||
|
highest-value reuse targets.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import collections
|
||||||
|
from dataclasses import asdict, dataclass, field
|
||||||
|
from typing import Any
|
||||||
|
|
||||||
|
from .signals import PROBLEM, Signal
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Pattern:
|
||||||
|
key: str # stable cluster key
|
||||||
|
polarity: str # problem | success
|
||||||
|
signal_type: str
|
||||||
|
locus: str
|
||||||
|
frequency: int # number of supporting signals
|
||||||
|
sessions: list[str] = field(default_factory=list)
|
||||||
|
repos: list[str] = field(default_factory=list)
|
||||||
|
flavors: list[str] = field(default_factory=list)
|
||||||
|
cross_flavor: bool = False
|
||||||
|
cost_impact: float = 0.0 # frequency-weighted magnitude
|
||||||
|
score: float = 0.0 # ranking score (impact x frequency)
|
||||||
|
title: str = ""
|
||||||
|
|
||||||
|
def to_dict(self) -> dict[str, Any]:
|
||||||
|
return asdict(self)
|
||||||
|
|
||||||
|
|
||||||
|
def _key(s: Signal) -> str:
|
||||||
|
return f"{s.polarity}:{s.type}:{s.locus}"
|
||||||
|
|
||||||
|
|
||||||
|
def _title(polarity: str, signal_type: str, n_flavors: int) -> str:
|
||||||
|
scope = "cross-flavor " if n_flavors > 1 else ""
|
||||||
|
verb = "problem" if polarity == PROBLEM else "success"
|
||||||
|
return f"{scope}{verb}: {signal_type.replace('_', ' ')}"
|
||||||
|
|
||||||
|
|
||||||
|
def cluster(signals: list[Signal], *, min_frequency: int = 2) -> list[Pattern]:
|
||||||
|
"""Group signals into candidate patterns; keep clusters >= min_frequency."""
|
||||||
|
groups: dict[str, list[Signal]] = collections.defaultdict(list)
|
||||||
|
for s in signals:
|
||||||
|
groups[_key(s)].append(s)
|
||||||
|
|
||||||
|
patterns: list[Pattern] = []
|
||||||
|
for key, members in groups.items():
|
||||||
|
if len(members) < min_frequency:
|
||||||
|
continue
|
||||||
|
sessions = sorted({m.session_uid for m in members})
|
||||||
|
repos = sorted({m.repo for m in members if m.repo})
|
||||||
|
flavors = sorted({m.flavor for m in members})
|
||||||
|
cost_impact = sum(m.magnitude for m in members)
|
||||||
|
first = members[0]
|
||||||
|
p = Pattern(
|
||||||
|
key=key, polarity=first.polarity, signal_type=first.type, locus=first.locus,
|
||||||
|
frequency=len(members), sessions=sessions, repos=repos, flavors=flavors,
|
||||||
|
cross_flavor=len(flavors) > 1, cost_impact=round(cost_impact, 3),
|
||||||
|
title=_title(first.polarity, first.type, len(flavors)),
|
||||||
|
)
|
||||||
|
# rank: impact x frequency, with a boost for cross-flavor reuse value
|
||||||
|
p.score = round(p.cost_impact * p.frequency * (1.5 if p.cross_flavor else 1.0), 3)
|
||||||
|
patterns.append(p)
|
||||||
|
|
||||||
|
# cross-flavor first, then by score
|
||||||
|
patterns.sort(key=lambda p: (not p.cross_flavor, -p.score))
|
||||||
|
return patterns
|
||||||
75
session_memory/detect/quality.py
Normal file
75
session_memory/detect/quality.py
Normal file
@@ -0,0 +1,75 @@
|
|||||||
|
"""Session-quality filter (T01).
|
||||||
|
|
||||||
|
The capture layer ingests *every* session it finds — including API health-checks,
|
||||||
|
smoke-tests, and interrupted runs (e.g. ``llm-connect`` firing "Say hello in one
|
||||||
|
word", or a transcript that is just ``[Request interrupted by user]``). These are
|
||||||
|
not real coding work, but the outcome heuristic labels the short ones ``abandoned``
|
||||||
|
and the clusterer then mints false-positive "problem" patterns from them.
|
||||||
|
|
||||||
|
:func:`is_real_coding_session` gates those out so Detect signals/clusters form only
|
||||||
|
over genuine coding sessions. It is intentionally conservative — a session counts
|
||||||
|
as real if it shows substantive activity, and is dropped only on clear trivial
|
||||||
|
markers. Thresholds come from ``[detect.quality]`` in ``config.toml``.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
# Prompt prefixes/markers that indicate a non-coding or interrupted session.
|
||||||
|
_TRIVIAL_PROMPTS = (
|
||||||
|
"say hello", "hello", "[request interrupted", "return only this json",
|
||||||
|
"ping", "ok", "<system-reminder>",
|
||||||
|
)
|
||||||
|
|
||||||
|
# Tool buckets that count as "substantive" coding activity.
|
||||||
|
_SUBSTANTIVE_TOOLS = (
|
||||||
|
"Edit", "Write", "Read", "Bash", "search_replace", "write", "read_file",
|
||||||
|
"run_terminal_command", "grep", "Grep", "glob", "Glob", "NotebookEdit",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class QualityConfig:
|
||||||
|
min_events: int = 20 # below this, not a real coding session
|
||||||
|
min_substantive: int = 3 # >= this many substantive tool calls required
|
||||||
|
min_prompt_len: int = 25 # first prompt shorter than this is suspect
|
||||||
|
|
||||||
|
|
||||||
|
def quality_config(config: Optional[dict] = None) -> QualityConfig:
|
||||||
|
d = (config or {}).get("detect", {}).get("quality", {}) if config else {}
|
||||||
|
return QualityConfig(
|
||||||
|
min_events=d.get("min_events", 20),
|
||||||
|
min_substantive=d.get("min_substantive", 3),
|
||||||
|
min_prompt_len=d.get("min_prompt_len", 25),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _substantive_calls(digest: dict) -> int:
|
||||||
|
hist = digest.get("tool_histogram") or {}
|
||||||
|
return sum(n for t, n in hist.items() if t in _SUBSTANTIVE_TOOLS)
|
||||||
|
|
||||||
|
|
||||||
|
def is_real_coding_session(digest: dict, config: Optional[QualityConfig] = None) -> bool:
|
||||||
|
cfg = config or QualityConfig()
|
||||||
|
|
||||||
|
if not digest.get("repo"):
|
||||||
|
return False
|
||||||
|
if digest.get("event_count", 0) < cfg.min_events:
|
||||||
|
return False
|
||||||
|
if _substantive_calls(digest) < cfg.min_substantive:
|
||||||
|
return False
|
||||||
|
|
||||||
|
prompt = (digest.get("first_prompt") or "").strip().lower()
|
||||||
|
if len(prompt) < cfg.min_prompt_len:
|
||||||
|
return False
|
||||||
|
if any(prompt.startswith(p) for p in _TRIVIAL_PROMPTS):
|
||||||
|
return False
|
||||||
|
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def filter_real(digests: list[dict], config: Optional[QualityConfig] = None) -> list[dict]:
|
||||||
|
cfg = config or QualityConfig()
|
||||||
|
return [d for d in digests if is_real_coding_session(d, cfg)]
|
||||||
205
session_memory/detect/signals.py
Normal file
205
session_memory/detect/signals.py
Normal file
@@ -0,0 +1,205 @@
|
|||||||
|
"""Signal extractors (PRD §6.2; T04).
|
||||||
|
|
||||||
|
Pure functions over a session digest (Tier 2) — the compact, durable view. Each
|
||||||
|
extractor emits zero or more :class:`Signal`s. A signal records its source
|
||||||
|
session, a *locus* (what it's about), a *polarity* (problem vs. success), and a
|
||||||
|
*magnitude*. Signals are the atoms the clusterer groups into candidate patterns.
|
||||||
|
|
||||||
|
No new capture happens here; everything is derived from digests already written
|
||||||
|
by the Capture layer, so detection is cheap and re-runnable.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from dataclasses import dataclass, field
|
||||||
|
from typing import Any, Callable, Optional
|
||||||
|
|
||||||
|
# polarity
|
||||||
|
PROBLEM = "problem"
|
||||||
|
SUCCESS = "success"
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Signal:
|
||||||
|
session_uid: str
|
||||||
|
flavor: str
|
||||||
|
repo: Optional[str]
|
||||||
|
type: str # e.g. "budget_overrun", "clean_pass"
|
||||||
|
polarity: str # PROBLEM | SUCCESS
|
||||||
|
locus: str # normalized subject key (tool, marker, ...)
|
||||||
|
magnitude: float = 1.0 # strength / cost weight
|
||||||
|
detail: dict[str, Any] = field(default_factory=dict)
|
||||||
|
|
||||||
|
|
||||||
|
# --- individual extractors --------------------------------------------------
|
||||||
|
# Each takes (digest, ctx) and returns a list[Signal]. ctx carries corpus-level
|
||||||
|
# stats (e.g. cost percentiles) so extractors can compare a session to its peers.
|
||||||
|
|
||||||
|
def _base(digest, type_, polarity, locus, magnitude=1.0, **detail) -> Signal:
|
||||||
|
return Signal(
|
||||||
|
session_uid=digest["session_uid"], flavor=digest["flavor"],
|
||||||
|
repo=digest.get("repo"), type=type_, polarity=polarity, locus=locus,
|
||||||
|
magnitude=magnitude, detail=detail,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def sig_retry_storm(digest, ctx) -> list[Signal]:
|
||||||
|
retries = digest.get("markers", {}).get("retries", 0)
|
||||||
|
if retries >= ctx.get("retry_storm_threshold", 3):
|
||||||
|
return [_base(digest, "retry_storm", PROBLEM, "retries", float(retries), retries=retries)]
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def sig_repeated_errors(digest, ctx) -> list[Signal]:
|
||||||
|
errors = digest.get("markers", {}).get("errors", 0)
|
||||||
|
if errors >= ctx.get("error_threshold", 3):
|
||||||
|
return [_base(digest, "repeated_errors", PROBLEM, "errors", float(errors), errors=errors)]
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def sig_budget_overrun(digest, ctx) -> list[Signal]:
|
||||||
|
total = digest.get("cost", {}).get("input_tokens", 0) + digest.get("cost", {}).get("output_tokens", 0)
|
||||||
|
p90 = ctx.get("tokens_p90", 0)
|
||||||
|
if p90 and total > p90:
|
||||||
|
return [_base(digest, "budget_overrun", PROBLEM, "tokens",
|
||||||
|
float(total) / max(p90, 1), tokens=total, p90=p90)]
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def sig_abandoned(digest, ctx) -> list[Signal]:
|
||||||
|
if digest.get("outcome") == "abandoned":
|
||||||
|
return [_base(digest, "abandoned", PROBLEM, "outcome", 1.0)]
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def sig_clean_pass(digest, ctx) -> list[Signal]:
|
||||||
|
"""Success: ended success, ran tests, no errors, modest cost."""
|
||||||
|
m = digest.get("markers", {})
|
||||||
|
if (digest.get("outcome") == "success" and m.get("test_runs", 0) >= 1
|
||||||
|
and m.get("errors", 0) == 0 and m.get("retries", 0) == 0):
|
||||||
|
return [_base(digest, "clean_pass", SUCCESS, "outcome", 1.0,
|
||||||
|
test_runs=m.get("test_runs"))]
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def sig_error_then_recovery(digest, ctx) -> list[Signal]:
|
||||||
|
"""Success despite hitting errors — a recovery worth learning from."""
|
||||||
|
m = digest.get("markers", {})
|
||||||
|
if digest.get("outcome") == "success" and m.get("errors", 0) >= 1:
|
||||||
|
return [_base(digest, "error_then_recovery", SUCCESS, "errors",
|
||||||
|
float(m.get("errors", 1)), errors=m.get("errors"))]
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
# --- tool-mix / infrastructure-overhead signals (WP-0005 T02) ----------------
|
||||||
|
# These read the captured ``tool_histogram`` — friction that the outcome+marker
|
||||||
|
# signals above are blind to (sessions still "succeed", just expensively).
|
||||||
|
|
||||||
|
def tool_bucket(tool: str) -> str:
|
||||||
|
"""Group a tool name into a coarse activity bucket (flavor-agnostic)."""
|
||||||
|
if tool.startswith("mcp__state-hub"):
|
||||||
|
return "statehub_mcp"
|
||||||
|
if tool in ("TaskUpdate", "TaskCreate", "TaskGet", "TaskList", "TaskOutput",
|
||||||
|
"TaskStop", "todo_write", "update_task_status"):
|
||||||
|
return "task_mgmt"
|
||||||
|
if tool == "ToolSearch":
|
||||||
|
return "schema_load"
|
||||||
|
if tool in ("Bash", "run_terminal_command"):
|
||||||
|
return "shell"
|
||||||
|
if tool in ("Edit", "Write", "search_replace", "write", "NotebookEdit"):
|
||||||
|
return "edit"
|
||||||
|
if tool in ("Read", "read_file", "grep", "Grep", "glob", "Glob"):
|
||||||
|
return "read"
|
||||||
|
return "other"
|
||||||
|
|
||||||
|
|
||||||
|
def _bucketed(digest) -> tuple[dict, int]:
|
||||||
|
buckets: dict[str, int] = {}
|
||||||
|
for tool, n in (digest.get("tool_histogram") or {}).items():
|
||||||
|
buckets[tool_bucket(tool)] = buckets.get(tool_bucket(tool), 0) + n
|
||||||
|
return buckets, sum(buckets.values())
|
||||||
|
|
||||||
|
|
||||||
|
def sig_infra_overhead(digest, ctx) -> list[Signal]:
|
||||||
|
"""Problem: a large share of tool calls is hub/task/schema plumbing, not work."""
|
||||||
|
buckets, total = _bucketed(digest)
|
||||||
|
if total < ctx.get("infra_min_calls", 20):
|
||||||
|
return []
|
||||||
|
overhead = buckets.get("statehub_mcp", 0) + buckets.get("task_mgmt", 0) + buckets.get("schema_load", 0)
|
||||||
|
share = overhead / total
|
||||||
|
if share >= ctx.get("infra_overhead_threshold", 0.30):
|
||||||
|
return [_base(digest, "infra_overhead", PROBLEM, "infra_overhead", round(share, 3),
|
||||||
|
overhead_calls=overhead, total_calls=total,
|
||||||
|
statehub=buckets.get("statehub_mcp", 0),
|
||||||
|
task_mgmt=buckets.get("task_mgmt", 0),
|
||||||
|
schema_load=buckets.get("schema_load", 0))]
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def sig_schema_thrash(digest, ctx) -> list[Signal]:
|
||||||
|
"""Problem: repeated ToolSearch — deferred-tool schemas reloaded over and over."""
|
||||||
|
buckets, _ = _bucketed(digest)
|
||||||
|
n = buckets.get("schema_load", 0)
|
||||||
|
if n >= ctx.get("schema_thrash_threshold", 5):
|
||||||
|
return [_base(digest, "schema_thrash", PROBLEM, "schema_load", float(n), tool_searches=n)]
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def sig_tool_thrash(digest, ctx) -> list[Signal]:
|
||||||
|
"""Problem: a single tool is hammered far more than any other — likely churn."""
|
||||||
|
hist = digest.get("tool_histogram") or {}
|
||||||
|
if not hist:
|
||||||
|
return []
|
||||||
|
tool, n = max(hist.items(), key=lambda kv: kv[1])
|
||||||
|
if n >= ctx.get("tool_thrash_threshold", 80):
|
||||||
|
return [_base(digest, "tool_thrash", PROBLEM, f"tool:{tool}", float(n), tool=tool, calls=n)]
|
||||||
|
return []
|
||||||
|
|
||||||
|
|
||||||
|
def sig_recurring_error(digest, ctx) -> list[Signal]:
|
||||||
|
"""Problem: a normalized error fingerprint (WP-0006) — one signal per distinct
|
||||||
|
error in the session, so the same error across sessions/repos/flavors clusters
|
||||||
|
into a candidate root-cause pattern (locus = fingerprint, magnitude = in-session
|
||||||
|
occurrences). This is the content-level 'why', not just a coarse error count.
|
||||||
|
"""
|
||||||
|
out: list[Signal] = []
|
||||||
|
for snip in digest.get("error_snippets", []) or []:
|
||||||
|
fp = snip.get("fingerprint")
|
||||||
|
if not fp:
|
||||||
|
continue
|
||||||
|
out.append(_base(digest, "recurring_error", PROBLEM, fp, float(snip.get("count", 1)),
|
||||||
|
sample=snip.get("sample", ""), tool=snip.get("tool"),
|
||||||
|
occurrences=snip.get("count", 1)))
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
EXTRACTORS: list[Callable] = [
|
||||||
|
sig_retry_storm, sig_repeated_errors, sig_budget_overrun, sig_abandoned,
|
||||||
|
sig_clean_pass, sig_error_then_recovery,
|
||||||
|
sig_infra_overhead, sig_schema_thrash, sig_tool_thrash,
|
||||||
|
sig_recurring_error,
|
||||||
|
]
|
||||||
|
|
||||||
|
|
||||||
|
def build_context(digests: list[dict]) -> dict[str, Any]:
|
||||||
|
"""Corpus-level stats so extractors can compare a session to its peers."""
|
||||||
|
totals = sorted(
|
||||||
|
d.get("cost", {}).get("input_tokens", 0) + d.get("cost", {}).get("output_tokens", 0)
|
||||||
|
for d in digests
|
||||||
|
)
|
||||||
|
p90 = totals[int(0.9 * (len(totals) - 1))] if totals else 0
|
||||||
|
return {
|
||||||
|
"tokens_p90": p90, "retry_storm_threshold": 3, "error_threshold": 3,
|
||||||
|
# tool-mix / infra-overhead thresholds (WP-0005 T02)
|
||||||
|
"infra_min_calls": 20, "infra_overhead_threshold": 0.30,
|
||||||
|
"schema_thrash_threshold": 5, "tool_thrash_threshold": 80,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def extract_signals(digests: list[dict], ctx: Optional[dict] = None) -> list[Signal]:
|
||||||
|
ctx = ctx or build_context(digests)
|
||||||
|
out: list[Signal] = []
|
||||||
|
for d in digests:
|
||||||
|
for ex in EXTRACTORS:
|
||||||
|
out.extend(ex(d, ctx))
|
||||||
|
return out
|
||||||
76
session_memory/digest_lookup.py
Normal file
76
session_memory/digest_lookup.py
Normal file
@@ -0,0 +1,76 @@
|
|||||||
|
"""Read a single session digest from the local store (AGENTIC-WP-0011 T03).
|
||||||
|
|
||||||
|
Thin read path for ``kaizen-agentic metrics correlate`` and other consumers.
|
||||||
|
Does not run ingest.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python -m session_memory.digest_lookup <session_uid> [--json]
|
||||||
|
HELIX_STORE_DB=/abs/path/to/mem.db python -m session_memory.digest_lookup <uid>
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
from .core.store import Store
|
||||||
|
from .ingest import _expand, load_config
|
||||||
|
|
||||||
|
|
||||||
|
def resolve_store_paths(*, config_path: str | None = None) -> tuple[str, str]:
|
||||||
|
"""Resolve db + blob paths from HELIX_STORE_DB or config.toml [store]."""
|
||||||
|
env_db = os.environ.get("HELIX_STORE_DB")
|
||||||
|
if env_db:
|
||||||
|
db_path = _expand(env_db)
|
||||||
|
blob_dir = os.path.join(os.path.dirname(db_path), "blobs")
|
||||||
|
return db_path, blob_dir
|
||||||
|
|
||||||
|
here = os.path.dirname(os.path.abspath(__file__))
|
||||||
|
cfg_path = config_path or os.path.join(here, "config.toml")
|
||||||
|
store_cfg = load_config(cfg_path).get("store", {})
|
||||||
|
return _expand(store_cfg.get("db_path", "session_memory/.store/mem.db")), _expand(
|
||||||
|
store_cfg.get("blob_dir", "session_memory/.store/blobs")
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def lookup_digest(session_uid: str, *, config_path: str | None = None) -> dict | None:
|
||||||
|
db_path, blob_dir = resolve_store_paths(config_path=config_path)
|
||||||
|
store = Store(db_path, blob_dir)
|
||||||
|
try:
|
||||||
|
return store.get_digest(session_uid)
|
||||||
|
finally:
|
||||||
|
store.close()
|
||||||
|
|
||||||
|
|
||||||
|
def main(argv: list[str] | None = None) -> int:
|
||||||
|
here = os.path.dirname(os.path.abspath(__file__))
|
||||||
|
ap = argparse.ArgumentParser(
|
||||||
|
description="Read one session digest from the Helix Forge store (no ingest)."
|
||||||
|
)
|
||||||
|
ap.add_argument("session_uid", help="Normalized session uid, e.g. claude:abc-123")
|
||||||
|
ap.add_argument("--config", default=os.path.join(here, "config.toml"),
|
||||||
|
help="config.toml when HELIX_STORE_DB is unset")
|
||||||
|
ap.add_argument("--json", action="store_true", help="print digest JSON to stdout")
|
||||||
|
args = ap.parse_args(argv)
|
||||||
|
|
||||||
|
digest = lookup_digest(args.session_uid, config_path=args.config)
|
||||||
|
if digest is None:
|
||||||
|
print(f"digest not found: {args.session_uid}", file=sys.stderr)
|
||||||
|
return 1
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
print(json.dumps(digest, indent=2, sort_keys=True))
|
||||||
|
else:
|
||||||
|
cost = digest.get("cost") or {}
|
||||||
|
tokens = cost.get("input_tokens", 0) + cost.get("output_tokens", 0)
|
||||||
|
print(f"session_uid: {digest.get('session_uid')}")
|
||||||
|
print(f"repo: {digest.get('repo')} flavor: {digest.get('flavor')}")
|
||||||
|
print(f"outcome: {digest.get('outcome')} tokens: {tokens}")
|
||||||
|
print(f"started_at: {digest.get('started_at')} ended_at: {digest.get('ended_at')}")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
raise SystemExit(main())
|
||||||
9
session_memory/distribute/__init__.py
Normal file
9
session_memory/distribute/__init__.py
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
"""Distribute phase (PRD §6.4) — render approved Solution Patterns into per-flavor
|
||||||
|
artifacts. Mirror of the collector design: agnostic core, thin distributor edges.
|
||||||
|
|
||||||
|
base.py Artifact + Distributor protocol + idempotent snippet markers (T01)
|
||||||
|
claude.py CLAUDE.md snippet distributor (T02)
|
||||||
|
codex.py AGENTS.md snippet distributor (T03)
|
||||||
|
grok.py native instruction distributor (T03)
|
||||||
|
__main__.py `python -m session_memory.distribute` (T05)
|
||||||
|
"""
|
||||||
89
session_memory/distribute/__main__.py
Normal file
89
session_memory/distribute/__main__.py
Normal file
@@ -0,0 +1,89 @@
|
|||||||
|
"""Distribute entrypoint (T05): catalog -> per-flavor proposals (HITL).
|
||||||
|
|
||||||
|
python -m session_memory.distribute [--config PATH] [--repo R] [--flavor F] [--json]
|
||||||
|
|
||||||
|
Reads approved / distribution-ready Solution Patterns from the Pattern Catalog and
|
||||||
|
renders them into per-flavor **proposals** (never auto-applied) scoped by
|
||||||
|
repo/domain, recording what is proposed where in the active-pattern registry.
|
||||||
|
Targets are the repo->domain map in ``config.toml`` crossed with the known
|
||||||
|
distributor flavors; each pattern's own ``Scope`` filters where it actually lands.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
|
||||||
|
from ..curate.catalog import Catalog
|
||||||
|
from ..ingest import _expand, load_config
|
||||||
|
from .proposals import ActiveRegistry, Target, propose
|
||||||
|
from .registry import all_flavors
|
||||||
|
|
||||||
|
|
||||||
|
def build_targets(config: dict, repo_filter=None, flavor_filter=None) -> list[Target]:
|
||||||
|
repo_map = config.get("repo_domain_map", {})
|
||||||
|
flavors = [flavor_filter] if flavor_filter else all_flavors()
|
||||||
|
targets = []
|
||||||
|
for repo, domain in repo_map.items():
|
||||||
|
if repo_filter and repo != repo_filter:
|
||||||
|
continue
|
||||||
|
for flavor in flavors:
|
||||||
|
targets.append(Target(repo=repo, domain=domain, flavor=flavor))
|
||||||
|
return targets
|
||||||
|
|
||||||
|
|
||||||
|
def run_distribute(config: dict, *, repo_filter=None, flavor_filter=None):
|
||||||
|
cur = config.get("curate", {})
|
||||||
|
dist = config.get("distribute", {})
|
||||||
|
catalog = Catalog(_expand(cur.get("catalog_dir", "session_memory/catalog")))
|
||||||
|
patterns = catalog.list()
|
||||||
|
targets = build_targets(config, repo_filter, flavor_filter)
|
||||||
|
registry = ActiveRegistry(_expand(dist.get("active_registry",
|
||||||
|
"session_memory/distribute/active_patterns.json")))
|
||||||
|
out_dir = _expand(dist.get("proposals_dir", "session_memory/proposals"))
|
||||||
|
return propose(patterns, targets, out_dir, registry)
|
||||||
|
|
||||||
|
|
||||||
|
def _summary(res) -> str:
|
||||||
|
by_repo = {}
|
||||||
|
for repo, flavor, pid, _ in res.proposals:
|
||||||
|
by_repo.setdefault(repo, []).append(f"{pid}[{flavor}]")
|
||||||
|
lines = [f"# Distribute proposals ({len(res.proposals)} renders, "
|
||||||
|
f"{len(res.files_written)} files)"]
|
||||||
|
for repo in sorted(by_repo):
|
||||||
|
lines.append(f" {repo}: {', '.join(sorted(by_repo[repo]))}")
|
||||||
|
if res.skipped_not_distributable:
|
||||||
|
lines.append(f" skipped (not distribution-ready): "
|
||||||
|
f"{len(set(res.skipped_not_distributable))} pattern(s)")
|
||||||
|
if not res.proposals:
|
||||||
|
lines.append(" (no approved/distribution-ready patterns matched any target)")
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
def main(argv=None) -> int:
|
||||||
|
here = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
|
||||||
|
ap = argparse.ArgumentParser(description="Distribute approved patterns as per-flavor proposals.")
|
||||||
|
ap.add_argument("--config", default=os.path.join(here, "config.toml"))
|
||||||
|
ap.add_argument("--repo", default=None, help="limit to one target repo")
|
||||||
|
ap.add_argument("--flavor", default=None, help="limit to one flavor")
|
||||||
|
ap.add_argument("--json", action="store_true")
|
||||||
|
args = ap.parse_args(argv)
|
||||||
|
|
||||||
|
config = load_config(args.config)
|
||||||
|
res = run_distribute(config, repo_filter=args.repo, flavor_filter=args.flavor)
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
print(json.dumps({
|
||||||
|
"proposals": [{"repo": r, "flavor": f, "pattern_id": p, "path": path}
|
||||||
|
for r, f, p, path in res.proposals],
|
||||||
|
"files_written": res.files_written,
|
||||||
|
"skipped": sorted(set(res.skipped_not_distributable)),
|
||||||
|
}, indent=2))
|
||||||
|
else:
|
||||||
|
print(_summary(res))
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
raise SystemExit(main())
|
||||||
242
session_memory/distribute/active_patterns.json
Normal file
242
session_memory/distribute/active_patterns.json
Normal file
@@ -0,0 +1,242 @@
|
|||||||
|
[
|
||||||
|
{
|
||||||
|
"flavor": "claude",
|
||||||
|
"pattern_id": "sp-problem-file_not_read-edit",
|
||||||
|
"repo": "agentic-resources",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.0"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "codex",
|
||||||
|
"pattern_id": "sp-problem-file_not_read-edit",
|
||||||
|
"repo": "agentic-resources",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.0"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "grok",
|
||||||
|
"pattern_id": "sp-problem-file_not_read-edit",
|
||||||
|
"repo": "agentic-resources",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.0"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "claude",
|
||||||
|
"pattern_id": "sp-problem-file_not_read-edit",
|
||||||
|
"repo": "can-you-assist",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.0"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "codex",
|
||||||
|
"pattern_id": "sp-problem-file_not_read-edit",
|
||||||
|
"repo": "can-you-assist",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.0"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "grok",
|
||||||
|
"pattern_id": "sp-problem-file_not_read-edit",
|
||||||
|
"repo": "can-you-assist",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.0"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "claude",
|
||||||
|
"pattern_id": "sp-problem-file_not_read-edit",
|
||||||
|
"repo": "net-kingdom",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.0"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "codex",
|
||||||
|
"pattern_id": "sp-problem-file_not_read-edit",
|
||||||
|
"repo": "net-kingdom",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.0"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "grok",
|
||||||
|
"pattern_id": "sp-problem-file_not_read-edit",
|
||||||
|
"repo": "net-kingdom",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.0"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "claude",
|
||||||
|
"pattern_id": "sp-problem-file_not_read-edit",
|
||||||
|
"repo": "ops-bridge",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.0"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "codex",
|
||||||
|
"pattern_id": "sp-problem-file_not_read-edit",
|
||||||
|
"repo": "ops-bridge",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.0"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "grok",
|
||||||
|
"pattern_id": "sp-problem-file_not_read-edit",
|
||||||
|
"repo": "ops-bridge",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.0"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "claude",
|
||||||
|
"pattern_id": "sp-problem-file_not_read-edit",
|
||||||
|
"repo": "state-hub",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.0"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "codex",
|
||||||
|
"pattern_id": "sp-problem-file_not_read-edit",
|
||||||
|
"repo": "state-hub",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.0"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "grok",
|
||||||
|
"pattern_id": "sp-problem-file_not_read-edit",
|
||||||
|
"repo": "state-hub",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.0"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "claude",
|
||||||
|
"pattern_id": "sp-problem-file_not_read-edit",
|
||||||
|
"repo": "the-custodian",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.0"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "codex",
|
||||||
|
"pattern_id": "sp-problem-file_not_read-edit",
|
||||||
|
"repo": "the-custodian",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.0"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "grok",
|
||||||
|
"pattern_id": "sp-problem-file_not_read-edit",
|
||||||
|
"repo": "the-custodian",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.0"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "claude",
|
||||||
|
"pattern_id": "sp-problem-schema_thrash-schema_load",
|
||||||
|
"repo": "ops-bridge",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.1"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "claude",
|
||||||
|
"pattern_id": "sp-problem-tool_thrash-tool-bash",
|
||||||
|
"repo": "state-hub",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.1"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "claude",
|
||||||
|
"pattern_id": "sp-success-clean_pass-outcome",
|
||||||
|
"repo": "agentic-resources",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.1"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "grok",
|
||||||
|
"pattern_id": "sp-success-clean_pass-outcome",
|
||||||
|
"repo": "agentic-resources",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.1"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "claude",
|
||||||
|
"pattern_id": "sp-success-clean_pass-outcome",
|
||||||
|
"repo": "can-you-assist",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.1"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "grok",
|
||||||
|
"pattern_id": "sp-success-clean_pass-outcome",
|
||||||
|
"repo": "can-you-assist",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.1"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "claude",
|
||||||
|
"pattern_id": "sp-success-clean_pass-outcome",
|
||||||
|
"repo": "ops-bridge",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.1"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "grok",
|
||||||
|
"pattern_id": "sp-success-clean_pass-outcome",
|
||||||
|
"repo": "ops-bridge",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.1"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "claude",
|
||||||
|
"pattern_id": "sp-success-clean_pass-outcome",
|
||||||
|
"repo": "state-hub",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.1"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "grok",
|
||||||
|
"pattern_id": "sp-success-clean_pass-outcome",
|
||||||
|
"repo": "state-hub",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.1"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "claude",
|
||||||
|
"pattern_id": "sp-success-clean_pass-outcome",
|
||||||
|
"repo": "the-custodian",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.1"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"flavor": "grok",
|
||||||
|
"pattern_id": "sp-success-clean_pass-outcome",
|
||||||
|
"repo": "the-custodian",
|
||||||
|
"status": "proposed",
|
||||||
|
"updated_at": "2026-06-07T14:25:34Z",
|
||||||
|
"version": "1.0.1"
|
||||||
|
}
|
||||||
|
]
|
||||||
115
session_memory/distribute/base.py
Normal file
115
session_memory/distribute/base.py
Normal file
@@ -0,0 +1,115 @@
|
|||||||
|
"""Distributor base — Artifact, the Distributor protocol, and idempotent markers
|
||||||
|
(PRD §6.4 FR-X1; T01).
|
||||||
|
|
||||||
|
A **distributor** turns one agnostic :class:`SolutionPattern` into a per-flavor
|
||||||
|
:class:`Artifact` (a target path + a snippet of content). Everything flavor-neutral
|
||||||
|
lives here; each flavor adapter (T02/T03) only supplies its target filename and may
|
||||||
|
override the rendered body using the pattern's ``rendering_hints``.
|
||||||
|
|
||||||
|
Snippets carry stable ``BEGIN/END`` markers keyed on the pattern id, so
|
||||||
|
re-distributing a pattern **updates its block in place** instead of duplicating it
|
||||||
|
— the property that lets Distribute run repeatedly (HITL) without drift.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import re
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from typing import Any, Optional, Protocol, runtime_checkable
|
||||||
|
|
||||||
|
from ..curate.schema import SolutionPattern
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Artifact:
|
||||||
|
"""A proposed per-flavor rendering of a pattern (FR-X1/FR-X3 — proposed, not applied)."""
|
||||||
|
|
||||||
|
flavor: str
|
||||||
|
target_path: str # repo-relative file the snippet belongs in (e.g. "CLAUDE.md")
|
||||||
|
pattern_id: str
|
||||||
|
content: str # the marker-wrapped snippet block
|
||||||
|
|
||||||
|
|
||||||
|
@runtime_checkable
|
||||||
|
class Distributor(Protocol):
|
||||||
|
flavor: str
|
||||||
|
target_path: str
|
||||||
|
|
||||||
|
def render(self, pattern: SolutionPattern) -> Artifact: ...
|
||||||
|
|
||||||
|
|
||||||
|
# --- idempotent snippet markers ---------------------------------------------
|
||||||
|
|
||||||
|
_MARK = "helix-forge pattern"
|
||||||
|
|
||||||
|
|
||||||
|
def begin_marker(pattern_id: str) -> str:
|
||||||
|
return f"<!-- BEGIN {_MARK}:{pattern_id} -->"
|
||||||
|
|
||||||
|
|
||||||
|
def end_marker(pattern_id: str) -> str:
|
||||||
|
return f"<!-- END {_MARK}:{pattern_id} -->"
|
||||||
|
|
||||||
|
|
||||||
|
def wrap_block(pattern_id: str, body: str, version: str = "") -> str:
|
||||||
|
"""Wrap a rendered body in stable BEGIN/END markers."""
|
||||||
|
ver = f" v{version}" if version else ""
|
||||||
|
return f"{begin_marker(pattern_id)}{ver}\n{body.strip()}\n{end_marker(pattern_id)}"
|
||||||
|
|
||||||
|
|
||||||
|
def upsert_block(doc_text: str, pattern_id: str, block: str) -> str:
|
||||||
|
"""Insert or replace a pattern's marked block within a document (idempotent)."""
|
||||||
|
pat = re.compile(
|
||||||
|
re.escape(begin_marker(pattern_id)) + r".*?" + re.escape(end_marker(pattern_id)),
|
||||||
|
re.DOTALL,
|
||||||
|
)
|
||||||
|
if pat.search(doc_text):
|
||||||
|
return pat.sub(block, doc_text)
|
||||||
|
sep = "" if doc_text.endswith("\n\n") or not doc_text else "\n\n"
|
||||||
|
return f"{doc_text}{sep}{block}\n"
|
||||||
|
|
||||||
|
|
||||||
|
# --- agnostic body rendering ------------------------------------------------
|
||||||
|
|
||||||
|
def render_markdown_body(pattern: SolutionPattern) -> str:
|
||||||
|
"""Default flavor-neutral snippet body from the agnostic pattern fields."""
|
||||||
|
label = "Avoid" if pattern.polarity == "problem" else "Prefer"
|
||||||
|
lines = [f"### {pattern.name}", "", pattern.problem.strip(), ""]
|
||||||
|
if pattern.resolutions:
|
||||||
|
lines.append(f"**{label}:**")
|
||||||
|
for r in pattern.resolutions:
|
||||||
|
detail = f" — {r.detail}" if r.detail else ""
|
||||||
|
lines.append(f"- {r.summary}{detail}")
|
||||||
|
for step in r.steps:
|
||||||
|
lines.append(f" - {step}")
|
||||||
|
return "\n".join(lines).strip()
|
||||||
|
|
||||||
|
|
||||||
|
def hint(pattern: SolutionPattern, flavor: str, key: str, default: Any = None) -> Any:
|
||||||
|
"""Read a per-flavor rendering hint, falling back to ``default``."""
|
||||||
|
return (pattern.rendering_hints.get(flavor) or {}).get(key, default)
|
||||||
|
|
||||||
|
|
||||||
|
class BaseDistributor:
|
||||||
|
"""Shared distributor: renders the agnostic body, honouring a ``body`` hint
|
||||||
|
override and a ``target`` hint, then wraps it in idempotent markers."""
|
||||||
|
|
||||||
|
flavor: str = ""
|
||||||
|
target_path: str = ""
|
||||||
|
|
||||||
|
def __init__(self, flavor: Optional[str] = None, target_path: Optional[str] = None) -> None:
|
||||||
|
if flavor is not None:
|
||||||
|
self.flavor = flavor
|
||||||
|
if target_path is not None:
|
||||||
|
self.target_path = target_path
|
||||||
|
|
||||||
|
def body(self, pattern: SolutionPattern) -> str:
|
||||||
|
return hint(pattern, self.flavor, "body") or render_markdown_body(pattern)
|
||||||
|
|
||||||
|
def target(self, pattern: SolutionPattern) -> str:
|
||||||
|
return hint(pattern, self.flavor, "target") or self.target_path
|
||||||
|
|
||||||
|
def render(self, pattern: SolutionPattern) -> Artifact:
|
||||||
|
block = wrap_block(pattern.id, self.body(pattern), pattern.version)
|
||||||
|
return Artifact(flavor=self.flavor, target_path=self.target(pattern),
|
||||||
|
pattern_id=pattern.id, content=block)
|
||||||
42
session_memory/distribute/claude.py
Normal file
42
session_memory/distribute/claude.py
Normal file
@@ -0,0 +1,42 @@
|
|||||||
|
"""Claude distributor (PRD §6.4 FR-X1; T02).
|
||||||
|
|
||||||
|
Renders an approved Solution Pattern into a ``CLAUDE.md`` snippet block. Most logic
|
||||||
|
is inherited from :class:`BaseDistributor`; the Claude-specific touch is an
|
||||||
|
optional **skill** rendering mode (``rendering_hints["claude"]["as"] == "skill"``)
|
||||||
|
that emits a skill-style stub instead of a plain instruction snippet — Claude's
|
||||||
|
native distribution targets are CLAUDE.md snippets, skills, or hooks.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from ..curate.schema import SolutionPattern
|
||||||
|
from .base import BaseDistributor, hint, render_markdown_body
|
||||||
|
|
||||||
|
|
||||||
|
class ClaudeDistributor(BaseDistributor):
|
||||||
|
flavor = "claude"
|
||||||
|
target_path = "CLAUDE.md"
|
||||||
|
|
||||||
|
def body(self, pattern: SolutionPattern) -> str:
|
||||||
|
override = hint(pattern, self.flavor, "body")
|
||||||
|
if override:
|
||||||
|
return override
|
||||||
|
if hint(pattern, self.flavor, "as") == "skill":
|
||||||
|
return self._skill_stub(pattern)
|
||||||
|
return render_markdown_body(pattern)
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _skill_stub(pattern: SolutionPattern) -> str:
|
||||||
|
trigger = "avoid" if pattern.polarity == "problem" else "apply"
|
||||||
|
lines = [
|
||||||
|
f"## Skill: {pattern.name}",
|
||||||
|
"",
|
||||||
|
f"**When:** situations where you would {trigger} — {pattern.problem.strip()}",
|
||||||
|
"",
|
||||||
|
"**Steps:**",
|
||||||
|
]
|
||||||
|
for r in pattern.resolutions:
|
||||||
|
lines.append(f"- {r.summary}" + (f" — {r.detail}" if r.detail else ""))
|
||||||
|
for step in r.steps:
|
||||||
|
lines.append(f" - {step}")
|
||||||
|
return "\n".join(lines).strip()
|
||||||
15
session_memory/distribute/codex.py
Normal file
15
session_memory/distribute/codex.py
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
"""Codex distributor (PRD §6.4 FR-X1; T03).
|
||||||
|
|
||||||
|
Renders an approved Solution Pattern into an ``AGENTS.md`` snippet — Codex's native
|
||||||
|
repo-convention surface. Identical agnostic body to the other flavors (FR-A3: one
|
||||||
|
pattern, expressible everywhere); only the target file differs.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from .base import BaseDistributor
|
||||||
|
|
||||||
|
|
||||||
|
class CodexDistributor(BaseDistributor):
|
||||||
|
flavor = "codex"
|
||||||
|
target_path = "AGENTS.md"
|
||||||
15
session_memory/distribute/grok.py
Normal file
15
session_memory/distribute/grok.py
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
"""Grok distributor (PRD §6.4 FR-X1; T03).
|
||||||
|
|
||||||
|
Renders an approved Solution Pattern into Grok's native instruction format. Defaults
|
||||||
|
to a ``.grok/instructions.md`` snippet; the same agnostic body as the other flavors
|
||||||
|
(FR-A3), overridable via ``rendering_hints["grok"]``.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from .base import BaseDistributor
|
||||||
|
|
||||||
|
|
||||||
|
class GrokDistributor(BaseDistributor):
|
||||||
|
flavor = "grok"
|
||||||
|
target_path = ".grok/instructions.md"
|
||||||
136
session_memory/distribute/proposals.py
Normal file
136
session_memory/distribute/proposals.py
Normal file
@@ -0,0 +1,136 @@
|
|||||||
|
"""Scoping, proposed-not-applied output, and the active-pattern registry
|
||||||
|
(PRD §6.4 FR-X2/FR-X3/FR-X4; T04).
|
||||||
|
|
||||||
|
* **Scope (FR-X2):** a pattern lands in a target environment only if the target's
|
||||||
|
repo/domain/flavor are within the pattern's :class:`Scope` (an empty scope list
|
||||||
|
means "unrestricted on that axis").
|
||||||
|
* **Proposed, not applied (FR-X3):** rendered artifacts are written under a
|
||||||
|
``proposals/`` tree mirroring the target path — a reviewable diff a human applies,
|
||||||
|
never auto-written into the live file. Re-running upserts each pattern's block in
|
||||||
|
place (idempotent), so proposals don't accumulate duplicates.
|
||||||
|
* **Active-pattern registry (FR-X4):** a JSON record of which pattern (and version)
|
||||||
|
is proposed/active in which (repo, flavor) environment.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
from dataclasses import dataclass
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
|
||||||
|
from ..curate.schema import SolutionPattern
|
||||||
|
from .base import upsert_block
|
||||||
|
from .registry import get_distributor
|
||||||
|
|
||||||
|
|
||||||
|
def _now() -> str:
|
||||||
|
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass(frozen=True)
|
||||||
|
class Target:
|
||||||
|
"""An environment a pattern could be distributed to."""
|
||||||
|
|
||||||
|
repo: str
|
||||||
|
domain: str = ""
|
||||||
|
flavor: str = "claude"
|
||||||
|
|
||||||
|
|
||||||
|
def applies(pattern: SolutionPattern, target: Target) -> bool:
|
||||||
|
"""True if ``target`` is within the pattern's scope (empty axis == any)."""
|
||||||
|
sc = pattern.scope
|
||||||
|
if sc.repos and target.repo not in sc.repos:
|
||||||
|
return False
|
||||||
|
if sc.domains and target.domain and target.domain not in sc.domains:
|
||||||
|
return False
|
||||||
|
if sc.flavors and target.flavor not in sc.flavors:
|
||||||
|
return False
|
||||||
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def is_distributable(pattern: SolutionPattern) -> bool:
|
||||||
|
return pattern.status == "approved" and pattern.distribution_ready
|
||||||
|
|
||||||
|
|
||||||
|
class ActiveRegistry:
|
||||||
|
"""JSON record of patterns proposed/active per (repo, flavor) — FR-X4."""
|
||||||
|
|
||||||
|
def __init__(self, path: str) -> None:
|
||||||
|
self.path = path
|
||||||
|
self._entries: dict[str, dict] = {}
|
||||||
|
if os.path.exists(path):
|
||||||
|
with open(path, encoding="utf-8") as fh:
|
||||||
|
for e in json.load(fh):
|
||||||
|
self._entries[self._key(e["pattern_id"], e["repo"], e["flavor"])] = e
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _key(pid: str, repo: str, flavor: str) -> str:
|
||||||
|
return f"{pid}|{repo}|{flavor}"
|
||||||
|
|
||||||
|
def record(self, pid: str, repo: str, flavor: str, version: str,
|
||||||
|
status: str = "proposed") -> None:
|
||||||
|
self._entries[self._key(pid, repo, flavor)] = {
|
||||||
|
"pattern_id": pid, "repo": repo, "flavor": flavor,
|
||||||
|
"version": version, "status": status, "updated_at": _now(),
|
||||||
|
}
|
||||||
|
|
||||||
|
def entries(self) -> list[dict]:
|
||||||
|
return [self._entries[k] for k in sorted(self._entries)]
|
||||||
|
|
||||||
|
def save(self) -> None:
|
||||||
|
os.makedirs(os.path.dirname(self.path) or ".", exist_ok=True)
|
||||||
|
with open(self.path, "w", encoding="utf-8") as fh:
|
||||||
|
json.dump(self.entries(), fh, indent=2, sort_keys=True)
|
||||||
|
fh.write("\n")
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class ProposalResult:
|
||||||
|
proposals: list = None # (repo, flavor, pattern_id, proposal_path)
|
||||||
|
files_written: list = None # absolute proposal paths
|
||||||
|
skipped_not_distributable: list = None # pattern ids
|
||||||
|
|
||||||
|
def __post_init__(self):
|
||||||
|
self.proposals = self.proposals or []
|
||||||
|
self.files_written = self.files_written or []
|
||||||
|
self.skipped_not_distributable = self.skipped_not_distributable or []
|
||||||
|
|
||||||
|
|
||||||
|
def propose(patterns: list[SolutionPattern], targets: list[Target], out_dir: str,
|
||||||
|
registry: ActiveRegistry) -> ProposalResult:
|
||||||
|
"""Render in-scope, distributable patterns into per-target proposal files."""
|
||||||
|
result = ProposalResult()
|
||||||
|
pending: dict[str, str] = {} # proposal path -> accumulated content
|
||||||
|
|
||||||
|
for p in patterns:
|
||||||
|
if not is_distributable(p):
|
||||||
|
result.skipped_not_distributable.append(p.id)
|
||||||
|
continue
|
||||||
|
for t in targets:
|
||||||
|
dist = get_distributor(t.flavor)
|
||||||
|
if dist is None or not applies(p, t):
|
||||||
|
continue
|
||||||
|
art = dist.render(p)
|
||||||
|
path = os.path.join(out_dir, t.repo, art.target_path)
|
||||||
|
if path not in pending:
|
||||||
|
pending[path] = _read(path)
|
||||||
|
pending[path] = upsert_block(pending[path], p.id, art.content)
|
||||||
|
registry.record(p.id, t.repo, t.flavor, p.version)
|
||||||
|
result.proposals.append((t.repo, t.flavor, p.id, path))
|
||||||
|
|
||||||
|
for path, content in pending.items():
|
||||||
|
os.makedirs(os.path.dirname(path), exist_ok=True)
|
||||||
|
with open(path, "w", encoding="utf-8") as fh:
|
||||||
|
fh.write(content if content.endswith("\n") else content + "\n")
|
||||||
|
result.files_written.append(path)
|
||||||
|
|
||||||
|
registry.save()
|
||||||
|
return result
|
||||||
|
|
||||||
|
|
||||||
|
def _read(path: str) -> str:
|
||||||
|
if os.path.exists(path):
|
||||||
|
with open(path, encoding="utf-8") as fh:
|
||||||
|
return fh.read()
|
||||||
|
return ""
|
||||||
26
session_memory/distribute/registry.py
Normal file
26
session_memory/distribute/registry.py
Normal file
@@ -0,0 +1,26 @@
|
|||||||
|
"""Distributor registry (T03) — flavor -> distributor, the one place that knows
|
||||||
|
about all flavor edges. Adding a flavor = one entry here + one adapter module.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from .base import BaseDistributor
|
||||||
|
from .claude import ClaudeDistributor
|
||||||
|
from .codex import CodexDistributor
|
||||||
|
from .grok import GrokDistributor
|
||||||
|
|
||||||
|
_REGISTRY: dict[str, BaseDistributor] = {
|
||||||
|
"claude": ClaudeDistributor(),
|
||||||
|
"codex": CodexDistributor(),
|
||||||
|
"grok": GrokDistributor(),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def get_distributor(flavor: str) -> Optional[BaseDistributor]:
|
||||||
|
return _REGISTRY.get(flavor)
|
||||||
|
|
||||||
|
|
||||||
|
def all_flavors() -> list[str]:
|
||||||
|
return list(_REGISTRY)
|
||||||
@@ -19,13 +19,19 @@ from dataclasses import dataclass, field
|
|||||||
from typing import Any
|
from typing import Any
|
||||||
|
|
||||||
from .adapters import claude as claude_adapter
|
from .adapters import claude as claude_adapter
|
||||||
|
from .adapters import codex as codex_adapter
|
||||||
|
from .adapters import grok as grok_adapter
|
||||||
from .core import digest as digest_mod
|
from .core import digest as digest_mod
|
||||||
from .core.cursor import Cursors
|
from .core.cursor import Cursors
|
||||||
from .core.retention import RetentionConfig, sweep as retention_sweep
|
from .core.retention import RetentionConfig, sweep as retention_sweep
|
||||||
from .core.store import Store
|
from .core.store import Store
|
||||||
|
|
||||||
# adapter dispatch by source name
|
# adapter dispatch by source name
|
||||||
_ADAPTERS = {"claude": claude_adapter.parse_session}
|
_ADAPTERS = {
|
||||||
|
"claude": claude_adapter.parse_session,
|
||||||
|
"codex": codex_adapter.parse_session,
|
||||||
|
"grok": grok_adapter.parse_session,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
@dataclass
|
@dataclass
|
||||||
|
|||||||
9
session_memory/measure/__init__.py
Normal file
9
session_memory/measure/__init__.py
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
"""Measure phase (PRD §6.5) — the loop-closer.
|
||||||
|
|
||||||
|
metrics.py fleet metrics + persisted baseline snapshots (T01)
|
||||||
|
effect.py before/after per-pattern effectiveness (T02)
|
||||||
|
__main__.py python -m session_memory.measure (T03)
|
||||||
|
|
||||||
|
Computation over existing digests (reusing WP-0005 tool buckets + WP-0006 error
|
||||||
|
mining); no new capture.
|
||||||
|
"""
|
||||||
101
session_memory/measure/__main__.py
Normal file
101
session_memory/measure/__main__.py
Normal file
@@ -0,0 +1,101 @@
|
|||||||
|
"""Measure entrypoint (T03): fleet trend + per-pattern effectiveness.
|
||||||
|
|
||||||
|
python -m session_memory.measure [--config PATH] [--label L] [--since DATE]
|
||||||
|
[--no-save] [--json]
|
||||||
|
|
||||||
|
Computes current fleet metrics over the real (quality-filtered) sessions, appends
|
||||||
|
them to the baseline trend, and reports whether the fleet is getting cheaper /
|
||||||
|
more reliable over time (FR-M3). With ``--since DATE`` it also reports before/after
|
||||||
|
effectiveness around a change (FR-M1/FR-M2).
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
|
||||||
|
from ..core.store import Store
|
||||||
|
from ..detect.quality import filter_real, quality_config
|
||||||
|
from ..ingest import _expand, load_config
|
||||||
|
from .effect import effectiveness
|
||||||
|
from .metrics import load_baselines, save_baseline, snapshot
|
||||||
|
|
||||||
|
_TREND_KEYS = ("infra_overhead_share_median", "error_rate", "schema_thrash_sessions",
|
||||||
|
"tokens_p50", "success_rate")
|
||||||
|
|
||||||
|
|
||||||
|
def real_digests(config: dict) -> list[dict]:
|
||||||
|
s = config.get("store", {})
|
||||||
|
store = Store(_expand(s["db_path"]), _expand(s["blob_dir"]))
|
||||||
|
out = filter_real(store.list_digests(), quality_config(config))
|
||||||
|
store.close()
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
def _fmt_trend(baselines: list[dict]) -> str:
|
||||||
|
if not baselines:
|
||||||
|
return " (no prior snapshots)"
|
||||||
|
lines = []
|
||||||
|
recent = baselines[-5:]
|
||||||
|
for b in recent:
|
||||||
|
when = (b.get("captured_at") or "")[:10]
|
||||||
|
lbl = f" {b['label']}" if b.get("label") else ""
|
||||||
|
lines.append(f" {when}{lbl}: overhead_med={b.get('infra_overhead_share_median')} "
|
||||||
|
f"err_rate={b.get('error_rate')} schema_thrash={b.get('schema_thrash_sessions')} "
|
||||||
|
f"tok_p50={b.get('tokens_p50')} success={b.get('success_rate')} "
|
||||||
|
f"(n={b.get('n_sessions')})")
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
def _report(current: dict, baselines: list[dict], eff: dict | None) -> str:
|
||||||
|
lines = [f"# Fleet metrics (n={current.get('n_sessions')} real sessions)"]
|
||||||
|
for k in _TREND_KEYS:
|
||||||
|
lines.append(f" {k} = {current.get(k)}")
|
||||||
|
lines.append("\n## Trend (recent snapshots)")
|
||||||
|
lines.append(_fmt_trend(baselines))
|
||||||
|
if eff is not None:
|
||||||
|
lines.append(f"\n## Effectiveness since {eff['applied_at']} "
|
||||||
|
f"(before={eff['n_before']}, after={eff['n_after']})")
|
||||||
|
if eff["insufficient_data"]:
|
||||||
|
lines.append(" insufficient data on one side of the date")
|
||||||
|
else:
|
||||||
|
for k in _TREND_KEYS:
|
||||||
|
d = eff["deltas"].get(k, {})
|
||||||
|
mark = {True: "improved", False: "worse", None: "—"}[d.get("improved")]
|
||||||
|
lines.append(f" {k}: {d.get('before')} -> {d.get('after')} "
|
||||||
|
f"({d.get('change'):+}) {mark}")
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
def main(argv=None) -> int:
|
||||||
|
here = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
|
||||||
|
ap = argparse.ArgumentParser(description="Measure fleet metrics + per-pattern effectiveness.")
|
||||||
|
ap.add_argument("--config", default=os.path.join(here, "config.toml"))
|
||||||
|
ap.add_argument("--label", default="")
|
||||||
|
ap.add_argument("--since", default=None, help="ISO date for before/after effectiveness")
|
||||||
|
ap.add_argument("--no-save", action="store_true", help="don't append to the baseline trend")
|
||||||
|
ap.add_argument("--json", action="store_true")
|
||||||
|
args = ap.parse_args(argv)
|
||||||
|
|
||||||
|
config = load_config(args.config)
|
||||||
|
digests = real_digests(config)
|
||||||
|
current = snapshot(digests, label=args.label)
|
||||||
|
|
||||||
|
path = _expand(config.get("measure", {}).get("baselines", "session_memory/measure/baselines.jsonl"))
|
||||||
|
prior = load_baselines(path)
|
||||||
|
if not args.no_save:
|
||||||
|
save_baseline(current, path)
|
||||||
|
|
||||||
|
eff = effectiveness(digests, args.since, label=args.label) if args.since else None
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
print(json.dumps({"current": current, "trend": prior + [current], "effectiveness": eff},
|
||||||
|
indent=2))
|
||||||
|
else:
|
||||||
|
print(_report(current, prior + [current], eff))
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
raise SystemExit(main())
|
||||||
1
session_memory/measure/baselines.jsonl
Normal file
1
session_memory/measure/baselines.jsonl
Normal file
@@ -0,0 +1 @@
|
|||||||
|
{"captured_at": "2026-06-07T13:30:14Z", "error_rate": 0.963, "infra_overhead_share_median": 0.117, "infra_overhead_share_p90": 0.261, "label": "phase4-baseline (pre-fixes)", "n_sessions": 27, "recurring_error_occurrences": 505, "schema_thrash_sessions": 8, "success_rate": 1.0, "tokens_p50": 250725, "tokens_p90": 1423966}
|
||||||
60
session_memory/measure/effect.py
Normal file
60
session_memory/measure/effect.py
Normal file
@@ -0,0 +1,60 @@
|
|||||||
|
"""Before/after per-pattern effectiveness (PRD §6.5 FR-M1/FR-M2; T02).
|
||||||
|
|
||||||
|
Given a change/pattern with an ``applied_at`` date, split sessions into *before*
|
||||||
|
and *after* by their start time, aggregate each side, and diff the headline
|
||||||
|
metrics — so we can say whether a distributed pattern (e.g. the Read-before-Edit
|
||||||
|
reflex, or the State Hub skill) actually moved the numbers, and retire it if not.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
from .metrics import aggregate
|
||||||
|
|
||||||
|
# Metrics where a *lower* value after the change means improvement.
|
||||||
|
_LOWER_IS_BETTER = {
|
||||||
|
"infra_overhead_share_median", "infra_overhead_share_p90", "error_rate",
|
||||||
|
"recurring_error_occurrences", "schema_thrash_sessions", "tokens_p50", "tokens_p90",
|
||||||
|
}
|
||||||
|
# Metrics where a *higher* value is improvement.
|
||||||
|
_HIGHER_IS_BETTER = {"success_rate"}
|
||||||
|
|
||||||
|
|
||||||
|
def split_by_date(digests: list[dict], applied_at: str) -> tuple[list[dict], list[dict]]:
|
||||||
|
"""Partition digests into (before, after) by ``started_at`` vs ``applied_at``."""
|
||||||
|
before, after = [], []
|
||||||
|
for d in digests:
|
||||||
|
ts = d.get("started_at") or ""
|
||||||
|
(after if ts and ts >= applied_at else before).append(d)
|
||||||
|
return before, after
|
||||||
|
|
||||||
|
|
||||||
|
def _delta(metric: str, before: float, after: float) -> dict:
|
||||||
|
change = round(after - before, 3)
|
||||||
|
if metric in _LOWER_IS_BETTER:
|
||||||
|
improved = change < 0
|
||||||
|
elif metric in _HIGHER_IS_BETTER:
|
||||||
|
improved = change > 0
|
||||||
|
else:
|
||||||
|
improved = None
|
||||||
|
return {"before": before, "after": after, "change": change, "improved": improved}
|
||||||
|
|
||||||
|
|
||||||
|
def effectiveness(digests: list[dict], applied_at: str, *, label: str = "") -> dict:
|
||||||
|
"""Compare fleet metrics after ``applied_at`` against the prior period."""
|
||||||
|
before, after = split_by_date(digests, applied_at)
|
||||||
|
b_agg, a_agg = aggregate(before), aggregate(after)
|
||||||
|
metrics = (_LOWER_IS_BETTER | _HIGHER_IS_BETTER)
|
||||||
|
deltas = {}
|
||||||
|
if before and after:
|
||||||
|
for m in metrics:
|
||||||
|
deltas[m] = _delta(m, b_agg.get(m, 0.0), a_agg.get(m, 0.0))
|
||||||
|
return {
|
||||||
|
"label": label,
|
||||||
|
"applied_at": applied_at,
|
||||||
|
"n_before": len(before),
|
||||||
|
"n_after": len(after),
|
||||||
|
"before": b_agg,
|
||||||
|
"after": a_agg,
|
||||||
|
"deltas": deltas,
|
||||||
|
"insufficient_data": not (before and after),
|
||||||
|
}
|
||||||
102
session_memory/measure/metrics.py
Normal file
102
session_memory/measure/metrics.py
Normal file
@@ -0,0 +1,102 @@
|
|||||||
|
"""Fleet metrics + persisted baselines (PRD §6.5 FR-M3; T01).
|
||||||
|
|
||||||
|
Computes the headline health metrics of the captured corpus — the same quantities
|
||||||
|
the friction assessment reported — so they can be tracked over time and compared
|
||||||
|
before/after a change. Reuses :func:`detect.signals.tool_bucket` (WP-0005) and the
|
||||||
|
digest ``error_snippets`` (WP-0006); no new capture.
|
||||||
|
|
||||||
|
A **baseline** is a timestamped metrics snapshot appended to a JSONL file, so
|
||||||
|
successive runs build a trend the entrypoint (T03) can chart.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import collections
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
from datetime import datetime, timezone
|
||||||
|
|
||||||
|
from ..detect.signals import tool_bucket
|
||||||
|
|
||||||
|
|
||||||
|
def _now() -> str:
|
||||||
|
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
|
||||||
|
|
||||||
|
|
||||||
|
def _pct(values: list[float], q: float) -> float:
|
||||||
|
if not values:
|
||||||
|
return 0.0
|
||||||
|
s = sorted(values)
|
||||||
|
return round(s[int(q * (len(s) - 1))], 3)
|
||||||
|
|
||||||
|
|
||||||
|
def _median(values: list[float]) -> float:
|
||||||
|
return _pct(values, 0.5)
|
||||||
|
|
||||||
|
|
||||||
|
def _buckets(digest: dict) -> collections.Counter:
|
||||||
|
b: collections.Counter = collections.Counter()
|
||||||
|
for tool, n in (digest.get("tool_histogram") or {}).items():
|
||||||
|
b[tool_bucket(tool)] += n
|
||||||
|
return b
|
||||||
|
|
||||||
|
|
||||||
|
def session_metrics(digest: dict) -> dict:
|
||||||
|
"""Per-session metrics used to build fleet aggregates."""
|
||||||
|
b = _buckets(digest)
|
||||||
|
total = sum(b.values()) or 1
|
||||||
|
overhead = b["statehub_mcp"] + b["task_mgmt"] + b["schema_load"]
|
||||||
|
cost = digest.get("cost", {})
|
||||||
|
tokens = cost.get("input_tokens", 0) + cost.get("output_tokens", 0)
|
||||||
|
return {
|
||||||
|
"infra_overhead_share": overhead / total,
|
||||||
|
"tool_calls": total,
|
||||||
|
"schema_load": b["schema_load"],
|
||||||
|
"error_occurrences": sum(s.get("count", 1) for s in (digest.get("error_snippets") or [])),
|
||||||
|
"has_error": bool(digest.get("error_snippets")),
|
||||||
|
"tokens": tokens,
|
||||||
|
"success": digest.get("outcome") == "success",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def aggregate(digests: list[dict], *, schema_thrash_threshold: int = 5) -> dict:
|
||||||
|
"""Fleet-level metrics over a set of (already quality-filtered) digests."""
|
||||||
|
per = [session_metrics(d) for d in digests]
|
||||||
|
n = len(per)
|
||||||
|
if n == 0:
|
||||||
|
return {"n_sessions": 0}
|
||||||
|
shares = [m["infra_overhead_share"] for m in per]
|
||||||
|
tokens = [m["tokens"] for m in per]
|
||||||
|
return {
|
||||||
|
"n_sessions": n,
|
||||||
|
"infra_overhead_share_median": _median(shares),
|
||||||
|
"infra_overhead_share_p90": _pct(shares, 0.9),
|
||||||
|
"error_rate": round(sum(m["has_error"] for m in per) / n, 3),
|
||||||
|
"recurring_error_occurrences": sum(m["error_occurrences"] for m in per),
|
||||||
|
"schema_thrash_sessions": sum(1 for m in per if m["schema_load"] >= schema_thrash_threshold),
|
||||||
|
"tokens_p50": _pct(tokens, 0.5),
|
||||||
|
"tokens_p90": _pct(tokens, 0.9),
|
||||||
|
"success_rate": round(sum(m["success"] for m in per) / n, 3),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def snapshot(digests: list[dict], *, label: str = "") -> dict:
|
||||||
|
m = aggregate(digests)
|
||||||
|
m["captured_at"] = _now()
|
||||||
|
m["label"] = label
|
||||||
|
return m
|
||||||
|
|
||||||
|
|
||||||
|
def save_baseline(metrics: dict, path: str) -> None:
|
||||||
|
"""Append a metrics snapshot to the baseline JSONL trend file."""
|
||||||
|
os.makedirs(os.path.dirname(path) or ".", exist_ok=True)
|
||||||
|
with open(path, "a", encoding="utf-8") as fh:
|
||||||
|
fh.write(json.dumps(metrics, sort_keys=True))
|
||||||
|
fh.write("\n")
|
||||||
|
|
||||||
|
|
||||||
|
def load_baselines(path: str) -> list[dict]:
|
||||||
|
if not os.path.exists(path):
|
||||||
|
return []
|
||||||
|
with open(path, encoding="utf-8") as fh:
|
||||||
|
return [json.loads(line) for line in fh if line.strip()]
|
||||||
9
session_memory/retro/__init__.py
Normal file
9
session_memory/retro/__init__.py
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
"""Weekly retro (AGENTIC-WP-0010) — the analysis half of the coding retrospection.
|
||||||
|
|
||||||
|
build.py windowed detect + measure -> ranked top-3 suggestions per repo (T01)
|
||||||
|
publish.py publish the retro to the hub read model + local report (T02)
|
||||||
|
__main__.py python -m session_memory.retro (T03)
|
||||||
|
|
||||||
|
Consumed by activity-core's weekly-coding-retro schedule (ACTIVITY-WP-0008) via
|
||||||
|
the ``event_type=coding_retro`` read model.
|
||||||
|
"""
|
||||||
68
session_memory/retro/__main__.py
Normal file
68
session_memory/retro/__main__.py
Normal file
@@ -0,0 +1,68 @@
|
|||||||
|
"""Weekly retro entrypoint (AGENTIC-WP-0010 T03).
|
||||||
|
|
||||||
|
python -m session_memory.retro [--window-days 7] [--since D] [--until D]
|
||||||
|
[--publish] [--json]
|
||||||
|
|
||||||
|
Builds the windowed top-3-per-repo retro over the captured sessions, writes a local
|
||||||
|
JSON + markdown report, and (with ``--publish``) posts it to the hub as the
|
||||||
|
``coding_retro`` read model that activity-core's weekly schedule consumes.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
|
||||||
|
from ..core.store import Store
|
||||||
|
from ..curate.catalog import Catalog
|
||||||
|
from ..ingest import _expand, load_config
|
||||||
|
from .build import weekly_retro
|
||||||
|
from .publish import publish_to_hub, render_markdown, write_local
|
||||||
|
|
||||||
|
|
||||||
|
def run_retro(config: dict, *, window_days=None, since=None, until=None):
|
||||||
|
s = config.get("store", {})
|
||||||
|
store = Store(_expand(s["db_path"]), _expand(s["blob_dir"]))
|
||||||
|
digests = store.list_digests()
|
||||||
|
store.close()
|
||||||
|
cur = config.get("curate", {})
|
||||||
|
catalog = Catalog(_expand(cur.get("catalog_dir", "session_memory/catalog")))
|
||||||
|
rcfg = config.get("retro", {})
|
||||||
|
return weekly_retro(digests, catalog, since=since, until=until,
|
||||||
|
window_days=window_days or rcfg.get("window_days", 7))
|
||||||
|
|
||||||
|
|
||||||
|
def main(argv=None) -> int:
|
||||||
|
here = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
|
||||||
|
ap = argparse.ArgumentParser(description="Build (and optionally publish) the weekly coding retro.")
|
||||||
|
ap.add_argument("--config", default=os.path.join(here, "config.toml"))
|
||||||
|
ap.add_argument("--window-days", type=int, default=None)
|
||||||
|
ap.add_argument("--since", default=None)
|
||||||
|
ap.add_argument("--until", default=None)
|
||||||
|
ap.add_argument("--publish", action="store_true", help="post to the hub coding_retro read model")
|
||||||
|
ap.add_argument("--json", action="store_true")
|
||||||
|
args = ap.parse_args(argv)
|
||||||
|
|
||||||
|
config = load_config(args.config)
|
||||||
|
report = run_retro(config, window_days=args.window_days, since=args.since, until=args.until)
|
||||||
|
|
||||||
|
rcfg = config.get("retro", {})
|
||||||
|
write_local(report, _expand(rcfg.get("report_json", "session_memory/retro/last_retro.json")),
|
||||||
|
_expand(rcfg.get("report_md", "session_memory/retro/last_retro.md")))
|
||||||
|
|
||||||
|
published = None
|
||||||
|
if args.publish:
|
||||||
|
published = publish_to_hub(report, base_url=rcfg.get("hub_url", "http://127.0.0.1:8000"))
|
||||||
|
|
||||||
|
if args.json:
|
||||||
|
print(json.dumps({"report": report, "published": published}, indent=2))
|
||||||
|
else:
|
||||||
|
print(render_markdown(report))
|
||||||
|
if args.publish:
|
||||||
|
print(f"\npublished to hub: {published}")
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
raise SystemExit(main())
|
||||||
99
session_memory/retro/build.py
Normal file
99
session_memory/retro/build.py
Normal file
@@ -0,0 +1,99 @@
|
|||||||
|
"""Windowed weekly retro report (AGENTIC-WP-0010 T01).
|
||||||
|
|
||||||
|
Runs the existing detect pipeline over a date window, ranks the recurring problem
|
||||||
|
patterns into **per-repo improvement suggestions** (top 3, cross-flavor first),
|
||||||
|
attaches a recommendation from the Pattern Catalog where one exists, and bundles a
|
||||||
|
fleet measure snapshot for context. Pure function over digests — the entrypoint
|
||||||
|
(T03) handles store/publish.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import collections
|
||||||
|
from dataclasses import asdict, dataclass
|
||||||
|
from datetime import datetime, timedelta, timezone
|
||||||
|
from typing import Optional
|
||||||
|
|
||||||
|
from ..detect.cluster import cluster
|
||||||
|
from ..detect.quality import QualityConfig, filter_real
|
||||||
|
from ..detect.signals import extract_signals
|
||||||
|
from ..measure.metrics import aggregate
|
||||||
|
|
||||||
|
# score at/above which a suggestion is "high" priority even when single-flavor
|
||||||
|
_HIGH_SCORE = 100.0
|
||||||
|
|
||||||
|
|
||||||
|
def _parse(ts: str) -> datetime:
|
||||||
|
return datetime.fromisoformat(ts.replace("Z", "+00:00"))
|
||||||
|
|
||||||
|
|
||||||
|
def _iso(dt: datetime) -> str:
|
||||||
|
return dt.astimezone(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
|
||||||
|
|
||||||
|
|
||||||
|
def _now() -> datetime:
|
||||||
|
return datetime.now(timezone.utc)
|
||||||
|
|
||||||
|
|
||||||
|
@dataclass
|
||||||
|
class Suggestion:
|
||||||
|
repo: str
|
||||||
|
title: str
|
||||||
|
recommendation: str
|
||||||
|
priority: str # high | medium
|
||||||
|
score: float
|
||||||
|
signal_type: str
|
||||||
|
cross_flavor: bool
|
||||||
|
pattern_key: str
|
||||||
|
|
||||||
|
|
||||||
|
def _recommendation(pattern_key: str, locus: str, catalog) -> Optional[str]:
|
||||||
|
if catalog is None:
|
||||||
|
return None
|
||||||
|
sp = catalog.find_for(pattern_key, locus)
|
||||||
|
if sp and sp.resolutions:
|
||||||
|
return sp.resolutions[0].summary
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def weekly_retro(digests: list[dict], catalog=None, *, since: Optional[str] = None,
|
||||||
|
until: Optional[str] = None, window_days: int = 7,
|
||||||
|
max_per_repo: int = 3, min_frequency: int = 2,
|
||||||
|
quality: Optional[QualityConfig] = None) -> dict:
|
||||||
|
"""Build the ranked weekly retro report over a date window."""
|
||||||
|
until_dt = _parse(until) if until else _now()
|
||||||
|
since_dt = _parse(since) if since else until_dt - timedelta(days=window_days)
|
||||||
|
|
||||||
|
windowed = [d for d in digests
|
||||||
|
if d.get("started_at") and since_dt <= _parse(d["started_at"]) < until_dt]
|
||||||
|
real = filter_real(windowed, quality or QualityConfig())
|
||||||
|
|
||||||
|
patterns = cluster(extract_signals(real), min_frequency=min_frequency)
|
||||||
|
|
||||||
|
by_repo: dict[str, list[Suggestion]] = collections.defaultdict(list)
|
||||||
|
for p in patterns:
|
||||||
|
if p.polarity != "problem":
|
||||||
|
continue # improvements come from problems
|
||||||
|
rec = (_recommendation(p.key, p.locus, catalog)
|
||||||
|
or f"Investigate {p.signal_type.replace('_', ' ')} on {p.locus}")
|
||||||
|
priority = "high" if (p.cross_flavor or p.score >= _HIGH_SCORE) else "medium"
|
||||||
|
for repo in (p.repos or ["(unknown)"]):
|
||||||
|
by_repo[repo].append(Suggestion(
|
||||||
|
repo=repo, title=p.title, recommendation=rec, priority=priority,
|
||||||
|
score=p.score, signal_type=p.signal_type, cross_flavor=p.cross_flavor,
|
||||||
|
pattern_key=p.key))
|
||||||
|
|
||||||
|
suggestions: list[Suggestion] = []
|
||||||
|
for repo in sorted(by_repo):
|
||||||
|
items = sorted(by_repo[repo], key=lambda s: -s.score)
|
||||||
|
suggestions.extend(items[:max_per_repo])
|
||||||
|
# cross-flavor first, then by score (global ordering for the report)
|
||||||
|
suggestions.sort(key=lambda s: (not s.cross_flavor, -s.score))
|
||||||
|
|
||||||
|
return {
|
||||||
|
"window": {"since": _iso(since_dt), "until": _iso(until_dt), "days": window_days},
|
||||||
|
"generated_at": _iso(_now()),
|
||||||
|
"n_sessions": len(real),
|
||||||
|
"suggestions": [asdict(s) for s in suggestions],
|
||||||
|
"measure": aggregate(real),
|
||||||
|
}
|
||||||
322
session_memory/retro/last_retro.json
Normal file
322
session_memory/retro/last_retro.json
Normal file
@@ -0,0 +1,322 @@
|
|||||||
|
{
|
||||||
|
"generated_at": "2026-06-07T19:30:56Z",
|
||||||
|
"measure": {
|
||||||
|
"error_rate": 0.957,
|
||||||
|
"infra_overhead_share_median": 0.167,
|
||||||
|
"infra_overhead_share_p90": 0.23,
|
||||||
|
"n_sessions": 23,
|
||||||
|
"recurring_error_occurrences": 463,
|
||||||
|
"schema_thrash_sessions": 7,
|
||||||
|
"success_rate": 1.0,
|
||||||
|
"tokens_p50": 250725,
|
||||||
|
"tokens_p90": 901422
|
||||||
|
},
|
||||||
|
"n_sessions": 23,
|
||||||
|
"suggestions": [
|
||||||
|
{
|
||||||
|
"cross_flavor": true,
|
||||||
|
"pattern_key": "problem:recurring_error:make: *** [makefile:<n>: fix-consistency] error <n>",
|
||||||
|
"priority": "high",
|
||||||
|
"recommendation": "Investigate recurring error on make: *** [makefile:<n>: fix-consistency] error <n>",
|
||||||
|
"repo": "net-kingdom",
|
||||||
|
"score": 54.0,
|
||||||
|
"signal_type": "recurring_error",
|
||||||
|
"title": "cross-flavor problem: recurring error"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:tool_thrash:tool:Bash",
|
||||||
|
"priority": "high",
|
||||||
|
"recommendation": "Batch related shell work into one script, not many small Bash calls",
|
||||||
|
"repo": "activity-core",
|
||||||
|
"score": 13128.0,
|
||||||
|
"signal_type": "tool_thrash",
|
||||||
|
"title": "problem: tool thrash"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:tool_thrash:tool:Bash",
|
||||||
|
"priority": "high",
|
||||||
|
"recommendation": "Batch related shell work into one script, not many small Bash calls",
|
||||||
|
"repo": "artifact-store",
|
||||||
|
"score": 13128.0,
|
||||||
|
"signal_type": "tool_thrash",
|
||||||
|
"title": "problem: tool thrash"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:tool_thrash:tool:Bash",
|
||||||
|
"priority": "high",
|
||||||
|
"recommendation": "Batch related shell work into one script, not many small Bash calls",
|
||||||
|
"repo": "citation-evidence",
|
||||||
|
"score": 13128.0,
|
||||||
|
"signal_type": "tool_thrash",
|
||||||
|
"title": "problem: tool thrash"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:tool_thrash:tool:Bash",
|
||||||
|
"priority": "high",
|
||||||
|
"recommendation": "Batch related shell work into one script, not many small Bash calls",
|
||||||
|
"repo": "infospace-bench",
|
||||||
|
"score": 13128.0,
|
||||||
|
"signal_type": "tool_thrash",
|
||||||
|
"title": "problem: tool thrash"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:tool_thrash:tool:Bash",
|
||||||
|
"priority": "high",
|
||||||
|
"recommendation": "Batch related shell work into one script, not many small Bash calls",
|
||||||
|
"repo": "railiance-apps",
|
||||||
|
"score": 13128.0,
|
||||||
|
"signal_type": "tool_thrash",
|
||||||
|
"title": "problem: tool thrash"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:tool_thrash:tool:Bash",
|
||||||
|
"priority": "high",
|
||||||
|
"recommendation": "Batch related shell work into one script, not many small Bash calls",
|
||||||
|
"repo": "state-hub",
|
||||||
|
"score": 13128.0,
|
||||||
|
"signal_type": "tool_thrash",
|
||||||
|
"title": "problem: tool thrash"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:schema_thrash:schema_load",
|
||||||
|
"priority": "high",
|
||||||
|
"recommendation": "Load the tool schemas you'll need once, up front",
|
||||||
|
"repo": "activity-core",
|
||||||
|
"score": 441.0,
|
||||||
|
"signal_type": "schema_thrash",
|
||||||
|
"title": "problem: schema thrash"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:schema_thrash:schema_load",
|
||||||
|
"priority": "high",
|
||||||
|
"recommendation": "Load the tool schemas you'll need once, up front",
|
||||||
|
"repo": "citation-evidence",
|
||||||
|
"score": 441.0,
|
||||||
|
"signal_type": "schema_thrash",
|
||||||
|
"title": "problem: schema thrash"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:schema_thrash:schema_load",
|
||||||
|
"priority": "high",
|
||||||
|
"recommendation": "Load the tool schemas you'll need once, up front",
|
||||||
|
"repo": "flex-auth",
|
||||||
|
"score": 441.0,
|
||||||
|
"signal_type": "schema_thrash",
|
||||||
|
"title": "problem: schema thrash"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:schema_thrash:schema_load",
|
||||||
|
"priority": "high",
|
||||||
|
"recommendation": "Load the tool schemas you'll need once, up front",
|
||||||
|
"repo": "infospace-bench",
|
||||||
|
"score": 441.0,
|
||||||
|
"signal_type": "schema_thrash",
|
||||||
|
"title": "problem: schema thrash"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:schema_thrash:schema_load",
|
||||||
|
"priority": "high",
|
||||||
|
"recommendation": "Load the tool schemas you'll need once, up front",
|
||||||
|
"repo": "ops-bridge",
|
||||||
|
"score": 441.0,
|
||||||
|
"signal_type": "schema_thrash",
|
||||||
|
"title": "problem: schema thrash"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
|
||||||
|
"priority": "high",
|
||||||
|
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
|
||||||
|
"repo": "activity-core",
|
||||||
|
"score": 290.0,
|
||||||
|
"signal_type": "recurring_error",
|
||||||
|
"title": "problem: recurring error"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
|
||||||
|
"priority": "high",
|
||||||
|
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
|
||||||
|
"repo": "citation-evidence",
|
||||||
|
"score": 290.0,
|
||||||
|
"signal_type": "recurring_error",
|
||||||
|
"title": "problem: recurring error"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
|
||||||
|
"priority": "high",
|
||||||
|
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
|
||||||
|
"repo": "infospace-bench",
|
||||||
|
"score": 290.0,
|
||||||
|
"signal_type": "recurring_error",
|
||||||
|
"title": "problem: recurring error"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
|
||||||
|
"priority": "high",
|
||||||
|
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
|
||||||
|
"repo": "issue-facade",
|
||||||
|
"score": 290.0,
|
||||||
|
"signal_type": "recurring_error",
|
||||||
|
"title": "problem: recurring error"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
|
||||||
|
"priority": "high",
|
||||||
|
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
|
||||||
|
"repo": "railiance-apps",
|
||||||
|
"score": 290.0,
|
||||||
|
"signal_type": "recurring_error",
|
||||||
|
"title": "problem: recurring error"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
|
||||||
|
"priority": "high",
|
||||||
|
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
|
||||||
|
"repo": "state-hub",
|
||||||
|
"score": 290.0,
|
||||||
|
"signal_type": "recurring_error",
|
||||||
|
"title": "problem: recurring error"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
|
||||||
|
"priority": "high",
|
||||||
|
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
|
||||||
|
"repo": "the-custodian",
|
||||||
|
"score": 290.0,
|
||||||
|
"signal_type": "recurring_error",
|
||||||
|
"title": "problem: recurring error"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:recurring_error:<tool_use_error>file has not been read yet. read it first before writing to it.<<path>>",
|
||||||
|
"priority": "high",
|
||||||
|
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
|
||||||
|
"repo": "vergabe-teilnahme",
|
||||||
|
"score": 290.0,
|
||||||
|
"signal_type": "recurring_error",
|
||||||
|
"title": "problem: recurring error"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:recurring_error:<tool_use_error>file has been modified since read, either by the user or by a linter. read it again before attempting to write it.<<path>>",
|
||||||
|
"priority": "medium",
|
||||||
|
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
|
||||||
|
"repo": "artifact-store",
|
||||||
|
"score": 78.0,
|
||||||
|
"signal_type": "recurring_error",
|
||||||
|
"title": "problem: recurring error"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:recurring_error:<tool_use_error>file has been modified since read, either by the user or by a linter. read it again before attempting to write it.<<path>>",
|
||||||
|
"priority": "medium",
|
||||||
|
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
|
||||||
|
"repo": "issue-facade",
|
||||||
|
"score": 78.0,
|
||||||
|
"signal_type": "recurring_error",
|
||||||
|
"title": "problem: recurring error"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:recurring_error:<tool_use_error>file has been modified since read, either by the user or by a linter. read it again before attempting to write it.<<path>>",
|
||||||
|
"priority": "medium",
|
||||||
|
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
|
||||||
|
"repo": "railiance-apps",
|
||||||
|
"score": 78.0,
|
||||||
|
"signal_type": "recurring_error",
|
||||||
|
"title": "problem: recurring error"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:recurring_error:<tool_use_error>file has been modified since read, either by the user or by a linter. read it again before attempting to write it.<<path>>",
|
||||||
|
"priority": "medium",
|
||||||
|
"recommendation": "Read the file (or the region you'll touch) before Edit/Write",
|
||||||
|
"repo": "state-hub",
|
||||||
|
"score": 78.0,
|
||||||
|
"signal_type": "recurring_error",
|
||||||
|
"title": "problem: recurring error"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:budget_overrun:tokens",
|
||||||
|
"priority": "medium",
|
||||||
|
"recommendation": "Read narrowly \u2014 target the region you need, not whole large files",
|
||||||
|
"repo": "artifact-store",
|
||||||
|
"score": 50.55,
|
||||||
|
"signal_type": "budget_overrun",
|
||||||
|
"title": "problem: budget overrun"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:recurring_error:{",
|
||||||
|
"priority": "medium",
|
||||||
|
"recommendation": "Investigate recurring error on {",
|
||||||
|
"repo": "vergabe-teilnahme",
|
||||||
|
"score": 12.0,
|
||||||
|
"signal_type": "recurring_error",
|
||||||
|
"title": "problem: recurring error"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:recurring_error:found <n> errors (<n> fixed, <n> remaining).",
|
||||||
|
"priority": "medium",
|
||||||
|
"recommendation": "Investigate recurring error on found <n> errors (<n> fixed, <n> remaining).",
|
||||||
|
"repo": "ops-bridge",
|
||||||
|
"score": 10.0,
|
||||||
|
"signal_type": "recurring_error",
|
||||||
|
"title": "problem: recurring error"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:recurring_error:(note: edit also tried swapping \\uxxxx escapes and their characters; neither form matched, so the mismatch is likely elsewhere in old_string. re-read the file a",
|
||||||
|
"priority": "medium",
|
||||||
|
"recommendation": "Investigate recurring error on (note: edit also tried swapping \\uxxxx escapes and their characters; neither form matched, so the mismatch is likely elsewhere in old_string. re-read the file a",
|
||||||
|
"repo": "net-kingdom",
|
||||||
|
"score": 6.0,
|
||||||
|
"signal_type": "recurring_error",
|
||||||
|
"title": "problem: recurring error"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:recurring_error:found <n> error (<n> fixed, <n> remaining).",
|
||||||
|
"priority": "medium",
|
||||||
|
"recommendation": "Investigate recurring error on found <n> error (<n> fixed, <n> remaining).",
|
||||||
|
"repo": "ops-bridge",
|
||||||
|
"score": 6.0,
|
||||||
|
"signal_type": "recurring_error",
|
||||||
|
"title": "problem: recurring error"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cross_flavor": false,
|
||||||
|
"pattern_key": "problem:recurring_error:<n> failed, <n> passed in <n>.00s",
|
||||||
|
"priority": "medium",
|
||||||
|
"recommendation": "Investigate recurring error on <n> failed, <n> passed in <n>.00s",
|
||||||
|
"repo": "agentic-resources",
|
||||||
|
"score": 4.0,
|
||||||
|
"signal_type": "recurring_error",
|
||||||
|
"title": "problem: recurring error"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"window": {
|
||||||
|
"days": 30,
|
||||||
|
"since": "2026-05-08T19:30:56Z",
|
||||||
|
"until": "2026-06-07T19:30:56Z"
|
||||||
|
}
|
||||||
|
}
|
||||||
39
session_memory/retro/last_retro.md
Normal file
39
session_memory/retro/last_retro.md
Normal file
@@ -0,0 +1,39 @@
|
|||||||
|
# Weekly Coding Retro (2026-05-08 → 2026-06-07)
|
||||||
|
_23 real sessions · generated 2026-06-07T19:30:56Z_
|
||||||
|
|
||||||
|
## Top improvement suggestions (cross-flavor first, ≤3 per repo)
|
||||||
|
- **net-kingdom** (high, score=54.0) [CROSS-FLAVOR]: cross-flavor problem: recurring error — Investigate recurring error on make: *** [makefile:<n>: fix-consistency] error <n>
|
||||||
|
- **activity-core** (high, score=13128.0): problem: tool thrash — Batch related shell work into one script, not many small Bash calls
|
||||||
|
- **artifact-store** (high, score=13128.0): problem: tool thrash — Batch related shell work into one script, not many small Bash calls
|
||||||
|
- **citation-evidence** (high, score=13128.0): problem: tool thrash — Batch related shell work into one script, not many small Bash calls
|
||||||
|
- **infospace-bench** (high, score=13128.0): problem: tool thrash — Batch related shell work into one script, not many small Bash calls
|
||||||
|
- **railiance-apps** (high, score=13128.0): problem: tool thrash — Batch related shell work into one script, not many small Bash calls
|
||||||
|
- **state-hub** (high, score=13128.0): problem: tool thrash — Batch related shell work into one script, not many small Bash calls
|
||||||
|
- **activity-core** (high, score=441.0): problem: schema thrash — Load the tool schemas you'll need once, up front
|
||||||
|
- **citation-evidence** (high, score=441.0): problem: schema thrash — Load the tool schemas you'll need once, up front
|
||||||
|
- **flex-auth** (high, score=441.0): problem: schema thrash — Load the tool schemas you'll need once, up front
|
||||||
|
- **infospace-bench** (high, score=441.0): problem: schema thrash — Load the tool schemas you'll need once, up front
|
||||||
|
- **ops-bridge** (high, score=441.0): problem: schema thrash — Load the tool schemas you'll need once, up front
|
||||||
|
- **activity-core** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
|
||||||
|
- **citation-evidence** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
|
||||||
|
- **infospace-bench** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
|
||||||
|
- **issue-facade** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
|
||||||
|
- **railiance-apps** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
|
||||||
|
- **state-hub** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
|
||||||
|
- **the-custodian** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
|
||||||
|
- **vergabe-teilnahme** (high, score=290.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
|
||||||
|
- **artifact-store** (medium, score=78.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
|
||||||
|
- **issue-facade** (medium, score=78.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
|
||||||
|
- **railiance-apps** (medium, score=78.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
|
||||||
|
- **state-hub** (medium, score=78.0): problem: recurring error — Read the file (or the region you'll touch) before Edit/Write
|
||||||
|
- **artifact-store** (medium, score=50.55): problem: budget overrun — Read narrowly — target the region you need, not whole large files
|
||||||
|
- **vergabe-teilnahme** (medium, score=12.0): problem: recurring error — Investigate recurring error on {
|
||||||
|
- **ops-bridge** (medium, score=10.0): problem: recurring error — Investigate recurring error on found <n> errors (<n> fixed, <n> remaining).
|
||||||
|
- **net-kingdom** (medium, score=6.0): problem: recurring error — Investigate recurring error on (note: edit also tried swapping \uxxxx escapes and their characters; neither form matched, so the mismatch is likely elsewhere in old_string. re-read the file a
|
||||||
|
- **ops-bridge** (medium, score=6.0): problem: recurring error — Investigate recurring error on found <n> error (<n> fixed, <n> remaining).
|
||||||
|
- **agentic-resources** (medium, score=4.0): problem: recurring error — Investigate recurring error on <n> failed, <n> passed in <n>.00s
|
||||||
|
|
||||||
|
## Fleet snapshot
|
||||||
|
- infra-overhead median: 0.167
|
||||||
|
- error rate: 0.957 · schema-thrash: 7
|
||||||
|
- success rate: 1.0 · tokens p50: 250725
|
||||||
78
session_memory/retro/publish.py
Normal file
78
session_memory/retro/publish.py
Normal file
@@ -0,0 +1,78 @@
|
|||||||
|
"""Publish the weekly retro (AGENTIC-WP-0010 T02).
|
||||||
|
|
||||||
|
The retro is published to the State Hub as a **read model** — a progress event of
|
||||||
|
``event_type=coding_retro`` whose ``detail`` carries the structured report. This is
|
||||||
|
exactly how ``daily-triage-report`` surfaces, and it is what activity-core's
|
||||||
|
``coding_retro`` resolver (ACTIVITY-WP-0008) reads. A local JSON + markdown report
|
||||||
|
is always written; the hub publish is best-effort and **degrades gracefully** when
|
||||||
|
the hub is unreachable.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import urllib.request
|
||||||
|
from typing import Callable, Optional
|
||||||
|
|
||||||
|
DEFAULT_HUB = "http://127.0.0.1:8000"
|
||||||
|
|
||||||
|
|
||||||
|
def render_markdown(report: dict) -> str:
|
||||||
|
w = report.get("window", {})
|
||||||
|
lines = [
|
||||||
|
f"# Weekly Coding Retro ({w.get('since', '')[:10]} → {w.get('until', '')[:10]})",
|
||||||
|
f"_{report.get('n_sessions', 0)} real sessions · generated {report.get('generated_at', '')}_",
|
||||||
|
"",
|
||||||
|
"## Top improvement suggestions (cross-flavor first, ≤3 per repo)",
|
||||||
|
]
|
||||||
|
if not report.get("suggestions"):
|
||||||
|
lines.append("- (no recurring problems above threshold this week)")
|
||||||
|
for s in report.get("suggestions", []):
|
||||||
|
flag = " [CROSS-FLAVOR]" if s.get("cross_flavor") else ""
|
||||||
|
lines.append(f"- **{s['repo']}** ({s['priority']}, score={s['score']}){flag}: "
|
||||||
|
f"{s['title']} — {s['recommendation']}")
|
||||||
|
m = report.get("measure", {})
|
||||||
|
lines += ["", "## Fleet snapshot",
|
||||||
|
f"- infra-overhead median: {m.get('infra_overhead_share_median')}",
|
||||||
|
f"- error rate: {m.get('error_rate')} · schema-thrash: {m.get('schema_thrash_sessions')}",
|
||||||
|
f"- success rate: {m.get('success_rate')} · tokens p50: {m.get('tokens_p50')}"]
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
def write_local(report: dict, json_path: str, md_path: Optional[str] = None) -> None:
|
||||||
|
os.makedirs(os.path.dirname(json_path) or ".", exist_ok=True)
|
||||||
|
with open(json_path, "w", encoding="utf-8") as fh:
|
||||||
|
json.dump(report, fh, indent=2, sort_keys=True)
|
||||||
|
fh.write("\n")
|
||||||
|
if md_path:
|
||||||
|
with open(md_path, "w", encoding="utf-8") as fh:
|
||||||
|
fh.write(render_markdown(report))
|
||||||
|
fh.write("\n")
|
||||||
|
|
||||||
|
|
||||||
|
def _http_post(url: str, payload: dict) -> None:
|
||||||
|
req = urllib.request.Request(url, data=json.dumps(payload).encode(),
|
||||||
|
headers={"Content-Type": "application/json"}, method="POST")
|
||||||
|
with urllib.request.urlopen(req, timeout=10) as r:
|
||||||
|
r.read()
|
||||||
|
|
||||||
|
|
||||||
|
def publish_to_hub(report: dict, *, base_url: str = DEFAULT_HUB,
|
||||||
|
poster: Optional[Callable[[str, dict], None]] = None) -> bool:
|
||||||
|
"""POST the retro as an event_type=coding_retro progress event. Best-effort."""
|
||||||
|
poster = poster or _http_post
|
||||||
|
n = report.get("n_sessions", 0)
|
||||||
|
k = len(report.get("suggestions", []))
|
||||||
|
payload = {
|
||||||
|
"event_type": "coding_retro",
|
||||||
|
"author": "helix-forge",
|
||||||
|
"summary": f"Weekly coding retro: {k} ranked suggestions across "
|
||||||
|
f"{report.get('window', {}).get('days', 7)} days ({n} sessions).",
|
||||||
|
"detail": report,
|
||||||
|
}
|
||||||
|
try:
|
||||||
|
poster(f"{base_url.rstrip('/')}/progress/", payload)
|
||||||
|
return True
|
||||||
|
except Exception:
|
||||||
|
return False
|
||||||
62
tests/test_catalog_covers.py
Normal file
62
tests/test_catalog_covers.py
Normal file
@@ -0,0 +1,62 @@
|
|||||||
|
"""find_for / covers tests (AGENTIC-WP-0010 follow-up)."""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from session_memory.curate.catalog import Catalog # noqa: E402
|
||||||
|
from session_memory.curate.schema import ( # noqa: E402
|
||||||
|
Provenance,
|
||||||
|
Resolution,
|
||||||
|
SolutionPattern,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _pattern(pid, src, covers=None, name="P"):
|
||||||
|
return SolutionPattern(
|
||||||
|
id=pid, name=name, version="1.0.0", polarity="problem", problem="p",
|
||||||
|
resolutions=[Resolution(summary="do x")],
|
||||||
|
provenance=Provenance(source_key=src), covers=covers or [])
|
||||||
|
|
||||||
|
|
||||||
|
def test_covers_round_trips(tmp_path):
|
||||||
|
cat = Catalog(str(tmp_path))
|
||||||
|
cat.upsert(_pattern("sp-a", "problem:file_not_read:edit",
|
||||||
|
covers=["file has not been read"]))
|
||||||
|
assert cat.load("sp-a").covers == ["file has not been read"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_find_for_exact_key(tmp_path):
|
||||||
|
cat = Catalog(str(tmp_path))
|
||||||
|
cat.upsert(_pattern(SolutionPattern.make_id("problem:retry_storm:retries"),
|
||||||
|
"problem:retry_storm:retries"))
|
||||||
|
got = cat.find_for("problem:retry_storm:retries")
|
||||||
|
assert got is not None and got.id == "sp-problem-retry_storm-retries"
|
||||||
|
|
||||||
|
|
||||||
|
def test_find_for_covers_match(tmp_path):
|
||||||
|
cat = Catalog(str(tmp_path))
|
||||||
|
cat.upsert(_pattern("sp-rbe", "problem:file_not_read:edit",
|
||||||
|
covers=["file has not been read", "modified since read"]))
|
||||||
|
# a recurring_error signal with a different key but matching fingerprint locus
|
||||||
|
got = cat.find_for(
|
||||||
|
"problem:recurring_error:<tool_use_error>file has not been read yet...",
|
||||||
|
locus="<tool_use_error>file has not been read yet. read it first...")
|
||||||
|
assert got is not None and got.id == "sp-rbe"
|
||||||
|
|
||||||
|
|
||||||
|
def test_find_for_no_match_returns_none(tmp_path):
|
||||||
|
cat = Catalog(str(tmp_path))
|
||||||
|
cat.upsert(_pattern("sp-rbe", "problem:file_not_read:edit",
|
||||||
|
covers=["file has not been read"]))
|
||||||
|
assert cat.find_for("problem:recurring_error:some unrelated error") is None
|
||||||
|
|
||||||
|
|
||||||
|
def test_covers_change_versions(tmp_path):
|
||||||
|
cat = Catalog(str(tmp_path))
|
||||||
|
cat.upsert(_pattern("sp-a", "problem:x:y"))
|
||||||
|
p = cat.load("sp-a")
|
||||||
|
p.covers = ["new coverage"]
|
||||||
|
assert cat.upsert(p) == "versioned" # covers is substantive content
|
||||||
|
assert cat.load("sp-a").version == "1.0.1"
|
||||||
54
tests/test_cluster.py
Normal file
54
tests/test_cluster.py
Normal file
@@ -0,0 +1,54 @@
|
|||||||
|
"""Clusterer + evidence + cross-flavor tests (T05/T06)."""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from session_memory.detect.cluster import cluster # noqa: E402
|
||||||
|
from session_memory.detect.signals import PROBLEM, SUCCESS, Signal # noqa: E402
|
||||||
|
|
||||||
|
|
||||||
|
def _sig(uid, flavor, repo, type_, polarity, locus, mag=1.0):
|
||||||
|
return Signal(session_uid=uid, flavor=flavor, repo=repo, type=type_,
|
||||||
|
polarity=polarity, locus=locus, magnitude=mag)
|
||||||
|
|
||||||
|
|
||||||
|
def test_min_frequency_filters_singletons():
|
||||||
|
sigs = [_sig("claude:a", "claude", "r1", "retry_storm", PROBLEM, "retries")]
|
||||||
|
assert cluster(sigs, min_frequency=2) == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_clusters_recurring_signal_with_evidence():
|
||||||
|
sigs = [
|
||||||
|
_sig("claude:a", "claude", "r1", "retry_storm", PROBLEM, "retries", 5),
|
||||||
|
_sig("claude:b", "claude", "r2", "retry_storm", PROBLEM, "retries", 3),
|
||||||
|
]
|
||||||
|
pats = cluster(sigs, min_frequency=2)
|
||||||
|
assert len(pats) == 1
|
||||||
|
p = pats[0]
|
||||||
|
assert p.frequency == 2
|
||||||
|
assert p.sessions == ["claude:a", "claude:b"]
|
||||||
|
assert sorted(p.repos) == ["r1", "r2"]
|
||||||
|
assert p.flavors == ["claude"]
|
||||||
|
assert p.cross_flavor is False
|
||||||
|
assert p.cost_impact == 8.0
|
||||||
|
|
||||||
|
|
||||||
|
def test_cross_flavor_flagged_and_ranked_first():
|
||||||
|
sigs = [
|
||||||
|
# cross-flavor problem (claude + codex)
|
||||||
|
_sig("claude:a", "claude", "r1", "repeated_errors", PROBLEM, "errors", 3),
|
||||||
|
_sig("codex:b", "codex", "r2", "repeated_errors", PROBLEM, "errors", 3),
|
||||||
|
# single-flavor success cluster with higher raw impact
|
||||||
|
_sig("grok:c", "grok", "r3", "clean_pass", SUCCESS, "outcome", 5),
|
||||||
|
_sig("grok:d", "grok", "r4", "clean_pass", SUCCESS, "outcome", 5),
|
||||||
|
]
|
||||||
|
pats = cluster(sigs, min_frequency=2)
|
||||||
|
assert len(pats) == 2
|
||||||
|
xf = next(p for p in pats if p.signal_type == "repeated_errors")
|
||||||
|
assert xf.cross_flavor is True
|
||||||
|
assert sorted(xf.flavors) == ["claude", "codex"]
|
||||||
|
# cross-flavor pattern is ranked first even if another has higher raw impact
|
||||||
|
assert pats[0].cross_flavor is True
|
||||||
|
assert "cross-flavor" in pats[0].title
|
||||||
86
tests/test_codex_adapter.py
Normal file
86
tests/test_codex_adapter.py
Normal file
@@ -0,0 +1,86 @@
|
|||||||
|
"""Codex adapter tests (T01): synthetic rollout fixture."""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from session_memory.adapters.codex import parse_session # noqa: E402
|
||||||
|
|
||||||
|
REPO_MAP = {"agentic-resources": "helix_forge"}
|
||||||
|
|
||||||
|
|
||||||
|
def _rollout(path, lines):
|
||||||
|
with open(path, "w", encoding="utf-8") as f:
|
||||||
|
for ln in lines:
|
||||||
|
f.write(json.dumps(ln) + "\n")
|
||||||
|
|
||||||
|
|
||||||
|
def test_codex_rollout_parse(tmp_path):
|
||||||
|
p = tmp_path / "rollout-2026-06-06-abc.jsonl"
|
||||||
|
_rollout(p, [
|
||||||
|
{"timestamp": "2026-06-06T10:00:00Z", "type": "session_meta",
|
||||||
|
"payload": {"id": "cdx-1", "cwd": "/home/worsch/agentic-resources",
|
||||||
|
"model_provider": "openai", "cli_version": "0.44.0", "model": "gpt-5-codex"}},
|
||||||
|
{"timestamp": "2026-06-06T10:00:01Z", "type": "turn_context",
|
||||||
|
"payload": {"model": "gpt-5-codex", "approval_policy": "on-request"}},
|
||||||
|
{"timestamp": "2026-06-06T10:00:02Z", "type": "event_msg",
|
||||||
|
"payload": {"type": "task_started"}},
|
||||||
|
{"timestamp": "2026-06-06T10:00:03Z", "type": "response_item",
|
||||||
|
"payload": {"type": "message", "role": "user",
|
||||||
|
"content": [{"type": "input_text", "text": "fix the bug"}]}},
|
||||||
|
{"timestamp": "2026-06-06T10:00:04Z", "type": "response_item",
|
||||||
|
"payload": {"type": "reasoning", "summary": "think about it"}},
|
||||||
|
{"timestamp": "2026-06-06T10:00:05Z", "type": "response_item",
|
||||||
|
"payload": {"type": "function_call", "name": "apply_patch",
|
||||||
|
"arguments": "{\"path\":\"x.py\"}", "call_id": "call_1"}},
|
||||||
|
{"timestamp": "2026-06-06T10:00:06Z", "type": "response_item",
|
||||||
|
"payload": {"type": "function_call", "name": "shell",
|
||||||
|
"arguments": "{\"command\":\"pytest -q\"}", "call_id": "call_2"}},
|
||||||
|
{"timestamp": "2026-06-06T10:00:07Z", "type": "response_item",
|
||||||
|
"payload": {"type": "function_call_output", "call_id": "call_2", "output": "2 passed"}},
|
||||||
|
{"timestamp": "2026-06-06T10:00:08Z", "type": "response_item",
|
||||||
|
"payload": {"type": "message", "role": "assistant",
|
||||||
|
"content": [{"type": "output_text", "text": "done"}]}},
|
||||||
|
{"timestamp": "2026-06-06T10:00:09Z", "type": "event_msg",
|
||||||
|
"payload": {"type": "token_count",
|
||||||
|
"info": {"total_token_usage": {"input_tokens": 200, "output_tokens": 30,
|
||||||
|
"cached_input_tokens": 15}}}},
|
||||||
|
{"timestamp": "2026-06-06T10:00:10Z", "type": "event_msg",
|
||||||
|
"payload": {"type": "task_complete"}},
|
||||||
|
])
|
||||||
|
|
||||||
|
norm = parse_session(str(p), REPO_MAP)
|
||||||
|
assert norm is not None
|
||||||
|
s = norm.session
|
||||||
|
assert s.session_uid == "codex:cdx-1"
|
||||||
|
assert s.flavor == "codex"
|
||||||
|
assert s.repo == "agentic-resources" and s.domain == "helix_forge"
|
||||||
|
assert s.model == "gpt-5-codex"
|
||||||
|
assert s.cost.input_tokens == 200 and s.cost.output_tokens == 30 and s.cost.cache_tokens == 15
|
||||||
|
assert s.cost.turns == 1
|
||||||
|
assert s.cost.wall_clock_s == 10.0
|
||||||
|
|
||||||
|
kinds = [e.kind for e in norm.events]
|
||||||
|
assert kinds == ["lifecycle", "user_msg", "thinking", "edit", "test_run",
|
||||||
|
"tool_result", "assistant_msg", "completion"]
|
||||||
|
|
||||||
|
# flat linkage: function_call_output links to its function_call by call_id
|
||||||
|
out = next(e for e in norm.events if e.kind == "tool_result")
|
||||||
|
test_call = next(e for e in norm.events if e.kind == "test_run")
|
||||||
|
assert out.parent_seq == test_call.seq
|
||||||
|
|
||||||
|
# apply_patch classified as edit; pytest as test_run
|
||||||
|
edit = next(e for e in norm.events if e.kind == "edit")
|
||||||
|
assert edit.tool == "apply_patch"
|
||||||
|
|
||||||
|
|
||||||
|
def test_codex_empty_or_no_meta_returns_none(tmp_path):
|
||||||
|
p = tmp_path / "rollout-empty.jsonl"
|
||||||
|
p.write_text("")
|
||||||
|
assert parse_session(str(p), REPO_MAP) is None
|
||||||
|
|
||||||
|
p2 = tmp_path / "rollout-nometa.jsonl"
|
||||||
|
_rollout(p2, [{"timestamp": "t", "type": "event_msg", "payload": {"type": "task_started"}}])
|
||||||
|
assert parse_session(str(p2), REPO_MAP) is None # no session_meta -> no id
|
||||||
86
tests/test_curate_catalog.py
Normal file
86
tests/test_curate_catalog.py
Normal file
@@ -0,0 +1,86 @@
|
|||||||
|
"""Versioned Pattern Catalog tests (T02): round-trip, dedup, idempotent upsert."""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from session_memory.curate.catalog import ( # noqa: E402
|
||||||
|
ADDED,
|
||||||
|
UNCHANGED,
|
||||||
|
UPDATED,
|
||||||
|
VERSIONED,
|
||||||
|
Catalog,
|
||||||
|
)
|
||||||
|
from session_memory.curate.schema import ( # noqa: E402
|
||||||
|
Provenance,
|
||||||
|
Resolution,
|
||||||
|
Scope,
|
||||||
|
SolutionPattern,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _pattern(src="success:clean_pass:outcome", problem="ran tests, clean finish"):
|
||||||
|
return SolutionPattern(
|
||||||
|
id=SolutionPattern.make_id(src),
|
||||||
|
name="Run tests before declaring success",
|
||||||
|
version="1.0.0",
|
||||||
|
polarity="success",
|
||||||
|
problem=problem,
|
||||||
|
resolutions=[Resolution(summary="run the suite")],
|
||||||
|
scope=Scope(flavors=["claude", "grok"]),
|
||||||
|
provenance=Provenance(source_key=src, evidence={"frequency": 18}),
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_add_then_load_round_trips(tmp_path):
|
||||||
|
cat = Catalog(str(tmp_path))
|
||||||
|
assert cat.upsert(_pattern()) == ADDED
|
||||||
|
loaded = cat.load(SolutionPattern.make_id("success:clean_pass:outcome"))
|
||||||
|
assert loaded is not None
|
||||||
|
assert loaded.problem == "ran tests, clean finish"
|
||||||
|
assert loaded.created_at and loaded.updated_at
|
||||||
|
assert [p.id for p in cat.list()] == [loaded.id]
|
||||||
|
|
||||||
|
|
||||||
|
def test_resave_identical_is_noop(tmp_path):
|
||||||
|
cat = Catalog(str(tmp_path))
|
||||||
|
cat.upsert(_pattern())
|
||||||
|
assert cat.upsert(_pattern()) == UNCHANGED
|
||||||
|
# version not bumped, no history written
|
||||||
|
assert cat.load(_pattern().id).version == "1.0.0"
|
||||||
|
assert cat.history(_pattern().id) == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_dedup_on_source_key(tmp_path):
|
||||||
|
cat = Catalog(str(tmp_path))
|
||||||
|
cat.upsert(_pattern())
|
||||||
|
cat.upsert(_pattern()) # same source key -> same id -> one file
|
||||||
|
assert len(cat.list()) == 1
|
||||||
|
|
||||||
|
|
||||||
|
def test_content_change_bumps_version_and_archives(tmp_path):
|
||||||
|
cat = Catalog(str(tmp_path))
|
||||||
|
cat.upsert(_pattern())
|
||||||
|
assert cat.upsert(_pattern(problem="now with more nuance")) == VERSIONED
|
||||||
|
current = cat.load(_pattern().id)
|
||||||
|
assert current.version == "1.0.1"
|
||||||
|
assert current.problem == "now with more nuance"
|
||||||
|
hist = cat.history(_pattern().id)
|
||||||
|
assert len(hist) == 1
|
||||||
|
assert hist[0]["version"] == "1.0.0"
|
||||||
|
assert hist[0]["status"] == "superseded"
|
||||||
|
|
||||||
|
|
||||||
|
def test_status_only_change_updates_without_bump(tmp_path):
|
||||||
|
cat = Catalog(str(tmp_path))
|
||||||
|
cat.upsert(_pattern())
|
||||||
|
p = _pattern()
|
||||||
|
p.status = "approved"
|
||||||
|
p.distribution_ready = True
|
||||||
|
assert cat.upsert(p) == UPDATED
|
||||||
|
current = cat.load(p.id)
|
||||||
|
assert current.status == "approved"
|
||||||
|
assert current.distribution_ready is True
|
||||||
|
assert current.version == "1.0.0" # metadata change, no bump
|
||||||
|
assert cat.history(p.id) == []
|
||||||
70
tests/test_curate_decisions.py
Normal file
70
tests/test_curate_decisions.py
Normal file
@@ -0,0 +1,70 @@
|
|||||||
|
"""Hub decision integration tests (T05): payload shape + graceful queue/flush."""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from session_memory.curate.catalog import Catalog # noqa: E402
|
||||||
|
from session_memory.curate.decisions import DecisionRecorder, build_decision # noqa: E402
|
||||||
|
from session_memory.curate.review import APPROVE, REJECT, ReviewLog, review # noqa: E402
|
||||||
|
|
||||||
|
|
||||||
|
def _candidate(key="success:clean_pass:outcome"):
|
||||||
|
return {"key": key, "frequency": 18, "sessions": ["a", "b"],
|
||||||
|
"cost_impact": 9.0, "cross_flavor": True, "flavors": ["claude", "grok"]}
|
||||||
|
|
||||||
|
|
||||||
|
def test_build_decision_payload_shape():
|
||||||
|
d = build_decision(_candidate(), "approve", "looks solid", workstream_id="ws-1")
|
||||||
|
assert d["decision_type"] == "made"
|
||||||
|
assert d["workstream_id"] == "ws-1"
|
||||||
|
assert "Promote" in d["title"]
|
||||||
|
assert d["rationale"] == "looks solid"
|
||||||
|
assert "success:clean_pass:outcome" in d["description"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_sink_accepts_decision(tmp_path):
|
||||||
|
captured = []
|
||||||
|
rec = DecisionRecorder(str(tmp_path / "q.jsonl"), sink=captured.append)
|
||||||
|
assert rec.record(_candidate(), "approve", "ok") is True
|
||||||
|
assert rec.pending() == []
|
||||||
|
assert len(captured) == 1
|
||||||
|
|
||||||
|
|
||||||
|
def test_queues_when_sink_down(tmp_path):
|
||||||
|
def boom(_):
|
||||||
|
raise RuntimeError("hub down")
|
||||||
|
|
||||||
|
rec = DecisionRecorder(str(tmp_path / "q.jsonl"), sink=boom)
|
||||||
|
assert rec.record(_candidate(), "reject", "noise") is False
|
||||||
|
assert len(rec.pending()) == 1
|
||||||
|
|
||||||
|
|
||||||
|
def test_no_sink_defaults_to_queue(tmp_path):
|
||||||
|
rec = DecisionRecorder(str(tmp_path / "q.jsonl"))
|
||||||
|
rec.record(_candidate(), "approve", "ok")
|
||||||
|
assert len(rec.pending()) == 1
|
||||||
|
|
||||||
|
|
||||||
|
def test_flush_replays_queue(tmp_path):
|
||||||
|
rec = DecisionRecorder(str(tmp_path / "q.jsonl")) # offline -> queue
|
||||||
|
rec.record(_candidate("problem:abandoned:outcome"), "reject", "x")
|
||||||
|
rec.record(_candidate("success:clean_pass:outcome"), "approve", "y")
|
||||||
|
captured = []
|
||||||
|
assert rec.flush(sink=captured.append) == 2
|
||||||
|
assert rec.pending() == []
|
||||||
|
assert len(captured) == 2
|
||||||
|
|
||||||
|
|
||||||
|
def test_review_records_each_final_decision(tmp_path):
|
||||||
|
cat = Catalog(str(tmp_path / "catalog"))
|
||||||
|
log = ReviewLog(str(tmp_path / "reviews.jsonl"))
|
||||||
|
captured = []
|
||||||
|
rec = DecisionRecorder(str(tmp_path / "q.jsonl"), sink=captured.append, workstream_id="ws")
|
||||||
|
cands = [_candidate("success:clean_pass:outcome"), _candidate("problem:abandoned:outcome")]
|
||||||
|
review(cands, lambda c: (APPROVE if "success" in c["key"] else REJECT, "r"), cat, log,
|
||||||
|
recorder=rec)
|
||||||
|
assert len(captured) == 2
|
||||||
|
actions = sorted("Promote" in d["title"] for d in captured)
|
||||||
|
assert actions == [False, True]
|
||||||
84
tests/test_curate_entrypoint.py
Normal file
84
tests/test_curate_entrypoint.py
Normal file
@@ -0,0 +1,84 @@
|
|||||||
|
"""Curate entrypoint tests (T06): batch auto-approve end-to-end via the store."""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from session_memory.core.store import Store # noqa: E402
|
||||||
|
from session_memory.curate.__main__ import main # noqa: E402
|
||||||
|
from session_memory.curate.catalog import Catalog # noqa: E402
|
||||||
|
|
||||||
|
|
||||||
|
def _digest(uid, flavor, repo, **markers):
|
||||||
|
return {
|
||||||
|
"session_uid": uid, "flavor": flavor, "repo": repo, "outcome": "fail",
|
||||||
|
"cost": {"input_tokens": 10, "output_tokens": 1},
|
||||||
|
"markers": {"errors": markers.get("errors", 0), "retries": markers.get("retries", 0),
|
||||||
|
"test_runs": 0, "edits": 0, "human_interventions": 0},
|
||||||
|
# real coding session per the quality filter (WP-0005 T01)
|
||||||
|
"event_count": 40, "first_prompt": "Fix the failing build and retry the suite",
|
||||||
|
"tool_histogram": {"Bash": 20, "Edit": 12, "Read": 8},
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _write_config(tmp_path) -> str:
|
||||||
|
store = tmp_path / ".store"
|
||||||
|
catalog = tmp_path / "catalog"
|
||||||
|
cfg = f"""
|
||||||
|
[store]
|
||||||
|
db_path = "{store / 'm.db'}"
|
||||||
|
blob_dir = "{store / 'blobs'}"
|
||||||
|
cursor = "{store / 'c.json'}"
|
||||||
|
|
||||||
|
[curate]
|
||||||
|
catalog_dir = "{catalog}"
|
||||||
|
review_log = "{store / 'reviews.jsonl'}"
|
||||||
|
decision_queue = "{store / 'decisions.queue.jsonl'}"
|
||||||
|
|
||||||
|
[curate.gate]
|
||||||
|
min_frequency = 2
|
||||||
|
min_sessions = 2
|
||||||
|
"""
|
||||||
|
path = tmp_path / "config.toml"
|
||||||
|
path.write_text(cfg)
|
||||||
|
return str(path), str(store), str(catalog)
|
||||||
|
|
||||||
|
|
||||||
|
def test_auto_approve_promotes_cross_flavor(tmp_path, capsys):
|
||||||
|
cfg_path, store_dir, catalog_dir = _write_config(tmp_path)
|
||||||
|
st = Store(os.path.join(store_dir, "m.db"), os.path.join(store_dir, "blobs"))
|
||||||
|
st.write_digest("claude:a", _digest("claude:a", "claude", "r1", retries=5))
|
||||||
|
st.write_digest("codex:b", _digest("codex:b", "codex", "r2", retries=4))
|
||||||
|
st.close()
|
||||||
|
|
||||||
|
rc = main(["--config", cfg_path, "--auto-approve"])
|
||||||
|
assert rc == 0
|
||||||
|
|
||||||
|
cat = Catalog(catalog_dir)
|
||||||
|
patterns = cat.list()
|
||||||
|
assert len(patterns) == 1
|
||||||
|
assert patterns[0].polarity == "problem"
|
||||||
|
# clears the promote floor (freq>=2) but below the default distribution
|
||||||
|
# floor (freq>=3) -> promoted as provisional, not distribution-ready
|
||||||
|
assert patterns[0].status == "provisional"
|
||||||
|
assert patterns[0].distribution_ready is False
|
||||||
|
|
||||||
|
out = capsys.readouterr().out
|
||||||
|
assert "Curate summary" in out
|
||||||
|
# hub offline in tests -> decision queued
|
||||||
|
assert "decisions queued" in out
|
||||||
|
|
||||||
|
|
||||||
|
def test_rerun_is_idempotent(tmp_path):
|
||||||
|
cfg_path, store_dir, catalog_dir = _write_config(tmp_path)
|
||||||
|
st = Store(os.path.join(store_dir, "m.db"), os.path.join(store_dir, "blobs"))
|
||||||
|
st.write_digest("claude:a", _digest("claude:a", "claude", "r1", retries=5))
|
||||||
|
st.write_digest("codex:b", _digest("codex:b", "codex", "r2", retries=4))
|
||||||
|
st.close()
|
||||||
|
|
||||||
|
main(["--config", cfg_path, "--auto-approve"])
|
||||||
|
main(["--config", cfg_path, "--auto-approve"]) # second pass: already decided
|
||||||
|
cat = Catalog(catalog_dir)
|
||||||
|
assert len(cat.list()) == 1
|
||||||
|
assert cat.load(cat.list()[0].id).version == "1.0.0" # no spurious bump
|
||||||
76
tests/test_curate_gating.py
Normal file
76
tests/test_curate_gating.py
Normal file
@@ -0,0 +1,76 @@
|
|||||||
|
"""Evidence-bar + bloat-guard tests (T04)."""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from session_memory.curate.catalog import Catalog # noqa: E402
|
||||||
|
from session_memory.curate.gating import ( # noqa: E402
|
||||||
|
GateConfig,
|
||||||
|
bloat_warnings,
|
||||||
|
evaluate,
|
||||||
|
gate_config,
|
||||||
|
)
|
||||||
|
from session_memory.curate.review import candidate_to_pattern # noqa: E402
|
||||||
|
|
||||||
|
|
||||||
|
def _candidate(key="success:clean_pass:outcome", freq=5, sessions=5, impact=10.0,
|
||||||
|
cross=True, flavors=("claude", "grok")):
|
||||||
|
return {
|
||||||
|
"key": key,
|
||||||
|
"frequency": freq,
|
||||||
|
"sessions": [f"s{i}" for i in range(sessions)],
|
||||||
|
"cost_impact": impact,
|
||||||
|
"cross_flavor": cross,
|
||||||
|
"flavors": list(flavors),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def test_clears_bar_and_distribution_ready():
|
||||||
|
r = evaluate(_candidate(), GateConfig(dist_min_frequency=3))
|
||||||
|
assert r.promotable and r.distribution_ready
|
||||||
|
assert r.status == "approved"
|
||||||
|
|
||||||
|
|
||||||
|
def test_thin_candidate_promotable_but_provisional():
|
||||||
|
# meets promote floor (freq>=2) but below distribution floor (freq<3)
|
||||||
|
r = evaluate(_candidate(freq=2, sessions=2), GateConfig(dist_min_frequency=3))
|
||||||
|
assert r.promotable
|
||||||
|
assert not r.distribution_ready
|
||||||
|
assert r.status == "provisional"
|
||||||
|
|
||||||
|
|
||||||
|
def test_below_promote_floor_not_promotable():
|
||||||
|
r = evaluate(_candidate(freq=1, sessions=1))
|
||||||
|
assert not r.promotable
|
||||||
|
assert any("frequency" in reason for reason in r.reasons)
|
||||||
|
|
||||||
|
|
||||||
|
def test_cross_flavor_required_for_distribution():
|
||||||
|
r = evaluate(_candidate(cross=False), GateConfig(dist_require_cross_flavor=True))
|
||||||
|
assert r.promotable
|
||||||
|
assert not r.distribution_ready
|
||||||
|
assert any("cross-flavor" in reason for reason in r.reasons)
|
||||||
|
|
||||||
|
|
||||||
|
def test_gate_config_reads_toml_dict():
|
||||||
|
cfg = gate_config({"curate": {"gate": {"min_frequency": 9, "dist_require_cross_flavor": True}}})
|
||||||
|
assert cfg.min_frequency == 9
|
||||||
|
assert cfg.dist_require_cross_flavor is True
|
||||||
|
# defaults preserved for unspecified keys
|
||||||
|
assert cfg.dist_min_frequency == 3
|
||||||
|
|
||||||
|
|
||||||
|
def test_bloat_flags_duplicate_and_near_duplicate(tmp_path):
|
||||||
|
cat = Catalog(str(tmp_path))
|
||||||
|
cat.upsert(candidate_to_pattern(_candidate(key="success:clean_pass:outcome")))
|
||||||
|
existing = cat.list()
|
||||||
|
# exact same key -> duplicate
|
||||||
|
dup = bloat_warnings(_candidate(key="success:clean_pass:outcome"), existing)
|
||||||
|
assert any("duplicate" in w for w in dup)
|
||||||
|
# different polarity, same signal_type+locus -> near-duplicate
|
||||||
|
near = bloat_warnings(_candidate(key="problem:clean_pass:outcome"), existing)
|
||||||
|
assert any("near-duplicate" in w for w in near)
|
||||||
|
# unrelated -> no warnings
|
||||||
|
assert bloat_warnings(_candidate(key="problem:retry_storm:retries"), existing) == []
|
||||||
93
tests/test_curate_review.py
Normal file
93
tests/test_curate_review.py
Normal file
@@ -0,0 +1,93 @@
|
|||||||
|
"""Review workflow tests (T03): promote/reject/discuss + idempotent re-review."""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from session_memory.curate.catalog import Catalog # noqa: E402
|
||||||
|
from session_memory.curate.review import ( # noqa: E402
|
||||||
|
APPROVE,
|
||||||
|
DISCUSS,
|
||||||
|
REJECT,
|
||||||
|
ReviewLog,
|
||||||
|
candidate_to_pattern,
|
||||||
|
review,
|
||||||
|
)
|
||||||
|
from session_memory.curate.schema import SolutionPattern # noqa: E402
|
||||||
|
|
||||||
|
|
||||||
|
def _candidate(key="success:clean_pass:outcome", freq=18, flavors=("claude", "grok")):
|
||||||
|
return {
|
||||||
|
"key": key,
|
||||||
|
"polarity": key.split(":")[0],
|
||||||
|
"signal_type": key.split(":")[1],
|
||||||
|
"locus": key.split(":")[2],
|
||||||
|
"title": "cross-flavor success: clean pass",
|
||||||
|
"frequency": freq,
|
||||||
|
"flavors": list(flavors),
|
||||||
|
"repos": ["agentic-resources"],
|
||||||
|
"sessions": [f"s{i}" for i in range(freq)],
|
||||||
|
"cross_flavor": len(flavors) > 1,
|
||||||
|
"cost_impact": 12.5,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _decider(action, rationale="because"):
|
||||||
|
return lambda cand: (action, rationale)
|
||||||
|
|
||||||
|
|
||||||
|
def test_approve_promotes_to_catalog(tmp_path):
|
||||||
|
cat = Catalog(str(tmp_path / "catalog"))
|
||||||
|
log = ReviewLog(str(tmp_path / "reviews.jsonl"))
|
||||||
|
res = review([_candidate()], _decider(APPROVE), cat, log)
|
||||||
|
assert len(res.approved) == 1
|
||||||
|
p = cat.load(SolutionPattern.make_id("success:clean_pass:outcome"))
|
||||||
|
assert p is not None
|
||||||
|
assert p.scope.flavors == ["claude", "grok"]
|
||||||
|
assert set(p.rendering_hints) == {"claude", "grok"}
|
||||||
|
assert p.provenance.evidence["frequency"] == 18
|
||||||
|
|
||||||
|
|
||||||
|
def test_reject_records_no_catalog_write(tmp_path):
|
||||||
|
cat = Catalog(str(tmp_path / "catalog"))
|
||||||
|
log = ReviewLog(str(tmp_path / "reviews.jsonl"))
|
||||||
|
res = review([_candidate()], _decider(REJECT), cat, log)
|
||||||
|
assert res.rejected == ["success:clean_pass:outcome"]
|
||||||
|
assert cat.list() == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_discuss_defers_and_is_not_final(tmp_path):
|
||||||
|
cat = Catalog(str(tmp_path / "catalog"))
|
||||||
|
log = ReviewLog(str(tmp_path / "reviews.jsonl"))
|
||||||
|
res = review([_candidate()], _decider(DISCUSS), cat, log)
|
||||||
|
assert res.deferred == ["success:clean_pass:outcome"]
|
||||||
|
# not recorded as final -> a later pass re-surfaces it
|
||||||
|
res2 = review([_candidate()], _decider(APPROVE), cat, log)
|
||||||
|
assert len(res2.approved) == 1
|
||||||
|
|
||||||
|
|
||||||
|
def test_prior_reject_remembered_same_evidence(tmp_path):
|
||||||
|
cat = Catalog(str(tmp_path / "catalog"))
|
||||||
|
log_path = str(tmp_path / "reviews.jsonl")
|
||||||
|
review([_candidate()], _decider(REJECT), cat, ReviewLog(log_path))
|
||||||
|
# fresh log instance (reloads from disk) + same evidence -> skipped
|
||||||
|
res = review([_candidate()], _decider(APPROVE), cat, ReviewLog(log_path))
|
||||||
|
assert res.skipped == ["success:clean_pass:outcome"]
|
||||||
|
assert cat.list() == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_changed_evidence_resurfaces(tmp_path):
|
||||||
|
cat = Catalog(str(tmp_path / "catalog"))
|
||||||
|
log_path = str(tmp_path / "reviews.jsonl")
|
||||||
|
review([_candidate(freq=18)], _decider(REJECT), cat, ReviewLog(log_path))
|
||||||
|
# more evidence now -> not skipped, gets re-reviewed
|
||||||
|
res = review([_candidate(freq=40)], _decider(APPROVE), cat, ReviewLog(log_path))
|
||||||
|
assert len(res.approved) == 1
|
||||||
|
|
||||||
|
|
||||||
|
def test_candidate_to_pattern_defaults():
|
||||||
|
p = candidate_to_pattern(_candidate(flavors=("claude",)))
|
||||||
|
assert p.status == "provisional"
|
||||||
|
assert p.rendering_hints["claude"]["target"] == "CLAUDE.md"
|
||||||
|
assert p.polarity == "success"
|
||||||
80
tests/test_curate_schema.py
Normal file
80
tests/test_curate_schema.py
Normal file
@@ -0,0 +1,80 @@
|
|||||||
|
"""Round-trip + validation tests for the Solution Pattern schema (T01)."""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from session_memory.curate.schema import ( # noqa: E402
|
||||||
|
Provenance,
|
||||||
|
Resolution,
|
||||||
|
Scope,
|
||||||
|
SolutionPattern,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _sample() -> SolutionPattern:
|
||||||
|
src = "success:clean_pass:outcome"
|
||||||
|
return SolutionPattern(
|
||||||
|
id=SolutionPattern.make_id(src),
|
||||||
|
name="Run tests before declaring success",
|
||||||
|
version="1.0.0",
|
||||||
|
polarity="success",
|
||||||
|
problem="Sessions that run tests and finish with no retries resolve cheaply.",
|
||||||
|
resolutions=[Resolution(summary="Always run the suite", steps=["edit", "test", "commit"])],
|
||||||
|
scope=Scope(flavors=["claude", "grok"]),
|
||||||
|
provenance=Provenance(source_key=src, evidence={"frequency": 18, "cross_flavor": True}),
|
||||||
|
rendering_hints={"claude": {"target": "CLAUDE.md"}, "codex": {"target": "AGENTS.md"}},
|
||||||
|
status="approved",
|
||||||
|
distribution_ready=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_round_trip_is_lossless():
|
||||||
|
p = _sample()
|
||||||
|
again = SolutionPattern.from_json(p.to_json())
|
||||||
|
assert again.to_dict() == p.to_dict()
|
||||||
|
assert again.resolutions[0].steps == ["edit", "test", "commit"]
|
||||||
|
assert again.scope.flavors == ["claude", "grok"]
|
||||||
|
assert again.provenance.evidence["cross_flavor"] is True
|
||||||
|
|
||||||
|
|
||||||
|
def test_serialization_is_deterministic():
|
||||||
|
p = _sample()
|
||||||
|
assert p.to_json() == p.to_json()
|
||||||
|
assert SolutionPattern.from_json(p.to_json()).to_json() == p.to_json()
|
||||||
|
|
||||||
|
|
||||||
|
def test_make_id_is_stable_and_slugged():
|
||||||
|
assert SolutionPattern.make_id("success:clean_pass:outcome") == "sp-success-clean_pass-outcome"
|
||||||
|
# same source key -> same id regardless of later wording
|
||||||
|
assert SolutionPattern.make_id("problem:abandoned:outcome") == SolutionPattern.make_id(
|
||||||
|
"problem:abandoned:outcome"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_bump_version():
|
||||||
|
assert SolutionPattern.bump_version("1.0.0") == "1.0.1"
|
||||||
|
assert SolutionPattern.bump_version("1.2.3", "minor") == "1.3.0"
|
||||||
|
assert SolutionPattern.bump_version("1.2.3", "major") == "2.0.0"
|
||||||
|
|
||||||
|
|
||||||
|
def test_rejects_unknown_polarity():
|
||||||
|
with pytest.raises(ValueError):
|
||||||
|
SolutionPattern(id="x", name="n", version="1.0.0", polarity="meh", problem="p")
|
||||||
|
|
||||||
|
|
||||||
|
def test_rejects_unknown_status():
|
||||||
|
with pytest.raises(ValueError):
|
||||||
|
SolutionPattern(id="x", name="n", version="1.0.0", polarity="problem",
|
||||||
|
problem="p", status="bogus")
|
||||||
|
|
||||||
|
|
||||||
|
def test_rejects_unknown_flavor_in_hints_and_scope():
|
||||||
|
with pytest.raises(ValueError):
|
||||||
|
SolutionPattern(id="x", name="n", version="1.0.0", polarity="problem",
|
||||||
|
problem="p", rendering_hints={"gpt": {}})
|
||||||
|
with pytest.raises(ValueError):
|
||||||
|
Scope(flavors=["gpt"])
|
||||||
47
tests/test_detect_entrypoint.py
Normal file
47
tests/test_detect_entrypoint.py
Normal file
@@ -0,0 +1,47 @@
|
|||||||
|
"""Detect entrypoint tests (T07): end-to-end digests -> patterns, persisted."""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from session_memory.core.store import Store # noqa: E402
|
||||||
|
from session_memory.detect.__main__ import run_detect # noqa: E402
|
||||||
|
|
||||||
|
|
||||||
|
def _digest(uid, flavor, repo, **markers):
|
||||||
|
return {
|
||||||
|
"session_uid": uid, "flavor": flavor, "repo": repo, "outcome": "fail",
|
||||||
|
"cost": {"input_tokens": 10, "output_tokens": 1},
|
||||||
|
"markers": {"errors": markers.get("errors", 0), "retries": markers.get("retries", 0),
|
||||||
|
"test_runs": 0, "edits": 0, "human_interventions": 0},
|
||||||
|
# fields the quality filter (WP-0005 T01) checks — real coding session
|
||||||
|
"event_count": 40, "first_prompt": "Fix the failing build and retry the suite",
|
||||||
|
"tool_histogram": {"Bash": 20, "Edit": 12, "Read": 8},
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _config(tmp_path):
|
||||||
|
return {"store": {"db_path": str(tmp_path / ".store/m.db"),
|
||||||
|
"blob_dir": str(tmp_path / ".store/blobs"),
|
||||||
|
"cursor": str(tmp_path / ".store/c.json")}}
|
||||||
|
|
||||||
|
|
||||||
|
def test_run_detect_persists_cross_flavor_pattern(tmp_path):
|
||||||
|
cfg = _config(tmp_path)
|
||||||
|
st = Store(cfg["store"]["db_path"], cfg["store"]["blob_dir"])
|
||||||
|
# same problem (retry_storm) across two flavors -> cross-flavor candidate
|
||||||
|
st.write_digest("claude:a", _digest("claude:a", "claude", "r1", retries=5))
|
||||||
|
st.write_digest("codex:b", _digest("codex:b", "codex", "r2", retries=4))
|
||||||
|
st.close()
|
||||||
|
|
||||||
|
patterns = run_detect(cfg, min_frequency=2)
|
||||||
|
assert len(patterns) == 1
|
||||||
|
assert patterns[0]["cross_flavor"] is True
|
||||||
|
assert patterns[0]["signal_type"] == "retry_storm"
|
||||||
|
|
||||||
|
# persisted to the Tier 2 patterns table
|
||||||
|
st2 = Store(cfg["store"]["db_path"], cfg["store"]["blob_dir"])
|
||||||
|
rows = st2.db.execute("SELECT key FROM patterns").fetchall()
|
||||||
|
assert len(rows) == 1
|
||||||
|
st2.close()
|
||||||
80
tests/test_detect_infra_signals.py
Normal file
80
tests/test_detect_infra_signals.py
Normal file
@@ -0,0 +1,80 @@
|
|||||||
|
"""Infra-overhead + thrash signal tests (WP-0005 T02)."""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from session_memory.detect.signals import ( # noqa: E402
|
||||||
|
build_context,
|
||||||
|
extract_signals,
|
||||||
|
sig_infra_overhead,
|
||||||
|
sig_schema_thrash,
|
||||||
|
sig_tool_thrash,
|
||||||
|
tool_bucket,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _digest(uid="claude:a", repo="r1", tools=None):
|
||||||
|
return {"session_uid": uid, "flavor": "claude", "repo": repo, "outcome": "success",
|
||||||
|
"cost": {"input_tokens": 1, "output_tokens": 1},
|
||||||
|
"markers": {"errors": 0, "retries": 0, "test_runs": 0},
|
||||||
|
"tool_histogram": tools or {}}
|
||||||
|
|
||||||
|
|
||||||
|
CTX = {"infra_min_calls": 20, "infra_overhead_threshold": 0.30,
|
||||||
|
"schema_thrash_threshold": 5, "tool_thrash_threshold": 80}
|
||||||
|
|
||||||
|
|
||||||
|
def test_tool_bucket_mapping():
|
||||||
|
assert tool_bucket("mcp__state-hub__update_task_status") == "statehub_mcp"
|
||||||
|
assert tool_bucket("ToolSearch") == "schema_load"
|
||||||
|
assert tool_bucket("TaskUpdate") == "task_mgmt"
|
||||||
|
assert tool_bucket("Bash") == "shell"
|
||||||
|
assert tool_bucket("Edit") == "edit"
|
||||||
|
|
||||||
|
|
||||||
|
def test_infra_overhead_fires_above_share():
|
||||||
|
# 18 statehub of 30 total = 60% overhead
|
||||||
|
d = _digest(tools={"mcp__state-hub__create_task": 18, "Bash": 8, "Edit": 4})
|
||||||
|
sig = sig_infra_overhead(d, CTX)
|
||||||
|
assert sig and sig[0].type == "infra_overhead"
|
||||||
|
assert sig[0].magnitude >= 0.30
|
||||||
|
assert sig[0].detail["statehub"] == 18
|
||||||
|
|
||||||
|
|
||||||
|
def test_infra_overhead_quiet_when_mostly_work():
|
||||||
|
d = _digest(tools={"mcp__state-hub__create_task": 3, "Bash": 40, "Edit": 30})
|
||||||
|
assert sig_infra_overhead(d, CTX) == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_infra_overhead_ignores_tiny_sessions():
|
||||||
|
d = _digest(tools={"mcp__state-hub__create_task": 5}) # below infra_min_calls
|
||||||
|
assert sig_infra_overhead(d, CTX) == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_schema_thrash_fires():
|
||||||
|
d = _digest(tools={"ToolSearch": 9, "Bash": 5})
|
||||||
|
sig = sig_schema_thrash(d, CTX)
|
||||||
|
assert sig and sig[0].type == "schema_thrash"
|
||||||
|
assert sig[0].detail["tool_searches"] == 9
|
||||||
|
|
||||||
|
|
||||||
|
def test_tool_thrash_fires_on_dominant_tool():
|
||||||
|
d = _digest(tools={"Bash": 120, "Edit": 5})
|
||||||
|
sig = sig_tool_thrash(d, CTX)
|
||||||
|
assert sig and sig[0].locus == "tool:Bash"
|
||||||
|
|
||||||
|
|
||||||
|
def test_extract_signals_includes_infra():
|
||||||
|
d = _digest(tools={"mcp__state-hub__create_task": 18, "Bash": 8, "Edit": 4,
|
||||||
|
"ToolSearch": 6})
|
||||||
|
types = {s.type for s in extract_signals([d])}
|
||||||
|
assert "infra_overhead" in types
|
||||||
|
assert "schema_thrash" in types
|
||||||
|
|
||||||
|
|
||||||
|
def test_build_context_has_infra_defaults():
|
||||||
|
ctx = build_context([])
|
||||||
|
assert ctx["infra_overhead_threshold"] == 0.30
|
||||||
|
assert ctx["schema_thrash_threshold"] == 5
|
||||||
61
tests/test_detect_quality.py
Normal file
61
tests/test_detect_quality.py
Normal file
@@ -0,0 +1,61 @@
|
|||||||
|
"""Session-quality filter tests (T01)."""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from session_memory.detect.quality import ( # noqa: E402
|
||||||
|
QualityConfig,
|
||||||
|
filter_real,
|
||||||
|
is_real_coding_session,
|
||||||
|
quality_config,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _digest(repo="agentic-resources", events=60, prompt="Implement the curate entrypoint",
|
||||||
|
tools=None):
|
||||||
|
return {
|
||||||
|
"session_uid": "claude:x", "flavor": "claude", "repo": repo,
|
||||||
|
"event_count": events, "first_prompt": prompt,
|
||||||
|
"tool_histogram": tools if tools is not None else {"Bash": 20, "Edit": 15, "Read": 8},
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def test_real_session_passes():
|
||||||
|
assert is_real_coding_session(_digest()) is True
|
||||||
|
|
||||||
|
|
||||||
|
def test_healthcheck_prompt_dropped():
|
||||||
|
assert is_real_coding_session(_digest(events=3, prompt="Say hello in one word.",
|
||||||
|
tools={})) is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_interrupted_dropped():
|
||||||
|
assert is_real_coding_session(_digest(events=1, prompt="[Request interrupted by user]",
|
||||||
|
tools={})) is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_too_short_dropped():
|
||||||
|
assert is_real_coding_session(_digest(events=5)) is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_no_repo_dropped():
|
||||||
|
assert is_real_coding_session(_digest(repo=None)) is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_no_substantive_tools_dropped():
|
||||||
|
# plenty of events but only plumbing calls -> not real coding
|
||||||
|
assert is_real_coding_session(
|
||||||
|
_digest(tools={"mcp__state-hub__update_task_status": 40})) is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_filter_real_keeps_only_real():
|
||||||
|
digs = [_digest(), _digest(events=3, prompt="hello", tools={}), _digest(repo=None)]
|
||||||
|
assert len(filter_real(digs)) == 1
|
||||||
|
|
||||||
|
|
||||||
|
def test_quality_config_from_toml():
|
||||||
|
cfg = quality_config({"detect": {"quality": {"min_events": 50}}})
|
||||||
|
assert cfg.min_events == 50
|
||||||
|
assert cfg.min_substantive == 3 # default preserved
|
||||||
59
tests/test_detect_recurring_error.py
Normal file
59
tests/test_detect_recurring_error.py
Normal file
@@ -0,0 +1,59 @@
|
|||||||
|
"""Recurring-error signal + clustering (WP-0006 T02)."""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from session_memory.detect.cluster import cluster # noqa: E402
|
||||||
|
from session_memory.detect.signals import ( # noqa: E402
|
||||||
|
extract_signals,
|
||||||
|
sig_recurring_error,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _digest(uid, repo, flavor="claude", snippets=None):
|
||||||
|
return {
|
||||||
|
"session_uid": uid, "flavor": flavor, "repo": repo, "outcome": "success",
|
||||||
|
"cost": {"input_tokens": 1, "output_tokens": 1},
|
||||||
|
"markers": {"errors": 0, "retries": 0, "test_runs": 0},
|
||||||
|
"tool_histogram": {}, "error_snippets": snippets or [],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
_FP = "modulenotfounderror: no module named 'foo' at <path>:<n>"
|
||||||
|
|
||||||
|
|
||||||
|
def test_signal_per_distinct_fingerprint():
|
||||||
|
d = _digest("claude:a", "r1", snippets=[
|
||||||
|
{"fingerprint": _FP, "sample": "ModuleNotFoundError ...", "count": 3, "tool": "Bash"},
|
||||||
|
{"fingerprint": "keyerror: <str>", "sample": "KeyError", "count": 1, "tool": None},
|
||||||
|
])
|
||||||
|
sigs = sig_recurring_error(d, {})
|
||||||
|
assert len(sigs) == 2
|
||||||
|
top = [s for s in sigs if s.locus == _FP][0]
|
||||||
|
assert top.type == "recurring_error"
|
||||||
|
assert top.magnitude == 3.0
|
||||||
|
assert top.detail["sample"].startswith("ModuleNotFound")
|
||||||
|
|
||||||
|
|
||||||
|
def test_clusters_across_sessions_and_flavors():
|
||||||
|
# same fingerprint in a claude and a grok session -> cross-flavor candidate
|
||||||
|
digs = [
|
||||||
|
_digest("claude:a", "r1", "claude",
|
||||||
|
[{"fingerprint": _FP, "sample": "ModuleNotFoundError", "count": 2, "tool": "Bash"}]),
|
||||||
|
_digest("grok:b", "r2", "grok",
|
||||||
|
[{"fingerprint": _FP, "sample": "ModuleNotFoundError", "count": 1, "tool": None}]),
|
||||||
|
]
|
||||||
|
signals = extract_signals(digs)
|
||||||
|
pats = cluster([s for s in signals if s.type == "recurring_error"], min_frequency=2)
|
||||||
|
assert len(pats) == 1
|
||||||
|
p = pats[0]
|
||||||
|
assert p.signal_type == "recurring_error"
|
||||||
|
assert p.cross_flavor is True
|
||||||
|
assert sorted(p.flavors) == ["claude", "grok"]
|
||||||
|
assert p.frequency == 2
|
||||||
|
|
||||||
|
|
||||||
|
def test_no_snippets_no_signal():
|
||||||
|
assert sig_recurring_error(_digest("claude:a", "r1"), {}) == []
|
||||||
101
tests/test_digest_errors.py
Normal file
101
tests/test_digest_errors.py
Normal file
@@ -0,0 +1,101 @@
|
|||||||
|
"""Error-body mining into the digest (WP-0006 T01)."""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from session_memory.core.digest import ( # noqa: E402
|
||||||
|
_error_fingerprint,
|
||||||
|
_error_snippets,
|
||||||
|
build_digest,
|
||||||
|
)
|
||||||
|
from session_memory.core.schema import SCHEMA_VERSION, Session, SessionEvent # noqa: E402
|
||||||
|
|
||||||
|
|
||||||
|
def _ev(seq, kind, **kw):
|
||||||
|
return SessionEvent(session_uid="claude:s", seq=seq, kind=kind, **kw)
|
||||||
|
|
||||||
|
|
||||||
|
def test_fingerprint_normalizes_paths_numbers_ids():
|
||||||
|
a = _error_fingerprint("ModuleNotFoundError: No module named 'foo' at /home/x/a.py:42")
|
||||||
|
b = _error_fingerprint("ModuleNotFoundError: No module named 'foo' at /srv/y/b.py:9991")
|
||||||
|
assert a == b # paths + line numbers stripped -> same fingerprint
|
||||||
|
assert "<path>" in a and "<n>" in a
|
||||||
|
|
||||||
|
|
||||||
|
def test_fingerprint_uuid_and_addr():
|
||||||
|
fp = _error_fingerprint("connection 0xDEADBEEF to 1972d1d9-fc35-4912-8126-1fe64cc51425 failed")
|
||||||
|
assert "<addr>" in fp and "<uuid>" in fp
|
||||||
|
|
||||||
|
|
||||||
|
def test_snippets_dedup_and_count():
|
||||||
|
blobs = {"b1": "Traceback...\nValueError: bad thing at /p/x.py:10",
|
||||||
|
"b2": "Traceback...\nValueError: bad thing at /q/y.py:99",
|
||||||
|
"b3": "KeyError: 'id'"}
|
||||||
|
events = [
|
||||||
|
_ev(0, "error", payload_ref="b1"),
|
||||||
|
_ev(1, "error", payload_ref="b2"), # same fingerprint as b1
|
||||||
|
_ev(2, "error", payload_ref="b3"),
|
||||||
|
]
|
||||||
|
snips = _error_snippets(events, blobs)
|
||||||
|
assert len(snips) == 2
|
||||||
|
top = snips[0]
|
||||||
|
assert top["count"] == 2 # the ValueError collapsed
|
||||||
|
assert "ValueError" in top["sample"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_failed_tool_result_mined():
|
||||||
|
blobs = {"b1": "npm ERR! something failed with non-zero exit"}
|
||||||
|
events = [_ev(0, "tool_result", tool="Bash", payload_ref="b1")]
|
||||||
|
snips = _error_snippets(events, blobs)
|
||||||
|
assert len(snips) == 1
|
||||||
|
assert snips[0]["tool"] == "Bash"
|
||||||
|
|
||||||
|
|
||||||
|
def test_clean_tool_result_not_mined():
|
||||||
|
blobs = {"b1": "6 passed in 0.4s"}
|
||||||
|
events = [_ev(0, "tool_result", tool="Bash", payload_ref="b1")]
|
||||||
|
assert _error_snippets(events, blobs) == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_success_json_not_mined():
|
||||||
|
# a hub MCP success payload mentioning 'error' deep inside is NOT a failure
|
||||||
|
blobs = {"b1": '{"result": "{\\"domain\\": \\"custodian\\", \\"note\\": \\"no errors\\"}"}'}
|
||||||
|
events = [_ev(0, "tool_result", tool="mcp__state-hub__get_domain_summary", payload_ref="b1")]
|
||||||
|
assert _error_snippets(events, blobs) == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_error_json_still_mined():
|
||||||
|
blobs = {"b1": '{"detail": "Invalid request parameters"}'}
|
||||||
|
events = [_ev(0, "tool_result", tool="Bash", payload_ref="b1")]
|
||||||
|
snips = _error_snippets(events, blobs)
|
||||||
|
assert len(snips) == 1
|
||||||
|
|
||||||
|
|
||||||
|
def test_plain_mcp_error_still_mined():
|
||||||
|
blobs = {"b1": "MCP error -32602: Invalid request parameters"}
|
||||||
|
events = [_ev(0, "tool_result", tool="Bash", payload_ref="b1")]
|
||||||
|
assert len(_error_snippets(events, blobs)) == 1
|
||||||
|
|
||||||
|
|
||||||
|
def test_file_read_snapshot_not_mined():
|
||||||
|
# a Read result of source code containing 'raise ...Error' is not a runtime error
|
||||||
|
blobs = {"b1": "227\t def f():\n228\t x = 1\n229\t raise InfospaceError()\n"}
|
||||||
|
events = [_ev(0, "tool_result", tool="Read", payload_ref="b1")]
|
||||||
|
assert _error_snippets(events, blobs) == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_build_digest_includes_error_snippets_and_v2():
|
||||||
|
s = Session(session_uid="claude:s", flavor="claude", native_session_id="s", repo="r")
|
||||||
|
events = [_ev(0, "user_msg"), _ev(1, "error", payload_ref="b1"), _ev(2, "assistant_msg")]
|
||||||
|
d = build_digest(s, events, {"b1": "RuntimeError: kaboom at /a/b.py:3"})
|
||||||
|
assert d["schema_version"] == SCHEMA_VERSION == 2
|
||||||
|
assert d["error_snippets"][0]["count"] == 1
|
||||||
|
assert "RuntimeError" in d["error_snippets"][0]["sample"]
|
||||||
|
|
||||||
|
|
||||||
|
def test_no_errors_empty_list():
|
||||||
|
s = Session(session_uid="claude:s", flavor="claude", native_session_id="s", repo="r")
|
||||||
|
d = build_digest(s, [_ev(0, "user_msg"), _ev(1, "assistant_msg")])
|
||||||
|
assert d["error_snippets"] == []
|
||||||
78
tests/test_digest_lookup.py
Normal file
78
tests/test_digest_lookup.py
Normal file
@@ -0,0 +1,78 @@
|
|||||||
|
"""digest_lookup entrypoint tests (AGENTIC-WP-0011 T03)."""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from session_memory.core.store import Store # noqa: E402
|
||||||
|
from session_memory.digest_lookup import lookup_digest, main, resolve_store_paths # noqa: E402
|
||||||
|
|
||||||
|
|
||||||
|
def _write_config(tmp_path) -> str:
|
||||||
|
store = tmp_path / ".store"
|
||||||
|
toml = tmp_path / "config.toml"
|
||||||
|
toml.write_text(
|
||||||
|
f'[store]\ndb_path = "{store / "m.db"}"\nblob_dir = "{store / "blobs"}"\n'
|
||||||
|
f'cursor = "{store / "c.json"}"\n')
|
||||||
|
return str(toml), str(store)
|
||||||
|
|
||||||
|
|
||||||
|
def _seed(store_dir, uid="claude:test-uid"):
|
||||||
|
st = Store(os.path.join(store_dir, "m.db"), os.path.join(store_dir, "blobs"))
|
||||||
|
st.write_digest(uid, {
|
||||||
|
"session_uid": uid,
|
||||||
|
"flavor": "claude",
|
||||||
|
"repo": "agentic-resources",
|
||||||
|
"outcome": "success",
|
||||||
|
"started_at": "2026-06-19T10:00:00Z",
|
||||||
|
"ended_at": "2026-06-19T11:00:00Z",
|
||||||
|
"cost": {"input_tokens": 100, "output_tokens": 25},
|
||||||
|
"tool_histogram": {"Bash": 10, "Edit": 5},
|
||||||
|
})
|
||||||
|
st.close()
|
||||||
|
return uid
|
||||||
|
|
||||||
|
|
||||||
|
def test_resolve_store_paths_from_config(tmp_path):
|
||||||
|
cfg_path, store_dir = _write_config(tmp_path)
|
||||||
|
db, blob = resolve_store_paths(config_path=cfg_path)
|
||||||
|
assert db.endswith("m.db")
|
||||||
|
assert blob.endswith("blobs")
|
||||||
|
assert store_dir in db
|
||||||
|
|
||||||
|
|
||||||
|
def test_resolve_store_paths_from_env(tmp_path, monkeypatch):
|
||||||
|
db = tmp_path / "custom" / "mem.db"
|
||||||
|
db.parent.mkdir(parents=True)
|
||||||
|
monkeypatch.setenv("HELIX_STORE_DB", str(db))
|
||||||
|
resolved_db, blob = resolve_store_paths()
|
||||||
|
assert resolved_db == str(db)
|
||||||
|
assert blob == str(tmp_path / "custom" / "blobs")
|
||||||
|
|
||||||
|
|
||||||
|
def test_lookup_digest_found_and_missing(tmp_path):
|
||||||
|
cfg_path, store_dir = _write_config(tmp_path)
|
||||||
|
uid = _seed(store_dir)
|
||||||
|
found = lookup_digest(uid, config_path=cfg_path)
|
||||||
|
assert found is not None and found["outcome"] == "success"
|
||||||
|
assert lookup_digest("claude:missing", config_path=cfg_path) is None
|
||||||
|
|
||||||
|
|
||||||
|
def test_main_json_success(tmp_path, capsys):
|
||||||
|
cfg_path, store_dir = _write_config(tmp_path)
|
||||||
|
uid = _seed(store_dir)
|
||||||
|
rc = main(["--config", cfg_path, uid, "--json"])
|
||||||
|
assert rc == 0
|
||||||
|
data = json.loads(capsys.readouterr().out)
|
||||||
|
assert data["session_uid"] == uid
|
||||||
|
assert data["repo"] == "agentic-resources"
|
||||||
|
|
||||||
|
|
||||||
|
def test_main_not_found(tmp_path, capsys):
|
||||||
|
cfg_path, store_dir = _write_config(tmp_path)
|
||||||
|
_seed(store_dir)
|
||||||
|
rc = main(["--config", cfg_path, "claude:missing"])
|
||||||
|
assert rc == 1
|
||||||
|
assert "not found" in capsys.readouterr().err.lower()
|
||||||
88
tests/test_distribute_base.py
Normal file
88
tests/test_distribute_base.py
Normal file
@@ -0,0 +1,88 @@
|
|||||||
|
"""Distributor base tests (WP-0007 T01): markers, idempotent upsert, rendering."""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from session_memory.curate.schema import Resolution, SolutionPattern # noqa: E402
|
||||||
|
from session_memory.distribute.base import ( # noqa: E402
|
||||||
|
Artifact,
|
||||||
|
BaseDistributor,
|
||||||
|
Distributor,
|
||||||
|
render_markdown_body,
|
||||||
|
upsert_block,
|
||||||
|
wrap_block,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _pattern(pid="sp-x", polarity="problem"):
|
||||||
|
return SolutionPattern(
|
||||||
|
id=pid, name="Read before edit", version="1.2.0", polarity=polarity,
|
||||||
|
problem="Agents edit files they have not read.",
|
||||||
|
resolutions=[Resolution(summary="Read the file first", detail="then Edit",
|
||||||
|
steps=["Read", "Edit"])],
|
||||||
|
rendering_hints={"claude": {"target": "CLAUDE.md"}},
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_render_markdown_body_has_problem_and_resolution():
|
||||||
|
body = render_markdown_body(_pattern())
|
||||||
|
assert "### Read before edit" in body
|
||||||
|
assert "Agents edit files" in body
|
||||||
|
assert "**Avoid:**" in body # problem polarity
|
||||||
|
assert "- Read the file first — then Edit" in body
|
||||||
|
assert " - Read" in body
|
||||||
|
|
||||||
|
|
||||||
|
def test_success_polarity_label():
|
||||||
|
assert "**Prefer:**" in render_markdown_body(_pattern(polarity="success"))
|
||||||
|
|
||||||
|
|
||||||
|
def test_wrap_block_has_markers_and_version():
|
||||||
|
block = wrap_block("sp-x", "hello", "1.2.0")
|
||||||
|
assert block.startswith("<!-- BEGIN helix-forge pattern:sp-x --> v1.2.0")
|
||||||
|
assert block.rstrip().endswith("<!-- END helix-forge pattern:sp-x -->")
|
||||||
|
|
||||||
|
|
||||||
|
def test_upsert_inserts_then_replaces_in_place():
|
||||||
|
doc = "# Title\n\nsome text\n"
|
||||||
|
b1 = wrap_block("sp-x", "first", "1")
|
||||||
|
once = upsert_block(doc, "sp-x", b1)
|
||||||
|
assert "first" in once and once.count("BEGIN helix-forge pattern:sp-x") == 1
|
||||||
|
# re-distributing the same id replaces, does not duplicate
|
||||||
|
b2 = wrap_block("sp-x", "second", "2")
|
||||||
|
twice = upsert_block(once, "sp-x", b2)
|
||||||
|
assert "second" in twice and "first" not in twice
|
||||||
|
assert twice.count("BEGIN helix-forge pattern:sp-x") == 1
|
||||||
|
|
||||||
|
|
||||||
|
def test_upsert_keeps_other_patterns():
|
||||||
|
doc = upsert_block("", "sp-a", wrap_block("sp-a", "A"))
|
||||||
|
doc = upsert_block(doc, "sp-b", wrap_block("sp-b", "B"))
|
||||||
|
assert "sp-a" in doc and "sp-b" in doc
|
||||||
|
|
||||||
|
|
||||||
|
def test_base_distributor_renders_artifact():
|
||||||
|
d = BaseDistributor(flavor="claude", target_path="CLAUDE.md")
|
||||||
|
art = d.render(_pattern())
|
||||||
|
assert isinstance(art, Artifact)
|
||||||
|
assert isinstance(d, Distributor) # satisfies the protocol
|
||||||
|
assert art.flavor == "claude"
|
||||||
|
assert art.target_path == "CLAUDE.md"
|
||||||
|
assert "BEGIN helix-forge pattern:sp-x" in art.content
|
||||||
|
assert "Read before edit" in art.content
|
||||||
|
|
||||||
|
|
||||||
|
def test_body_hint_overrides_default():
|
||||||
|
p = _pattern()
|
||||||
|
p.rendering_hints["claude"]["body"] = "custom claude body"
|
||||||
|
d = BaseDistributor(flavor="claude", target_path="CLAUDE.md")
|
||||||
|
assert "custom claude body" in d.render(p).content
|
||||||
|
|
||||||
|
|
||||||
|
def test_target_hint_overrides_default():
|
||||||
|
p = _pattern()
|
||||||
|
p.rendering_hints["claude"]["target"] = "docs/CLAUDE.md"
|
||||||
|
d = BaseDistributor(flavor="claude", target_path="CLAUDE.md")
|
||||||
|
assert d.render(p).target_path == "docs/CLAUDE.md"
|
||||||
40
tests/test_distribute_claude.py
Normal file
40
tests/test_distribute_claude.py
Normal file
@@ -0,0 +1,40 @@
|
|||||||
|
"""Claude distributor tests (WP-0007 T02)."""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from session_memory.curate.schema import Resolution, SolutionPattern # noqa: E402
|
||||||
|
from session_memory.distribute.claude import ClaudeDistributor # noqa: E402
|
||||||
|
|
||||||
|
|
||||||
|
def _pattern(hints=None):
|
||||||
|
return SolutionPattern(
|
||||||
|
id="sp-read-before-edit", name="Read before edit", version="1.0.0",
|
||||||
|
polarity="problem", problem="Agents edit files they have not read.",
|
||||||
|
resolutions=[Resolution(summary="Read the file first", steps=["Read", "Edit"])],
|
||||||
|
rendering_hints=hints or {"claude": {}},
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_default_targets_claude_md():
|
||||||
|
art = ClaudeDistributor().render(_pattern())
|
||||||
|
assert art.flavor == "claude"
|
||||||
|
assert art.target_path == "CLAUDE.md"
|
||||||
|
assert "BEGIN helix-forge pattern:sp-read-before-edit" in art.content
|
||||||
|
assert "### Read before edit" in art.content
|
||||||
|
|
||||||
|
|
||||||
|
def test_skill_mode_emits_skill_stub():
|
||||||
|
art = ClaudeDistributor().render(_pattern({"claude": {"as": "skill"}}))
|
||||||
|
assert "## Skill: Read before edit" in art.content
|
||||||
|
assert "**When:**" in art.content
|
||||||
|
assert " - Read" in art.content
|
||||||
|
|
||||||
|
|
||||||
|
def test_idempotent_marker_present_for_reupsert():
|
||||||
|
art = ClaudeDistributor().render(_pattern())
|
||||||
|
# same id in both renders -> caller can upsert in place
|
||||||
|
art2 = ClaudeDistributor().render(_pattern())
|
||||||
|
assert art.pattern_id == art2.pattern_id == "sp-read-before-edit"
|
||||||
49
tests/test_distribute_codex_grok.py
Normal file
49
tests/test_distribute_codex_grok.py
Normal file
@@ -0,0 +1,49 @@
|
|||||||
|
"""Codex + Grok distributor + registry tests (WP-0007 T03)."""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from session_memory.curate.schema import Resolution, SolutionPattern # noqa: E402
|
||||||
|
from session_memory.distribute.codex import CodexDistributor # noqa: E402
|
||||||
|
from session_memory.distribute.grok import GrokDistributor # noqa: E402
|
||||||
|
from session_memory.distribute.registry import all_flavors, get_distributor # noqa: E402
|
||||||
|
|
||||||
|
|
||||||
|
def _pattern():
|
||||||
|
return SolutionPattern(
|
||||||
|
id="sp-x", name="Read before edit", version="1.0.0", polarity="problem",
|
||||||
|
problem="Agents edit files they have not read.",
|
||||||
|
resolutions=[Resolution(summary="Read the file first")],
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_codex_targets_agents_md():
|
||||||
|
art = CodexDistributor().render(_pattern())
|
||||||
|
assert art.flavor == "codex" and art.target_path == "AGENTS.md"
|
||||||
|
assert "Read before edit" in art.content
|
||||||
|
|
||||||
|
|
||||||
|
def test_grok_targets_native_instructions():
|
||||||
|
art = GrokDistributor().render(_pattern())
|
||||||
|
assert art.flavor == "grok" and art.target_path == ".grok/instructions.md"
|
||||||
|
|
||||||
|
|
||||||
|
def test_same_pattern_expressible_for_all_flavors():
|
||||||
|
# FR-A3: one pattern, rendered for every flavor (same body, different targets)
|
||||||
|
p = _pattern()
|
||||||
|
bodies = {}
|
||||||
|
for f in all_flavors():
|
||||||
|
art = get_distributor(f).render(p)
|
||||||
|
# strip markers -> compare agnostic body
|
||||||
|
inner = art.content.split("\n", 1)[1].rsplit("\n", 1)[0]
|
||||||
|
bodies[f] = inner
|
||||||
|
targets = {get_distributor(f).render(p).target_path for f in all_flavors()}
|
||||||
|
assert len(targets) == 3 # distinct per-flavor targets
|
||||||
|
assert len(set(bodies.values())) == 1 # identical agnostic body
|
||||||
|
|
||||||
|
|
||||||
|
def test_registry_unknown_flavor():
|
||||||
|
assert get_distributor("gpt") is None
|
||||||
|
assert set(all_flavors()) == {"claude", "codex", "grok"}
|
||||||
76
tests/test_distribute_entrypoint.py
Normal file
76
tests/test_distribute_entrypoint.py
Normal file
@@ -0,0 +1,76 @@
|
|||||||
|
"""Distribute entrypoint tests (WP-0007 T05)."""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from session_memory.curate.catalog import Catalog # noqa: E402
|
||||||
|
from session_memory.curate.schema import Resolution, Scope, SolutionPattern # noqa: E402
|
||||||
|
from session_memory.distribute.__main__ import build_targets, main, run_distribute # noqa: E402
|
||||||
|
|
||||||
|
|
||||||
|
def _pattern(pid, repos, flavors, status="approved", ready=True):
|
||||||
|
return SolutionPattern(
|
||||||
|
id=pid, name=pid, version="1.0.0", polarity="problem", problem="p",
|
||||||
|
resolutions=[Resolution(summary="do x")],
|
||||||
|
scope=Scope(repos=repos, flavors=flavors), status=status, distribution_ready=ready,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _config(tmp_path):
|
||||||
|
return {
|
||||||
|
"repo_domain_map": {"agentic-resources": "helix_forge", "state-hub": "custodian"},
|
||||||
|
"curate": {"catalog_dir": str(tmp_path / "catalog")},
|
||||||
|
"distribute": {"proposals_dir": str(tmp_path / "proposals"),
|
||||||
|
"active_registry": str(tmp_path / "active.json")},
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def test_build_targets_crosses_repos_and_flavors():
|
||||||
|
cfg = {"repo_domain_map": {"r1": "d1", "r2": "d2"}}
|
||||||
|
targets = build_targets(cfg)
|
||||||
|
assert len(targets) == 2 * 3 # 2 repos x 3 flavors
|
||||||
|
assert build_targets(cfg, repo_filter="r1") and all(t.repo == "r1"
|
||||||
|
for t in build_targets(cfg, repo_filter="r1"))
|
||||||
|
assert all(t.flavor == "claude" for t in build_targets(cfg, flavor_filter="claude"))
|
||||||
|
|
||||||
|
|
||||||
|
def test_run_distribute_scopes_to_catalog(tmp_path):
|
||||||
|
cfg = _config(tmp_path)
|
||||||
|
cat = Catalog(cfg["curate"]["catalog_dir"])
|
||||||
|
# in-scope for agentic-resources/claude only
|
||||||
|
cat.upsert(_pattern("sp-a", ["agentic-resources"], ["claude"]))
|
||||||
|
# provisional -> must be skipped
|
||||||
|
cat.upsert(_pattern("sp-prov", [], [], status="provisional", ready=False))
|
||||||
|
res = run_distribute(cfg)
|
||||||
|
rendered = {pid for _, _, pid, _ in res.proposals}
|
||||||
|
assert "sp-a" in rendered
|
||||||
|
assert "sp-prov" not in rendered
|
||||||
|
assert "sp-prov" in res.skipped_not_distributable
|
||||||
|
# landed only in the agentic-resources/CLAUDE.md proposal
|
||||||
|
p = os.path.join(cfg["distribute"]["proposals_dir"], "agentic-resources", "CLAUDE.md")
|
||||||
|
assert os.path.exists(p)
|
||||||
|
assert not os.path.exists(
|
||||||
|
os.path.join(cfg["distribute"]["proposals_dir"], "state-hub", "CLAUDE.md"))
|
||||||
|
|
||||||
|
|
||||||
|
def test_main_runs_json(tmp_path, capsys):
|
||||||
|
cfg = _config(tmp_path)
|
||||||
|
cat = Catalog(cfg["curate"]["catalog_dir"])
|
||||||
|
cat.upsert(_pattern("sp-a", [], ["claude"])) # unrestricted repos
|
||||||
|
# write a config file
|
||||||
|
import json as _json
|
||||||
|
cfg_path = tmp_path / "c.json"
|
||||||
|
# main() loads TOML; emulate by calling run_distribute path via a tiny toml
|
||||||
|
toml = tmp_path / "config.toml"
|
||||||
|
toml.write_text(
|
||||||
|
f'[repo_domain_map]\nagentic-resources = "helix_forge"\n'
|
||||||
|
f'[curate]\ncatalog_dir = "{cfg["curate"]["catalog_dir"]}"\n'
|
||||||
|
f'[distribute]\nproposals_dir = "{cfg["distribute"]["proposals_dir"]}"\n'
|
||||||
|
f'active_registry = "{cfg["distribute"]["active_registry"]}"\n')
|
||||||
|
rc = main(["--config", str(toml), "--json"])
|
||||||
|
assert rc == 0
|
||||||
|
out = capsys.readouterr().out
|
||||||
|
assert "sp-a" in out
|
||||||
|
_json.loads(out) # valid JSON
|
||||||
79
tests/test_distribute_proposals.py
Normal file
79
tests/test_distribute_proposals.py
Normal file
@@ -0,0 +1,79 @@
|
|||||||
|
"""Scoping + proposals + active registry tests (WP-0007 T04)."""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from session_memory.curate.schema import Resolution, Scope, SolutionPattern # noqa: E402
|
||||||
|
from session_memory.distribute.proposals import ( # noqa: E402
|
||||||
|
ActiveRegistry,
|
||||||
|
Target,
|
||||||
|
applies,
|
||||||
|
propose,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _pattern(pid="sp-x", repos=None, flavors=None, status="approved", ready=True):
|
||||||
|
return SolutionPattern(
|
||||||
|
id=pid, name="Read before edit", version="1.0.0", polarity="problem",
|
||||||
|
problem="edit before read", resolutions=[Resolution(summary="read first")],
|
||||||
|
scope=Scope(repos=repos or [], flavors=flavors or []),
|
||||||
|
status=status, distribution_ready=ready,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def test_applies_respects_scope():
|
||||||
|
p = _pattern(repos=["agentic-resources"], flavors=["claude"])
|
||||||
|
assert applies(p, Target("agentic-resources", flavor="claude"))
|
||||||
|
assert not applies(p, Target("other-repo", flavor="claude"))
|
||||||
|
assert not applies(p, Target("agentic-resources", flavor="codex"))
|
||||||
|
|
||||||
|
|
||||||
|
def test_empty_scope_is_unrestricted():
|
||||||
|
assert applies(_pattern(), Target("any", flavor="grok"))
|
||||||
|
|
||||||
|
|
||||||
|
def test_propose_writes_scoped_proposal_files(tmp_path):
|
||||||
|
out = str(tmp_path / "proposals")
|
||||||
|
reg = ActiveRegistry(str(tmp_path / "active.json"))
|
||||||
|
p = _pattern(flavors=["claude"])
|
||||||
|
res = propose([p], [Target("agentic-resources", flavor="claude"),
|
||||||
|
Target("agentic-resources", flavor="codex")], out, reg)
|
||||||
|
# only claude target is in scope
|
||||||
|
assert len(res.proposals) == 1
|
||||||
|
path = os.path.join(out, "agentic-resources", "CLAUDE.md")
|
||||||
|
assert os.path.exists(path)
|
||||||
|
assert "BEGIN helix-forge pattern:sp-x" in open(path).read()
|
||||||
|
|
||||||
|
|
||||||
|
def test_not_distributable_skipped(tmp_path):
|
||||||
|
reg = ActiveRegistry(str(tmp_path / "active.json"))
|
||||||
|
prov = _pattern(status="provisional", ready=False)
|
||||||
|
res = propose([prov], [Target("r", flavor="claude")], str(tmp_path / "p"), reg)
|
||||||
|
assert res.proposals == []
|
||||||
|
assert "sp-x" in res.skipped_not_distributable
|
||||||
|
|
||||||
|
|
||||||
|
def test_proposals_idempotent_on_rerun(tmp_path):
|
||||||
|
out = str(tmp_path / "proposals")
|
||||||
|
reg_path = str(tmp_path / "active.json")
|
||||||
|
p = _pattern()
|
||||||
|
propose([p], [Target("r", flavor="claude")], out, ActiveRegistry(reg_path))
|
||||||
|
propose([p], [Target("r", flavor="claude")], out, ActiveRegistry(reg_path))
|
||||||
|
content = open(os.path.join(out, "r", "CLAUDE.md")).read()
|
||||||
|
assert content.count("BEGIN helix-forge pattern:sp-x") == 1 # no duplication
|
||||||
|
|
||||||
|
|
||||||
|
def test_active_registry_records_environment(tmp_path):
|
||||||
|
reg_path = str(tmp_path / "active.json")
|
||||||
|
reg = ActiveRegistry(reg_path)
|
||||||
|
propose([_pattern()], [Target("r", domain="helix_forge", flavor="claude")],
|
||||||
|
str(tmp_path / "p"), reg)
|
||||||
|
reg2 = ActiveRegistry(reg_path) # reload from disk
|
||||||
|
entries = reg2.entries()
|
||||||
|
assert len(entries) == 1
|
||||||
|
assert entries[0]["pattern_id"] == "sp-x"
|
||||||
|
assert entries[0]["repo"] == "r"
|
||||||
|
assert entries[0]["flavor"] == "claude"
|
||||||
|
assert entries[0]["status"] == "proposed"
|
||||||
92
tests/test_grok_adapter.py
Normal file
92
tests/test_grok_adapter.py
Normal file
@@ -0,0 +1,92 @@
|
|||||||
|
"""Grok adapter tests (T02): synthetic session dir + real local sessions."""
|
||||||
|
|
||||||
|
import glob
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from session_memory.adapters.grok import parse_session # noqa: E402
|
||||||
|
|
||||||
|
REPO_MAP = {"agentic-resources": "helix_forge", "net-kingdom": "netkingdom",
|
||||||
|
"can-you-assist": "coulomb_social"}
|
||||||
|
|
||||||
|
|
||||||
|
def _mk_session(dir_path, sid):
|
||||||
|
os.makedirs(dir_path, exist_ok=True)
|
||||||
|
with open(os.path.join(dir_path, "summary.json"), "w") as f:
|
||||||
|
json.dump({"info": {"id": sid, "cwd": "/home/worsch/agentic-resources"},
|
||||||
|
"created_at": "2026-06-06T10:00:00Z",
|
||||||
|
"last_active_at": "2026-06-06T10:05:00Z",
|
||||||
|
"current_model_id": "grok-build", "head_branch": "main"}, f)
|
||||||
|
with open(os.path.join(dir_path, "events.jsonl"), "w") as f:
|
||||||
|
f.write(json.dumps({"ts": "2026-06-06T10:00:00Z", "type": "turn_started",
|
||||||
|
"turn_number": 0, "model_id": "grok-build"}) + "\n")
|
||||||
|
f.write(json.dumps({"ts": "2026-06-06T10:05:00Z", "type": "turn_ended",
|
||||||
|
"turn_number": 0}) + "\n")
|
||||||
|
with open(os.path.join(dir_path, "chat_history.jsonl"), "w") as f:
|
||||||
|
for rec in [
|
||||||
|
{"type": "system", "content": "sys prompt"},
|
||||||
|
{"type": "user", "content": [{"type": "text", "text": "fix the bug"}]},
|
||||||
|
{"type": "reasoning", "content": [{"type": "text", "text": "thinking..."}]},
|
||||||
|
{"type": "assistant", "content": ""}, # empty -> skipped
|
||||||
|
{"type": "tool_result", "content": "The file x.py has been updated"},
|
||||||
|
{"type": "assistant", "content": "done"},
|
||||||
|
{"type": "tool_result", "content": "6 passed"},
|
||||||
|
]:
|
||||||
|
f.write(json.dumps(rec) + "\n")
|
||||||
|
with open(os.path.join(dir_path, "updates.jsonl"), "w") as f:
|
||||||
|
for u in [
|
||||||
|
{"sessionUpdate": "tool_call", "toolCallId": "c1", "title": "edit_file",
|
||||||
|
"rawInput": {"target_file": "x.py"}},
|
||||||
|
{"sessionUpdate": "tool_call", "toolCallId": "c2", "title": "shell",
|
||||||
|
"rawInput": {"command": "pytest -q"}},
|
||||||
|
]:
|
||||||
|
f.write(json.dumps({"timestamp": "t", "method": "session/update",
|
||||||
|
"params": {"sessionId": sid, "update": u}}) + "\n")
|
||||||
|
|
||||||
|
|
||||||
|
def test_grok_synthetic_dir(tmp_path):
|
||||||
|
d = tmp_path / "%2Fhome%2Fworsch%2Fagentic-resources" / "sid-1"
|
||||||
|
_mk_session(str(d), "sid-1")
|
||||||
|
|
||||||
|
norm = parse_session(str(d / "chat_history.jsonl"), REPO_MAP)
|
||||||
|
assert norm is not None
|
||||||
|
s = norm.session
|
||||||
|
assert s.session_uid == "grok:sid-1"
|
||||||
|
assert s.flavor == "grok"
|
||||||
|
assert s.repo == "agentic-resources" and s.domain == "helix_forge"
|
||||||
|
assert s.model == "grok-build"
|
||||||
|
assert s.git_branch == "main"
|
||||||
|
assert s.cost.turns == 1
|
||||||
|
assert s.cost.wall_clock_s == 300.0
|
||||||
|
|
||||||
|
kinds = [e.kind for e in norm.events]
|
||||||
|
# 4 lifecycle from events.jsonl? no: turn_started + turn_ended = 2 lifecycle
|
||||||
|
assert kinds.count("lifecycle") == 2
|
||||||
|
assert "user_msg" in kinds and "thinking" in kinds and "assistant_msg" in kinds
|
||||||
|
# paired tool calls recovered names -> edit + test_run, each followed by tool_result
|
||||||
|
assert "edit" in kinds and "test_run" in kinds
|
||||||
|
edit = next(e for e in norm.events if e.kind == "edit")
|
||||||
|
assert edit.tool == "edit_file"
|
||||||
|
# tool_result after test_run links to it
|
||||||
|
tr = [e for e in norm.events if e.kind == "tool_result"]
|
||||||
|
assert len(tr) == 2
|
||||||
|
|
||||||
|
|
||||||
|
def test_real_local_grok_sessions_if_available():
|
||||||
|
base = os.path.expanduser("~/.grok/sessions")
|
||||||
|
chats = glob.glob(os.path.join(base, "*", "*", "chat_history.jsonl"))
|
||||||
|
if not chats:
|
||||||
|
return
|
||||||
|
parsed = 0
|
||||||
|
for c in chats:
|
||||||
|
norm = parse_session(c, REPO_MAP)
|
||||||
|
if norm is None:
|
||||||
|
continue
|
||||||
|
parsed += 1
|
||||||
|
assert norm.session.session_uid.startswith("grok:")
|
||||||
|
seqs = [e.seq for e in norm.events]
|
||||||
|
assert seqs == sorted(seqs) and len(seqs) == len(set(seqs))
|
||||||
|
assert parsed >= 1
|
||||||
49
tests/test_measure_effect.py
Normal file
49
tests/test_measure_effect.py
Normal file
@@ -0,0 +1,49 @@
|
|||||||
|
"""Before/after effectiveness tests (WP-0009 T02)."""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from session_memory.measure.effect import effectiveness, split_by_date # noqa: E402
|
||||||
|
|
||||||
|
|
||||||
|
def _digest(ts, tools=None, errors=0, outcome="success"):
|
||||||
|
return {
|
||||||
|
"started_at": ts, "outcome": outcome,
|
||||||
|
"cost": {"input_tokens": 100, "output_tokens": 0},
|
||||||
|
"tool_histogram": tools or {"Bash": 10},
|
||||||
|
"error_snippets": [{"fingerprint": f"e{i}", "count": 1} for i in range(errors)],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def test_split_by_date():
|
||||||
|
digs = [_digest("2026-06-01"), _digest("2026-06-05"), _digest("2026-06-10")]
|
||||||
|
before, after = split_by_date(digs, "2026-06-05")
|
||||||
|
assert len(before) == 1 and len(after) == 2 # >= applied_at goes to after
|
||||||
|
|
||||||
|
|
||||||
|
def test_effectiveness_detects_improvement():
|
||||||
|
# before: lots of errors + hub overhead; after: clean
|
||||||
|
before = [_digest("2026-06-01", tools={"mcp__state-hub__x": 8, "Bash": 2}, errors=3)
|
||||||
|
for _ in range(3)]
|
||||||
|
after = [_digest("2026-06-10", tools={"Bash": 10}, errors=0) for _ in range(3)]
|
||||||
|
e = effectiveness(before + after, "2026-06-05", label="read-before-edit")
|
||||||
|
assert not e["insufficient_data"]
|
||||||
|
assert e["n_before"] == 3 and e["n_after"] == 3
|
||||||
|
assert e["deltas"]["error_rate"]["improved"] is True
|
||||||
|
assert e["deltas"]["infra_overhead_share_median"]["improved"] is True
|
||||||
|
assert e["deltas"]["error_rate"]["change"] < 0
|
||||||
|
|
||||||
|
|
||||||
|
def test_effectiveness_insufficient_data():
|
||||||
|
e = effectiveness([_digest("2026-06-01")], "2026-06-05")
|
||||||
|
assert e["insufficient_data"] is True
|
||||||
|
assert e["deltas"] == {}
|
||||||
|
|
||||||
|
|
||||||
|
def test_success_rate_higher_is_better():
|
||||||
|
before = [_digest("2026-06-01", outcome="fail") for _ in range(2)]
|
||||||
|
after = [_digest("2026-06-10", outcome="success") for _ in range(2)]
|
||||||
|
e = effectiveness(before + after, "2026-06-05")
|
||||||
|
assert e["deltas"]["success_rate"]["improved"] is True
|
||||||
79
tests/test_measure_entrypoint.py
Normal file
79
tests/test_measure_entrypoint.py
Normal file
@@ -0,0 +1,79 @@
|
|||||||
|
"""Measure entrypoint tests (WP-0009 T03)."""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from session_memory.core.store import Store # noqa: E402
|
||||||
|
from session_memory.measure.__main__ import main, real_digests # noqa: E402
|
||||||
|
from session_memory.measure.metrics import load_baselines # noqa: E402
|
||||||
|
|
||||||
|
|
||||||
|
def _digest(uid, ts, tools=None):
|
||||||
|
return {
|
||||||
|
"session_uid": uid, "flavor": "claude", "repo": "agentic-resources",
|
||||||
|
"outcome": "success", "started_at": ts,
|
||||||
|
"cost": {"input_tokens": 100, "output_tokens": 10},
|
||||||
|
"event_count": 40, "first_prompt": "Implement the measure entrypoint cleanly",
|
||||||
|
"tool_histogram": tools or {"Bash": 20, "Edit": 12, "Read": 8},
|
||||||
|
"error_snippets": [],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def _write_config(tmp_path) -> str:
|
||||||
|
store = tmp_path / ".store"
|
||||||
|
toml = tmp_path / "config.toml"
|
||||||
|
toml.write_text(
|
||||||
|
f'[store]\ndb_path = "{store / "m.db"}"\nblob_dir = "{store / "blobs"}"\n'
|
||||||
|
f'cursor = "{store / "c.json"}"\n'
|
||||||
|
f'[measure]\nbaselines = "{tmp_path / "baselines.jsonl"}"\n')
|
||||||
|
return str(toml), str(store)
|
||||||
|
|
||||||
|
|
||||||
|
def _seed(store_dir):
|
||||||
|
st = Store(os.path.join(store_dir, "m.db"), os.path.join(store_dir, "blobs"))
|
||||||
|
st.write_digest("claude:a", _digest("claude:a", "2026-06-01"))
|
||||||
|
st.write_digest("claude:b", _digest("claude:b", "2026-06-10",
|
||||||
|
tools={"mcp__state-hub__x": 18, "Bash": 8, "Edit": 4}))
|
||||||
|
st.close()
|
||||||
|
|
||||||
|
|
||||||
|
def test_real_digests_filters_and_loads(tmp_path):
|
||||||
|
cfg_path, store_dir = _write_config(tmp_path)
|
||||||
|
_seed(store_dir)
|
||||||
|
from session_memory.ingest import load_config
|
||||||
|
digs = real_digests(load_config(cfg_path))
|
||||||
|
assert len(digs) == 2
|
||||||
|
|
||||||
|
|
||||||
|
def test_main_writes_baseline_and_reports(tmp_path, capsys):
|
||||||
|
cfg_path, store_dir = _write_config(tmp_path)
|
||||||
|
_seed(store_dir)
|
||||||
|
rc = main(["--config", cfg_path, "--label", "first"])
|
||||||
|
assert rc == 0
|
||||||
|
out = capsys.readouterr().out
|
||||||
|
assert "Fleet metrics" in out
|
||||||
|
rows = load_baselines(str(tmp_path / "baselines.jsonl"))
|
||||||
|
assert len(rows) == 1 and rows[0]["label"] == "first"
|
||||||
|
|
||||||
|
|
||||||
|
def test_main_no_save_and_json(tmp_path, capsys):
|
||||||
|
cfg_path, store_dir = _write_config(tmp_path)
|
||||||
|
_seed(store_dir)
|
||||||
|
rc = main(["--config", cfg_path, "--no-save", "--json"])
|
||||||
|
assert rc == 0
|
||||||
|
data = json.loads(capsys.readouterr().out)
|
||||||
|
assert data["current"]["n_sessions"] == 2
|
||||||
|
assert not os.path.exists(str(tmp_path / "baselines.jsonl"))
|
||||||
|
|
||||||
|
|
||||||
|
def test_main_effectiveness_since(tmp_path, capsys):
|
||||||
|
cfg_path, store_dir = _write_config(tmp_path)
|
||||||
|
_seed(store_dir)
|
||||||
|
rc = main(["--config", cfg_path, "--no-save", "--since", "2026-06-05", "--json"])
|
||||||
|
assert rc == 0
|
||||||
|
data = json.loads(capsys.readouterr().out)
|
||||||
|
assert data["effectiveness"]["n_before"] == 1
|
||||||
|
assert data["effectiveness"]["n_after"] == 1
|
||||||
63
tests/test_measure_metrics.py
Normal file
63
tests/test_measure_metrics.py
Normal file
@@ -0,0 +1,63 @@
|
|||||||
|
"""Fleet metrics + baseline tests (WP-0009 T01)."""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from session_memory.measure.metrics import ( # noqa: E402
|
||||||
|
aggregate,
|
||||||
|
load_baselines,
|
||||||
|
save_baseline,
|
||||||
|
session_metrics,
|
||||||
|
snapshot,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _digest(tools=None, errors=0, tokens=100, outcome="success"):
|
||||||
|
return {
|
||||||
|
"outcome": outcome,
|
||||||
|
"cost": {"input_tokens": tokens, "output_tokens": 0},
|
||||||
|
"tool_histogram": tools or {"Bash": 10, "Edit": 5},
|
||||||
|
"error_snippets": [{"fingerprint": f"e{i}", "count": 1} for i in range(errors)],
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def test_session_metrics_overhead_and_errors():
|
||||||
|
m = session_metrics(_digest(tools={"mcp__state-hub__create_task": 6, "Bash": 4}, errors=2))
|
||||||
|
assert abs(m["infra_overhead_share"] - 0.6) < 1e-9
|
||||||
|
assert m["error_occurrences"] == 2
|
||||||
|
assert m["has_error"] is True
|
||||||
|
|
||||||
|
|
||||||
|
def test_aggregate_rates_and_percentiles():
|
||||||
|
digs = [
|
||||||
|
_digest(tools={"mcp__state-hub__x": 8, "Bash": 2}, errors=1, tokens=50), # 80% overhead
|
||||||
|
_digest(tools={"Bash": 9, "Edit": 1}, errors=0, tokens=200), # 0% overhead
|
||||||
|
_digest(tools={"ToolSearch": 6, "Bash": 4}, errors=0, tokens=100, outcome="fail"),
|
||||||
|
]
|
||||||
|
a = aggregate(digs)
|
||||||
|
assert a["n_sessions"] == 3
|
||||||
|
assert a["error_rate"] == round(1 / 3, 3)
|
||||||
|
assert a["success_rate"] == round(2 / 3, 3)
|
||||||
|
assert a["schema_thrash_sessions"] == 1 # the ToolSearch=6 session
|
||||||
|
assert 0 <= a["infra_overhead_share_median"] <= 1
|
||||||
|
|
||||||
|
|
||||||
|
def test_aggregate_empty():
|
||||||
|
assert aggregate([]) == {"n_sessions": 0}
|
||||||
|
|
||||||
|
|
||||||
|
def test_snapshot_has_timestamp_and_label():
|
||||||
|
s = snapshot([_digest()], label="baseline")
|
||||||
|
assert s["label"] == "baseline"
|
||||||
|
assert "captured_at" in s and s["n_sessions"] == 1
|
||||||
|
|
||||||
|
|
||||||
|
def test_baseline_roundtrip_appends(tmp_path):
|
||||||
|
path = str(tmp_path / "baselines.jsonl")
|
||||||
|
save_baseline(snapshot([_digest()], label="a"), path)
|
||||||
|
save_baseline(snapshot([_digest(), _digest()], label="b"), path)
|
||||||
|
rows = load_baselines(path)
|
||||||
|
assert [r["label"] for r in rows] == ["a", "b"]
|
||||||
|
assert rows[1]["n_sessions"] == 2
|
||||||
66
tests/test_merge.py
Normal file
66
tests/test_merge.py
Normal file
@@ -0,0 +1,66 @@
|
|||||||
|
"""Multi-file session merge tests (T03)."""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||||
|
|
||||||
|
from session_memory.adapters.common import Normalized # noqa: E402
|
||||||
|
from session_memory.core.schema import Session, SessionEvent # noqa: E402
|
||||||
|
from session_memory.core.store import Store # noqa: E402
|
||||||
|
|
||||||
|
|
||||||
|
def _part(native, kinds, base_blob="b"):
|
||||||
|
uid = Session.make_uid("claude", native)
|
||||||
|
s = Session(session_uid=uid, flavor="claude", native_session_id=native)
|
||||||
|
events, blobs = [], {}
|
||||||
|
for i, k in enumerate(kinds):
|
||||||
|
ref = f"blob://{native}/{i}"
|
||||||
|
events.append(SessionEvent(session_uid=uid, seq=i, parent_seq=(i - 1 if i else None),
|
||||||
|
kind=k, ts=f"2026-06-06T10:0{i}:00Z", payload_ref=ref))
|
||||||
|
blobs[ref] = f"{base_blob}-{k}-{i}"
|
||||||
|
return Normalized(session=s, events=events, blobs=blobs)
|
||||||
|
|
||||||
|
|
||||||
|
def test_second_file_appends_not_overwrites(tmp_path):
|
||||||
|
st = Store(str(tmp_path / "m.db"), str(tmp_path / "blobs"))
|
||||||
|
uid = Session.make_uid("claude", "s1")
|
||||||
|
|
||||||
|
# file 1: 3 events (seq 0..2)
|
||||||
|
n1 = _part("s1", ["user_msg", "assistant_msg", "tool_call"])
|
||||||
|
added1 = st.ingest(n1)
|
||||||
|
assert added1 == 3
|
||||||
|
assert st.count_events(uid) == 3
|
||||||
|
|
||||||
|
# file 2 for the SAME session: repeats event 0 + adds 2 new (continuation)
|
||||||
|
n2 = _part("s1", ["user_msg", "edit", "completion"])
|
||||||
|
# make the first event identical to file1's first event so it dedups
|
||||||
|
n2.events[0].kind = "user_msg"
|
||||||
|
n2.events[0].ts = "2026-06-06T10:00:00Z"
|
||||||
|
n2.blobs[n2.events[0].payload_ref] = "b-user_msg-0"
|
||||||
|
added2 = st.ingest(n2)
|
||||||
|
|
||||||
|
# only the 2 genuinely-new events appended; total grows additively
|
||||||
|
assert added2 == 2
|
||||||
|
assert st.count_events(uid) == 5
|
||||||
|
seqs = [e.seq for e in st.get_events(uid)]
|
||||||
|
assert seqs == [0, 1, 2, 3, 4] # contiguous, offset
|
||||||
|
|
||||||
|
|
||||||
|
def test_reingest_same_bundle_is_idempotent(tmp_path):
|
||||||
|
st = Store(str(tmp_path / "m.db"), str(tmp_path / "blobs"))
|
||||||
|
uid = Session.make_uid("claude", "s2")
|
||||||
|
n = _part("s2", ["user_msg", "assistant_msg"])
|
||||||
|
assert st.ingest(n) == 2
|
||||||
|
assert st.ingest(n) == 0 # nothing new on re-run
|
||||||
|
assert st.count_events(uid) == 2
|
||||||
|
|
||||||
|
|
||||||
|
def test_appended_event_parent_remapped_within_part(tmp_path):
|
||||||
|
st = Store(str(tmp_path / "m.db"), str(tmp_path / "blobs"))
|
||||||
|
uid = Session.make_uid("claude", "s3")
|
||||||
|
st.ingest(_part("s3", ["user_msg", "assistant_msg"])) # seq 0,1
|
||||||
|
st.ingest(_part("s3", ["x_unused"]) if False else _part("s3", ["thinking", "edit"])) # new 2,3
|
||||||
|
events = {e.seq: e for e in st.get_events(uid)}
|
||||||
|
# the 'edit' (seq 3) had parent_seq=0 within its part -> remapped to its part's first new seq (2)
|
||||||
|
assert events[3].parent_seq == 2
|
||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user