Files
agentic-resources/workplans/AGENTIC-WP-0004-session-memory-phase2.md
tegwick d06791f070 session-memory Phase 2: verify + catalog artifacts (T07)
End-to-end verification over real local sessions: ingest 94->93 -> 72 digests;
detect 3 candidates (2 cross-flavor); curate --auto-approve cataloged 3
SolutionPatterns (2 cross-flavor approved/distribution_ready, 1 Claude-only),
re-run fully idempotent, 3 hub decisions queued (API offline). Commits the 3
catalog artifacts as the source of truth. PRD §12 OQ4/OQ5/OQ6 marked resolved;
README + design refreshed. Workplan finished; suite 72/72.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 10:08:52 +02:00

178 lines
7.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
id: AGENTIC-WP-0004
type: workplan
title: "Coding Session Memory — Phase 2 (Curate: review workflow + Pattern Catalog)"
domain: helix_forge
repo: agentic-resources
status: finished
owner: codex
topic_slug: helix-forge
created: "2026-06-06"
updated: "2026-06-07"
state_hub_workstream_id: "b3703684-f60e-42f3-b03e-dabe3e8ce3f4"
---
# Coding Session Memory — Phase 2 (Curate)
Implements the **Curate** phase (PRD §6.3, FR-U1FR-U4) of
[PRD-helix-forge](../docs/PRD-helix-forge.md), continuing
[AGENTIC-WP-0003](AGENTIC-WP-0003-session-memory-phase1.md) (Detect).
Phase 1 surfaces ranked **candidate** problem/success patterns with evidence
(`python -m session_memory.detect --json`, persisted to the Tier 2 `patterns`
table by `detect/cluster.py::Pattern`). Phase 2 turns those candidates into
**reviewed, versioned Solution Patterns** held in an in-repo **Pattern Catalog**
— the source of truth that Phase 3 (Distribute) renders into per-flavor artifacts.
Design boundary (ADR-001 / PRD §9): the catalog is **files-first** — solution
patterns originate as versioned files in this repo; the State Hub indexes them and
records each promote/reject as an auditable decision. The agnostic core stays
flavor-neutral; per-flavor knowledge lives only in **rendering hints** consumed
later by distributor adapters (PRD §6.4 / FR-A2). New code lands under a new
`session_memory/curate/` package, mirroring the `detect/` layout from Phase 1.
Relevant design open questions this phase resolves: **OQ4** (one agnostic
representation that still gives distributors enough to render natively), **OQ5**
(minimum trustworthy evidence bar before a pattern is distribution-eligible),
**OQ6** (preventing pattern bloat / context-budget degradation).
## Solution Pattern Schema + Per-Flavor Rendering Hints
```task
id: AGENTIC-WP-0004-T01
status: done
priority: high
state_hub_task_id: "c6d20bb6-7b6c-48fd-bd25-30a349514f41"
```
Define the agnostic **Solution Pattern** artifact (FR-U2, OQ4) in
`session_memory/curate/schema.py`: stable id, name, semantic `version`, problem
description, one or more recommended resolutions, applicability scope
(repos/domains/flavors), provenance (source candidate `key` + an evidence
snapshot copied from the detect `Pattern`), and **per-flavor rendering hints**
kept in a separate sub-structure so the core stays flavor-agnostic while
distributors get enough to render high-quality native artifacts. Dataclass +
deterministic serialization (sorted keys), reusing the `Pattern.to_dict()`
contract for the embedded evidence. Unit-tested for round-trip stability.
## Versioned Pattern Catalog Store (files-first)
```task
id: AGENTIC-WP-0004-T02
status: done
priority: high
state_hub_task_id: "d40c7810-fd1e-4b14-8577-b8a64ddd337b"
```
Implement the in-repo **Pattern Catalog** as the source of truth (FR-U3, ADR-001)
in `session_memory/curate/catalog.py`: versioned solution-pattern files under a
catalog dir (e.g. `session_memory/catalog/<pattern-id>.json`), stable IDs, a
version bump on edit (supersede-in-place with history preserved), and
load/save/list with **dedup on pattern identity** (the source candidate key).
Files originate work; the hub indexes them. Verify save→load is lossless and
re-saving an unchanged pattern is a no-op (no spurious version bump).
## Review Workflow (discuss / approve / reject → promote)
```task
id: AGENTIC-WP-0004-T03
status: done
priority: high
state_hub_task_id: "e303d01f-564e-4499-9ce5-22cf959ed84c"
```
Implement the curation workflow (FR-U1/FR-U2) in
`session_memory/curate/review.py`: load Phase 1 detect candidates with their
evidence (cross-flavor first), present each candidate, accept a
**discuss/approve/reject** action, and on **approve** promote the candidate into
a Solution Pattern written to the catalog (T02) with default rendering-hint
stubs the reviewer can refine. Re-review is **idempotent**: candidates already
promoted are matched on source key and updated in place, never duplicated; a
prior reject is remembered so it is not re-surfaced unless evidence changed.
## Promotion Evidence-Bar + Bloat Guard
```task
id: AGENTIC-WP-0004-T04
status: done
priority: medium
state_hub_task_id: "d474425d-18af-48e4-8f5b-7716b2da0057"
```
Gate promotion on a **minimum trustworthy evidence threshold** (OQ5):
configurable floors on `frequency`, distinct supporting sessions, and — for
*distribution-eligible* patterns — `cross_flavor` and/or a `cost_impact` floor.
Candidates below the bar can be cataloged as `provisional` but not marked
distribution-ready. Add a **bloat guard** (OQ6): flag low-value or
near-duplicate patterns (same locus/signal-type already cataloged) so the
catalog stays lean and agent context budgets are protected. Knobs live in
`config.toml` alongside the existing retention/detect settings.
## State Hub Decision Integration
```task
id: AGENTIC-WP-0004-T05
status: done
priority: medium
state_hub_task_id: "449f12d4-fae0-450d-873f-143b3a570b5a"
```
Record every promote/reject as an **auditable hub decision** (FR-U4) via the
decision API (`record_decision` / `resolve_decision`), capturing rationale, the
source candidate key, and the evidence snapshot. **Degrade gracefully** when the
hub API is down — queue decisions locally and sync later (mirrors Phase 1's
after-the-fact status sync, recorded in the milestone for `055713a`). Keep the
hub a read model: the catalog file is the durable artifact; the decision is the
audit trail.
## Curate Entrypoint (`python -m session_memory.curate`)
```task
id: AGENTIC-WP-0004-T06
status: done
priority: medium
state_hub_task_id: "95d7747e-8407-41af-9a60-b919a4ee5e06"
```
Add a `session_memory/curate/__main__.py` entrypoint consuming detect candidates
(ranked cross-flavor first): an **interactive** review mode plus a
**batch/non-interactive** mode (e.g. `--auto-approve` above the evidence bar, for
kaizen-agent review). Emits a **catalog diff summary** (added / version-bumped /
rejected) and machine-readable JSON. Document usage in `session_memory/README.md`
next to the existing `detect` instructions, including the
detect → curate → (Phase 3) distribute flow.
## Tests + Verify Against Live Phase 1 Candidates
```task
id: AGENTIC-WP-0004-T07
status: done
priority: medium
state_hub_task_id: "20407007-0a8b-4999-a470-fa3c84e17eba"
```
Unit tests for schema/catalog/review/gating on synthetic candidates, plus an
**end-to-end** run that promotes at least one **real cross-flavor** candidate from
the live detect output (the Claude+Grok "clean pass" / "abandoned" patterns from
the WP-0003 verification) into the catalog and confirms a hub decision is logged
(or queued if the API is down). Confirm catalog round-trips and versioning is
idempotent on re-run. Refresh design open questions **OQ4/OQ5/OQ6** (PRD §12).
After workplan file updates, notify the custodian operator to run from
`~/state-hub`:
```bash
make fix-consistency REPO=agentic-resources
```
**Verification results (2026-06-07):** full suite 72/72 green (26 new curate
tests across schema/catalog/review/gating/decisions/entrypoint). Live pipeline
over real local sessions: fresh ingest 94→93 → 72 digests; detect surfaced 3
candidates, **2 cross-flavor** (Claude+Grok). `curate --auto-approve` promoted
all 3 into the files-first catalog — `sp-success-clean_pass-outcome` and
`sp-problem-abandoned-outcome` (both cross-flavor, `approved`/`distribution_ready`)
plus `sp-problem-budget_overrun-tokens` (Claude-only). 3 hub decisions queued
(API offline). Re-run was fully idempotent (3 skipped, 0 catalog writes, no
version bump). PRD §12 OQ4/OQ5/OQ6 resolved. The 3 catalog artifacts are
committed as the source of truth; operator runs `make fix-consistency` to index
them in the hub.