Files

tegwick d06791f070 session-memory Phase 2: verify + catalog artifacts (T07)

End-to-end verification over real local sessions: ingest 94->93 -> 72 digests;
detect 3 candidates (2 cross-flavor); curate --auto-approve cataloged 3
SolutionPatterns (2 cross-flavor approved/distribution_ready, 1 Claude-only),
re-run fully idempotent, 3 hub decisions queued (API offline). Commits the 3
catalog artifacts as the source of truth. PRD §12 OQ4/OQ5/OQ6 marked resolved;
README + design refreshed. Workplan finished; suite 72/72.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-07 10:08:52 +02:00

7.6 KiB

Raw Permalink Blame History

id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_workstream_id

id	type	title	domain	repo	status	owner	topic_slug	created	updated	state_hub_workstream_id
AGENTIC-WP-0004	workplan	Coding Session Memory — Phase 2 (Curate: review workflow + Pattern Catalog)	helix_forge	agentic-resources	finished	codex	helix-forge	2026-06-06	2026-06-07	b3703684-f60e-42f3-b03e-dabe3e8ce3f4

Coding Session Memory — Phase 2 (Curate)

Implements the Curate phase (PRD §6.3, FR-U1–FR-U4) of PRD-helix-forge, continuing AGENTIC-WP-0003 (Detect).

Phase 1 surfaces ranked candidate problem/success patterns with evidence (python -m session_memory.detect --json, persisted to the Tier 2 patterns table by detect/cluster.py::Pattern). Phase 2 turns those candidates into reviewed, versioned Solution Patterns held in an in-repo Pattern Catalog — the source of truth that Phase 3 (Distribute) renders into per-flavor artifacts.

Design boundary (ADR-001 / PRD §9): the catalog is files-first — solution patterns originate as versioned files in this repo; the State Hub indexes them and records each promote/reject as an auditable decision. The agnostic core stays flavor-neutral; per-flavor knowledge lives only in rendering hints consumed later by distributor adapters (PRD §6.4 / FR-A2). New code lands under a new session_memory/curate/ package, mirroring the detect/ layout from Phase 1.

Relevant design open questions this phase resolves: OQ4 (one agnostic representation that still gives distributors enough to render natively), OQ5 (minimum trustworthy evidence bar before a pattern is distribution-eligible), OQ6 (preventing pattern bloat / context-budget degradation).

Solution Pattern Schema + Per-Flavor Rendering Hints

id: AGENTIC-WP-0004-T01
status: done
priority: high
state_hub_task_id: "c6d20bb6-7b6c-48fd-bd25-30a349514f41"

Define the agnostic Solution Pattern artifact (FR-U2, OQ4) in session_memory/curate/schema.py: stable id, name, semantic version, problem description, one or more recommended resolutions, applicability scope (repos/domains/flavors), provenance (source candidate key + an evidence snapshot copied from the detect Pattern), and per-flavor rendering hints kept in a separate sub-structure so the core stays flavor-agnostic while distributors get enough to render high-quality native artifacts. Dataclass + deterministic serialization (sorted keys), reusing the Pattern.to_dict() contract for the embedded evidence. Unit-tested for round-trip stability.

Versioned Pattern Catalog Store (files-first)

id: AGENTIC-WP-0004-T02
status: done
priority: high
state_hub_task_id: "d40c7810-fd1e-4b14-8577-b8a64ddd337b"

Implement the in-repo Pattern Catalog as the source of truth (FR-U3, ADR-001) in session_memory/curate/catalog.py: versioned solution-pattern files under a catalog dir (e.g. session_memory/catalog/<pattern-id>.json), stable IDs, a version bump on edit (supersede-in-place with history preserved), and load/save/list with dedup on pattern identity (the source candidate key). Files originate work; the hub indexes them. Verify save→load is lossless and re-saving an unchanged pattern is a no-op (no spurious version bump).

Review Workflow (discuss / approve / reject → promote)

id: AGENTIC-WP-0004-T03
status: done
priority: high
state_hub_task_id: "e303d01f-564e-4499-9ce5-22cf959ed84c"

Implement the curation workflow (FR-U1/FR-U2) in session_memory/curate/review.py: load Phase 1 detect candidates with their evidence (cross-flavor first), present each candidate, accept a discuss/approve/reject action, and on approve promote the candidate into a Solution Pattern written to the catalog (T02) with default rendering-hint stubs the reviewer can refine. Re-review is idempotent: candidates already promoted are matched on source key and updated in place, never duplicated; a prior reject is remembered so it is not re-surfaced unless evidence changed.

Promotion Evidence-Bar + Bloat Guard

id: AGENTIC-WP-0004-T04
status: done
priority: medium
state_hub_task_id: "d474425d-18af-48e4-8f5b-7716b2da0057"

Gate promotion on a minimum trustworthy evidence threshold (OQ5): configurable floors on frequency, distinct supporting sessions, and — for distribution-eligible patterns — cross_flavor and/or a cost_impact floor. Candidates below the bar can be cataloged as provisional but not marked distribution-ready. Add a bloat guard (OQ6): flag low-value or near-duplicate patterns (same locus/signal-type already cataloged) so the catalog stays lean and agent context budgets are protected. Knobs live in config.toml alongside the existing retention/detect settings.

State Hub Decision Integration

id: AGENTIC-WP-0004-T05
status: done
priority: medium
state_hub_task_id: "449f12d4-fae0-450d-873f-143b3a570b5a"

Record every promote/reject as an auditable hub decision (FR-U4) via the decision API (record_decision / resolve_decision), capturing rationale, the source candidate key, and the evidence snapshot. Degrade gracefully when the hub API is down — queue decisions locally and sync later (mirrors Phase 1's after-the-fact status sync, recorded in the milestone for 055713a). Keep the hub a read model: the catalog file is the durable artifact; the decision is the audit trail.

Curate Entrypoint (`python -m session_memory.curate`)

id: AGENTIC-WP-0004-T06
status: done
priority: medium
state_hub_task_id: "95d7747e-8407-41af-9a60-b919a4ee5e06"

Add a session_memory/curate/__main__.py entrypoint consuming detect candidates (ranked cross-flavor first): an interactive review mode plus a batch/non-interactive mode (e.g. --auto-approve above the evidence bar, for kaizen-agent review). Emits a catalog diff summary (added / version-bumped / rejected) and machine-readable JSON. Document usage in session_memory/README.md next to the existing detect instructions, including the detect → curate → (Phase 3) distribute flow.

Tests + Verify Against Live Phase 1 Candidates

id: AGENTIC-WP-0004-T07
status: done
priority: medium
state_hub_task_id: "20407007-0a8b-4999-a470-fa3c84e17eba"

Unit tests for schema/catalog/review/gating on synthetic candidates, plus an end-to-end run that promotes at least one real cross-flavor candidate from the live detect output (the Claude+Grok "clean pass" / "abandoned" patterns from the WP-0003 verification) into the catalog and confirms a hub decision is logged (or queued if the API is down). Confirm catalog round-trips and versioning is idempotent on re-run. Refresh design open questions OQ4/OQ5/OQ6 (PRD §12). After workplan file updates, notify the custodian operator to run from ~/state-hub:

make fix-consistency REPO=agentic-resources

Verification results (2026-06-07): full suite 72/72 green (26 new curate tests across schema/catalog/review/gating/decisions/entrypoint). Live pipeline over real local sessions: fresh ingest 94→93 → 72 digests; detect surfaced 3 candidates, 2 cross-flavor (Claude+Grok). curate --auto-approve promoted all 3 into the files-first catalog — sp-success-clean_pass-outcome and sp-problem-abandoned-outcome (both cross-flavor, approved/distribution_ready) plus sp-problem-budget_overrun-tokens (Claude-only). 3 hub decisions queued (API offline). Re-run was fully idempotent (3 skipped, 0 catalog writes, no version bump). PRD §12 OQ4/OQ5/OQ6 resolved. The 3 catalog artifacts are committed as the source of truth; operator runs make fix-consistency to index them in the hub.

7.6 KiB Raw Permalink Blame History Unescape Escape