Thin per-flavor distributors over the shared base: Claude (CLAUDE.md, optional
skill-stub mode), Codex (AGENTS.md), Grok (.grok/instructions.md). registry maps
flavor->distributor — adding a flavor is one entry + one module. Same agnostic
body renders to distinct per-flavor targets (FR-A3). 7 new tests; suite 117/117.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
distribute/base.py: Artifact dataclass + Distributor protocol + idempotent
BEGIN/END snippet markers (upsert_block replaces a pattern's block in place so
re-distribution doesn't duplicate) + agnostic markdown body rendering from
SolutionPattern fields. BaseDistributor honours per-flavor body/target hints.
8 new tests; suite 110/110.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Three workplans queued and registered with the State Hub (via REST — MCP write
layer is erroring this session):
- AGENTIC-WP-0007 Phase 3 Distribute: per-flavor distributor adapters render
approved catalog patterns into proposed (HITL) artifacts, scoped by repo/domain.
- AGENTIC-WP-0008 Read-before-Edit reflex: act on the #1 friction finding.
- AGENTIC-WP-0009 Phase 4 Measure: baseline + before/after effectiveness + trend.
Proceeding in that order.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Tighten _is_failed: exclude successful hub JSON responses (top-level no-error
payloads) and file-read snapshots (numbered cat -n source lines) that were
polluting error_snippets. JSON verdict classifies error vs success payloads
directly. Cuts distinct fingerprints 444 -> 269 (~40%) over the real corpus with
the top errors unchanged. Assessment caveat updated. 5 new tests; suite 102/102.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Re-ingested under schema v2 (populates error_snippets) and re-ran detect over
27 real sessions. Added a 'content-level root causes' section to
docs/ASSESSMENT-infra-friction.md: top recurring error is Edit/Write-before-Read
(12/27 sessions, 8 repos), then stale-read conflicts, a cross-flavor (claude+grok)
make fix-consistency failure, and State Hub MCP instability. Documented a
fingerprint-noise caveat. WP-0006 finished; suite 98/98.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
detect/signals.py sig_recurring_error emits one signal per distinct error
fingerprint per session (magnitude = in-session occurrences), so the same error
recurring across sessions/repos/flavors clusters into a candidate root-cause
problem pattern via the existing clusterer — cross-flavor flagged automatically.
3 new tests; suite 98/98 green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
build_digest now extracts normalized error fingerprints + samples from failed
events (error kind + failing tool_result bodies) into a durable error_snippets
list — paths/numbers/uuids/addrs stripped so the same error collapses to one
fingerprint with a count; Python traceback header skipped in favour of the real
exception line. Durable in Tier 2 (survives Tier 1 eviction). SCHEMA_VERSION ->
2 (re-ingest needed to populate). 7 new tests; suite 95/95 green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Captures normalized error fingerprints into the durable digest and clusters
recurring root-cause errors across sessions — closes the content-level 'why' gap
called out in the friction assessment. 3 tasks; we implement this in helix_forge.
(State Hub skill handed off to the state-hub worker as STATE-WP-0058.)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Re-ran ingest->detect with the quality filter + infra signals over real local
sessions (72 captured -> 27 real). Purged the false-positive 'abandoned' catalog
entry and re-curated; catalog now carries tool_thrash/schema_thrash/infra_overhead
patterns. docs/ASSESSMENT-infra-friction.md ranks the friction: ~17.6% of real
tool activity is hub/task/schema plumbing (State Hub 10.3%, one session 231 calls;
ToolSearch in 81% of sessions). Validates the CLI/MCP-skill hypothesis as top-2;
recommends a State Hub skill (front-load schemas + batched writes) + bulk hub ops.
Workplan finished; suite 88/88.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
signals.py: tool_bucket helper + three tool_histogram-based extractors that the
outcome/marker signals were blind to — sig_infra_overhead (hub+task+schema share
of tool calls over threshold), sig_schema_thrash (repeated ToolSearch), and
sig_tool_thrash (one tool dominating). Thresholds in build_context. 8 new tests;
suite 88/88 green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
detect/quality.py: is_real_coding_session drops health-checks / smoke-tests /
interrupted / trivially-short sessions (event floor, repo present, substantive
tool activity, non-trivial prompt). Wired into run_detect so signals only form
over real sessions — fixes the abandoned false-positive. [detect.quality] knobs;
existing detect/curate fixtures made realistic. 8 new tests; suite 80/80.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
T02 was the one genuinely-incomplete bootstrap task: AGENTS.md had no
dev-workflow section. Added one documenting the pure-stdlib Python 3.11+
toolchain, pytest, and the session_memory ingest/detect/curate entrypoints so
future sessions can verify changes. T01 (integration files) and T03 (first real
workplan) were already satisfied; reconciled stale ready/todo bookkeeping to
finished/done.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
python -m session_memory.curate: refreshes detect candidates, then drives them
through review interactively or with --auto-approve (batch, gate-driven) /
--json. Emits a catalog diff summary; queues hub decisions when offline.
[curate] config gains decision_queue + workstream id. README documents the
detect -> curate -> distribute flow and the gate knobs. 2 new tests; suite 72/72.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
decisions.py: every final promote/reject becomes a record_decision-shaped
payload (rationale + source key + evidence snapshot). DecisionRecorder degrades
gracefully under a hub outage — pluggable sink with a durable local-queue
fallback and ordered flush/replay (mirrors Phase 1's after-the-fact sync).
Wired into review() via an optional recorder. 6 new tests; suite 70/70 green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
gating.py: two-tier evidence bar (OQ5) — promote floor (frequency/sessions/
cost_impact) plus a stricter distribution-eligibility floor that sets a
promoted pattern to approved+distribution_ready vs provisional. Wired into
review() so thin approvals land provisional. bloat_warnings flags duplicate
and near-duplicate (same signal-type+locus) candidates (OQ6). [curate]/
[curate.gate] knobs in config.toml. 6 new tests; suite 64/64 green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
UI-free discuss/approve/reject engine driving detect candidates into the
catalog via a decide callback. candidate_to_pattern builds a provisional
SolutionPattern with per-flavor rendering-hint stubs. ReviewLog makes
re-review idempotent: prior rejects remembered, re-surfaced only when the
evidence fingerprint changes. 6 new tests; suite 58/58 green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Files-first catalog (one JSON per pattern, id = source-key). Single
idempotent upsert path: added / unchanged / updated (status-only, no bump) /
versioned (content change bumps semver + archives prior to <id>.history.jsonl).
Dedup is structural on pattern id. 5 new tests; suite 52/52 green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Marks AGENTIC-WP-0003 finished. Full suite 40/40 green; live pipeline
over real local sessions (Codex via fixtures) surfaced 3 candidate
patterns, 2 cross-flavor (Claude+Grok) — PRD success metric met.
README documents the detect entrypoint and Phase 0/1/next status.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- adapters/grok.py: reads the per-session dir (summary.json + chat_history.jsonl
+ events.jsonl + updates.jsonl); conversation from chat_history, lifecycle/
turn from events, tool-call names paired in order from updates ACP stream
- registered in ingest dispatch; codex+grok sources enabled in config.toml
- tests/test_grok_adapter.py (synthetic + real local sessions)
- live multi-flavor dry-run discovers 89 sessions across flavors
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- verified full sweep over 85 real local Claude transcripts: 63 sessions
ingested+analyzed, eviction under tiny cap freed 26MB with zero data loss,
digest-preservation invariant holds, idempotent re-run
- session_memory/README.md: usage, scheduling, retention knobs
- design doc: OQ4 resolved (median ~49KB/session), OQ6 (multi-file sessions)
- workplan AGENTIC-WP-0002 finished
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- session_memory/core/retention.py: RetentionConfig + sweep() with backstop,
budget (oldest-analyzed-first, never touches un-analyzed), and hard-cap
overflow (analyze-now then reported last-resort data_loss); EvictionReport
- tests/test_retention.py covers all four branches
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- docs/PRD-helix-forge.md: Capture→Detect→Curate→Distribute→Measure loop
- docs/DESIGN-session-memory.md: tiered store + budget-based eviction;
verified session-log schemas for Claude/Codex/Grok
- workplans/AGENTIC-WP-0002: Phase 0 (registered with State Hub)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>