agentic-resources

Author	SHA1	Message	Date
tegwick	4f28cd67cf	session-memory: Phase 4 Measure — baseline, effectiveness, trend (WP-0009) Closes the loop. metrics.py: fleet metrics (infra-overhead share, error rate, schema-thrash, token percentiles, success) + persisted baseline trend. effect.py: before/after per-pattern effectiveness with an improved verdict per metric. measure entrypoint with trend + --since effectiveness + JSON. Recorded pre-fix baseline: 27 sessions, overhead median 11.7%, error rate 0.96, schema-thrash 8. 13 new tests; suite 139/139. Capture->Detect->Curate->Distribute->Measure complete. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 15:49:22 +02:00
tegwick	035c7a20d3	session-memory: Read-before-Edit reflex + curated pattern (WP-0008) Acts on the #1 friction finding. T01: added a data-cited Read-before-Edit / re-read-on-stale reflex to AGENTS.md (top error: 'File has not been read yet', 12/27 sessions). T02: captured it as a curated SolutionPattern (sp-problem-file_not_read-edit, approved/distribution_ready) with real resolutions + per-flavor hints, so Distribute proposes it across repos/flavors — closing assess->curate->distribute on a real pattern. Suite 126/126. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 15:27:22 +02:00
tegwick	59632e94db	session-memory: distribute entrypoint + live verify (WP-0007 T05) python -m session_memory.distribute: reads approved catalog patterns, builds targets from repo->domain map x flavors, renders scoped per-flavor proposals (HITL) + active registry. Live verify against the real catalog: 12 renders across 5 repos, idempotent, provisional skipped. proposals/ gitignored (regenerated); active_patterns.json committed. README documents detect->curate-> distribute. Phase 3 finished; suite 126/126. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 15:25:20 +02:00
tegwick	00e8958540	session-memory: scoping + proposals + active registry (WP-0007 T04) distribute/proposals.py: Scope-aware targeting (FR-X2, empty axis = any), render distributable (approved+distribution_ready) patterns into a proposals/ tree mirroring target paths — proposed not applied (FR-X3, HITL), idempotent on re-run. ActiveRegistry (FR-X4) records which pattern+version is proposed in which (repo,flavor). 6 new tests; suite 123/123. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 15:09:40 +02:00
tegwick	9e28b1b806	session-memory: Claude + Codex + Grok distributors + registry (WP-0007 T02/T03) Thin per-flavor distributors over the shared base: Claude (CLAUDE.md, optional skill-stub mode), Codex (AGENTS.md), Grok (.grok/instructions.md). registry maps flavor->distributor — adding a flavor is one entry + one module. Same agnostic body renders to distinct per-flavor targets (FR-A3). 7 new tests; suite 117/117. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 15:06:15 +02:00
tegwick	7646cbc358	session-memory: distributor base + Artifact (WP-0007 T01) distribute/base.py: Artifact dataclass + Distributor protocol + idempotent BEGIN/END snippet markers (upsert_block replaces a pattern's block in place so re-distribution doesn't duplicate) + agnostic markdown body rendering from SolutionPattern fields. BaseDistributor honours per-flavor body/target hints. 8 new tests; suite 110/110. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 15:02:47 +02:00
tegwick	9e6f8a6e08	Register WP-0007 (Distribute), WP-0008 (Read-before-Edit), WP-0009 (Measure) Three workplans queued and registered with the State Hub (via REST — MCP write layer is erroring this session): - AGENTIC-WP-0007 Phase 3 Distribute: per-flavor distributor adapters render approved catalog patterns into proposed (HITL) artifacts, scoped by repo/domain. - AGENTIC-WP-0008 Read-before-Edit reflex: act on the #1 friction finding. - AGENTIC-WP-0009 Phase 4 Measure: baseline + before/after effectiveness + trend. Proceeding in that order. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 14:58:03 +02:00
tegwick	ea03cbdd47	chore(consistency): sync task status from DB [auto] Updated by fix-consistency on 2026-06-07: - update .custodian-brief.md for agentic-resources	2026-06-07 13:46:45 +02:00
tegwick	1b6081cd88	session-memory: denoise error fingerprints (WP-0006 follow-up) Tighten _is_failed: exclude successful hub JSON responses (top-level no-error payloads) and file-read snapshots (numbered cat -n source lines) that were polluting error_snippets. JSON verdict classifies error vs success payloads directly. Cuts distinct fingerprints 444 -> 269 (~40%) over the real corpus with the top errors unchanged. Assessment caveat updated. 5 new tests; suite 102/102. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 13:39:08 +02:00
tegwick	7cce276d32	session-memory: error root-cause assessment + v2 re-ingest (WP-0006 T03) Re-ingested under schema v2 (populates error_snippets) and re-ran detect over 27 real sessions. Added a 'content-level root causes' section to docs/ASSESSMENT-infra-friction.md: top recurring error is Edit/Write-before-Read (12/27 sessions, 8 repos), then stale-read conflicts, a cross-flavor (claude+grok) make fix-consistency failure, and State Hub MCP instability. Documented a fingerprint-noise caveat. WP-0006 finished; suite 98/98. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 13:09:29 +02:00
tegwick	e022c0f9d6	session-memory: recurring-error signal + clustering (WP-0006 T02) detect/signals.py sig_recurring_error emits one signal per distinct error fingerprint per session (magnitude = in-session occurrences), so the same error recurring across sessions/repos/flavors clusters into a candidate root-cause problem pattern via the existing clusterer — cross-flavor flagged automatically. 3 new tests; suite 98/98 green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 13:01:29 +02:00
tegwick	2bd6aa3b41	chore(consistency): sync task status from DB [auto] Updated by fix-consistency on 2026-06-07: - update .custodian-brief.md for agentic-resources	2026-06-07 12:48:18 +02:00
tegwick	97379e9658	session-memory: error-body mining into digest (WP-0006 T01) build_digest now extracts normalized error fingerprints + samples from failed events (error kind + failing tool_result bodies) into a durable error_snippets list — paths/numbers/uuids/addrs stripped so the same error collapses to one fingerprint with a count; Python traceback header skipped in favour of the real exception line. Durable in Tier 2 (survives Tier 1 eviction). SCHEMA_VERSION -> 2 (re-ingest needed to populate). 7 new tests; suite 95/95 green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 12:45:01 +02:00
tegwick	dbd212d2b1	chore(consistency): sync task status from DB [auto] Updated by fix-consistency on 2026-06-07: - update .custodian-brief.md for agentic-resources	2026-06-07 11:59:38 +02:00
tegwick	896fde59f0	Register AGENTIC-WP-0006 (error-body mining) workplan Captures normalized error fingerprints into the durable digest and clusters recurring root-cause errors across sessions — closes the content-level 'why' gap called out in the friction assessment. 3 tasks; we implement this in helix_forge. (State Hub skill handed off to the state-hub worker as STATE-WP-0058.) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 11:56:17 +02:00
tegwick	48618293b0	session-memory: friction assessment + hardened catalog (WP-0005 T03) Re-ran ingest->detect with the quality filter + infra signals over real local sessions (72 captured -> 27 real). Purged the false-positive 'abandoned' catalog entry and re-curated; catalog now carries tool_thrash/schema_thrash/infra_overhead patterns. docs/ASSESSMENT-infra-friction.md ranks the friction: ~17.6% of real tool activity is hub/task/schema plumbing (State Hub 10.3%, one session 231 calls; ToolSearch in 81% of sessions). Validates the CLI/MCP-skill hypothesis as top-2; recommends a State Hub skill (front-load schemas + batched writes) + bulk hub ops. Workplan finished; suite 88/88. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 11:18:27 +02:00
tegwick	21c714e286	session-memory: infra-overhead + thrash signals (WP-0005 T02) signals.py: tool_bucket helper + three tool_histogram-based extractors that the outcome/marker signals were blind to — sig_infra_overhead (hub+task+schema share of tool calls over threshold), sig_schema_thrash (repeated ToolSearch), and sig_tool_thrash (one tool dominating). Thresholds in build_context. 8 new tests; suite 88/88 green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 11:12:09 +02:00
tegwick	70433cda61	session-memory: session-quality filter (WP-0005 T01) detect/quality.py: is_real_coding_session drops health-checks / smoke-tests / interrupted / trivially-short sessions (event floor, repo present, substantive tool activity, non-trivial prompt). Wired into run_detect so signals only form over real sessions — fixes the abandoned false-positive. [detect.quality] knobs; existing detect/curate fixtures made realistic. 8 new tests; suite 80/80. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 11:07:22 +02:00
tegwick	56b2f576de	AGENTIC-WP-0001: complete T02 + close bootstrap workplan T02 was the one genuinely-incomplete bootstrap task: AGENTS.md had no dev-workflow section. Added one documenting the pure-stdlib Python 3.11+ toolchain, pytest, and the session_memory ingest/detect/curate entrypoints so future sessions can verify changes. T01 (integration files) and T03 (first real workplan) were already satisfied; reconciled stale ready/todo bookkeeping to finished/done. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 10:15:23 +02:00
tegwick	d06791f070	session-memory Phase 2: verify + catalog artifacts (T07) End-to-end verification over real local sessions: ingest 94->93 -> 72 digests; detect 3 candidates (2 cross-flavor); curate --auto-approve cataloged 3 SolutionPatterns (2 cross-flavor approved/distribution_ready, 1 Claude-only), re-run fully idempotent, 3 hub decisions queued (API offline). Commits the 3 catalog artifacts as the source of truth. PRD §12 OQ4/OQ5/OQ6 marked resolved; README + design refreshed. Workplan finished; suite 72/72. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 10:08:52 +02:00
tegwick	519e76442a	session-memory Phase 2: curate entrypoint + README (T06) python -m session_memory.curate: refreshes detect candidates, then drives them through review interactively or with --auto-approve (batch, gate-driven) / --json. Emits a catalog diff summary; queues hub decisions when offline. [curate] config gains decision_queue + workstream id. README documents the detect -> curate -> distribute flow and the gate knobs. 2 new tests; suite 72/72. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 10:00:56 +02:00
tegwick	4b7a628b6f	session-memory Phase 2: hub decision integration (T05) decisions.py: every final promote/reject becomes a record_decision-shaped payload (rationale + source key + evidence snapshot). DecisionRecorder degrades gracefully under a hub outage — pluggable sink with a durable local-queue fallback and ordered flush/replay (mirrors Phase 1's after-the-fact sync). Wired into review() via an optional recorder. 6 new tests; suite 70/70 green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 00:31:22 +02:00
tegwick	ab22d22bfb	session-memory Phase 2: evidence-bar + bloat guard (T04) gating.py: two-tier evidence bar (OQ5) — promote floor (frequency/sessions/ cost_impact) plus a stricter distribution-eligibility floor that sets a promoted pattern to approved+distribution_ready vs provisional. Wired into review() so thin approvals land provisional. bloat_warnings flags duplicate and near-duplicate (same signal-type+locus) candidates (OQ6). [curate]/ [curate.gate] knobs in config.toml. 6 new tests; suite 64/64 green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 00:28:34 +02:00
tegwick	e51fd8154d	session-memory Phase 2: review workflow (T03) UI-free discuss/approve/reject engine driving detect candidates into the catalog via a decide callback. candidate_to_pattern builds a provisional SolutionPattern with per-flavor rendering-hint stubs. ReviewLog makes re-review idempotent: prior rejects remembered, re-surfaced only when the evidence fingerprint changes. 6 new tests; suite 58/58 green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 00:25:10 +02:00
tegwick	c6164a82ba	session-memory Phase 2: versioned Pattern Catalog store (T02) Files-first catalog (one JSON per pattern, id = source-key). Single idempotent upsert path: added / unchanged / updated (status-only, no bump) / versioned (content change bumps semver + archives prior to <id>.history.jsonl). Dedup is structural on pattern id. 5 new tests; suite 52/52 green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 00:18:01 +02:00
tegwick	5f810a6992	session-memory Phase 2: Solution Pattern schema (T01) Curate package scaffold + flavor-agnostic SolutionPattern artifact with separate per-flavor rendering hints (OQ4): Resolution/Scope/Provenance sub-records, stable source-key id, semver bump helper, deterministic round-trip serialization. 7 new tests; suite 47/47 green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 00:16:46 +02:00
tegwick	43d76b5cf8	chore(consistency): sync task status from DB [auto] Updated by fix-consistency on 2026-06-07: - update .custodian-brief.md for agentic-resources	2026-06-07 00:11:12 +02:00
tegwick	055713aa4f	session-memory Phase 1: T08 verify across all three flavors + docs Marks AGENTIC-WP-0003 finished. Full suite 40/40 green; live pipeline over real local sessions (Codex via fixtures) surfaced 3 candidate patterns, 2 cross-flavor (Claude+Grok) — PRD success metric met. README documents the detect entrypoint and Phase 0/1/next status. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 23:39:37 +02:00
tegwick	436a96dcd8	session-memory Phase 1: Detect pipeline (T04-T07) - detect/signals.py: pure extractors over digests (retry storm, repeated errors, budget overrun vs corpus p90, abandoned, clean pass, recovery) - detect/cluster.py: deterministic clustering into candidate Patterns with evidence (sessions/repos/flavors/cost impact) + cross-flavor flagging - detect/__main__.py: python -m session_memory.detect, ranked report (cross-flavor first) + --json; persists candidates to Tier 2 patterns table - core/store.py: list_digests + save_patterns - tests for signals, cluster, detect entrypoint Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 22:31:13 +02:00
tegwick	06767ef924	session-memory Phase 1: Grok adapter (T02) - adapters/grok.py: reads the per-session dir (summary.json + chat_history.jsonl + events.jsonl + updates.jsonl); conversation from chat_history, lifecycle/ turn from events, tool-call names paired in order from updates ACP stream - registered in ingest dispatch; codex+grok sources enabled in config.toml - tests/test_grok_adapter.py (synthetic + real local sessions) - live multi-flavor dry-run discovers 89 sessions across flavors Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 22:12:30 +02:00
tegwick	bc11cb9aec	session-memory Phase 1: Codex adapter (T01) + multi-file merge (T03) - adapters/common.py: shared Normalized + helpers (resolve_repo, classify_tool, jsonl iter, etc.); claude.py refactored to use it (Normalized re-exported) - adapters/codex.py: rollout {timestamp,type,payload} parser; session_meta/ response_item/event_msg mapping; flat call_id join; token_count cost; registered in ingest dispatch - core/store.py: ingest() now merges multi-file sessions by content fingerprint, appends new events with offset seq (design OQ6); idempotent - tests/test_codex_adapter.py, tests/test_merge.py Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 21:55:32 +02:00
tegwick	5aea22f24f	Register AGENTIC-WP-0003 (session-memory Phase 1) with State Hub Codex + Grok adapters, multi-file session merge, and the Detect pipeline (signals -> clustering -> evidence -> candidate report). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 21:50:23 +02:00
tegwick	7c6f4358ee	session-memory Phase 0: end-to-end verification + docs (T07) - verified full sweep over 85 real local Claude transcripts: 63 sessions ingested+analyzed, eviction under tiny cap freed 26MB with zero data loss, digest-preservation invariant holds, idempotent re-run - session_memory/README.md: usage, scheduling, retention knobs - design doc: OQ4 resolved (median ~49KB/session), OQ6 (multi-file sessions) - workplan AGENTIC-WP-0002 finished Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 21:44:46 +02:00
tegwick	586ed90948	session-memory Phase 0: ingest cursor + sweep entrypoint + config (T06) - session_memory/core/cursor.py: size/mtime change detection sidecar - session_memory/config.toml: store paths, retention caps, per-source globs (claude on, codex/grok off for Phase 1), repo->domain map - session_memory/ingest.py: discover->normalize->store->digest->evict; --dry-run creates/writes nothing; python -m session_memory.ingest - tests/test_ingest.py; live dry-run parsed 84/85 real local sessions Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 21:41:59 +02:00
tegwick	451fb8f1f3	session-memory Phase 0: budget-based retention sweep (T05) - session_memory/core/retention.py: RetentionConfig + sweep() with backstop, budget (oldest-analyzed-first, never touches un-analyzed), and hard-cap overflow (analyze-now then reported last-resort data_loss); EvictionReport - tests/test_retention.py covers all four branches Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 21:37:40 +02:00
tegwick	abb888f3ef	session-memory Phase 0: session digest + outcome heuristic (T04) - session_memory/core/digest.py: build_digest (cost totals, kind/tool histograms, markers, snippets) + cross-flavor infer_outcome heuristic; analyze() promotes Tier1->Tier2 and sets analyzed_at (-> evictable) - tests/test_digest.py Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 20:03:04 +02:00
tegwick	29fc211a14	session-memory Phase 0: Tier1/Tier2 store (T03) - session_memory/core/store.py: SQLite rows + blob-dir bodies, idempotent ingest on (session_uid,seq), Tier1/Tier2 usage accounting, evict_raw that drops raw but preserves the digest; watermark columns authoritative - tests/test_store.py: ingest idempotency, accounting, eviction invariant Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 19:10:02 +02:00
tegwick	1c29a94fa9	session-memory Phase 0: normalized schema (T01) + Claude adapter (T02) - session_memory/core/schema.py: Session/SessionEvent/Cost dataclasses, flavor-prefixed uids, watermarks, kind/outcome validation (T01) - session_memory/adapters/claude.py: JSONL -> Normalized bundle, turn DAG via uuid/parentUuid, kind mapping, cost from message.usage (T02) - tests: schema round-trip + adapter (synthetic + real local session) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 19:06:10 +02:00
tegwick	ffe191d44e	Add Helix Forge PRD, session-memory design, and Phase 0 workplan - docs/PRD-helix-forge.md: Capture→Detect→Curate→Distribute→Measure loop - docs/DESIGN-session-memory.md: tiered store + budget-based eviction; verified session-log schemas for Claude/Codex/Grok - workplans/AGENTIC-WP-0002: Phase 0 (registered with State Hub) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-06 19:00:30 +02:00
tegwick	0f8382b505	Statehub registration	2026-06-06 00:10:52 +02:00
tegwick	e2e2ed00c7	chore(consistency): sync task status from DB [auto] Updated by fix-consistency on 2026-06-06: - update .custodian-brief.md for agentic-resources	2026-06-06 00:10:03 +02:00
tegwick	bd6bb9a7e2	Seeded with intent	2026-06-05 23:58:21 +02:00
Bernd Worsch	f931cff63e	Update README.md	2026-06-05 21:43:41 +00:00
Coulomb Social	0147026f56	Initial commit	2026-06-05 21:43:17 +00:00

44 Commits