generated from coulomb/repo-seed
feat(P5): IHF Phase 5 complete — agent-assisted distillation
Some checks failed
Test / test (push) Has been cancelled
Some checks failed
Test / test (push) Has been cancelled
Adds bounded AI support to the IHF governance loop. All AI outputs are attributed (model_ref), reviewable (AgentReviewRecord), and reversible. No autonomous decisions; no silent requirement promotion. - T01: Schema — agent_proposals, agent_review_records, confidence_annotations (migration 1743379200) - T02: AgentProposalsController (index/show/accept/reject, idempotent review guard), global nav "Agent" link - T03: SummarizeClusterAction — Claude API cluster summary on widget show - T04: DraftRequirementAction — AI requirement draft; acceptance creates RequirementCandidate (human-gated) - T05: DetectDuplicatesAction — duplicate_flag proposal on candidate show - T06: DetectPolicySensitivityAction — policy_flag with ConfidenceAnnotations per concern scope - T07: ProposeImplementationAction — impl_proposal from decision show - T08: AgentAuditDashboardAction — autoRefresh; KPI row, unreviewed queue, recent proposals, attribution log matrix - T09: integration tests, SCOPE.md updated, phase5-summary.md, flake.nix adds http-conduit/aeson/string-conversions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
82
docs/phase5-summary.md
Normal file
82
docs/phase5-summary.md
Normal file
@@ -0,0 +1,82 @@
|
||||
# Phase 5 Summary — Agent-Assisted Distillation and Suggestion
|
||||
|
||||
**Workplan:** IHUB-WP-0005
|
||||
**Completed:** 2026-03-29
|
||||
**Phase:** 5 of 8 in the IHF specification
|
||||
|
||||
---
|
||||
|
||||
## What Was Built
|
||||
|
||||
Phase 5 introduces bounded AI support into the IHF governance loop. All AI outputs are attributed (model_ref recorded), reviewable (AgentReviewRecord), and reversible (proposals can be rejected). No autonomous final decisions. No silent requirement promotion.
|
||||
|
||||
### T01 — Schema
|
||||
|
||||
Three new tables:
|
||||
- **`agent_proposals`** — stores every AI-generated output (summary, requirement_draft, duplicate_flag, policy_flag, impl_proposal). Status: `pending → accepted | rejected | superseded`. `model_ref` and optional `confidence` on every row.
|
||||
- **`agent_review_records`** — one per proposal (UNIQUE on `proposal_id`). Human decision: `accepted | rejected | modified`. Idempotent: second accept/reject returns "Already reviewed".
|
||||
- **`confidence_annotations`** — per-dimension breakdown of AI confidence (accuracy, relevance, completeness, policy_alignment). Linked to proposal via FK.
|
||||
|
||||
### T02 — AgentProposalsController
|
||||
|
||||
`AgentProposalsAction` (index filterable by type and status), `ShowAgentProposalAction`, `AcceptProposalAction`, `RejectProposalAction`. Proposals are immutable audit artifacts — no update/delete. Global nav "Agent" link added.
|
||||
|
||||
### T03 — Cluster Summarization
|
||||
|
||||
`SummarizeClusterAction` (POST from widget show page). Fetches last 20 annotations + threads, calls `claude-sonnet-4-6` with a factual distillation prompt (max_tokens=300), creates a `summary` AgentProposal. "Summarize Feedback" button on widget show page. API errors produce a user-visible flash message.
|
||||
|
||||
### T04 — AI-Drafted Requirement Candidate
|
||||
|
||||
`DraftRequirementAction` (POST from widget show page, gated on ≥ 3 annotations). Prompts Claude for JSON `{title, description}`. Creates a `requirement_draft` AgentProposal. On `AcceptProposalAction` for a requirement_draft: parses JSON and creates a `RequirementCandidate` with `category="friction"`, `status="open"`. No candidate is created without human acceptance.
|
||||
|
||||
### T05 — Duplicate Candidate Detection
|
||||
|
||||
`DetectDuplicatesAction` (POST from candidate show page). Sends target candidate + all others to Claude, requests JSON `{duplicates: [{id, reason}]}`. Creates a `duplicate_flag` AgentProposal. Informational only — no automated merging.
|
||||
|
||||
### T06 — Policy-Sensitive Issue Detection
|
||||
|
||||
`DetectPolicySensitivityAction` (POST from candidate show page). Sends candidate + widget `policy_scope` context to Claude, requests JSON `{concerns: [{scope, note}], severity}`. Creates a `policy_flag` AgentProposal with numeric `confidence` (low=0.3, medium=0.6, high=0.9). Creates one `ConfidenceAnnotation` per concern scope. Amber badge if concerns, green if clean.
|
||||
|
||||
### T07 — Implementation Path Proposal
|
||||
|
||||
`ProposeImplementationAction` (POST from decision show page). Fetches decision + requirement + existing impl refs, prompts Claude for JSON `{proposals: [{work_item_ref, system, rationale}]}`. Creates an `impl_proposal` AgentProposal. "Propose Implementation" button on decision show page.
|
||||
|
||||
### T08 — Agent Audit Dashboard
|
||||
|
||||
`AgentAuditDashboardAction` wrapped with `autoRefresh do`. Five panels:
|
||||
1. **KPI row**: total proposals / pending count / acceptance rate / rejection rate
|
||||
2. **Proposals by type**: count per type with color badges
|
||||
3. **Unreviewed queue**: pending proposals oldest-first with "Review" links
|
||||
4. **Recent proposals** (last 20): type, source widget, status, confidence, age
|
||||
5. **Attribution log**: model_ref × proposal_type count matrix
|
||||
|
||||
Linked from hub show page (purple "Agent Audit" button).
|
||||
|
||||
---
|
||||
|
||||
## Governance Constraints Upheld
|
||||
|
||||
- **Attributability**: every `AgentProposal` records `model_ref`. Reviewers see exactly which model produced the output.
|
||||
- **Human control**: no proposal auto-promotes. `requirement_draft` → `RequirementCandidate` only after explicit acceptance via `AcceptProposalAction`.
|
||||
- **Idempotency**: `UNIQUE (proposal_id)` on `agent_review_records`. Second accept/reject returns "Already reviewed" — no double-review.
|
||||
- **Error isolation**: all Claude API calls are wrapped; failures produce a flash message and redirect, never a 500.
|
||||
- **Confidence is optional**: `AgentProposal.confidence` is nullable. Summaries have no numeric score; policy flags derive confidence from severity.
|
||||
|
||||
---
|
||||
|
||||
## Known Limitations
|
||||
|
||||
- **No streaming**: Claude API calls are synchronous within the controller action. For large annotation sets this may cause slow response times. Future work: background job with status polling.
|
||||
- **Confidence annotations for T06 use same score for all dimensions**: each concern dimension gets the overall severity score, not a per-dimension score from Claude. A refined prompt could request per-dimension scores.
|
||||
- **Duplicate detection scales linearly**: the full candidate list is sent to Claude. For large datasets (>100 candidates) the prompt will be large. Future work: embedding-based pre-filtering.
|
||||
- **No embeddings storage**: all intelligence is delegated to the Anthropic API. Phase 5 adds no local model serving or vector store.
|
||||
|
||||
---
|
||||
|
||||
## Phase 6 Readiness
|
||||
|
||||
Phase 6 (Cross-Hub Integration) can build on:
|
||||
- `AgentProposal` as a structured output artifact that can be routed across hubs
|
||||
- `ConfidenceAnnotation` as a per-dimension quality signal for cross-hub filtering
|
||||
- The attribution log as an audit trail for multi-hub AI operations
|
||||
- All Phase 1–5 traceability chain intact: Widget → Annotation → Candidate → Decision → Deployment → OutcomeSignal → AgentProposal
|
||||
Reference in New Issue
Block a user