Files

Test / test (push) Has been cancelled

Details

feat(P4): IHF Phase 4 complete — Outcome Observation and Antifragility Loop

Closes the IHF improvement loop. Full antifragility chain now traversable:
Widget → Annotation → Candidate → Requirement → Decision → Deployment → OutcomeSignal

New artifacts:
- DeploymentRecord (immutable, links DecisionRecord to a deployed version)
- OutcomeSignal (append-only; DB trigger prevents UPDATE/DELETE)
- ChangeEvaluation (one-per-deployment; UNIQUE constraint; 1–5 score)

New capabilities:
- DeploymentRecordsController (index, show, new, create)
- RecordOutcomeSignalAction — capture improved/regressed/neutral/inconclusive signals
- Pre/post comparison panel on deployment show (±30-day event/annotation counts)
- Regression detection — improved signal followed by high/critical annotation
- ChangeEvaluation — idempotent score+rationale per deployment
- Recurrence tracking — cycle count per widget, leaderboard
- AntifragilityDashboardAction (autoRefresh, 5 panels) per hub
- Phase 4 integration tests (T01–T08 logic coverage)
- docs/phase4-summary.md; SCOPE.md updated to Phase 4 complete

State Hub: workstream 07e9c860 → completed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-03-29 12:27:30 +00:00

14 KiB

Raw Blame History

id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_workstream_id

id	type	title	domain	repo	status	owner	topic_slug	created	updated	state_hub_workstream_id
IHUB-WP-0004	workplan	IHF Phase 4 — Outcome Observation and Antifragility Loop	inter_hub	inter-hub	done	custodian	inter_hub	2026-03-29	2026-03-29	07e9c860-e39b-407f-9c0b-d44989498b48

IHF Phase 4 — Outcome Observation and Antifragility Loop

Goal

Close the improvement loop by observing whether implemented changes actually helped. Phase 3 established the governance layer — requirements, decisions, policy constraints, and implementation references. Phase 4 connects those implementation references to deployed versions, captures outcome signals per widget, compares behaviour before and after a change, and detects regressions and recurring friction.

Background

Phase 1 (IHUB-WP-0001) delivered the Minimal Interaction Core. Phase 2 (IHUB-WP-0002) delivered Structured Feedback and Triage. Phase 3 (IHUB-WP-0003) delivered Governance and Decision Linkage. All Phase 3 exit criteria are met.

Phase 4 is the fourth of eight phases in the IHF specification (specs/InteractionHubFrameworkSpecification_v0.1.md, §14 Phase 4). It completes the antifragility loop:

Widget → InteractionEvent / Annotation
       → RequirementCandidate → Requirement
       → DecisionRecord → ImplementationChangeReference
       → DeploymentRecord
       → OutcomeSignal  ←  ChangeEvaluation
       → RegressionDetection / RecurrenceTracking

Technology stack: IHP v1.5 (Haskell, Nix), PostgreSQL, AutoRefresh (antifragility dashboard). Outcome signals and deployment records are append-only / immutable by the same conventions as InteractionEvent and TriageState.

Reference: docs/ihp-overview.md, docs/ihp-data-and-queries.md, docs/ihp-controllers-views-forms.md, docs/ihp-realtime.md.

Phase 4 Exit Criteria (from IHF spec §14 Phase 4)

The platform can determine whether a change improved outcomes
Recurrent friction becomes visible
The system supports evidence-based UI evolution

Data Artifacts Introduced (Phase 4)

DeploymentRecord, OutcomeSignal, ChangeEvaluation

Tasks

T01 — Schema: DeploymentRecord, OutcomeSignal, ChangeEvaluation

id: IHUB-WP-0004-T01
status: done
priority: high
state_hub_task_id: "4d0aa6d5-f291-4053-a487-8c64627f8271"

Add Phase 4 tables to Application/Schema.sql and write migration:

CREATE TABLE deployment_records (
    id UUID DEFAULT uuid_generate_v4() PRIMARY KEY NOT NULL,
    impl_ref_id UUID REFERENCES implementation_change_references(id) ON DELETE SET NULL,
    decision_id UUID NOT NULL REFERENCES decision_records(id) ON DELETE RESTRICT,
    version_ref TEXT NOT NULL,
    deployed_at TIMESTAMP WITH TIME ZONE DEFAULT now() NOT NULL,
    deployed_by UUID REFERENCES users(id),
    notes TEXT,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT now() NOT NULL
);

CREATE INDEX deployment_records_decision_id_idx ON deployment_records (decision_id);
CREATE INDEX deployment_records_deployed_at_idx ON deployment_records (deployed_at DESC);

-- Outcome signals — append-only, no update/delete
CREATE TABLE outcome_signals (
    id UUID DEFAULT uuid_generate_v4() PRIMARY KEY NOT NULL,
    widget_id UUID NOT NULL REFERENCES widgets(id) ON DELETE CASCADE,
    deployment_id UUID NOT NULL REFERENCES deployment_records(id) ON DELETE CASCADE,
    signal_type TEXT NOT NULL,
    value NUMERIC,
    observed_at TIMESTAMP WITH TIME ZONE DEFAULT now() NOT NULL
);

CREATE INDEX outcome_signals_widget_id_idx ON outcome_signals (widget_id);
CREATE INDEX outcome_signals_deployment_id_idx ON outcome_signals (deployment_id);
CREATE INDEX outcome_signals_observed_at_idx ON outcome_signals (observed_at DESC);

-- Enforce append-only on outcome_signals
CREATE OR REPLACE FUNCTION prevent_outcome_signal_mutation()
RETURNS TRIGGER AS $$
BEGIN
    RAISE EXCEPTION 'outcome_signals is append-only: UPDATE and DELETE are not permitted';
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER outcome_signals_no_update
    BEFORE UPDATE ON outcome_signals
    FOR EACH ROW EXECUTE FUNCTION prevent_outcome_signal_mutation();

CREATE TRIGGER outcome_signals_no_delete
    BEFORE DELETE ON outcome_signals
    FOR EACH ROW EXECUTE FUNCTION prevent_outcome_signal_mutation();

-- Change evaluations — one per deployment
CREATE TABLE change_evaluations (
    id UUID DEFAULT uuid_generate_v4() PRIMARY KEY NOT NULL,
    deployment_id UUID NOT NULL REFERENCES deployment_records(id) ON DELETE CASCADE,
    decision_id UUID REFERENCES decision_records(id) ON DELETE SET NULL,
    score SMALLINT NOT NULL CHECK (score BETWEEN 1 AND 5),
    rationale TEXT NOT NULL,
    evaluated_by UUID REFERENCES users(id),
    evaluated_at TIMESTAMP WITH TIME ZONE DEFAULT now() NOT NULL,
    UNIQUE (deployment_id)
);

CREATE INDEX change_evaluations_deployment_id_idx ON change_evaluations (deployment_id);

Valid outcome_signals.signal_type values: improved, regressed, neutral, inconclusive
deployment_records is immutable (no update/delete — append-only by convention; controller enforces)
change_evaluations has a UNIQUE constraint on deployment_id — one evaluation per deployment
Verify Haskell types are generated correctly

Exit criteria: migrate runs cleanly; all Phase 4 types available in GHCi.

T02 — DeploymentRecord controller and views

id: IHUB-WP-0004-T02
status: done
priority: high
state_hub_task_id: "4932b036-fe91-4146-9b35-7d3031894c2d"

Scaffold DeploymentRecordsController
Actions: index, show, new, create (no update/delete — immutable)
Fields: decisionId (select — required), implRefId (optional select from linked impl refs), versionRef (free text, e.g. v1.2.3, git:abc1234), deployedAt, notes
Index: table with decision title, version_ref, deployed_at, outcome signal count, evaluation score (if present)
Show: full detail + linked decision chain (→ requirement → candidate → widget) + list of outcome signals + change evaluation panel
Add "New Deployment" button on decision show page (only for decisions with at least one impl ref)

Exit criteria: Deployment records can be created and viewed; linked back to the full decision → requirement → widget chain.

T03 — OutcomeSignal capture

id: IHUB-WP-0004-T03
status: done
priority: high
state_hub_task_id: "8b39bbb3-4129-4acc-97ac-38ecfcfd7c88"

RecordOutcomeSignalAction { deploymentId } (POST from deployment show page or widget show page)
Fields: signalType (select: improved/regressed/neutral/inconclusive), value (optional 0–100), observedAt (default now)
Append-only — no edit/delete in UI (DB trigger enforces)
List signals on deployment show page ordered by observedAt DESC
List signals on widget show page (last 10, across all deployments)
Signal type color roles:
- improved → green
- regressed → red
- neutral → gray
- inconclusive → yellow/amber

Exit criteria: Outcome signals can be recorded from deployment and widget pages; append-only constraint verified; color roles applied consistently.

T04 — Pre/post comparison: interaction behaviour before and after deployment

id: IHUB-WP-0004-T04
status: done
priority: high
state_hub_task_id: "27c4de52-755a-40e7-bef3-986fe4470f7c"

ComparisonAction { deploymentId } — rendered on deployment show page as a panel
Time windows: 30 days before deployed_at vs 30 days after (or until present)
Metrics computed via SQL aggregates (no in-memory processing for large datasets):
- Interaction event count by type
- Total annotation count
- Annotation severity distribution (low/medium/high/critical counts)
- High/critical annotation rate (high+critical / total)
Render side-by-side comparison table: Before | After | Delta
Delta column: green if annotation rate decreased, red if increased, gray if flat

Exit criteria: Comparison panel renders correct before/after counts; delta direction color-coded correctly; works with no post-deployment data (shows "—").

T05 — Regression detection

id: IHUB-WP-0004-T05
status: done
priority: high
state_hub_task_id: "844a828b-b7d7-4822-becc-b377c08c673a"

A regression is defined as: a widget that has an OutcomeSignal(improved) for a deployment, followed by a new Annotation(severity IN ['high','critical']) created more than 1 day after the signal's observed_at (grace period)
RegressionQuery (pure SQL query, no controller action — used by dashboard and widget show page):
```
-- widgets with improved signal then subsequent high/critical annotation
```
Surface on widget show page: regression warning badge if widget is in regression
Surface on governance dashboard: regression count in KPI row, list of regressed widgets
Surface on antifragility dashboard (T08): prominent regression alert panel

Exit criteria: Regression query returns correct results; badge visible on affected widgets; count accurate on dashboard.

T06 — ChangeEvaluation: score changes by observed effect

id: IHUB-WP-0004-T06
status: done
priority: medium
state_hub_task_id: "391c6136-baea-417a-9291-5ba9f633e03f"

EvaluateChangeAction { deploymentId } (POST from deployment show page)
Idempotent: if change_evaluations.deployment_id already exists, redirect to deployment show page with "Already evaluated" message
Fields: score (1–5, required), rationale (textarea, required)
Show evaluation on deployment show page: score as ★ stars + rationale
Show evaluation summary on decision show page: "Deployment evaluated: ★★★★☆"
Score 1–2 → red, 3 → yellow, 4–5 → green (in all views)

Exit criteria: One evaluation per deployment; idempotent; score displayed with correct color in all views.

T07 — Recurrence tracking: detect repeated unresolved friction

id: IHUB-WP-0004-T07
status: done
priority: medium
state_hub_task_id: "6a5eea23-4e73-441a-ab93-49f5b76bf3ea"

A recurrence is: a widget that has had 2 or more RequirementCandidates created in separate decision cycles (a new cycle begins after a prior candidate for the same widget was accepted and a DeploymentRecord exists for that decision)
RecurrenceQuery (SQL): per widget, count completed cycles and flag widgets with cycle_count ≥ 2
Widget show page: recurrence count badge ("⟳ 3 cycles") if cycle_count ≥ 2
Antifragility dashboard: recurrence leaderboard — top 10 widgets by cycle count, sortable
Recurrence is informational only — no automated blocking

Exit criteria: Recurrence count accurate per widget; leaderboard renders; badge visible on widget show page.

T08 — Antifragility dashboard (AutoRefresh)

id: IHUB-WP-0004-T08
status: done
priority: high
state_hub_task_id: "e5c65c77-c757-49a6-a8e9-99c8d3503f59"

Add AntifragilityDashboardAction { hubId } to HubsController wrapped with autoRefresh do
Dashboard panels:
- KPI row: total deployments / avg evaluation score / % improved signals / regression count
- Open gaps: decisions with impl refs but no deployment record yet
- Recent deployments (last 20): version_ref, decision title, signal summary, evaluation score
- Regression alerts: widgets currently in regression state
- Recurrence leaderboard: top 10 widgets by cycle count
Link from hub Show page alongside Triage Dashboard and Governance Dashboard

Exit criteria: Dashboard live-updates on deployment/signal/evaluation changes. All five panels render with correct data.

T09 — Phase 4 gate: tests, consistency, docs

id: IHUB-WP-0004-T09
status: done
priority: high
state_hub_task_id: "1dda0a32-4913-4007-a9f4-1d86761a8cf1"

Integration tests (Test/):
- DeploymentRecord create + link to decision
- OutcomeSignal append-only (DB trigger fires on update/delete)
- Pre/post comparison: correct counts with known fixture data
- Regression detection: widget with improved signal + subsequent high annotation
- ChangeEvaluation create + idempotent (second create → duplicate rejection)
- Recurrence count: widget with 2 completed cycles
- Antifragility dashboard data fetch: compiles and returns correct counts
Consistency sync via State Hub MCP: check_repo_consistency(repo_slug="inter-hub", fix=True)
Documentation updates:
- Update SCOPE.md current state section: Phase 4 complete
- Write docs/phase4-summary.md: what was built, known limitations, Phase 5 readiness
Smoke test checklist:
- Create deployment record linked to a decision
- Record outcome signals (improved, then regressed)
- Observe pre/post comparison panel
- Evaluate the change (score 4)
- Confirm regression badge appears on widget show page
- Confirm antifragility dashboard shows all panels

Exit criteria: All tests pass; consistency sync reports no errors; smoke test completed; SCOPE.md updated.

Phase 4 Dependencies

Phase 3 schema stable (T01 depends on decision_records, implementation_change_references, widgets from Phase 3)
deployment_records before outcome_signals and change_evaluations (FK)
Schema (T01) before all controller work (T02–T08)
DeploymentRecord (T02) before OutcomeSignal (T03), comparison (T04), regression (T05), ChangeEvaluation (T06)
All feature tasks (T01–T08) before gate (T09)

Notes

DeploymentRecord is immutable. No update/delete — same convention as DecisionRecord. A mis-recorded deployment should be noted in a new DeploymentRecord with a correcting note.
OutcomeSignal is append-only. DB trigger enforces — same pattern as InteractionEvent. Observations are evidence; they cannot be revised.
ChangeEvaluation is one-per-deployment (UNIQUE constraint). A wrong evaluation cannot be changed — create a new deployment record if you need to re-evaluate a re-deployment.
Regression detection is a heuristic, not a hard constraint. It is a signal for operator attention, not an automated gate.
Recurrence tracking uses completed cycles only. A cycle is only counted when: prior candidate accepted → deployment record exists for that decision → new annotation created. Partial cycles (no deployment yet) do not count.
No ML in Phase 4. All scoring, comparison, and detection is rule-based SQL. Agent-assisted distillation begins in Phase 5.

14 KiB Raw Blame History Unescape Escape