generated from coulomb/repo-seed
Implement post-triage operational hardening
This commit is contained in:
@@ -4,11 +4,11 @@ type: workplan
|
||||
title: "Post-triage operational hardening"
|
||||
domain: custodian
|
||||
repo: activity-core
|
||||
status: ready
|
||||
status: active
|
||||
owner: codex
|
||||
topic_slug: custodian
|
||||
created: "2026-06-03"
|
||||
updated: "2026-06-03"
|
||||
updated: "2026-06-04"
|
||||
state_hub_workstream_id: "5646e13a-13af-4724-bca6-3c0d86f96733"
|
||||
---
|
||||
|
||||
@@ -31,7 +31,7 @@ task lifecycle database, a project planner, or an execution worker.
|
||||
|
||||
```task
|
||||
id: ACTIVITY-WP-0006-T01
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "5d79e3da-d26d-4cad-9cdf-5e5264bb7019"
|
||||
```
|
||||
@@ -50,11 +50,17 @@ Scope:
|
||||
Done when the full test suite passes and activity-core no longer depends on
|
||||
legacy task-status aliases for State Hub API clients or tests.
|
||||
|
||||
2026-06-04: Completed. `AGENTS.md` now uses State Hub task statuses
|
||||
`wait`, `todo`, `progress`, `done`, and `cancel`; workplan/workstream lifecycle
|
||||
`blocked` remains separate. The State Hub daily triage digest now counts
|
||||
`wait/todo/progress` open tasks and no longer fixtures task-level
|
||||
`in_progress` or `blocked`. Full suite passed: 128 passed, 1 skipped.
|
||||
|
||||
## Daily Triage Observability Runbook
|
||||
|
||||
```task
|
||||
id: ACTIVITY-WP-0006-T02
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "02c34443-0e8d-4f1a-93d9-6c39f07faad7"
|
||||
```
|
||||
@@ -73,11 +79,16 @@ The operator should be able to check:
|
||||
Done when `docs/runbook.md` has a concise daily-triage verification section
|
||||
and any helper command/script is covered by tests or a dry-run path.
|
||||
|
||||
2026-06-04: Completed. Added `scripts/verify_daily_triage.py` with dry-run and
|
||||
live modes, plus `tests/test_daily_triage_verifier.py`. `docs/runbook.md` now
|
||||
covers Temporal schedule/workflow checks, `activity_runs`, State Hub progress,
|
||||
working-memory notes, missed-run `skip` behavior, and LLM timeout budget.
|
||||
|
||||
## Three-Run Calibration Feedback
|
||||
|
||||
```task
|
||||
id: ACTIVITY-WP-0006-T03
|
||||
status: todo
|
||||
status: wait
|
||||
priority: medium
|
||||
state_hub_task_id: "7cbf0a35-71a1-47ac-afc2-f51ad2180fd0"
|
||||
```
|
||||
@@ -96,11 +107,16 @@ Done when the calibration result is recorded in State Hub and the related
|
||||
`CUST-WP-0044` / `CUST-WP-0045` tasks can close based on activity-core runs,
|
||||
not Codex app fallback runs.
|
||||
|
||||
2026-06-04: Waiting on real evidence. The repo now has a verification path for
|
||||
scheduled daily triage runs, but this task still requires three consecutive
|
||||
actual activity-core scheduled runs and State Hub calibration feedback. Local
|
||||
tests cannot substitute for that operational evidence.
|
||||
|
||||
## Rule Action Contract Documentation
|
||||
|
||||
```task
|
||||
id: ACTIVITY-WP-0006-T04
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "c9066d2e-0429-4e14-a68a-8418061ffd8d"
|
||||
```
|
||||
@@ -116,11 +132,16 @@ Also decide and document the naming/semantics mismatch around
|
||||
Done when ADR-003 or a focused follow-up doc contains examples, unsafe cases,
|
||||
and the weekly SBOM staleness definition is cited as the canonical pattern.
|
||||
|
||||
2026-06-04: Completed. Updated ADR-003 with whole-field path rendering,
|
||||
scalar placeholder rendering, unsafe action cases, explicit `for_each` /
|
||||
`bind_as` expansion, the `task_template` naming mismatch, and weekly SBOM
|
||||
staleness as the canonical per-item pattern.
|
||||
|
||||
## Production Alerting And Failure Modes
|
||||
|
||||
```task
|
||||
id: ACTIVITY-WP-0006-T05
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "420ea629-0c20-4d09-9cc1-6b2f32665161"
|
||||
```
|
||||
@@ -139,11 +160,17 @@ Cover:
|
||||
Done when the runbook and metrics/health surface make ordinary failures visible
|
||||
without inspecting a Codex Desktop session.
|
||||
|
||||
2026-06-04: Completed. `docs/runbook.md` now documents Kubernetes worker/API/
|
||||
router health checks, Temporal schedule paused/missing checks, report sink
|
||||
failure behavior, LLM timeout/retry behavior, and page/note/next-session
|
||||
classification. Task emission sink failures now raise from `emit_tasks`, making
|
||||
them visible to Temporal retries instead of warning-only logs.
|
||||
|
||||
## Issue-Core Emission Boundary Verification
|
||||
|
||||
```task
|
||||
id: ACTIVITY-WP-0006-T06
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "78089aef-aba1-42d7-a203-ef80ba6791d9"
|
||||
```
|
||||
@@ -163,6 +190,13 @@ Done when there is a tested or dry-run-verified path from a rule match to a
|
||||
downstream task reference, and activity-core still owns only the spawn audit
|
||||
trail, not task lifecycle state.
|
||||
|
||||
2026-06-04: Completed. Added `docs/issue-core-emission-boundary.md` documenting
|
||||
REST `/issues/` as the current authoritative endpoint, NATS as future work,
|
||||
Railiance `ISSUE_SINK_TYPE=null` dry-run mode, and the fields sent to
|
||||
issue-core versus retained in `task_spawn_log`. Added REST payload and sink
|
||||
failure tests in `tests/test_issue_sink.py`; the existing weekly SBOM integration
|
||||
test remains the dry-run rule-match-to-task-reference proof.
|
||||
|
||||
## Completion Criteria
|
||||
|
||||
- State Hub task-status canon adaptation is complete.
|
||||
|
||||
Reference in New Issue
Block a user