Add safe action interpolation and for_each binding for rule fan-out, update the weekly SBOM definition, cover the new evaluation path, and reconcile activity-core scope/workplans for the State Hub sync.
5.5 KiB
id, type, title, domain, repo, status, owner, topic_slug, created, updated
| id | type | title | domain | repo | status | owner | topic_slug | created | updated |
|---|---|---|---|---|---|---|---|---|---|
| ACTIVITY-WP-0006 | workplan | Post-triage operational hardening | custodian | activity-core | ready | codex | custodian | 2026-06-03 | 2026-06-03 |
ACTIVITY-WP-0006 — Post-triage operational hardening
Context
activity-core has crossed the main construction threshold: Temporal-backed schedules, context resolution, deterministic rules, LLM instructions, report sinks, and the Railiance production service are implemented. The daily State Hub WSJF triage cutover is now trusted enough that activity-core can be treated as the standing scheduled substrate rather than an experiment.
The next work should keep that substrate dependable and aligned with
INTENT.md: activity-core owns when coordination work runs, what task/report
outputs are produced, and where they are emitted. It must not grow into the
task lifecycle database, a project planner, or an execution worker.
Task Status Canon Adaptation
id: ACTIVITY-WP-0006-T01
status: todo
priority: high
Adapt activity-core to State Hub's task status canon:
wait, todo, progress, done, cancel.
Scope:
- update
AGENTS.mdtask-status examples and progression text - update State Hub context resolver task-status filters and digest counters
- keep workstream/workplan lifecycle status separate;
blockedremains valid for workstreams/workplans where State Hub still uses it - update tests that fixture or assert
in_progress/ task-levelblocked - resolve the State Hub interface-change notice only after the repo is adapted
Done when the full test suite passes and activity-core no longer depends on legacy task-status aliases for State Hub API clients or tests.
Daily Triage Observability Runbook
id: ACTIVITY-WP-0006-T02
status: todo
priority: high
Document and, where cheap, automate how to answer "did today's daily triage run happen?"
The operator should be able to check:
- Temporal schedule state and latest workflow history
activity_runsrow for the daily triage ActivityDefinition- State Hub
daily_triageprogress event - working-memory report note
- expected missed-run behavior (
skip, not catch-up) - the configured LLM and Temporal timeout relationship
Done when docs/runbook.md has a concise daily-triage verification section
and any helper command/script is covered by tests or a dry-run path.
Three-Run Calibration Feedback
id: ACTIVITY-WP-0006-T03
status: todo
priority: medium
Collect three consecutive scheduled activity-core daily triage runs and feed the result back into the Custodian WSJF calibration loop.
Assess:
- whether the top recommendations matched actual useful follow-up work
- report length and density
- loose-end detection sensitivity
- stale-but-intentionally-parked work handling
- whether model settings or prompt/schema constraints need adjustment
Done when the calibration result is recorded in State Hub and the related
CUST-WP-0044 / CUST-WP-0045 tasks can close based on activity-core runs,
not Codex app fallback runs.
Rule Action Contract Documentation
id: ACTIVITY-WP-0006-T04
status: todo
priority: medium
Document the rule action contract introduced by the ADHOC-2026-06-01 work:
whole-field context.* / event.* paths, scalar {context.foo} placeholders,
and explicit for_each / bind_as per-item expansion.
Also decide and document the naming/semantics mismatch around
action.task_template: today it is the emitted task title field, while
tasks/*.md contains template files with their own title templates.
Done when ADR-003 or a focused follow-up doc contains examples, unsafe cases, and the weekly SBOM staleness definition is cited as the canonical pattern.
Production Alerting And Failure Modes
id: ACTIVITY-WP-0006-T05
status: todo
priority: medium
Turn the current confidence in the daily triage schedule into routine operational visibility.
Cover:
- Kubernetes/Temporal worker health expectations
- schedule paused/missing detection
- report sink failure behavior
- LLM timeout and retry behavior
- what should page, what should only leave a progress note, and what should be handled in the next operator session
Done when the runbook and metrics/health surface make ordinary failures visible without inspecting a Codex Desktop session.
Issue-Core Emission Boundary Verification
id: ACTIVITY-WP-0006-T06
status: todo
priority: medium
Verify the downstream task emission boundary now that rule fan-out is real.
Questions to close:
- which issue-core endpoint is authoritative for task creation in the current environment
- whether
IssueCoreRestSinkshould keep using REST or move to the intended NATS subscription path - whether emitted rule tasks carry enough title, description, labels, source id, condition, and target repo data for issue-core and operators
- whether weekly SBOM staleness can be safely enabled against the real sink
Done when there is a tested or dry-run-verified path from a rule match to a downstream task reference, and activity-core still owns only the spawn audit trail, not task lifecycle state.
Completion Criteria
- State Hub task-status canon adaptation is complete.
- Daily triage has an operator-grade verification path and three-run calibration evidence.
- Rule action semantics are documented and no longer surprising.
- Production failure modes are observable enough for routine operation.
- Downstream task emission has been verified without expanding activity-core's ownership boundary.