generated from coulomb/repo-seed
Expand rule actions for per-repo tasks
Add safe action interpolation and for_each binding for rule fan-out, update the weekly SBOM definition, cover the new evaluation path, and reconcile activity-core scope/workplans for the State Hub sync.
This commit is contained in:
167
workplans/ACTIVITY-WP-0006-post-triage-operational-hardening.md
Normal file
167
workplans/ACTIVITY-WP-0006-post-triage-operational-hardening.md
Normal file
@@ -0,0 +1,167 @@
|
||||
---
|
||||
id: ACTIVITY-WP-0006
|
||||
type: workplan
|
||||
title: "Post-triage operational hardening"
|
||||
domain: custodian
|
||||
repo: activity-core
|
||||
status: ready
|
||||
owner: codex
|
||||
topic_slug: custodian
|
||||
created: "2026-06-03"
|
||||
updated: "2026-06-03"
|
||||
---
|
||||
|
||||
# ACTIVITY-WP-0006 — Post-triage operational hardening
|
||||
|
||||
## Context
|
||||
|
||||
activity-core has crossed the main construction threshold: Temporal-backed
|
||||
schedules, context resolution, deterministic rules, LLM instructions, report
|
||||
sinks, and the Railiance production service are implemented. The daily State
|
||||
Hub WSJF triage cutover is now trusted enough that activity-core can be treated
|
||||
as the standing scheduled substrate rather than an experiment.
|
||||
|
||||
The next work should keep that substrate dependable and aligned with
|
||||
`INTENT.md`: activity-core owns when coordination work runs, what task/report
|
||||
outputs are produced, and where they are emitted. It must not grow into the
|
||||
task lifecycle database, a project planner, or an execution worker.
|
||||
|
||||
## Task Status Canon Adaptation
|
||||
|
||||
```task
|
||||
id: ACTIVITY-WP-0006-T01
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Adapt activity-core to State Hub's task status canon:
|
||||
`wait`, `todo`, `progress`, `done`, `cancel`.
|
||||
|
||||
Scope:
|
||||
- update `AGENTS.md` task-status examples and progression text
|
||||
- update State Hub context resolver task-status filters and digest counters
|
||||
- keep workstream/workplan lifecycle status separate; `blocked` remains valid
|
||||
for workstreams/workplans where State Hub still uses it
|
||||
- update tests that fixture or assert `in_progress` / task-level `blocked`
|
||||
- resolve the State Hub interface-change notice only after the repo is adapted
|
||||
|
||||
Done when the full test suite passes and activity-core no longer depends on
|
||||
legacy task-status aliases for State Hub API clients or tests.
|
||||
|
||||
## Daily Triage Observability Runbook
|
||||
|
||||
```task
|
||||
id: ACTIVITY-WP-0006-T02
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Document and, where cheap, automate how to answer "did today's daily triage
|
||||
run happen?"
|
||||
|
||||
The operator should be able to check:
|
||||
- Temporal schedule state and latest workflow history
|
||||
- `activity_runs` row for the daily triage ActivityDefinition
|
||||
- State Hub `daily_triage` progress event
|
||||
- working-memory report note
|
||||
- expected missed-run behavior (`skip`, not catch-up)
|
||||
- the configured LLM and Temporal timeout relationship
|
||||
|
||||
Done when `docs/runbook.md` has a concise daily-triage verification section
|
||||
and any helper command/script is covered by tests or a dry-run path.
|
||||
|
||||
## Three-Run Calibration Feedback
|
||||
|
||||
```task
|
||||
id: ACTIVITY-WP-0006-T03
|
||||
status: todo
|
||||
priority: medium
|
||||
```
|
||||
|
||||
Collect three consecutive scheduled activity-core daily triage runs and feed
|
||||
the result back into the Custodian WSJF calibration loop.
|
||||
|
||||
Assess:
|
||||
- whether the top recommendations matched actual useful follow-up work
|
||||
- report length and density
|
||||
- loose-end detection sensitivity
|
||||
- stale-but-intentionally-parked work handling
|
||||
- whether model settings or prompt/schema constraints need adjustment
|
||||
|
||||
Done when the calibration result is recorded in State Hub and the related
|
||||
`CUST-WP-0044` / `CUST-WP-0045` tasks can close based on activity-core runs,
|
||||
not Codex app fallback runs.
|
||||
|
||||
## Rule Action Contract Documentation
|
||||
|
||||
```task
|
||||
id: ACTIVITY-WP-0006-T04
|
||||
status: todo
|
||||
priority: medium
|
||||
```
|
||||
|
||||
Document the rule action contract introduced by the ADHOC-2026-06-01 work:
|
||||
whole-field `context.*` / `event.*` paths, scalar `{context.foo}` placeholders,
|
||||
and explicit `for_each` / `bind_as` per-item expansion.
|
||||
|
||||
Also decide and document the naming/semantics mismatch around
|
||||
`action.task_template`: today it is the emitted task title field, while
|
||||
`tasks/*.md` contains template files with their own title templates.
|
||||
|
||||
Done when ADR-003 or a focused follow-up doc contains examples, unsafe cases,
|
||||
and the weekly SBOM staleness definition is cited as the canonical pattern.
|
||||
|
||||
## Production Alerting And Failure Modes
|
||||
|
||||
```task
|
||||
id: ACTIVITY-WP-0006-T05
|
||||
status: todo
|
||||
priority: medium
|
||||
```
|
||||
|
||||
Turn the current confidence in the daily triage schedule into routine
|
||||
operational visibility.
|
||||
|
||||
Cover:
|
||||
- Kubernetes/Temporal worker health expectations
|
||||
- schedule paused/missing detection
|
||||
- report sink failure behavior
|
||||
- LLM timeout and retry behavior
|
||||
- what should page, what should only leave a progress note, and what should be
|
||||
handled in the next operator session
|
||||
|
||||
Done when the runbook and metrics/health surface make ordinary failures visible
|
||||
without inspecting a Codex Desktop session.
|
||||
|
||||
## Issue-Core Emission Boundary Verification
|
||||
|
||||
```task
|
||||
id: ACTIVITY-WP-0006-T06
|
||||
status: todo
|
||||
priority: medium
|
||||
```
|
||||
|
||||
Verify the downstream task emission boundary now that rule fan-out is real.
|
||||
|
||||
Questions to close:
|
||||
- which issue-core endpoint is authoritative for task creation in the current
|
||||
environment
|
||||
- whether `IssueCoreRestSink` should keep using REST or move to the intended
|
||||
NATS subscription path
|
||||
- whether emitted rule tasks carry enough title, description, labels,
|
||||
source id, condition, and target repo data for issue-core and operators
|
||||
- whether weekly SBOM staleness can be safely enabled against the real sink
|
||||
|
||||
Done when there is a tested or dry-run-verified path from a rule match to a
|
||||
downstream task reference, and activity-core still owns only the spawn audit
|
||||
trail, not task lifecycle state.
|
||||
|
||||
## Completion Criteria
|
||||
|
||||
- State Hub task-status canon adaptation is complete.
|
||||
- Daily triage has an operator-grade verification path and three-run
|
||||
calibration evidence.
|
||||
- Rule action semantics are documented and no longer surprising.
|
||||
- Production failure modes are observable enough for routine operation.
|
||||
- Downstream task emission has been verified without expanding activity-core's
|
||||
ownership boundary.
|
||||
@@ -4,11 +4,11 @@ type: workplan
|
||||
title: "Ad hoc — activity-core opportunistic fixes 2026-06-01"
|
||||
domain: custodian
|
||||
repo: activity-core
|
||||
status: ready
|
||||
status: finished
|
||||
owner: custodian
|
||||
topic_slug: custodian
|
||||
created: "2026-06-01"
|
||||
updated: "2026-06-01"
|
||||
updated: "2026-06-03"
|
||||
state_hub_workstream_id: "36162ff0-9b47-47c4-8602-56767f9b7a1c"
|
||||
---
|
||||
|
||||
@@ -112,7 +112,7 @@ worst-repo fields to the top of `context.repos`).
|
||||
|
||||
```task
|
||||
id: ADHOC-2026-06-01-T02
|
||||
status: todo
|
||||
status: done
|
||||
priority: low
|
||||
state_hub_task_id: "6b3a185e-cbea-454c-82fb-8b4c16cefef0"
|
||||
```
|
||||
@@ -168,6 +168,32 @@ expressions and a stale-repo workflow run emits a TaskSpec with the actual
|
||||
repo slug, or (b) a recorded decision explicitly defers/declines the change
|
||||
with reasoning.
|
||||
|
||||
**Completion — 2026-06-03:**
|
||||
|
||||
Implemented explicit rule action expansion in `activity_core.rules.actions`.
|
||||
`evaluate_rules` now returns concrete TaskSpec dictionaries directly, and
|
||||
`RunActivityWorkflow` no longer lifts raw YAML action fields itself.
|
||||
|
||||
Action fields support two safe interpolation forms:
|
||||
- whole-field paths such as `target_repo: context.repo.repo_slug`
|
||||
- scalar placeholders such as `task_template: Run SBOM rescan for {context.repo.repo_slug}`
|
||||
|
||||
Rules may opt into per-item binding with:
|
||||
|
||||
```yaml
|
||||
for_each: context.repos.repos
|
||||
bind_as: repo
|
||||
condition: 'context.repo.sbom_age_days > 30'
|
||||
```
|
||||
|
||||
`activity-definitions/weekly-sbom-staleness.md` now uses that explicit
|
||||
contract, so bulk SBOM staleness evaluation emits one task per stale repo
|
||||
instead of one task for the hoisted worst repo. Tests cover direct action
|
||||
interpolation, `for_each` binding, activity-level rule evaluation, and the
|
||||
weekly SBOM integration path.
|
||||
|
||||
Tests: `PYTHONPATH=src .venv/bin/python -m pytest -q` -> 125 passed, 1 skipped.
|
||||
|
||||
### T03 - Make activity-core's Temporal activity timeout env-configurable
|
||||
|
||||
```task
|
||||
|
||||
@@ -1,11 +1,19 @@
|
||||
---
|
||||
type: session-note
|
||||
created: "2026-03-28"
|
||||
status: handoff
|
||||
updated: "2026-06-03"
|
||||
status: archived
|
||||
---
|
||||
|
||||
# WP-0002 Handoff Note — Continue on CoulombCore
|
||||
|
||||
## Archive note — 2026-06-03
|
||||
|
||||
This handoff note has been reconciled and archived. Its remaining build order
|
||||
is superseded by `custodian-WP-0002-triggers-ops.md`, which is marked done, and
|
||||
by later completed workplans for the event bridge, Railiance operations, and
|
||||
production service. It is no longer an active source of next steps.
|
||||
|
||||
## Context
|
||||
|
||||
Implementing custodian-WP-0002 (Triggers & Ops). Work interrupted on workstation
|
||||
Reference in New Issue
Block a user