generated from coulomb/repo-seed
The state-hub resolver was calling GET /sbom/status?repo={slug}, which State
Hub does not expose. Real SBOM routes are /sbom/, /sbom/{slug},
/sbom/snapshots/, /sbom/snapshots/{id}, /sbom/ingest/, /sbom/report/licences/.
The weekly-sbom-staleness ActivityDefinition was passing params {repos: all}
and the resolver was reading params.get("repo_slug", ""), so the URL
collapsed to /sbom/status?repo= and 404'd. _fetch_json swallowed the error,
the rule context.repos.sbom_age_days > 30 evaluated against {} and never
matched, and the weekly SBOM check has been a silent no-op for as long as
the route mismatch has existed.
Resolver now supports two modes selected by params:
- single-repo: {repo_slug: foo} → GET /sbom/{foo}, returns
{repo_slug, last_sbom_at, sbom_age_days, has_sbom}
- bulk: {repos: all} → GET /repos/, computes per-repo age, returns the
worst repo's fields hoisted to the top of the result alongside
stale_count, total_count, worst_* fields, and the full per-repo list
Never-scanned repos get a 99999 sentinel age so threshold rules treat
them as very stale without forcing the rule to special-case None.
Hoisting the worst entry to the top preserves the existing rule
expression context.repos.sbom_age_days > 30 (and target_repo:
context.repos.repo_slug, though that field is a separate interpolation
gap tracked as ADHOC-2026-06-01-T02). The integration tests'
aspirational per-repo iteration model is left intact.
Live validation against State Hub on 2026-06-01:
- single: activity-core → 36 days since 2026-04-26 ingest
- bulk: 48 repos total, 46 stale (>30d), worst is info-tech-canon (never
scanned), rule expression evaluates True
Tests: 120 passed, 1 skipped.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
170 lines
7.4 KiB
Markdown
170 lines
7.4 KiB
Markdown
---
|
|
id: ADHOC-2026-06-01
|
|
type: workplan
|
|
title: "Ad hoc — activity-core opportunistic fixes 2026-06-01"
|
|
domain: custodian
|
|
repo: activity-core
|
|
status: ready
|
|
owner: custodian
|
|
topic_slug: custodian
|
|
created: "2026-06-01"
|
|
updated: "2026-06-01"
|
|
state_hub_workstream_id: "36162ff0-9b47-47c4-8602-56767f9b7a1c"
|
|
---
|
|
|
|
# ADHOC-2026-06-01 — activity-core opportunistic fixes
|
|
|
|
Captured during the CUST-WP-0045 T06 cutover prep session. The dev worker was
|
|
brought up and surfaced an unrelated, pre-existing bug in the state-hub
|
|
context resolver that is independent of the daily triage canary.
|
|
|
|
## Tasks
|
|
|
|
### T01 - Fix repo_sbom_status resolver route and params
|
|
|
|
```task
|
|
id: ADHOC-2026-06-01-T01
|
|
status: done
|
|
priority: low
|
|
state_hub_task_id: "87b56da9-e692-4350-9aff-47080414ec06"
|
|
```
|
|
|
|
`src/activity_core/context_resolvers/state_hub.py` resolves
|
|
`query: repo_sbom_status` by calling `GET /sbom/status?repo={repo_slug}`, but
|
|
State Hub does not expose `/sbom/status` at all. Actual SBOM routes are
|
|
`/sbom/`, `/sbom/{repo_slug}`, `/sbom/snapshots/`, `/sbom/snapshots/{id}`,
|
|
`/sbom/ingest/`, `/sbom/report/licences/`.
|
|
|
|
Compounding bug: the only ActivityDefinition using this query is
|
|
`activity-definitions/weekly-sbom-staleness.md`, which passes
|
|
`params: { repos: all }`. The resolver reads `params.get("repo_slug", "")`,
|
|
so the lookup URL collapses to `/sbom/status?repo=` regardless of the
|
|
ActivityDefinition value.
|
|
|
|
Symptom: every Monday at 09:00 Europe/Berlin (and on worker startup after a
|
|
missed Monday tick), the `weekly-sbom-staleness` workflow runs and the
|
|
resolver logs `HTTP/1.1 404 Not Found` for `GET /sbom/status?repo=`. The
|
|
`_fetch_json` helper swallows the error and returns `{}`, so the workflow
|
|
continues but the downstream rule evaluates
|
|
`context.repos.sbom_age_days > 30` against an empty dict and never spawns
|
|
the intended SBOM rescan tasks. The weekly SBOM staleness check has been
|
|
silently no-op for as long as this route mismatch has existed.
|
|
|
|
Fix scope:
|
|
|
|
1. Decide the contract — single-repo lookup (current parameter shape suggests
|
|
this) versus multi-repo bulk lookup (`repos: all` suggests this).
|
|
2. Update the resolver to call the actual State Hub route(s):
|
|
- single repo: `GET /sbom/{repo_slug}` (or `/sbom/{repo_slug}/status` if a
|
|
status-shaped projection is preferred and exists).
|
|
- bulk: iterate the State Hub `/repos/` list and call `/sbom/{repo_slug}`
|
|
per repo, returning a list bound to `context.repos`.
|
|
3. Update `activity-definitions/weekly-sbom-staleness.md` to match: either pass
|
|
a real `repo_slug` per definition (multiple definitions, one per repo) or
|
|
keep `repos: all` and let the resolver fan out.
|
|
4. Update the rule expression to traverse the resulting shape — currently
|
|
`context.repos.sbom_age_days` assumes a single object; if the resolver
|
|
returns a list, the rule needs `any(repo.sbom_age_days > 30 for repo in
|
|
context.repos)` or an equivalent per-repo evaluation.
|
|
5. Add a resolver unit test that asserts the resolver hits a route State Hub
|
|
actually serves, and an integration test against a fixture State Hub
|
|
response so this regression cannot repeat.
|
|
|
|
Out of scope for this adhoc:
|
|
|
|
- Decoupling SBOM staleness rules from the state hub resolver.
|
|
- Rewriting the SBOM ingestion pipeline or `sbom_source` policy.
|
|
- Promoting this to a full workplan unless the multi-repo decision turns out
|
|
to need design discussion.
|
|
|
|
Done when `weekly-sbom-staleness` runs cleanly against a live State Hub on
|
|
Monday and either spawns SBOM rescan tasks for stale repos or leaves a clear
|
|
"all SBOMs fresh" audit row — not a 404 log line and a silent no-op.
|
|
|
|
**Completion — 2026-06-01:**
|
|
|
|
Resolver now supports two modes selected by params:
|
|
- single-repo: `params: {repo_slug: foo}` → `GET /sbom/{foo}`
|
|
- bulk: `params: {repos: all}` → `GET /repos/`, computes per-repo age,
|
|
returns the worst-repo fields hoisted to the top of the result alongside
|
|
`stale_count`, `total_count`, `worst_*` fields, and the full per-repo list
|
|
|
|
Never-scanned repos use a `99999` sentinel age so threshold rules treat them
|
|
as very stale without forcing the rule expression to special-case `None`.
|
|
|
|
`activity-definitions/weekly-sbom-staleness.md` kept its existing rule
|
|
expression `context.repos.sbom_age_days > 30` (the resolver hoists the worst
|
|
repo's age to that path). The definition now documents that the rule fires
|
|
at most once per workflow run, not once per stale repo, and that the
|
|
aspirational per-stale-repo fan-out exercised by the integration tests is
|
|
not delivered by the current workflow.
|
|
|
|
Live validation against the running State Hub on 2026-06-01:
|
|
- single: `activity-core` → 36 days since SBOM ingest at 2026-04-26
|
|
- bulk: 48 repos total, 46 stale (>30d); worst is `info-tech-canon`
|
|
(`last_sbom_at: null` → 99999d sentinel); rule expression evaluates True
|
|
|
|
Tests: `uv run pytest -q` → 120 passed, 1 skipped (previously 116 passed +
|
|
4 broken integration tests; broken-on-my-change reverted by hoisting the
|
|
worst-repo fields to the top of `context.repos`).
|
|
|
|
### T02 - Rule action context interpolation and per-iteration binding
|
|
|
|
```task
|
|
id: ADHOC-2026-06-01-T02
|
|
status: todo
|
|
priority: low
|
|
state_hub_task_id: "6b3a185e-cbea-454c-82fb-8b4c16cefef0"
|
|
```
|
|
|
|
Discovered while completing T01: `RunActivityWorkflow` builds each
|
|
`TaskSpec` by lifting raw YAML fields out of the rule action without ever
|
|
interpolating `context.*` references:
|
|
|
|
```python
|
|
# src/activity_core/workflows.py
|
|
task_spec_dicts.append({
|
|
"title": action.get("task_template", rule.get("id", "")),
|
|
"target_repo": action.get("target_repo"),
|
|
...
|
|
})
|
|
```
|
|
|
|
So `target_repo: context.repos.repo_slug` in an ActivityDefinition rule is
|
|
emitted to the spawn log as the literal string `"context.repos.repo_slug"`,
|
|
not the actual stale repo slug. The aspirational per-stale-repo fan-out
|
|
exercised by `test_pipeline_emits_one_task_for_stale_repo_only` and friends
|
|
in `tests/test_integration_event_bridge.py` is *not* delivered by the
|
|
workflow — those tests simulate a per-repo iteration the real workflow
|
|
does not perform.
|
|
|
|
Two pieces of work, likely related:
|
|
|
|
1. **Action field interpolation.** Define and implement a safe template
|
|
grammar for `action.target_repo`, `action.task_template`,
|
|
`action.priority`, `action.labels`, etc. Reuse the rule-condition AST
|
|
walker (no `exec`, no comprehensions) or a constrained string
|
|
`{context.foo.bar}` substitution. Decide on grammar — instruction
|
|
prompt rendering uses `{...}` placeholders today
|
|
(`rules/executor.py::_render_prompt`); consistent with that is probably
|
|
right.
|
|
|
|
2. **Per-iteration context binding.** Decide whether the workflow should
|
|
evaluate a rule once per element of a list-valued context field (the
|
|
integration-test contract), or whether the spawn-once semantics is
|
|
actually desired and the tests should be relaxed. If iteration is the
|
|
answer, the resolver shape from T01 already gives a clean `repos` list
|
|
to iterate over; the workflow would need an explicit `for_each:`
|
|
directive on the rule, or implicit iteration when `condition` references
|
|
a list element.
|
|
|
|
This is borderline workplan-grade work (design decision + security review of
|
|
the interpolation grammar + workflow change + test updates). Promote to a
|
|
full workplan if anyone decides to actually do it; the adhoc T02 is just to
|
|
make sure the gap doesn't get forgotten.
|
|
|
|
Done when either: (a) rule action fields interpolate `context.*`
|
|
expressions and a stale-repo workflow run emits a TaskSpec with the actual
|
|
repo slug, or (b) a recorded decision explicitly defers/declines the change
|
|
with reasoning.
|