Fix repo_sbom_status resolver — close ADHOC-2026-06-01-T01

The state-hub resolver was calling GET /sbom/status?repo={slug}, which State Hub does not expose. Real SBOM routes are /sbom/, /sbom/{slug}, /sbom/snapshots/, /sbom/snapshots/{id}, /sbom/ingest/, /sbom/report/licences/. The weekly-sbom-staleness ActivityDefinition was passing params {repos: all} and the resolver was reading params.get("repo_slug", ""), so the URL collapsed to /sbom/status?repo= and 404'd. _fetch_json swallowed the error, the rule context.repos.sbom_age_days > 30 evaluated against {} and never matched, and the weekly SBOM check has been a silent no-op for as long as the route mismatch has existed. Resolver now supports two modes selected by params: - single-repo: {repo_slug: foo} → GET /sbom/{foo}, returns {repo_slug, last_sbom_at, sbom_age_days, has_sbom} - bulk: {repos: all} → GET /repos/, computes per-repo age, returns the worst repo's fields hoisted to the top of the result alongside stale_count, total_count, worst_* fields, and the full per-repo list Never-scanned repos get a 99999 sentinel age so threshold rules treat them as very stale without forcing the rule to special-case None. Hoisting the worst entry to the top preserves the existing rule expression context.repos.sbom_age_days > 30 (and target_repo: context.repos.repo_slug, though that field is a separate interpolation gap tracked as ADHOC-2026-06-01-T02). The integration tests' aspirational per-repo iteration model is left intact. Live validation against State Hub on 2026-06-01: - single: activity-core → 36 days since 2026-04-26 ingest - bulk: 48 repos total, 46 stale (>30d), worst is info-tech-canon (never scanned), rule expression evaluates True Tests: 120 passed, 1 skipped. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 03:31:56 +02:00
parent 5d3fb33c6b
commit a8d3cc2782
4 changed files with 270 additions and 16 deletions
--- a/workplans/ADHOC-2026-06-01.md
+++ b/workplans/ADHOC-2026-06-01.md
@@ -24,7 +24,7 @@ context resolver that is independent of the daily triage canary.

 ```task
 id: ADHOC-2026-06-01-T01
-status: todo
+status: done
 priority: low
 state_hub_task_id: "87b56da9-e692-4350-9aff-47080414ec06"
 ```
@@ -80,3 +80,90 @@ Out of scope for this adhoc:
 Done when `weekly-sbom-staleness` runs cleanly against a live State Hub on
 Monday and either spawns SBOM rescan tasks for stale repos or leaves a clear
 "all SBOMs fresh" audit row — not a 404 log line and a silent no-op.
+
+**Completion — 2026-06-01:**
+
+Resolver now supports two modes selected by params:
+- single-repo: `params: {repo_slug: foo}` → `GET /sbom/{foo}`
+- bulk: `params: {repos: all}` → `GET /repos/`, computes per-repo age,
+  returns the worst-repo fields hoisted to the top of the result alongside
+  `stale_count`, `total_count`, `worst_*` fields, and the full per-repo list
+
+Never-scanned repos use a `99999` sentinel age so threshold rules treat them
+as very stale without forcing the rule expression to special-case `None`.
+
+`activity-definitions/weekly-sbom-staleness.md` kept its existing rule
+expression `context.repos.sbom_age_days > 30` (the resolver hoists the worst
+repo's age to that path). The definition now documents that the rule fires
+at most once per workflow run, not once per stale repo, and that the
+aspirational per-stale-repo fan-out exercised by the integration tests is
+not delivered by the current workflow.
+
+Live validation against the running State Hub on 2026-06-01:
+- single: `activity-core` → 36 days since SBOM ingest at 2026-04-26
+- bulk: 48 repos total, 46 stale (>30d); worst is `info-tech-canon`
+  (`last_sbom_at: null` → 99999d sentinel); rule expression evaluates True
+
+Tests: `uv run pytest -q` → 120 passed, 1 skipped (previously 116 passed +
+4 broken integration tests; broken-on-my-change reverted by hoisting the
+worst-repo fields to the top of `context.repos`).
+
+### T02 - Rule action context interpolation and per-iteration binding
+
+```task
+id: ADHOC-2026-06-01-T02
+status: todo
+priority: low
+state_hub_task_id: "6b3a185e-cbea-454c-82fb-8b4c16cefef0"
+```
+
+Discovered while completing T01: `RunActivityWorkflow` builds each
+`TaskSpec` by lifting raw YAML fields out of the rule action without ever
+interpolating `context.*` references:
+
+```python
+# src/activity_core/workflows.py
+task_spec_dicts.append({
+    "title": action.get("task_template", rule.get("id", "")),
+    "target_repo": action.get("target_repo"),
+    ...
+})
+```
+
+So `target_repo: context.repos.repo_slug` in an ActivityDefinition rule is
+emitted to the spawn log as the literal string `"context.repos.repo_slug"`,
+not the actual stale repo slug. The aspirational per-stale-repo fan-out
+exercised by `test_pipeline_emits_one_task_for_stale_repo_only` and friends
+in `tests/test_integration_event_bridge.py` is *not* delivered by the
+workflow — those tests simulate a per-repo iteration the real workflow
+does not perform.
+
+Two pieces of work, likely related:
+
+1. **Action field interpolation.** Define and implement a safe template
+   grammar for `action.target_repo`, `action.task_template`,
+   `action.priority`, `action.labels`, etc. Reuse the rule-condition AST
+   walker (no `exec`, no comprehensions) or a constrained string
+   `{context.foo.bar}` substitution. Decide on grammar — instruction
+   prompt rendering uses `{...}` placeholders today
+   (`rules/executor.py::_render_prompt`); consistent with that is probably
+   right.
+
+2. **Per-iteration context binding.** Decide whether the workflow should
+   evaluate a rule once per element of a list-valued context field (the
+   integration-test contract), or whether the spawn-once semantics is
+   actually desired and the tests should be relaxed. If iteration is the
+   answer, the resolver shape from T01 already gives a clean `repos` list
+   to iterate over; the workflow would need an explicit `for_each:`
+   directive on the rule, or implicit iteration when `condition` references
+   a list element.
+
+This is borderline workplan-grade work (design decision + security review of
+the interpolation grammar + workflow change + test updates). Promote to a
+full workplan if anyone decides to actually do it; the adhoc T02 is just to
+make sure the gap doesn't get forgotten.
+
+Done when either: (a) rule action fields interpolate `context.*`
+expressions and a stale-repo workflow run emits a TaskSpec with the actual
+repo slug, or (b) a recorded decision explicitly defers/declines the change
+with reasoning.