Align activity-core scope boundaries

2026-06-18 15:11:48 +02:00
parent 78eed5f942
commit 977a3bd97f
7 changed files with 530 additions and 56 deletions
--- a/SCOPE.md
+++ b/SCOPE.md
@@ -1,7 +1,7 @@
 ---
 domain: capabilities
 repo: activity-core
-updated: "2026-06-03"
+updated: "2026-06-16"
 ---

 # SCOPE
@@ -16,7 +16,8 @@ updated: "2026-06-03"
 activity-core is the org-wide Event Bridge for the Coulomb organization — a
 rule-governed event loop that receives time-based and domain events, evaluates
 declarative rules and LLM instructions against current org context, and emits
-structured task sets to issue-core.
+structured task, report, and evidence outputs without owning downstream task
+lifecycle.

 ---

@@ -27,8 +28,11 @@ An `ActivityDefinition` (a markdown file checked into a repo) declares a trigger
 resolve before evaluation, and a set of rules and instructions that determine
 what tasks to create. When triggered, a durable Temporal workflow loads the
 definition, resolves context, evaluates the rule/instruction set, and emits task
-creation requests to issue-core. Everything is auditable: the spawn log records
-the triggering event, matched rule, and resulting task references.
+creation requests to issue-core or configured dry-run/audit sinks. Instructions
+may also emit validated reports, and selected context resolvers may emit compact
+non-secret evidence. Everything is auditable: the spawn log records the
+triggering event, matched rule/instruction metadata, model/prompt hash where
+applicable, and resulting task references.

 The two evaluation modes:
 - **Rule** — deterministic condition (sandboxed Python-like DSL) → fixed task
@@ -48,21 +52,33 @@ The two evaluation modes:
  attribute schemas, example payloads, and intent documentation.
  Curator-gating configurable per runtime environment.
 - **Trigger types**: 5-field cron with timezone and misfire policy; one-off
-  scheduled datetime; event-type subscription via NATS.
+  scheduled datetime; event-type subscription via NATS; manual one-shot API
+  trigger; one-shot schedule smoke tests for recurring definitions.
 - **Context resolution adapters**: repo-scoping (repository capability queries),
-  state hub (domain and workstream state), extensible for other sources.
+  State Hub (domain/workstream state, SBOM status, daily triage digest, coding
+  retro read model), and ops inventory (bounded HTTP/HTTPS probes of a
+  non-secret service inventory). The adapter registry is extensible for other
+  sources.
 - **Rule evaluator**: sandboxed AST walker for Python-like boolean expressions
  over event attributes and resolved context. Rule actions support safe
  `context.*` / `event.*` interpolation and explicit `for_each` per-item
  binding. No `exec()`.
 - **Instruction executor**: trusted-field prompt rendering, LLM call via
-  llm-connect, structured output validation, optional curator review queue,
-  and deterministic report sinks.
+  llm-connect, structured output validation, bounded validation-failure
+  artifacts for report instructions, review-required audit metadata, and
+  deterministic report sinks. A real downstream review queue is not implemented
+  in this repo.
 - **Task emission adapter**: abstraction over issue-core; current transport is
-  REST; designed to migrate to NATS subscription without code changes.
+  REST, with `ISSUE_SINK_TYPE=null` for dry-run/audit mode. It is designed to
+  migrate to a durable issue-core-owned NATS command boundary when issue-core
+  provides that contract.
 - **Report sinks**: instruction report outputs can be persisted to bounded
  local working memory and posted as State Hub progress events. These are
  reporting outputs, not task lifecycle ownership.
+- **Ops evidence sinks**: `ops-inventory` context sources can post compact
+  non-secret `ops_inventory_probe` summaries to State Hub. Inter-Hub submission
+  is present only as a gated/deferred sink result until operator-owned
+  `OPS_HUB_KEY` custody and widget mapping are ready.
 - **Spawn audit log**: every task emission recorded with rule/instruction id,
  triggering event id, model and prompt hash (instructions), issue-core task ref.
 - **Webhook receiver**: HTTP endpoint normalising inbound Gitea/GitHub webhook
@@ -84,6 +100,14 @@ The two evaluation modes:
  coordinated changes belong to project-core (future).
 - **Execution of automatable tasks** — Temporal Activities that do real work
  (run a scan, apply a patch, call an API) live in per-repo workers, not here.
+- **General ops execution** — Kubernetes, SSH, tunnel, authenticated service
+  checks, secret custody, OpenBao writes, and Inter-Hub widget/API-key
+  provisioning belong to the owning operational repos and operator workflows.
+  activity-core may record non-secret probe evidence; it must not become the ops
+  control plane.
+- **Service inventory authority** — the Custodian inventory remains owned by
+  the custodian/state-hub surface. activity-core may read a projected
+  non-secret snapshot.
 - **Event broker hosting** — NATS JetStream is org infrastructure; activity-core
  consumes it but does not own its lifecycle.
 - **Temporal server hosting** — activity-core uses the Temporal SDK; the server
@@ -101,6 +125,9 @@ The two evaluation modes:
  structured tasks in the right repos."
 - You need one-off future task scheduling without a separate reminder system.
 - You want an auditable record of what triggered what and why.
+- You need a scheduled, non-secret evidence note proving that declared service
+  endpoints or access paths were observed, without executing privileged ops
+  commands.
 - You are replacing scattered bespoke cron jobs and manual coordination with
  a governed, observable automation layer.

@@ -117,29 +144,45 @@ The two evaluation modes:

 ## Current State

- **Status**: active production-backed service. Foundation, triggers/ops,
-  event bridge, Railiance deployment, and the production service workplans are
-  complete. The stale March WP-0002 handoff note has been reconciled and
-  archived.
+- **Status**: active production-backed service with two visible open gates:
+  `ACTIVITY-WP-0006` still waits on three clean consecutive scheduled daily
+  triage runs and calibration feedback, and `ACTIVITY-WP-0008` is blocked until
+  Helix Forge publishes the upstream `coding_retro` read model needed to enable
+  the Saturday schedule. `ACTIVITY-WP-0007` is finished: the bounded
+  ops-inventory probe/evidence slice has live Railiance evidence.
 - **Implementation**: core is functional. `RunActivityWorkflow`,
-  `TaskExecutorWorkflow` (stub), PostgreSQL schema, Temporal Schedules, NATS
-  Event Router, FastAPI admin API, Prometheus metrics, event type registry,
-  markdown ActivityDefinition parser/sync, rule evaluator, instruction
-  executor, context resolvers, issue sink, report sinks, Kubernetes deployment,
-  and operational runbook are all implemented.
- **Operational proof**: the daily State Hub WSJF triage cutover has completed
-  far enough that activity-core is now the trusted scheduled substrate for the
-  routine report. Recent hardening fixed the State Hub SBOM resolver contract,
-  made slow LLM activity timeouts configurable, and added safe rule action
-  interpolation plus explicit `for_each` binding for per-repo SBOM staleness
-  tasks.
- **Stability**: construction risk has shifted to operational hardening risk.
-  The full test suite passed on 2026-06-03 (`125 passed, 1 skipped`). The
-  remaining work is mostly observability, status-canon adaptation, contract
-  documentation, and broader production adoption rather than first
-  implementation.
- **Next**: `ACTIVITY-WP-0006` — post-triage operational hardening and scope
-  alignment.
+  `TaskExecutorWorkflow` (stub), PostgreSQL schema, Temporal Schedules and smoke
+  schedules, NATS Event Router, FastAPI admin API, Prometheus metrics, event
+  type registry, markdown ActivityDefinition parser/sync, rule evaluator,
+  instruction executor, context resolvers, issue sink, report sinks, ops
+  evidence sink, Kubernetes deployment, and operational runbook are all
+  implemented.
+- **Current definitions**: `weekly-sbom-staleness` is enabled and demonstrates
+  the deterministic rule/fan-out path. `weekly-coding-retro` is present and
+  tested but intentionally disabled until live `coding_retro` evidence exists.
+  Railiance projects the daily State Hub WSJF triage definition and the disabled
+  ops-service-inventory probe definition from the runtime bundle.
+- **Operational proof**: the State Hub daily WSJF triage path has produced
+  validated reports and working-memory notes, but the calibration gate is not
+  closed. A 2026-06-16 recheck found State Hub `daily_triage` progress and
+  working-memory `daily-triage-*` notes only through 2026-06-06, so there is not
+  yet evidence for three clean consecutive scheduled runs after the June 7
+  runtime projection failure. The ops inventory probe path has live fallback
+  evidence in State Hub; Inter-Hub per-entity submission remains deferred.
+- **Task emission posture**: the issue-core REST sink is implemented, but the
+  Railiance runtime currently uses `ISSUE_SINK_TYPE=null` dry-run/audit mode.
+  Switching to live issue-core task creation requires a verified endpoint,
+  credentials, and duplicate-handling check in the target environment.
+- **Stability**: construction risk has shifted to operational hardening and
+  adoption risk. The last recorded full-suite pass in the workplans was
+  2026-06-04 (`128 passed, 1 skipped`), with later targeted coverage added for
+  ops inventory, ops evidence sinks, Railiance projection wiring, and weekly
+  coding retro parsing/rule behavior.
+- **Next**: close `ACTIVITY-WP-0006-T03` with real scheduled-run calibration
+  evidence; close `ACTIVITY-WP-0008-T03` once upstream `coding_retro` publication
+  exists and the dry-run/duplicate check passes; decide when to move selected
+  task/report/evidence sinks from dry-run or fallback mode to their intended
+  live backends.

 ---

@@ -159,9 +202,9 @@ database, the project planner, or a general execution worker. The local
 workplan explicitly rehomes execution responsibility.

 One boundary nuance is now explicit: activity-core may post State Hub progress
-events as a configured report sink. That is acceptable because it records the
-result of an activity-core activation; it is not ownership of State Hub state,
-task lifecycle, or workstream planning.
+events as a configured report or evidence sink. That is acceptable because it
+records the result of an activity-core activation; it is not ownership of State
+Hub state, task lifecycle, or workstream planning.

 The main drift risk is convenience creep: adding direct task tracking,
 project-phase state, or bespoke operational scripts because the Temporal
@@ -169,27 +212,58 @@ substrate is already nearby. Future work should prefer declarative
 ActivityDefinitions, bounded context resolvers, and outbound adapters over
 new one-off control paths.

+## Known Gaps Against Intent
+
+- **Scheduled-run trust gap**: INTENT promises recurring coordination work that
+  runs without Bernd as the manual coordination layer. The daily triage path is
+  implemented, but its current calibration task still lacks three clean
+  consecutive scheduled runs after the June 7 runtime failure. Until that closes,
+  daily triage remains a production-backed capability with an evidence gap, not
+  a fully proven standing substrate.
+- **Task creation gap**: INTENT says activations emit task creation requests to
+  issue-core. The REST sink exists, but Railiance is still in `ISSUE_SINK_TYPE=null`
+  mode. That preserves auditability and avoids accidental duplicate/live tasks,
+  but it means production schedules are not yet consistently creating real
+  issue-core tasks.
+- **Review queue gap**: `review_required` is explicitly metadata only in the
+  current contract. No issue-core review queue integration exists here, so any
+  future queue routing needs a downstream issue-core contract before high-impact
+  instruction outputs rely on it.
+- **Evidence backend posture**: the State Hub fallback evidence path is the
+  accepted current backend for `ops_inventory_probe`. Inter-Hub/ops-hub
+  submission is deliberately deferred behind `OPS_HUB_KEY`, widget mapping, and
+  operator approval, so per-entity ops evidence publication is future work.
+- **Execution-boundary residue**: `TaskExecutorWorkflow` is still registered as
+  a stub that writes a done `task_instances` row. It should remain inert or be
+  removed/re-homed before it attracts real execution work, because execution is
+  explicitly outside activity-core's intent.
+- **API exposure posture**: the FastAPI surface stays ClusterIP-only for now.
+  External ingress remains future work until an authenticated access policy is
+  designed.
+
 ---

 ## How It Fits

 ```
-[NATS JetStream]  ←  publishers: state hub, Gitea webhooks, Temporal signals, cron
+[NATS JetStream]  ←  publishers: State Hub, Gitea webhooks, Temporal signals, cron
       ↓
 [activity-core]   ←  event type registry, rule evaluator, instruction executor
 [activity-core]   →  [issue-core]  →  [repos/services]
-[activity-core]   →  [report sinks]
+[activity-core]   →  [report/evidence sinks]  →  [State Hub / working memory / future Inter-Hub]
 ```

 - **Upstream**: NATS (event bus), Temporal (durable workflow engine), PostgreSQL
-  (definitions and audit log), repo-scoping (context adapter), state hub (context
+  (definitions and audit log), repo-scoping (context adapter), State Hub (context
  adapter and event publisher).
- **Downstream**: issue-core (task management) and configured report sinks.
+- **Downstream**: issue-core (task management) and configured report/evidence sinks.
  Agents and humans pick up tasks from issue-core and do the actual work.
+  Railiance may use the null sink for dry-run/audit mode until live issue-core
+  emission is approved.
 - **Coordinates with**: the state hub delegates maintenance automations to
  activity-core by publishing lifecycle events or by being resolved as context.
-  activity-core may post progress events as report outputs, but it does not own
-  State Hub task/workstream state.
+  activity-core may post progress events as report/evidence outputs, but it
+  does not own State Hub task/workstream state.

 ---

@@ -203,6 +277,11 @@ new one-off control paths.
  by a sandboxed AST walker.
 - **Instruction** — LLM-evaluated task generation with trusted-field prompt
  interpolation and structured output schema enforcement.
+- **Report sink** — configured persistence for instruction reports, currently
+  working-memory markdown notes and State Hub progress events.
+- **Evidence sink** — configured persistence for compact non-secret resolver
+  evidence, currently State Hub progress for ops inventory probes; Inter-Hub is
+  a deferred gated target.
 - **Event type** — a registered, schema-documented category of event (e.g.
  `org.repo.registered`). Publisher-declared; curator-gated per environment.
 - **Spawn audit trail** — activity-core's local record of what tasks were emitted,
@@ -219,8 +298,12 @@ new one-off control paths.
 - `issue-core` (formerly issue-facade) — downstream task management; receives
  all task emission from activity-core.
 - `repo-scoping` — context adapter for repository capability queries.
- `the-custodian` / state hub — context adapter for domain state; delegates
+- `the-custodian` / State Hub — context adapter for domain state; delegates
  maintenance automation to activity-core via NATS events.
+- `llm-connect` — instruction execution backend for judgement-oriented reports
+  such as daily State Hub WSJF triage.
+- `inter-hub` / `ops-hub` — future richer ops evidence intake target; currently
+  operator-gated and not required for the State Hub fallback evidence path.
 - `rules-core` (future extraction) — the rule evaluator and instruction executor
  module, currently in `src/activity_core/rules/`.
 - `project-core` (future) — project and initiative management; will use
@@ -248,7 +331,10 @@ new one-off control paths.
  `src/activity_core/activities.py` (Temporal activities),
  `src/activity_core/event_router.py` (NATS → Temporal),
  `src/activity_core/schedule_manager.py` (Temporal Schedules),
-  `src/activity_core/api.py` (FastAPI admin).
+  `src/activity_core/api.py` (FastAPI admin),
+  `src/activity_core/report_sinks.py` (instruction reports),
+  `src/activity_core/ops_evidence_sinks.py` (ops evidence),
+  and `src/activity_core/context_resolvers/` (external context adapters).
 - Definition files: `event-types/`, `activity-definitions/`, and `tasks/`.
 - Dev environment: `docker-compose.dev.yml` (Temporal + PostgreSQL + NATS).
 - Entry points: `uv run python -m activity_core.worker` (Temporal worker),
@@ -264,6 +350,7 @@ title: Durable event-triggered task factory
 description: >
  Org-wide Event Bridge that receives time-based and domain events, evaluates
  declarative rules and LLM instructions against current org context, and emits
-  structured task sets to issue-core with a full spawn audit trail.
-keywords: [temporal, workflow, event-bridge, task, cron, event, rule, instruction, org-automation]
+  structured task, report, and evidence outputs with a full spawn/report audit
+  trail while leaving task lifecycle ownership downstream.
+keywords: [temporal, workflow, event-bridge, task, report, evidence, cron, event, rule, instruction, org-automation]
 ```