Files
activity-core/SCOPE.md

357 lines
18 KiB
Markdown

---
domain: capabilities
repo: activity-core
updated: "2026-06-16"
---
# SCOPE
> This file helps you quickly understand what this repository is about,
> when it is relevant, and when it is not.
---
## One-liner
activity-core is the org-wide Event Bridge for the Coulomb organization — a
rule-governed event loop that receives time-based and domain events, evaluates
declarative rules and LLM instructions against current org context, and emits
structured task, report, and evidence outputs without owning downstream task
lifecycle.
---
## Core Idea
An `ActivityDefinition` (a markdown file checked into a repo) declares a trigger
(cron, one-off scheduled datetime, or named event type), context sources to
resolve before evaluation, and a set of rules and instructions that determine
what tasks to create. When triggered, a durable Temporal workflow loads the
definition, resolves context, evaluates the rule/instruction set, and emits task
creation requests to issue-core or configured dry-run/audit sinks. Instructions
may also emit validated reports, and selected context resolvers may emit compact
non-secret evidence. Everything is auditable: the spawn log records the
triggering event, matched rule/instruction metadata, model/prompt hash where
applicable, and resulting task references.
The two evaluation modes:
- **Rule** — deterministic condition (sandboxed Python-like DSL) → fixed task
templates. Fast, testable, no LLM cost.
- **Instruction** — optional pre-filter condition → LLM prompt with trusted
fields only → structured task list. For cases where the right tasks depend
on context that is easier to describe than to enumerate.
---
## In Scope
- **ActivityDefinition model**: trigger config (cron / scheduled / event),
context sources, rules (condition + action), instructions (trusted-field
prompt + model + output schema).
- **Event type registry**: publisher-declared markdown definitions with
attribute schemas, example payloads, and intent documentation.
Curator-gating configurable per runtime environment.
- **Trigger types**: 5-field cron with timezone and misfire policy; one-off
scheduled datetime; event-type subscription via NATS; manual one-shot API
trigger; one-shot schedule smoke tests for recurring definitions.
- **Context resolution adapters**: repo-scoping (repository capability queries),
State Hub (domain/workstream state, SBOM status, daily triage digest, coding
retro read model), and ops inventory (bounded HTTP/HTTPS probes of a
non-secret service inventory). The adapter registry is extensible for other
sources.
- **Rule evaluator**: sandboxed AST walker for Python-like boolean expressions
over event attributes and resolved context. Rule actions support safe
`context.*` / `event.*` interpolation and explicit `for_each` per-item
binding. No `exec()`.
- **Instruction executor**: trusted-field prompt rendering, LLM call via
llm-connect, structured output validation, bounded validation-failure
artifacts for report instructions, review-required audit metadata, and
deterministic report sinks. A real downstream review queue is not implemented
in this repo.
- **Task emission adapter**: abstraction over issue-core; current transport is
REST, with `ISSUE_SINK_TYPE=null` for dry-run/audit mode. It is designed to
migrate to a durable issue-core-owned NATS command boundary when issue-core
provides that contract.
- **Report sinks**: instruction report outputs can be persisted to bounded
local working memory and posted as State Hub progress events. These are
reporting outputs, not task lifecycle ownership.
- **Ops evidence sinks**: `ops-inventory` context sources can post compact
non-secret `ops_inventory_probe` summaries to State Hub. Inter-Hub submission
is present only as a gated/deferred sink result until operator-owned
`OPS_HUB_KEY` custody and widget mapping are ready.
- **Spawn audit log**: every task emission recorded with rule/instruction id,
triggering event id, model and prompt hash (instructions), issue-core task ref.
- **Webhook receiver**: HTTP endpoint normalising inbound Gitea/GitHub webhook
payloads to EventEnvelope format and publishing to NATS.
- **Worker and workflow infrastructure**: Temporal-based durable execution —
`RunActivityWorkflow` orchestrates load → resolve → evaluate → emit.
- **REST admin API** (FastAPI): CRUD for ActivityDefinitions, manual trigger,
event type registry queries.
- **Prometheus metrics**: Temporal SDK metrics exposed for scraping.
- **Operational runbook**: `docs/runbook.md`.
---
## Out of Scope
- **Task lifecycle** — creating, assigning, tracking, and closing tasks is
issue-core's responsibility. activity-core holds a spawn audit trail only.
- **Project and initiative management** — phased, completion-gated, multi-step
coordinated changes belong to project-core (future).
- **Execution of automatable tasks** — Temporal Activities that do real work
(run a scan, apply a patch, call an API) live in per-repo workers, not here.
- **General ops execution** — Kubernetes, SSH, tunnel, authenticated service
checks, secret custody, OpenBao writes, and Inter-Hub widget/API-key
provisioning belong to the owning operational repos and operator workflows.
activity-core may record non-secret probe evidence; it must not become the ops
control plane.
- **Service inventory authority** — the Custodian inventory remains owned by
the custodian/state-hub surface. activity-core may read a projected
non-secret snapshot.
- **Event broker hosting** — NATS JetStream is org infrastructure; activity-core
consumes it but does not own its lifecycle.
- **Temporal server hosting** — activity-core uses the Temporal SDK; the server
runs on Railiance infrastructure (or Docker Compose for dev).
- **End-user task UI** — tasks land in issue-core; presentation is separate.
- **Synchronous request-response patterns** — Temporal is async-first.
---
## Relevant When
- You need org-wide recurring maintenance automation (dependency scans, SBOM
checks, staleness audits) without bespoke per-service cron jobs.
- You need reactive task generation: "when X happens across the org, create
structured tasks in the right repos."
- You need one-off future task scheduling without a separate reminder system.
- You want an auditable record of what triggered what and why.
- You need a scheduled, non-secret evidence note proving that declared service
endpoints or access paths were observed, without executing privileged ops
commands.
- You are replacing scattered bespoke cron jobs and manual coordination with
a governed, observable automation layer.
---
## Not Relevant When
- You need to track whether a task is done, blocked, or reassigned → issue-core.
- You need to coordinate a multi-phase project with dependencies → project-core.
- You need a simple system cron with no durability requirement.
- You need synchronous request-response patterns.
---
## Current State
- **Status**: active production-backed service with two visible open gates:
`ACTIVITY-WP-0006` still waits on three clean consecutive scheduled daily
triage runs and calibration feedback, and `ACTIVITY-WP-0008` is blocked until
Helix Forge publishes the upstream `coding_retro` read model needed to enable
the Saturday schedule. `ACTIVITY-WP-0007` is finished: the bounded
ops-inventory probe/evidence slice has live Railiance evidence.
- **Implementation**: core is functional. `RunActivityWorkflow`,
`TaskExecutorWorkflow` (stub), PostgreSQL schema, Temporal Schedules and smoke
schedules, NATS Event Router, FastAPI admin API, Prometheus metrics, event
type registry, markdown ActivityDefinition parser/sync, rule evaluator,
instruction executor, context resolvers, issue sink, report sinks, ops
evidence sink, Kubernetes deployment, and operational runbook are all
implemented.
- **Current definitions**: `weekly-sbom-staleness` is enabled and demonstrates
the deterministic rule/fan-out path. `weekly-coding-retro` is present and
tested but intentionally disabled until live `coding_retro` evidence exists.
Railiance projects the daily State Hub WSJF triage definition and the disabled
ops-service-inventory probe definition from the runtime bundle.
- **Operational proof**: the State Hub daily WSJF triage path has produced
validated reports and working-memory notes, but the calibration gate is not
closed. A 2026-06-16 recheck found State Hub `daily_triage` progress and
working-memory `daily-triage-*` notes only through 2026-06-06, so there is not
yet evidence for three clean consecutive scheduled runs after the June 7
runtime projection failure. The ops inventory probe path has live fallback
evidence in State Hub; Inter-Hub per-entity submission remains deferred.
- **Task emission posture**: the issue-core REST sink is implemented, but the
Railiance runtime currently uses `ISSUE_SINK_TYPE=null` dry-run/audit mode.
Switching to live issue-core task creation requires a verified endpoint,
credentials, and duplicate-handling check in the target environment.
- **Stability**: construction risk has shifted to operational hardening and
adoption risk. The last recorded full-suite pass in the workplans was
2026-06-04 (`128 passed, 1 skipped`), with later targeted coverage added for
ops inventory, ops evidence sinks, Railiance projection wiring, and weekly
coding retro parsing/rule behavior.
- **Next**: close `ACTIVITY-WP-0006-T03` with real scheduled-run calibration
evidence; close `ACTIVITY-WP-0008-T03` once upstream `coding_retro` publication
exists and the dry-run/duplicate check passes; decide when to move selected
task/report/evidence sinks from dry-run or fallback mode to their intended
live backends.
---
## Assessment Against Intent
activity-core now matches the core intent: it answers **when** coordination
work should happen, **what** work should be created from current org context,
and **where** each work item should land. The daily WSJF triage is the clearest
judgement-oriented proof point; weekly SBOM staleness is the clearest
deterministic-rule proof point.
The governing boundary still matters. activity-core should keep owning trigger
durability, context resolution, rule/instruction evaluation, report/task
emission, and spawn/report audit. It should not become the task lifecycle
database, the project planner, or a general execution worker. The local
`TaskExecutorWorkflow` remains a stub and should stay that way unless a future
workplan explicitly rehomes execution responsibility.
One boundary nuance is now explicit: activity-core may post State Hub progress
events as a configured report or evidence sink. That is acceptable because it
records the result of an activity-core activation; it is not ownership of State
Hub state, task lifecycle, or workstream planning.
The main drift risk is convenience creep: adding direct task tracking,
project-phase state, or bespoke operational scripts because the Temporal
substrate is already nearby. Future work should prefer declarative
ActivityDefinitions, bounded context resolvers, and outbound adapters over
new one-off control paths.
## Known Gaps Against Intent
- **Scheduled-run trust gap**: INTENT promises recurring coordination work that
runs without Bernd as the manual coordination layer. The daily triage path is
implemented, but its current calibration task still lacks three clean
consecutive scheduled runs after the June 7 runtime failure. Until that closes,
daily triage remains a production-backed capability with an evidence gap, not
a fully proven standing substrate.
- **Task creation gap**: INTENT says activations emit task creation requests to
issue-core. The REST sink exists, but Railiance is still in `ISSUE_SINK_TYPE=null`
mode. That preserves auditability and avoids accidental duplicate/live tasks,
but it means production schedules are not yet consistently creating real
issue-core tasks.
- **Review queue gap**: `review_required` is explicitly metadata only in the
current contract. No issue-core review queue integration exists here, so any
future queue routing needs a downstream issue-core contract before high-impact
instruction outputs rely on it.
- **Evidence backend posture**: the State Hub fallback evidence path is the
accepted current backend for `ops_inventory_probe`. Inter-Hub/ops-hub
submission is deliberately deferred behind `OPS_HUB_KEY`, widget mapping, and
operator approval, so per-entity ops evidence publication is future work.
- **Execution-boundary residue**: `TaskExecutorWorkflow` is still registered as
a stub that writes a done `task_instances` row. It should remain inert or be
removed/re-homed before it attracts real execution work, because execution is
explicitly outside activity-core's intent.
- **API exposure posture**: the FastAPI surface stays ClusterIP-only for now.
External ingress remains future work until an authenticated access policy is
designed.
---
## How It Fits
```
[NATS JetStream] ← publishers: State Hub, Gitea webhooks, Temporal signals, cron
[activity-core] ← event type registry, rule evaluator, instruction executor
[activity-core] → [issue-core] → [repos/services]
[activity-core] → [report/evidence sinks] → [State Hub / working memory / future Inter-Hub]
```
- **Upstream**: NATS (event bus), Temporal (durable workflow engine), PostgreSQL
(definitions and audit log), repo-scoping (context adapter), State Hub (context
adapter and event publisher).
- **Downstream**: issue-core (task management) and configured report/evidence sinks.
Agents and humans pick up tasks from issue-core and do the actual work.
Railiance may use the null sink for dry-run/audit mode until live issue-core
emission is approved.
- **Coordinates with**: the state hub delegates maintenance automations to
activity-core by publishing lifecycle events or by being resolved as context.
activity-core may post progress events as report/evidence outputs, but it
does not own State Hub task/workstream state.
---
## Terminology
- **ActivityDefinition** — a markdown file declaring trigger, context sources,
rules, and instructions. The unit of automation policy.
- **EventEnvelope** — the canonical internal event format; normalises all inbound
events (NATS, webhooks, cron signals) to a common structure.
- **Rule** — deterministic condition expression + task template action. Evaluated
by a sandboxed AST walker.
- **Instruction** — LLM-evaluated task generation with trusted-field prompt
interpolation and structured output schema enforcement.
- **Report sink** — configured persistence for instruction reports, currently
working-memory markdown notes and State Hub progress events.
- **Evidence sink** — configured persistence for compact non-secret resolver
evidence, currently State Hub progress for ops inventory probes; Inter-Hub is
a deferred gated target.
- **Event type** — a registered, schema-documented category of event (e.g.
`org.repo.registered`). Publisher-declared; curator-gated per environment.
- **Spawn audit trail** — activity-core's local record of what tasks were emitted,
to which issue-core backend, and under which rule/instruction. Not the
authoritative task record.
- Potentially confusing: **Activity** (Temporal concept — a single executable
step in a workflow) vs. **ActivityDefinition** (app concept — the policy record
that governs what gets spawned). These are different things.
---
## Related / Overlapping
- `issue-core` (formerly issue-facade) — downstream task management; receives
all task emission from activity-core.
- `repo-scoping` — context adapter for repository capability queries.
- `the-custodian` / State Hub — context adapter for domain state; delegates
maintenance automation to activity-core via NATS events.
- `llm-connect` — instruction execution backend for judgement-oriented reports
such as daily State Hub WSJF triage.
- `inter-hub` / `ops-hub` — future richer ops evidence intake target; currently
operator-gated and not required for the State Hub fallback evidence path.
- `rules-core` (future extraction) — the rule evaluator and instruction executor
module, currently in `src/activity_core/rules/`.
- `project-core` (future) — project and initiative management; will use
activity-core to generate per-phase task sets.
- `ops-bridge` — SSH tunnel to remote Temporal server on Railiance.
---
## Architecture Decisions
- `docs/adr/adr-001-event-bridge-architecture.md` — overall Event Bridge pattern,
boundaries, state hub relationship, domain assignment.
- `docs/adr/adr-002-definition-format.md` — markdown-as-definition format,
governance model, event type schema, ActivityDefinition structure.
- `docs/adr/adr-003-rule-instruction-model.md` — Rule DSL, Instruction safety
model, evaluation semantics, audit trail, testing strategy.
---
## Getting Oriented
- Start with: `INTENT.md` (why), this file (what), `docs/adr/` (decisions).
- Key source files: `src/activity_core/models.py` (domain model),
`src/activity_core/workflows.py` (RunActivityWorkflow),
`src/activity_core/activities.py` (Temporal activities),
`src/activity_core/event_router.py` (NATS → Temporal),
`src/activity_core/schedule_manager.py` (Temporal Schedules),
`src/activity_core/api.py` (FastAPI admin),
`src/activity_core/report_sinks.py` (instruction reports),
`src/activity_core/ops_evidence_sinks.py` (ops evidence),
and `src/activity_core/context_resolvers/` (external context adapters).
- Definition files: `event-types/`, `activity-definitions/`, and `tasks/`.
- Dev environment: `docker-compose.dev.yml` (Temporal + PostgreSQL + NATS).
- Entry points: `uv run python -m activity_core.worker` (Temporal worker),
`uv run uvicorn activity_core.api:app --port 8010` (admin API).
---
## Provided Capabilities
```capability
type: data
title: Durable event-triggered task factory
description: >
Org-wide Event Bridge that receives time-based and domain events, evaluates
declarative rules and LLM instructions against current org context, and emits
structured task, report, and evidence outputs with a full spawn/report audit
trail while leaving task lifecycle ownership downstream.
keywords: [temporal, workflow, event-bridge, task, report, evidence, cron, event, rule, instruction, org-automation]
```