Files

tegwick 977a3bd97f Align activity-core scope boundaries

2026-06-18 15:11:48 +02:00

18 KiB

Raw Blame History

domain, repo, updated

domain	repo	updated
capabilities	activity-core	2026-06-16

SCOPE

This file helps you quickly understand what this repository is about, when it is relevant, and when it is not.

One-liner

activity-core is the org-wide Event Bridge for the Coulomb organization — a rule-governed event loop that receives time-based and domain events, evaluates declarative rules and LLM instructions against current org context, and emits structured task, report, and evidence outputs without owning downstream task lifecycle.

Core Idea

An ActivityDefinition (a markdown file checked into a repo) declares a trigger (cron, one-off scheduled datetime, or named event type), context sources to resolve before evaluation, and a set of rules and instructions that determine what tasks to create. When triggered, a durable Temporal workflow loads the definition, resolves context, evaluates the rule/instruction set, and emits task creation requests to issue-core or configured dry-run/audit sinks. Instructions may also emit validated reports, and selected context resolvers may emit compact non-secret evidence. Everything is auditable: the spawn log records the triggering event, matched rule/instruction metadata, model/prompt hash where applicable, and resulting task references.

The two evaluation modes:

Rule — deterministic condition (sandboxed Python-like DSL) → fixed task templates. Fast, testable, no LLM cost.
Instruction — optional pre-filter condition → LLM prompt with trusted fields only → structured task list. For cases where the right tasks depend on context that is easier to describe than to enumerate.

In Scope

ActivityDefinition model: trigger config (cron / scheduled / event), context sources, rules (condition + action), instructions (trusted-field prompt + model + output schema).
Event type registry: publisher-declared markdown definitions with attribute schemas, example payloads, and intent documentation. Curator-gating configurable per runtime environment.
Trigger types: 5-field cron with timezone and misfire policy; one-off scheduled datetime; event-type subscription via NATS; manual one-shot API trigger; one-shot schedule smoke tests for recurring definitions.
Context resolution adapters: repo-scoping (repository capability queries), State Hub (domain/workstream state, SBOM status, daily triage digest, coding retro read model), and ops inventory (bounded HTTP/HTTPS probes of a non-secret service inventory). The adapter registry is extensible for other sources.
Rule evaluator: sandboxed AST walker for Python-like boolean expressions over event attributes and resolved context. Rule actions support safe context.* / event.* interpolation and explicit for_each per-item binding. No exec().
Instruction executor: trusted-field prompt rendering, LLM call via llm-connect, structured output validation, bounded validation-failure artifacts for report instructions, review-required audit metadata, and deterministic report sinks. A real downstream review queue is not implemented in this repo.
Task emission adapter: abstraction over issue-core; current transport is REST, with ISSUE_SINK_TYPE=null for dry-run/audit mode. It is designed to migrate to a durable issue-core-owned NATS command boundary when issue-core provides that contract.
Report sinks: instruction report outputs can be persisted to bounded local working memory and posted as State Hub progress events. These are reporting outputs, not task lifecycle ownership.
Ops evidence sinks: ops-inventory context sources can post compact non-secret ops_inventory_probe summaries to State Hub. Inter-Hub submission is present only as a gated/deferred sink result until operator-owned OPS_HUB_KEY custody and widget mapping are ready.
Spawn audit log: every task emission recorded with rule/instruction id, triggering event id, model and prompt hash (instructions), issue-core task ref.
Webhook receiver: HTTP endpoint normalising inbound Gitea/GitHub webhook payloads to EventEnvelope format and publishing to NATS.
Worker and workflow infrastructure: Temporal-based durable execution — RunActivityWorkflow orchestrates load → resolve → evaluate → emit.
REST admin API (FastAPI): CRUD for ActivityDefinitions, manual trigger, event type registry queries.
Prometheus metrics: Temporal SDK metrics exposed for scraping.
Operational runbook: docs/runbook.md.

Out of Scope

Task lifecycle — creating, assigning, tracking, and closing tasks is issue-core's responsibility. activity-core holds a spawn audit trail only.
Project and initiative management — phased, completion-gated, multi-step coordinated changes belong to project-core (future).
Execution of automatable tasks — Temporal Activities that do real work (run a scan, apply a patch, call an API) live in per-repo workers, not here.
General ops execution — Kubernetes, SSH, tunnel, authenticated service checks, secret custody, OpenBao writes, and Inter-Hub widget/API-key provisioning belong to the owning operational repos and operator workflows. activity-core may record non-secret probe evidence; it must not become the ops control plane.
Service inventory authority — the Custodian inventory remains owned by the custodian/state-hub surface. activity-core may read a projected non-secret snapshot.
Event broker hosting — NATS JetStream is org infrastructure; activity-core consumes it but does not own its lifecycle.
Temporal server hosting — activity-core uses the Temporal SDK; the server runs on Railiance infrastructure (or Docker Compose for dev).
End-user task UI — tasks land in issue-core; presentation is separate.
Synchronous request-response patterns — Temporal is async-first.

Relevant When

You need org-wide recurring maintenance automation (dependency scans, SBOM checks, staleness audits) without bespoke per-service cron jobs.
You need reactive task generation: "when X happens across the org, create structured tasks in the right repos."
You need one-off future task scheduling without a separate reminder system.
You want an auditable record of what triggered what and why.
You need a scheduled, non-secret evidence note proving that declared service endpoints or access paths were observed, without executing privileged ops commands.
You are replacing scattered bespoke cron jobs and manual coordination with a governed, observable automation layer.

Not Relevant When

You need to track whether a task is done, blocked, or reassigned → issue-core.
You need to coordinate a multi-phase project with dependencies → project-core.
You need a simple system cron with no durability requirement.
You need synchronous request-response patterns.

Current State

Status: active production-backed service with two visible open gates: ACTIVITY-WP-0006 still waits on three clean consecutive scheduled daily triage runs and calibration feedback, and ACTIVITY-WP-0008 is blocked until Helix Forge publishes the upstream coding_retro read model needed to enable the Saturday schedule. ACTIVITY-WP-0007 is finished: the bounded ops-inventory probe/evidence slice has live Railiance evidence.
Implementation: core is functional. RunActivityWorkflow, TaskExecutorWorkflow (stub), PostgreSQL schema, Temporal Schedules and smoke schedules, NATS Event Router, FastAPI admin API, Prometheus metrics, event type registry, markdown ActivityDefinition parser/sync, rule evaluator, instruction executor, context resolvers, issue sink, report sinks, ops evidence sink, Kubernetes deployment, and operational runbook are all implemented.
Current definitions: weekly-sbom-staleness is enabled and demonstrates the deterministic rule/fan-out path. weekly-coding-retro is present and tested but intentionally disabled until live coding_retro evidence exists. Railiance projects the daily State Hub WSJF triage definition and the disabled ops-service-inventory probe definition from the runtime bundle.
Operational proof: the State Hub daily WSJF triage path has produced validated reports and working-memory notes, but the calibration gate is not closed. A 2026-06-16 recheck found State Hub daily_triage progress and working-memory daily-triage-* notes only through 2026-06-06, so there is not yet evidence for three clean consecutive scheduled runs after the June 7 runtime projection failure. The ops inventory probe path has live fallback evidence in State Hub; Inter-Hub per-entity submission remains deferred.
Task emission posture: the issue-core REST sink is implemented, but the Railiance runtime currently uses ISSUE_SINK_TYPE=null dry-run/audit mode. Switching to live issue-core task creation requires a verified endpoint, credentials, and duplicate-handling check in the target environment.
Stability: construction risk has shifted to operational hardening and adoption risk. The last recorded full-suite pass in the workplans was 2026-06-04 (128 passed, 1 skipped), with later targeted coverage added for ops inventory, ops evidence sinks, Railiance projection wiring, and weekly coding retro parsing/rule behavior.
Next: close ACTIVITY-WP-0006-T03 with real scheduled-run calibration evidence; close ACTIVITY-WP-0008-T03 once upstream coding_retro publication exists and the dry-run/duplicate check passes; decide when to move selected task/report/evidence sinks from dry-run or fallback mode to their intended live backends.

Assessment Against Intent

activity-core now matches the core intent: it answers when coordination work should happen, what work should be created from current org context, and where each work item should land. The daily WSJF triage is the clearest judgement-oriented proof point; weekly SBOM staleness is the clearest deterministic-rule proof point.

The governing boundary still matters. activity-core should keep owning trigger durability, context resolution, rule/instruction evaluation, report/task emission, and spawn/report audit. It should not become the task lifecycle database, the project planner, or a general execution worker. The local TaskExecutorWorkflow remains a stub and should stay that way unless a future workplan explicitly rehomes execution responsibility.

One boundary nuance is now explicit: activity-core may post State Hub progress events as a configured report or evidence sink. That is acceptable because it records the result of an activity-core activation; it is not ownership of State Hub state, task lifecycle, or workstream planning.

The main drift risk is convenience creep: adding direct task tracking, project-phase state, or bespoke operational scripts because the Temporal substrate is already nearby. Future work should prefer declarative ActivityDefinitions, bounded context resolvers, and outbound adapters over new one-off control paths.

Known Gaps Against Intent

Scheduled-run trust gap: INTENT promises recurring coordination work that runs without Bernd as the manual coordination layer. The daily triage path is implemented, but its current calibration task still lacks three clean consecutive scheduled runs after the June 7 runtime failure. Until that closes, daily triage remains a production-backed capability with an evidence gap, not a fully proven standing substrate.
Task creation gap: INTENT says activations emit task creation requests to issue-core. The REST sink exists, but Railiance is still in ISSUE_SINK_TYPE=null mode. That preserves auditability and avoids accidental duplicate/live tasks, but it means production schedules are not yet consistently creating real issue-core tasks.
Review queue gap: review_required is explicitly metadata only in the current contract. No issue-core review queue integration exists here, so any future queue routing needs a downstream issue-core contract before high-impact instruction outputs rely on it.
Evidence backend posture: the State Hub fallback evidence path is the accepted current backend for ops_inventory_probe. Inter-Hub/ops-hub submission is deliberately deferred behind OPS_HUB_KEY, widget mapping, and operator approval, so per-entity ops evidence publication is future work.
Execution-boundary residue: TaskExecutorWorkflow is still registered as a stub that writes a done task_instances row. It should remain inert or be removed/re-homed before it attracts real execution work, because execution is explicitly outside activity-core's intent.
API exposure posture: the FastAPI surface stays ClusterIP-only for now. External ingress remains future work until an authenticated access policy is designed.

How It Fits

[NATS JetStream]  ←  publishers: State Hub, Gitea webhooks, Temporal signals, cron
       ↓
[activity-core]   ←  event type registry, rule evaluator, instruction executor
[activity-core]   →  [issue-core]  →  [repos/services]
[activity-core]   →  [report/evidence sinks]  →  [State Hub / working memory / future Inter-Hub]

Upstream: NATS (event bus), Temporal (durable workflow engine), PostgreSQL (definitions and audit log), repo-scoping (context adapter), State Hub (context adapter and event publisher).
Downstream: issue-core (task management) and configured report/evidence sinks. Agents and humans pick up tasks from issue-core and do the actual work. Railiance may use the null sink for dry-run/audit mode until live issue-core emission is approved.
Coordinates with: the state hub delegates maintenance automations to activity-core by publishing lifecycle events or by being resolved as context. activity-core may post progress events as report/evidence outputs, but it does not own State Hub task/workstream state.

Terminology

ActivityDefinition — a markdown file declaring trigger, context sources, rules, and instructions. The unit of automation policy.
EventEnvelope — the canonical internal event format; normalises all inbound events (NATS, webhooks, cron signals) to a common structure.
Rule — deterministic condition expression + task template action. Evaluated by a sandboxed AST walker.
Instruction — LLM-evaluated task generation with trusted-field prompt interpolation and structured output schema enforcement.
Report sink — configured persistence for instruction reports, currently working-memory markdown notes and State Hub progress events.
Evidence sink — configured persistence for compact non-secret resolver evidence, currently State Hub progress for ops inventory probes; Inter-Hub is a deferred gated target.
Event type — a registered, schema-documented category of event (e.g. org.repo.registered). Publisher-declared; curator-gated per environment.
Spawn audit trail — activity-core's local record of what tasks were emitted, to which issue-core backend, and under which rule/instruction. Not the authoritative task record.
Potentially confusing: Activity (Temporal concept — a single executable step in a workflow) vs. ActivityDefinition (app concept — the policy record that governs what gets spawned). These are different things.

issue-core (formerly issue-facade) — downstream task management; receives all task emission from activity-core.
repo-scoping — context adapter for repository capability queries.
the-custodian / State Hub — context adapter for domain state; delegates maintenance automation to activity-core via NATS events.
llm-connect — instruction execution backend for judgement-oriented reports such as daily State Hub WSJF triage.
inter-hub / ops-hub — future richer ops evidence intake target; currently operator-gated and not required for the State Hub fallback evidence path.
rules-core (future extraction) — the rule evaluator and instruction executor module, currently in src/activity_core/rules/.
project-core (future) — project and initiative management; will use activity-core to generate per-phase task sets.
ops-bridge — SSH tunnel to remote Temporal server on Railiance.

Architecture Decisions

docs/adr/adr-001-event-bridge-architecture.md — overall Event Bridge pattern, boundaries, state hub relationship, domain assignment.
docs/adr/adr-002-definition-format.md — markdown-as-definition format, governance model, event type schema, ActivityDefinition structure.
docs/adr/adr-003-rule-instruction-model.md — Rule DSL, Instruction safety model, evaluation semantics, audit trail, testing strategy.

Getting Oriented

Start with: INTENT.md (why), this file (what), docs/adr/ (decisions).
Key source files: src/activity_core/models.py (domain model), src/activity_core/workflows.py (RunActivityWorkflow), src/activity_core/activities.py (Temporal activities), src/activity_core/event_router.py (NATS → Temporal), src/activity_core/schedule_manager.py (Temporal Schedules), src/activity_core/api.py (FastAPI admin), src/activity_core/report_sinks.py (instruction reports), src/activity_core/ops_evidence_sinks.py (ops evidence), and src/activity_core/context_resolvers/ (external context adapters).
Definition files: event-types/, activity-definitions/, and tasks/.
Dev environment: docker-compose.dev.yml (Temporal + PostgreSQL + NATS).
Entry points: uv run python -m activity_core.worker (Temporal worker), uv run uvicorn activity_core.api:app --port 8010 (admin API).

Provided Capabilities

type: data
title: Durable event-triggered task factory
description: >
  Org-wide Event Bridge that receives time-based and domain events, evaluates
  declarative rules and LLM instructions against current org context, and emits
  structured task, report, and evidence outputs with a full spawn/report audit
  trail while leaving task lifecycle ownership downstream.
keywords: [temporal, workflow, event-bridge, task, report, evidence, cron, event, rule, instruction, org-automation]

18 KiB Raw Blame History