diff --git a/INTENT.md b/INTENT.md new file mode 100644 index 0000000..2c2910d --- /dev/null +++ b/INTENT.md @@ -0,0 +1,130 @@ +--- +domain: capabilities +repo: activity-core +updated: "2026-05-14" +--- + +# INTENT + +> This file explains why activity-core exists — the problem it solves, the +> principle that governs its boundaries, and what it must never become. + +--- + +## Why it exists + +As the Coulomb organization grows — more repositories, services, deployments, +and domains — coordination work that used to happen informally in one person's +head needs a structural home. Recurring maintenance tasks (dependency scans, +SBOM audits, consistency checks) get forgotten or implemented as bespoke cron +jobs scattered across services. Cross-domain events (a new repo registered, a +CVE published, a deployment completed) need coordinated responses that no single +repo is positioned to own. + +activity-core exists so that **the Coulomb org can respond to what is happening +in a structured, auditable, and automation-ready way** — without Bernd being +the manual coordination layer. + +--- + +## The governing principle + +activity-core answers three questions and only three: + +1. **When** — what triggers coordination work? (time, event, or one-off schedule) +2. **What** — given current org context, what work must be created? +3. **Where** — which repo, service, or agent should each work item land? + +It does not execute the work. It does not track task lifecycle. It does not +manage projects or campaigns. Those belong to other systems. + +This constraint is intentional and load-bearing. An orchestrator that also +stores task state, manages project phases, or executes work becomes a God object +— the thing everything depends on and nobody can safely change. activity-core +stays small and focused by refusing those responsibilities. + +--- + +## What it is + +**activity-core is the org-wide Event Bridge for the Coulomb organization.** + +It is an event loop governed by declarative rules and LLM instructions: + +- **Event sources**: cron schedules, one-off future datetimes, NATS domain + events (from the state hub, Gitea, Temporal, and other publishers), and + inbound webhooks from external systems. +- **Context resolution**: before evaluating what to create, activity-core + resolves current org state — repository profiles from repo-scoping, domain + state from the state hub, and other context adapters. +- **Rules and instructions**: deterministic rules (Python-like expression DSL, + evaluated by a sandboxed AST walker) handle cases where the right action is + fully specifiable. LLM instructions handle cases where human-like judgement + is needed to decide what tasks are appropriate. Both are defined as markdown + files, co-located with their intent and debugging guidance (see ACT-ADR-002, + ACT-ADR-003). +- **Task emission**: the output of every activation is a set of task creation + requests sent to issue-core via a task emission adapter. activity-core records + the spawn event (what was created, when, referencing the issue-core task ID) + as an audit trail — not as the authoritative task record. + +--- + +## What it is not + +| Concern | Owner | +|---|---| +| Task lifecycle (create, assign, track, close) | issue-core | +| Project and initiative management | project-core (future) | +| Repository capability profiling | repo-scoping | +| Cross-domain coordination state | state hub | +| Execution of automatable tasks | Temporal workers (per-repo) | +| Event broker infrastructure | NATS (org infrastructure) | + +activity-core does not compete with the state hub — it extends it. The state +hub is a read model of what is and has been; activity-core is the automation +layer that reacts to that state and creates new work. The state hub delegates +maintenance automation to activity-core by publishing lifecycle events on NATS. + +--- + +## What it enables + +When activity-core is in place, Bernd can: + +- Define a rule once — "every Monday, scan all Python repos for dependency + drift and create a task for each one" — and trust it will run without manual + intervention. +- Register an instruction — "when a new repo is registered in the `railiance` + domain, determine the appropriate onboarding tasks based on its profile and + domain standards" — and have an LLM agent make that judgement reproducibly. +- Set up a one-off reminder — "on 2026-09-01, create a review task for the + Q3 architecture retrospective" — without managing a separate reminder system. +- Observe a complete audit trail of every activation: what triggered it, what + rules matched, what tasks were created, and (for instructions) what prompt + and model produced the output. + +The Coulomb org gains **structured, auditable automation** that scales with the +number of repos and domains without scaling the coordination burden on Bernd. + +--- + +## Design values + +**Markdown-as-definition.** Event types, ActivityDefinitions, and task templates +are markdown files checked into repositories. Intent, schema, logic, and +debugging notes live together. Agents and humans can read, write, and review +them without specialist tooling. + +**Rules before instructions.** Deterministic rules are always preferred over LLM +instructions when the condition is fully expressible. Instructions are reserved +for genuine judgement cases. This keeps most automation fast, cheap, testable, +and auditable. + +**No task state ownership.** activity-core holds a spawn audit trail, not task +state. The moment it starts tracking whether tasks are complete, blocked, or +re-assigned, it has become a task database — and that is issue-core's job. + +**Publisher-declared event governance.** Producers of org events register their +event types by committing definition files. Curator review is a configurable +gate per environment — never a permanent bottleneck. diff --git a/SCOPE.md b/SCOPE.md index a934ab6..c6743eb 100644 --- a/SCOPE.md +++ b/SCOPE.md @@ -1,89 +1,213 @@ +--- +domain: capabilities +repo: activity-core +updated: "2026-05-14" +--- + # SCOPE > This file helps you quickly understand what this repository is about, > when it is relevant, and when it is not. -> It is intentionally lightweight and may be incomplete. --- ## One-liner -Event-driven task factory backbone using Temporal — defines ActivityDefinitions that spawn TaskInstances when triggered by cron schedules or domain events, with durable execution and PostgreSQL persistence. +activity-core is the org-wide Event Bridge for the Coulomb organization — a +rule-governed event loop that receives time-based and domain events, evaluates +declarative rules and LLM instructions against current org context, and emits +structured task sets to issue-core. --- ## Core Idea -Activity-core replaces ad-hoc cron jobs and Celery tasks with a durable Temporal-based workflow engine. An `ActivityDefinition` declares what triggers it (cron or event), how to resolve context, and what task templates to spawn. When triggered, a `RunActivityWorkflow` executes durably, creating `TaskInstance` records and run logs. Server crashes replay automatically from Temporal's event log. +An `ActivityDefinition` (a markdown file checked into a repo) declares a trigger +(cron, one-off scheduled datetime, or named event type), context sources to +resolve before evaluation, and a set of rules and instructions that determine +what tasks to create. When triggered, a durable Temporal workflow loads the +definition, resolves context, evaluates the rule/instruction set, and emits task +creation requests to issue-core. Everything is auditable: the spawn log records +the triggering event, matched rule, and resulting task references. + +The two evaluation modes: +- **Rule** — deterministic condition (sandboxed Python-like DSL) → fixed task + templates. Fast, testable, no LLM cost. +- **Instruction** — optional pre-filter condition → LLM prompt with trusted + fields only → structured task list. For cases where the right tasks depend + on context that is easier to describe than to enumerate. --- ## In Scope -- Domain model: ActivityDefinition, EventEnvelope, CronTriggerConfig, EventTriggerConfig, RunActivityWorkflow, TaskInstance -- Trigger types: 5-field cron (with timezone, jitter, misfire policy) or event-based (EventEnvelope type + payload filters) -- EventEnvelope normalization: standard internal event format for all inbound events -- Context resolution: load definition from DB, resolve context sources, evaluate rules -- Task spawning: generate TaskInstance records as child workflows/activities -- Run logging: audit trail of activity executions, task creation, completion +- **ActivityDefinition model**: trigger config (cron / scheduled / event), + context sources, rules (condition + action), instructions (trusted-field + prompt + model + output schema). +- **Event type registry**: publisher-declared markdown definitions with + attribute schemas, example payloads, and intent documentation. + Curator-gating configurable per runtime environment. +- **Trigger types**: 5-field cron with timezone and misfire policy; one-off + scheduled datetime; event-type subscription via NATS. +- **Context resolution adapters**: repo-scoping (repository capability queries), + state hub (domain and workstream state), extensible for other sources. +- **Rule evaluator**: sandboxed AST walker for Python-like boolean expressions + over event attributes and resolved context. No `exec()`. +- **Instruction executor**: trusted-field prompt rendering, LLM call via + llm-connect, structured output validation, optional curator review queue. +- **Task emission adapter**: abstraction over issue-core; current transport is + REST; designed to migrate to NATS subscription without code changes. +- **Spawn audit log**: every task emission recorded with rule/instruction id, + triggering event id, model and prompt hash (instructions), issue-core task ref. +- **Webhook receiver**: HTTP endpoint normalising inbound Gitea/GitHub webhook + payloads to EventEnvelope format and publishing to NATS. +- **Worker and workflow infrastructure**: Temporal-based durable execution — + `RunActivityWorkflow` orchestrates load → resolve → evaluate → emit. +- **REST admin API** (FastAPI): CRUD for ActivityDefinitions, manual trigger, + event type registry queries. +- **Prometheus metrics**: Temporal SDK metrics exposed for scraping. +- **Operational runbook**: `docs/runbook.md`. --- ## Out of Scope -- Temporal server hosting/operations (activity-core consumes Temporal SDK; infra is separate) -- End-user task UI (TaskInstance records are created; presentation is a separate concern) -- Event broker integration (upstream router delivers EventEnvelopes; activity-core receives them) -- Synchronous request-response workflows (Temporal is async-first) +- **Task lifecycle** — creating, assigning, tracking, and closing tasks is + issue-core's responsibility. activity-core holds a spawn audit trail only. +- **Project and initiative management** — phased, completion-gated, multi-step + coordinated changes belong to project-core (future). +- **Execution of automatable tasks** — Temporal Activities that do real work + (run a scan, apply a patch, call an API) live in per-repo workers, not here. +- **Event broker hosting** — NATS JetStream is org infrastructure; activity-core + consumes it but does not own its lifecycle. +- **Temporal server hosting** — activity-core uses the Temporal SDK; the server + runs on Railiance infrastructure (or Docker Compose for dev). +- **End-user task UI** — tasks land in issue-core; presentation is separate. +- **Synchronous request-response patterns** — Temporal is async-first. --- ## Relevant When -- Need durable, time-triggered or event-triggered task generation with crash resilience -- Want an audit trail of activity runs and spawned task instances -- Replacing unreliable cron + Celery patterns with replay-safe Temporal workflows +- You need org-wide recurring maintenance automation (dependency scans, SBOM + checks, staleness audits) without bespoke per-service cron jobs. +- You need reactive task generation: "when X happens across the org, create + structured tasks in the right repos." +- You need one-off future task scheduling without a separate reminder system. +- You want an auditable record of what triggered what and why. +- You are replacing scattered bespoke cron jobs and manual coordination with + a governed, observable automation layer. --- ## Not Relevant When -- Simple system cron jobs with no durability requirement -- Synchronous request-response patterns -- Existing Celery infrastructure is working (migration is non-trivial) +- You need to track whether a task is done, blocked, or reassigned → issue-core. +- You need to coordinate a multi-phase project with dependencies → project-core. +- You need a simple system cron with no durability requirement. +- You need synchronous request-response patterns. --- ## Current State -- Status: concept → planning (pre-alpha) -- Implementation: ~30% — domain model defined (EventEnvelope, ActivityDefinition, trigger configs); PostgreSQL schema planned; Temporal workflows.py + activities.py not yet scaffolded -- Stability: unstable — models are valid but untested in real workflows -- Usage: none yet; proto-planning phase +- **Status**: active — WP-0001 (Foundation) and WP-0002 (Triggers & Ops) complete. +- **Implementation**: core is functional. `RunActivityWorkflow`, `TaskExecutorWorkflow` + (stub), PostgreSQL schema (activity_definitions, activity_runs, task_instances), + Temporal Schedules (cron), NATS Event Router, FastAPI admin API, Prometheus + metrics, and operational runbook are all implemented. +- **Next**: WP-0003 — event type registry, rule/instruction model, task emission + adapter, webhook receiver, one-off `scheduled` trigger type, INTENT.md and + SCOPE.md rewrite (this file). Architecture established in ACT-ADR-001/002/003. +- **Stability**: core workflow is stable; the rule/instruction layer and registry + are not yet implemented. --- ## How It Fits -- Upstream dependencies: Temporal (orchestration engine), PostgreSQL (persistence), ops-bridge (network tunnel to remote Temporal server) -- Downstream consumers: Custodian State Hub tracks activity-core workstreams; tasks spawned feed into other systems -- Often used with: kaizen-agentic (project scaffolding), the-custodian (workstream tracking), railiance (Temporal server may run on Railiance infrastructure) +``` +[NATS JetStream] ← publishers: state hub, Gitea webhooks, Temporal signals, cron + ↓ +[activity-core] ← event type registry, rule evaluator, instruction executor + ↓ +[issue-core] ← task lifecycle, assignment, tracking (Gitea / SQLite / GitHub) + ↓ +[repos/services] ← execution: actual code changes, scans, operations +``` + +- **Upstream**: NATS (event bus), Temporal (durable workflow engine), PostgreSQL + (definitions and audit log), repo-scoping (context adapter), state hub (context + adapter and event publisher). +- **Downstream**: issue-core (task management). Agents and humans pick up tasks + from issue-core and do the actual work. +- **Coordinates with**: the state hub delegates maintenance automations to + activity-core by publishing lifecycle events; activity-core never writes to + the state hub directly. --- ## Terminology -- Preferred terms: ActivityDefinition, EventEnvelope, RunActivityWorkflow, TaskInstance, trigger, misfire policy -- Also known as: "the task factory", "activity backbone" -- Potentially confusing terms: "Activity" is a Temporal concept (a single executable step); "ActivityDefinition" is the app-level record that configures what gets spawned +- **ActivityDefinition** — a markdown file declaring trigger, context sources, + rules, and instructions. The unit of automation policy. +- **EventEnvelope** — the canonical internal event format; normalises all inbound + events (NATS, webhooks, cron signals) to a common structure. +- **Rule** — deterministic condition expression + task template action. Evaluated + by a sandboxed AST walker. +- **Instruction** — LLM-evaluated task generation with trusted-field prompt + interpolation and structured output schema enforcement. +- **Event type** — a registered, schema-documented category of event (e.g. + `org.repo.registered`). Publisher-declared; curator-gated per environment. +- **Spawn audit trail** — activity-core's local record of what tasks were emitted, + to which issue-core backend, and under which rule/instruction. Not the + authoritative task record. +- Potentially confusing: **Activity** (Temporal concept — a single executable + step in a workflow) vs. **ActivityDefinition** (app concept — the policy record + that governs what gets spawned). These are different things. --- ## Related / Overlapping -- `the-custodian` — tracks activity-core workstreams in State Hub (domain: custodian) -- `ops-bridge` — provides network tunnel to remote Temporal server -- `railiance-cluster` / `railiance-platform` — may host Temporal server +- `issue-core` (formerly issue-facade) — downstream task management; receives + all task emission from activity-core. +- `repo-scoping` — context adapter for repository capability queries. +- `the-custodian` / state hub — context adapter for domain state; delegates + maintenance automation to activity-core via NATS events. +- `rules-core` (future extraction) — the rule evaluator and instruction executor + module, currently in `src/activity_core/rules/`. +- `project-core` (future) — project and initiative management; will use + activity-core to generate per-phase task sets. +- `ops-bridge` — SSH tunnel to remote Temporal server on Railiance. + +--- + +## Architecture Decisions + +- `docs/adr/adr-001-event-bridge-architecture.md` — overall Event Bridge pattern, + boundaries, state hub relationship, domain assignment. +- `docs/adr/adr-002-definition-format.md` — markdown-as-definition format, + governance model, event type schema, ActivityDefinition structure. +- `docs/adr/adr-003-rule-instruction-model.md` — Rule DSL, Instruction safety + model, evaluation semantics, audit trail, testing strategy. + +--- + +## Getting Oriented + +- Start with: `INTENT.md` (why), this file (what), `docs/adr/` (decisions). +- Key source files: `src/activity_core/models.py` (domain model), + `src/activity_core/workflows.py` (RunActivityWorkflow), + `src/activity_core/activities.py` (Temporal activities), + `src/activity_core/event_router.py` (NATS → Temporal), + `src/activity_core/schedule_manager.py` (Temporal Schedules), + `src/activity_core/api.py` (FastAPI admin). +- Definition files (WP-0003): `event-types/` and `activity-definitions/` + (not yet created — coming in WP-0003). +- Dev environment: `docker-compose.dev.yml` (Temporal + PostgreSQL + NATS). +- Entry points: `uv run python -m activity_core.worker` (Temporal worker), + `uv run uvicorn activity_core.api:app --port 8010` (admin API). --- @@ -92,14 +216,9 @@ Activity-core replaces ad-hoc cron jobs and Celery tasks with a durable Temporal ```capability type: data title: Durable event-triggered task factory -description: Temporal-based workflow engine that spawns TaskInstance records on cron schedules or domain events with crash resilience and full replay — replaces ad-hoc cron/Celery patterns. -keywords: [temporal, workflow, task, cron, event, durable, taskfactory] +description: > + Org-wide Event Bridge that receives time-based and domain events, evaluates + declarative rules and LLM instructions against current org context, and emits + structured task sets to issue-core with a full spawn audit trail. +keywords: [temporal, workflow, event-bridge, task, cron, event, rule, instruction, org-automation] ``` - ---- - -## Getting Oriented - -- Start with: `CLAUDE.md` (session protocol, repo boundary), `wiki/ActivityCorePlan_chtgpt.md` (proto-architecture) -- Key files / directories: `src/activity_core/models.py` (EventEnvelope, ActivityDefinition), `workplans/` (roadmap), `docker-compose.dev.yml` (local Temporal + PostgreSQL) -- Entry points: models.py is the current starting point; workflows.py does not yet exist