diff --git a/docs/adr/adr-001-event-bridge-architecture.md b/docs/adr/adr-001-event-bridge-architecture.md new file mode 100644 index 0000000..e3ea1f0 --- /dev/null +++ b/docs/adr/adr-001-event-bridge-architecture.md @@ -0,0 +1,190 @@ +--- +id: ACT-ADR-001 +type: architecture-decision-record +title: "Activity-Core as Coulomb Org Event Bridge" +status: accepted +decided_by: Bernd Worsch +date: "2026-05-14" +scope: cross-repo +affects: + - activity-core + - the-custodian/state-hub + - issue-facade (→ issue-core) + - repo-scoping +tags: ["architecture", "event-bridge", "activity-core", "orchestration", "event-loop"] +--- + +# ACT-ADR-001: Activity-Core as Coulomb Org Event Bridge + +## Status + +Accepted. + +## Context + +The Coulomb organization's set of repositories, services, and deployments is growing +beyond what a single person can coordinate manually. The state hub tracks cross-domain +state but has no mechanism to automatically respond to it. Recurring maintenance +(dependency scans, SBOM staleness checks, consistency audits) is implemented as +bespoke cron jobs baked into individual services — scattered, hard to audit, and +impossible to govern from a single vantage point. + +Three forces drive the need for a dedicated orchestration layer: + +1. **Scale**: as the repo count grows, manual coordination becomes the bottleneck. +2. **Reactivity**: org-level events (new repo registered, CVE published, deployment + completed) should trigger coordinated responses without human intervention. +3. **Separation of concerns**: the state hub is a read model and should remain one. + It must not accumulate automation logic to avoid becoming a God object. + +## Decision + +**activity-core is the org-wide Event Bridge for the Coulomb organization.** + +Its responsibility is exactly three things: +1. **Receive events** — time-based (cron, one-off scheduled) and domain events (NATS, + Gitea webhooks, state hub lifecycle signals). +2. **Evaluate rules and instructions** — given event payload and resolved context, + determine what work must be created. +3. **Emit task sets** — publish structured task creation requests to issue-core. + +It does not execute work. It does not track task lifecycle. It does not manage projects. + +### Boundary rules + +| Concern | Owner | +|---|---| +| Cross-org task scheduling and reactive automation | **activity-core** | +| Task lifecycle (create, assign, track, close) | **issue-core** | +| Project and initiative management (phased, completion-gated) | **project-core** (future) | +| Repository capability profiling | **repo-scoping** | +| Cross-domain coordination state | **state hub** | +| Execution of automatable tasks | Temporal workers (per-repo) | + +### Event type registry + +Event types are declared by publishers as markdown definition files (see ACT-ADR-002). +Governance is **publisher-declared by default**: a publisher registers its event types +by committing definition files to the event-types registry. In production environments, +a curator gate can be enabled — registry entries must be reviewed before the runtime +accepts events of that type. This is a configuration flag per runtime scope (dev, +staging, prod), not a hard-coded rule. + +### State hub relationship + +The state hub **delegates automation to activity-core** rather than implementing it +internally. Concretely: + +- Maintenance jobs currently baked into the state hub (consistency sync, SBOM + staleness checks) are migrated to ActivityDefinitions in activity-core. +- The state hub becomes a **publisher** of lifecycle events on NATS + (`org.workstream.created`, `org.decision.resolved`, `org.repo.registered`, etc.). +- The state hub does not subscribe to activity-core's output directly; it reads + task state from issue-core when needed. + +This preserves the state hub as a read model and makes activity-core the single +home for automation policy. + +### rules-core: module-first + +The rule and instruction evaluation engine starts as `src/activity_core/rules/` — a +module with a clean internal boundary (no imports from Temporal, Postgres, or FastAPI +within the module). Extraction to a standalone `rules-core` repository happens when a +**second consumer** (e.g. state hub governance, project-core) needs the engine. This +follows the same discipline as the task-flow-engine extraction plan (CUST-TFE-SCOPE). + +### NATS as org infrastructure + +NATS JetStream is promoted from an activity-core internal component to **org-wide +event bus infrastructure**. It runs as a standalone service (not bundled in +activity-core's docker-compose) with its own lifecycle. All services that publish +or subscribe to org events do so via NATS streams. + +### issue-core integration + +activity-core communicates with issue-core via a **task emission adapter** — an +abstraction layer that, in the initial implementation, calls issue-core's REST API. +The adapter interface is defined now; the transport can migrate to NATS subscription +(issue-core subscribes to `task.spawned` events) once issue-core adds that capability. +This avoids hardcoding REST coupling throughout the codebase. + +### Webhook receiver + +A new HTTP endpoint within activity-core accepts inbound webhooks from Gitea (and +later GitHub, other services). It normalises payloads to the canonical EventEnvelope +format, validates against the event type registry, and publishes to NATS. This runs +alongside the existing FastAPI `api.py`. + +### Domain assignment + +activity-core and issue-core are assigned to the **`capabilities`** domain — the +same domain as repo-scoping. These are org-wide infrastructure tools that serve all +domains equally, not artefacts of any single project or custodian's personal workflow. +issue-core is explicitly disassociated from the markitect domain. + +## Trigger types + +Three trigger types are supported: + +| Type | Description | Temporal mechanism | +|---|---|---| +| `cron` | Recurring schedule (5-field cron + timezone + misfire policy) | Temporal Schedule (implemented WP-0002) | +| `event` | React to a named event type on NATS | Temporal workflow started by Event Router | +| `scheduled` | One-off at a future datetime | Temporal Schedule with `remaining_actions: 1` | + +`scheduled` is a new trigger type added in WP-0003. + +## Consequences + +### Immediate + +- activity-core's `INTENT.md` and `SCOPE.md` are rewritten to reflect this architecture. +- The `task_instances` Postgres table is reclassified as a **spawn audit trail** — + it records the act of spawning (what was created, when, which issue-core reference) + but is not the authoritative task record. Authoritative lifecycle state lives in + issue-core. +- A task emission adapter interface (`src/activity_core/issue_sink.py`) replaces any + direct Postgres writes to `task_instances` with calls through the adapter. +- The `TaskExecutorWorkflow` stub from WP-0001 is replaced with the actual adapter + call in WP-0003. + +### Medium term + +- State hub adds NATS publishing to its lifecycle operations. +- Gitea webhook receiver added to activity-core as a new HTTP router. +- Existing state hub maintenance crons are migrated to ActivityDefinitions. +- issue-facade is renamed issue-core and re-registered under the `capabilities` domain. + +### Long term + +- rules-core extracted as a standalone package when a second consumer appears. +- project-core created (depends on task-flow-engine extraction) for multi-phase + initiative management — explicitly out of scope for activity-core. +- NATS gets its own operational runbook and monitoring as org infrastructure. + +## Alternatives Considered + +**State hub absorbs activity-core functionality**: rejected — turns the state hub into +a God object, violates the read-model boundary, and makes automation logic impossible +to test independently. + +**Per-repo automation (GitHub Actions style)**: rejected — cross-repo coordination +requires a single vantage point that can see all repos; per-repo actions can't express +org-level triggers or context. + +**Activity-core as a thin Temporal wrapper only**: rejected — without the event type +registry and rule model, it's just a scheduler. The governance and introspection +properties are the point. + +**Separate rules-core from day one**: rejected — premature extraction adds dependency +management overhead before a second consumer exists. Module-first with a clean boundary +costs nothing and preserves the extraction option. + +## Related + +- ACT-ADR-002 — Event type and ActivityDefinition definition format +- ACT-ADR-003 — Rule vs. Instruction model and DSL +- CUST-ADR-001 — Workplans as repository artefacts (canon/architecture/) +- CUST-TFE-SCOPE-2026-000001 — task-flow-engine extraction plan (canon/projects/) +- activity-core INTENT.md (to be written) +- activity-core WP-0003 (to be written) diff --git a/docs/adr/adr-002-definition-format.md b/docs/adr/adr-002-definition-format.md new file mode 100644 index 0000000..df3776d --- /dev/null +++ b/docs/adr/adr-002-definition-format.md @@ -0,0 +1,356 @@ +--- +id: ACT-ADR-002 +type: architecture-decision-record +title: "Markdown-as-Definition Format for Event Types and ActivityDefinitions" +status: accepted +decided_by: Bernd Worsch +date: "2026-05-14" +scope: cross-repo +affects: + - activity-core + - any event publisher registering event types +tags: ["architecture", "format", "event-type", "activity-definition", "markdown", "documentation"] +--- + +# ACT-ADR-002: Markdown-as-Definition Format + +## Status + +Accepted. + +## Context + +Event type schemas and ActivityDefinition rules need to be understood and authored +by three distinct audiences simultaneously: humans reviewing and debugging automation, +agents creating and modifying definitions at runtime, and machines parsing and +evaluating them. Traditional approaches split these concerns — schemas go in JSON +Schema or YAML, documentation goes in a wiki, logic goes in code — and they drift +apart. A bug in a rule requires cross-referencing three places to understand intent, +check the schema, and read the condition. + +The Custodian ecosystem already uses markdown files with YAML frontmatter as the +authoritative format for workplans, ADRs, SCOPE.md, and INTENT.md — all understood +by humans and agents without additional tooling. The same pattern should apply here. + +## Decision + +**Event type definitions and ActivityDefinitions are markdown files** where machine- +parseable structure (frontmatter YAML and fenced definition blocks) is embedded within +human-readable narrative. Intent, schema, logic, and debugging notes live in one file. + +### Event Type Definition Files + +**Location**: `event-types/{namespace}.{event-name}.md` within the activity-core repo +(or a registered event-types registry repo if volumes justify separation). + +**Naming convention**: `{publisher-domain}.{noun}.{verb}.md`, e.g.: +- `org.repo.registered.md` +- `org.security.cve.published.md` +- `org.workstream.completed.md` + +**Structure**: + +```markdown +--- +id: org.repo.registered +type: event-type +version: "1.0" +publisher: the-custodian/state-hub +governance: publisher-declared # publisher-declared | curated +status: active # active | deprecated | draft +introduced: "2026-05-14" +--- + +# Event: org.repo.registered + +## Intent + +One-paragraph statement of why this event exists and what it signals. +Written for an agent or human who has never seen it before. + +## When Published + +Bulleted list of the exact conditions under which the publisher fires this event. +Be precise — ambiguity here causes missed or duplicate activations. + +## Attributes + +| Attribute | Type | Required | Description | +|---|---|---|---| +| `repo_slug` | string | yes | URL-safe repository identifier | +| `domain` | string | yes | Domain slug the repo is assigned to | +| `tags` | string[] | no | Capability tags set at registration time | +| `registered_at` | datetime | yes | ISO 8601 UTC timestamp | + +## Example Payload + +​```json +{ + "id": "evt-7f3a1b2c", + "type": "org.repo.registered", + "version": "1.0", + "timestamp": "2026-05-14T10:00:00Z", + "publisher": "the-custodian/state-hub", + "attributes": { + "repo_slug": "new-python-service", + "domain": "railiance", + "tags": ["python-service", "fastapi"], + "registered_at": "2026-05-14T10:00:00Z" + } +} +​``` + +## Consumer Notes + +Guidance for agents and humans writing rules against this event type: +- Which attributes are safe for instruction prompts (trusted fields) +- Common misuses or gotchas +- Related events that are often used together + +## Debugging + +What to check when an activity that subscribes to this event does not fire: +- How to verify the event was published (NATS subject, log entry) +- How to inspect the event payload in the registry +- Common schema validation failures +``` + +### Attribute Types + +The type system for event attributes is intentionally small: + +| Type | Notes | +|---|---| +| `string` | UTF-8 string | +| `integer` | 64-bit signed integer | +| `float` | 64-bit float | +| `boolean` | true / false | +| `datetime` | ISO 8601 UTC string in payload, parsed to datetime in evaluator | +| `uuid` | String in payload, validated as UUID v4 | +| `string[]` | JSON array of strings | +| `integer[]` | JSON array of integers | +| `object` | Freeform JSON object — cannot be used in rule conditions; instruction-only | + +`object` type attributes are available to instructions but excluded from rule +conditions deliberately — rules must be deterministic and schema-validatable. + +### ActivityDefinition Files + +**Location**: `activity-definitions/{slug}.md` within the repo that owns the automation. +For org-wide automations: `activity-core/activity-definitions/`. +For domain-specific automations: `{domain-repo}/activity-definitions/`. + +**Structure**: + +```markdown +--- +id: ACT-DEF-onboard-python-repo +type: activity-definition +version: "1.0" +status: active +trigger: + type: event # event | cron | scheduled + event_type: org.repo.registered # for type: event + # cron: "0 9 * * 1" # for type: cron (5-field, UTC) + # timezone: "Europe/Berlin" # optional, cron only + # misfire_policy: skip # skip | catchup | compress (cron only) + # at: "2026-06-01T09:00:00Z" # for type: scheduled (one-off) +context_sources: + - type: repo-scoping + query: repo_profile + bind_to: context.repo_profile + - type: state-hub + query: domain_summary + bind_to: context.domain_summary +governance: publisher-declared +owner: custodian-agent +created: "2026-05-14" +--- + +# ActivityDefinition: Onboard New Python Service + +## Purpose + +One paragraph. What does this automation do and why does it exist? What problem +would accumulate if this automation were turned off? + +## Trigger + +Which event type fires this activity, and under what conditions does it apply? +Cross-reference the event type definition file. + +## Context Sources + +What context is resolved before rules are evaluated? Explain what each source +provides and why it is needed. + +## Rules + +Each rule is a fenced block tagged `rule`. Rules are evaluated in order; all +matching rules fire (not first-match-only). See ACT-ADR-003 for the expression +language specification. + +​```rule +id: create-sbom-scan +condition: '"python-service" in event.attributes.tags' +action: + task_template: tasks/sbom-initial-scan.md + target_repo: event.attributes.repo_slug + priority: high + labels: ["onboarding", "security"] +​``` + +​```rule +id: create-scope-generation +condition: '"python-service" in event.attributes.tags and context.repo_profile.scope_md_exists == false' +action: + task_template: tasks/generate-scope-md.md + target_repo: event.attributes.repo_slug + priority: medium + labels: ["onboarding", "documentation"] +​``` + +## Instructions + +Instructions are evaluated after all rules. An instruction asks an LLM to decide +what additional tasks (if any) to create. See ACT-ADR-003 for safety requirements. + +​```instruction +id: domain-specific-onboarding +condition: 'event.attributes.domain != "test_domain_v2"' +trusted_fields: + - event.attributes.repo_slug + - event.attributes.domain + - event.attributes.tags +model: claude-sonnet-4-6 +review_required: false +prompt: | + A new repository has been registered in the Coulomb organization. + + Repository: {event.attributes.repo_slug} + Domain: {event.attributes.domain} + Tags: {event.attributes.tags} + + Based on the domain's current standards and the repository profile above, + determine what additional domain-specific onboarding tasks should be created + beyond the standard SBOM scan and SCOPE.md generation. Return an empty list + if no additional tasks are warranted. +output_schema: tasks/task-template-list-schema.json +​``` + +## Task Templates + +References to task template files used in rule actions. Each template is a +separate markdown file under `tasks/` that defines the task title, description +template, default labels, and default assignee logic. + +- `tasks/sbom-initial-scan.md` +- `tasks/generate-scope-md.md` + +## Notes + +Operational notes, edge cases, and context that does not fit elsewhere. + +## Debugging + +Checklist for when this ActivityDefinition fires but produces unexpected output: + +1. Was the triggering event published with the correct type and attributes? +2. Do the rule conditions evaluate as expected? (Use `make eval-rule` with a fixture) +3. Is issue-core reachable and configured for the target domain? +4. For instructions: check the audit log for the model response and output validation result. + +## Change History + +- v1.0 (2026-05-14): Initial definition +``` + +### Governance model + +The `governance` field on an event type definition determines how the registry +runtime handles it: + +| Value | Behaviour | +|---|---| +| `publisher-declared` | Accepted immediately on publish; no review required | +| `curated` | Held in `pending` state until a curator approves via registry API | + +The runtime checks the **environment's curator gate configuration** — not just the +file's governance field. An environment configured with `curator_gate: disabled` +treats all event types as `publisher-declared` regardless of the field value. +An environment with `curator_gate: required` treats all event types as `curated` +regardless of the field value. The field is the publisher's declared preference; +the environment config is the enforcement point. + +This means: +- **Dev / integration**: `curator_gate: disabled` — developers and agents iterate + freely; new event types take effect immediately. +- **Staging / production**: `curator_gate: required` — all new event types queue + for curator review before the runtime accepts events of that type. + +### File as source of truth + +Following CUST-ADR-001 (Workplans as Repository Artefacts), definition files are +the canonical source of truth. The activity-core runtime indexes them into its +database on startup and via a sync command. The database is a queryable cache, +not the origin. A definition deleted from the filesystem is disabled at next sync. + +### Task Templates + +Task templates are separate markdown files (`tasks/{slug}.md`) referenced from +ActivityDefinition action blocks. They define: + +```markdown +--- +id: tasks/sbom-initial-scan +type: task-template +--- +# Task: Run Initial SBOM Scan + +## Title template +`Run SBOM scan — {target_repo}` + +## Description template +Initial SBOM scan required for newly registered repository `{target_repo}`. +Run: `make ingest-sbom REPO={target_repo} SCAN=1` + +## Default labels +["sbom", "security", "automated"] + +## Default assignee +None (unassigned) +``` + +This keeps task content editable separately from the routing logic in +ActivityDefinitions. + +## Consequences + +- A new `event-types/` directory in activity-core (and eventually a shared registry) + holds all org event type definitions. +- A new `activity-definitions/` directory in activity-core holds org-wide automations. +- Domain repos may hold their own `activity-definitions/` for domain-specific + automations, scanned by activity-core at sync time. +- The runtime requires a parser for the `rule` and `instruction` fenced blocks. +- SCOPE.md for activity-core must be updated to list these directories. + +## Alternatives Considered + +**Pure JSON Schema for event types, separate wiki for docs**: rejected — documentation +and schema diverge immediately; agents must cross-reference two systems to author +a rule correctly. + +**OpenAPI / AsyncAPI specification**: rejected — those formats are excellent for +API and broker documentation but not designed for co-locating operational intent +and debugging guidance. They are also less readable for non-specialists. + +**Code-only (Python dataclasses for event schemas, Python functions for rules)**: +rejected — requires code deployment for any definition change; agents cannot modify +definitions without write access to the codebase; non-technical stakeholders cannot +review or understand automation policies. + +## Related + +- ACT-ADR-001 — Event Bridge Architecture +- ACT-ADR-003 — Rule vs. Instruction model and DSL +- CUST-ADR-001 — Workplans as repository artefacts diff --git a/docs/adr/adr-003-rule-instruction-model.md b/docs/adr/adr-003-rule-instruction-model.md new file mode 100644 index 0000000..bd6370e --- /dev/null +++ b/docs/adr/adr-003-rule-instruction-model.md @@ -0,0 +1,281 @@ +--- +id: ACT-ADR-003 +type: architecture-decision-record +title: "Rule vs. Instruction Model and Expression DSL" +status: accepted +decided_by: Bernd Worsch +date: "2026-05-14" +scope: cross-repo +affects: + - activity-core + - rules-core (future extraction) +tags: ["architecture", "rules", "instructions", "dsl", "llm", "safety", "evaluation"] +--- + +# ACT-ADR-003: Rule vs. Instruction Model and Expression DSL + +## Status + +Accepted. + +## Context + +ActivityDefinitions need two distinct evaluation modes to cover the full range +of automation scenarios in the Coulomb org: + +**Deterministic cases**: "if this repo has tag `python-service` AND has no SBOM +in the last 30 days, create a scan task." The condition is fully expressible as a +boolean predicate over known attributes. The output is fixed by the template. No +ambiguity, no LLM required, fully testable. + +**Judgement cases**: "a new repository has been registered — based on its domain +and profile, determine what domain-specific onboarding tasks are appropriate." The +right answer depends on context that is expensive to encode as explicit rules. An +LLM is a better evaluator than a rule tree, but introduces non-determinism, cost, +and a new attack surface (prompt injection via event payload). + +Conflating these two modes into one mechanism produces a system that is either +too rigid (rules only) or too unpredictable (LLM everywhere). The two modes +need different evaluation pipelines, testing strategies, and audit trails. + +## Decision + +**Two named, distinct evaluation modes: Rule and Instruction.** + +Terminology is deliberate. A **Rule** is deterministic and mechanical — it applies +or it does not. An **Instruction** is contextual and interpretive — it guides an +LLM agent to make a judgement call. Both are expressed as fenced blocks in +ActivityDefinition markdown files (see ACT-ADR-002). + +### Rules + +A Rule has two parts: a **condition** (boolean predicate) and one or more +**actions** (task template references). + +#### Condition expression language + +The condition is a single-line string expression evaluated by a sandboxed +AST walker — never `exec()` or `eval()`. The evaluator walks the parsed AST +and whitelist-checks every node type before executing. Unknown node types +raise an `UnsafeExpression` error at parse time, not at evaluation time. + +**Available operations**: + +| Category | Syntax | Example | +|---|---|---| +| Equality | `==`, `!=` | `event.type == "org.repo.registered"` | +| Comparison | `>`, `<`, `>=`, `<=` | `event.attributes.sbom_age_days > 30` | +| Membership | `in`, `not in` | `"python-service" in event.attributes.tags` | +| Boolean | `and`, `or`, `not` | `a and (b or not c)` | +| Grouping | `( )` | `(a or b) and c` | +| Length | `len(x)` | `len(event.attributes.affected_repos) > 0` | +| Existence | `x is None`, `x is not None` | `event.attributes.domain is not None` | + +**Attribute access** follows dot notation on the `event` object and the `context` +object (populated by context sources declared in the ActivityDefinition): + +- `event.id` — UUID string +- `event.type` — event type identifier +- `event.version` — event type version +- `event.timestamp` — ISO 8601 datetime string +- `event.publisher` — publisher identifier +- `event.attributes.{name}` — typed attribute per event type schema +- `context.{source}.{field}` — resolved context data + +**Explicitly forbidden** (evaluator rejects at parse time): +- Function calls other than `len()` and `None` tests +- Attribute access on arbitrary Python objects +- String interpolation or formatting +- Any control flow (`if`, `for`, `while`, `lambda`) +- Import statements +- Assignments + +**Design rationale**: the expression language is intentionally small. Anything +complex enough to need more than this belongs in an Instruction, not a Rule. +When a rule condition becomes difficult to express, that is a signal that the +case requires LLM judgement, not a signal that the DSL needs more features. + +#### Actions + +A Rule's action block specifies: + +```yaml +action: + task_template: tasks/{template-slug}.md # required + target_repo: event.attributes.repo_slug # expression — attribute access only + priority: high # high | medium | low | literal + labels: ["onboarding", "security"] # literal list + due_in_days: 7 # optional, integer literal +``` + +`target_repo` and similar fields accept simple attribute access expressions +(no boolean logic — just path traversal). This allows dynamic routing to the +correct issue-core instance without arbitrary expression evaluation in action +fields. + +#### Evaluation semantics + +- All rules in an ActivityDefinition are evaluated; **all matching rules fire** + (not first-match-only). There is no implicit ordering beyond the file order, + which is documented in the ActivityDefinition for human clarity. +- A rule whose condition raises an error during evaluation is skipped and logged + as `rule_error`; other rules still fire. This prevents a single malformed rule + from silencing an entire ActivityDefinition. +- An empty condition (omitted `condition` field) evaluates to `true` — the rule + always fires when the trigger fires. + +### Instructions + +An Instruction defers the task-creation decision to an LLM. It specifies what +context to provide, how to frame the prompt, and what output schema to enforce. + +#### Structure + +```yaml +# in an instruction fenced block: +id: {slug} +condition: '{expression}' # optional pre-filter (Rule DSL); runs before LLM +trusted_fields: # REQUIRED — explicit allowlist of payload fields + - event.attributes.repo_slug # safe to interpolate into prompt + - event.attributes.domain + - event.attributes.tags +model: claude-sonnet-4-6 +review_required: false # true | false — curator gate for output +prompt: | + {prompt template — only trusted_fields may be interpolated} +output_schema: {path to JSON schema file} +``` + +#### Trusted fields and prompt injection protection + +The `trusted_fields` list is **required** and enforced at parse time. Any field +not listed is unavailable to the prompt template. The template engine raises +`UntrustedFieldError` if the prompt references a field not in `trusted_fields`. + +The rationale: event payloads may contain free-text from untrusted sources — +commit messages, issue titles, CVE descriptions, repo descriptions. Interpolating +these directly into a prompt creates a prompt injection surface. Trusted fields +are those whose values are validated by the event type schema (typed attributes +like slugs, domain names, tag lists) and cannot carry arbitrary instruction text +by construction. + +Fields of type `object` (freeform JSON) are **never eligible** for `trusted_fields` +even if listed — the evaluator rejects this at parse time. + +#### Output schema enforcement + +The LLM response is validated against `output_schema` using JSON Schema validation. +If validation fails, the instruction retries once with the schema error appended +to the prompt. If the second attempt also fails, the instruction records an +`instruction_output_error` audit event and emits no tasks. Tasks are **never +created from unvalidated output**. + +Structured output mode (tool_use / JSON mode) is used where the model supports +it. The output schema must define `List[TaskSpec]` or a compatible envelope. + +#### `review_required: true` + +When set, the instruction's proposed task list is written to a **pending review +queue** in issue-core rather than directly created. A human or curator agent +reviews and approves/rejects before tasks are materialised. This is the default +for instructions that create high-impact tasks (cross-repo changes, security +responses, production operations). + +#### Evaluation semantics + +- Instructions are evaluated **after** all rules in the ActivityDefinition. +- The optional `condition` field on an instruction uses the same Rule DSL as + a first-pass filter — if the condition is false, the LLM is not called. + This avoids LLM cost for events that clearly do not need instruction judgement. +- Instructions are **not** first-match-only; all instructions whose conditions + pass fire. An ActivityDefinition may have zero instructions. + +### Audit trail + +Every task emission records: + +| Field | Rule | Instruction | +|---|---|---| +| `source_type` | `"rule"` | `"instruction"` | +| `source_id` | rule `id` from definition | instruction `id` from definition | +| `source_version` | ActivityDefinition version | ActivityDefinition version | +| `triggering_event_id` | event UUID | event UUID | +| `condition_matched` | expression string | expression string (pre-filter) | +| `prompt_hash` | — | SHA-256 of rendered prompt | +| `model` | — | model ID used | +| `output_validated` | — | `true` / `false` | +| `review_required` | — | `true` / `false` | + +The audit trail is written to the `task_spawn_log` table in activity-core's database +and referenced from the task record in issue-core. + +### Testing strategy + +**Rules**: every rule can and should be unit-tested with fixture event payloads. +A test helper `evaluate_rule(condition_str, event_fixture)` returns `bool` and +raises on syntax errors. Tests live alongside ActivityDefinition files: +`activity-definitions/{slug}.test.json` — a list of `{event, expected_rules_fired}` +fixtures. + +**Instructions**: instructions cannot be deterministically unit-tested. Instead: +- Sample evaluations are collected: given a fixture event, record the LLM response. +- Samples are committed to `activity-definitions/{slug}.samples/` for human review. +- Output schema validation is unit-tested independently of the LLM call. +- Prompt injection resistance is tested by including injection strings in fixture + event payloads and asserting they do not appear in the rendered prompt. + +### rules-core module boundary + +The rule evaluator and instruction executor live in `src/activity_core/rules/`. +Within this module: + +- **No imports from** `temporalio`, `sqlalchemy`, `fastapi`, or any activity-core + application code. +- Public surface: `evaluate_condition(expr: str, event: EventEnvelope, context: dict) -> bool` + and `execute_instruction(instr: InstructionDef, event: EventEnvelope, context: dict) -> List[TaskSpec]`. +- The module is independently importable and testable without starting the Temporal + worker or Postgres. + +This boundary makes future extraction to `rules-core` a packaging exercise, not a refactor. + +## Consequences + +- The `ActivityDefinition` Pydantic model gains `rules: List[RuleDef]` and + `instructions: List[InstructionDef]` fields. The current implicit "always create + tasks" behaviour is replaced by explicit rule blocks. +- A new `RuleEvaluator` class (AST walker) is added to `src/activity_core/rules/`. +- A new `InstructionExecutor` class handles prompt rendering, LLM call, output + validation, and review queue routing. +- Integration tests for rule evaluation use fixture JSON; no running Temporal required. +- The `task_spawn_log` table is added to the Postgres schema (new Alembic migration). +- ActivityDefinition files that omit both `rules` and `instructions` are valid + (they fire with no output) — this supports future placeholder definitions. + +## Alternatives Considered + +**OPA / Rego for rule conditions**: powerful, well-established policy language, +supports complex logic. Rejected — Rego's learning curve is high for non-specialists; +agents rarely produce correct Rego without fine-tuning; it adds a runtime dependency. +The simple AST-walker DSL covers the realistic condition complexity for this org. + +**Rules as Python lambdas**: maximum expressiveness. Rejected — arbitrary code +execution in a rule condition is a serious security surface, especially in an +org-wide event loop. Code deployment required for any rule change; agents cannot +write rules without code write access. + +**LLM for all conditions (no Rule/Instruction split)**: simpler model, more +flexible. Rejected — non-deterministic for cases that are deterministic; expensive +for high-frequency events like cron ticks; impossible to unit-test; audit trail +for deterministic rules becomes murky. + +**Instructions only, no Rules**: allows arbitrary LLM judgement for everything. +Rejected — LLM cost for every event, latency, and non-determinism are unacceptable +for high-frequency maintenance automations. Many cases (SBOM staleness check, +tag-based routing) are fully deterministic and should stay that way. + +## Related + +- ACT-ADR-001 — Event Bridge Architecture +- ACT-ADR-002 — Definition format (where rule/instruction blocks live) +- CUST-TFE-SCOPE-2026-000001 — task-flow-engine extraction (analogue pattern) +- `src/activity_core/rules/` — implementation home