docs(adr): establish three foundational ADRs for Event Bridge architecture

ADR-001: activity-core as org-wide Event Bridge — boundaries, NATS as
org infrastructure, state hub delegation, rules-core module-first,
issue-core adapter interface, capabilities domain assignment.

ADR-002: markdown-as-definition format for event types and
ActivityDefinitions — co-located intent/schema/logic/debugging,
publisher-declared governance with environment-configurable curator gate,
attribute type system, task template files.

ADR-003: Rule vs. Instruction model and expression DSL — sandboxed
Python-like AST evaluator for Rules, trusted-fields prompt injection
protection for Instructions, output schema enforcement, audit trail,
testing strategy, rules-core module boundary.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-05-14 16:48:42 +02:00
parent 0818ce3eb1
commit 617b2420d3
3 changed files with 827 additions and 0 deletions

View File

@@ -0,0 +1,190 @@
---
id: ACT-ADR-001
type: architecture-decision-record
title: "Activity-Core as Coulomb Org Event Bridge"
status: accepted
decided_by: Bernd Worsch
date: "2026-05-14"
scope: cross-repo
affects:
- activity-core
- the-custodian/state-hub
- issue-facade (→ issue-core)
- repo-scoping
tags: ["architecture", "event-bridge", "activity-core", "orchestration", "event-loop"]
---
# ACT-ADR-001: Activity-Core as Coulomb Org Event Bridge
## Status
Accepted.
## Context
The Coulomb organization's set of repositories, services, and deployments is growing
beyond what a single person can coordinate manually. The state hub tracks cross-domain
state but has no mechanism to automatically respond to it. Recurring maintenance
(dependency scans, SBOM staleness checks, consistency audits) is implemented as
bespoke cron jobs baked into individual services — scattered, hard to audit, and
impossible to govern from a single vantage point.
Three forces drive the need for a dedicated orchestration layer:
1. **Scale**: as the repo count grows, manual coordination becomes the bottleneck.
2. **Reactivity**: org-level events (new repo registered, CVE published, deployment
completed) should trigger coordinated responses without human intervention.
3. **Separation of concerns**: the state hub is a read model and should remain one.
It must not accumulate automation logic to avoid becoming a God object.
## Decision
**activity-core is the org-wide Event Bridge for the Coulomb organization.**
Its responsibility is exactly three things:
1. **Receive events** — time-based (cron, one-off scheduled) and domain events (NATS,
Gitea webhooks, state hub lifecycle signals).
2. **Evaluate rules and instructions** — given event payload and resolved context,
determine what work must be created.
3. **Emit task sets** — publish structured task creation requests to issue-core.
It does not execute work. It does not track task lifecycle. It does not manage projects.
### Boundary rules
| Concern | Owner |
|---|---|
| Cross-org task scheduling and reactive automation | **activity-core** |
| Task lifecycle (create, assign, track, close) | **issue-core** |
| Project and initiative management (phased, completion-gated) | **project-core** (future) |
| Repository capability profiling | **repo-scoping** |
| Cross-domain coordination state | **state hub** |
| Execution of automatable tasks | Temporal workers (per-repo) |
### Event type registry
Event types are declared by publishers as markdown definition files (see ACT-ADR-002).
Governance is **publisher-declared by default**: a publisher registers its event types
by committing definition files to the event-types registry. In production environments,
a curator gate can be enabled — registry entries must be reviewed before the runtime
accepts events of that type. This is a configuration flag per runtime scope (dev,
staging, prod), not a hard-coded rule.
### State hub relationship
The state hub **delegates automation to activity-core** rather than implementing it
internally. Concretely:
- Maintenance jobs currently baked into the state hub (consistency sync, SBOM
staleness checks) are migrated to ActivityDefinitions in activity-core.
- The state hub becomes a **publisher** of lifecycle events on NATS
(`org.workstream.created`, `org.decision.resolved`, `org.repo.registered`, etc.).
- The state hub does not subscribe to activity-core's output directly; it reads
task state from issue-core when needed.
This preserves the state hub as a read model and makes activity-core the single
home for automation policy.
### rules-core: module-first
The rule and instruction evaluation engine starts as `src/activity_core/rules/` — a
module with a clean internal boundary (no imports from Temporal, Postgres, or FastAPI
within the module). Extraction to a standalone `rules-core` repository happens when a
**second consumer** (e.g. state hub governance, project-core) needs the engine. This
follows the same discipline as the task-flow-engine extraction plan (CUST-TFE-SCOPE).
### NATS as org infrastructure
NATS JetStream is promoted from an activity-core internal component to **org-wide
event bus infrastructure**. It runs as a standalone service (not bundled in
activity-core's docker-compose) with its own lifecycle. All services that publish
or subscribe to org events do so via NATS streams.
### issue-core integration
activity-core communicates with issue-core via a **task emission adapter** — an
abstraction layer that, in the initial implementation, calls issue-core's REST API.
The adapter interface is defined now; the transport can migrate to NATS subscription
(issue-core subscribes to `task.spawned` events) once issue-core adds that capability.
This avoids hardcoding REST coupling throughout the codebase.
### Webhook receiver
A new HTTP endpoint within activity-core accepts inbound webhooks from Gitea (and
later GitHub, other services). It normalises payloads to the canonical EventEnvelope
format, validates against the event type registry, and publishes to NATS. This runs
alongside the existing FastAPI `api.py`.
### Domain assignment
activity-core and issue-core are assigned to the **`capabilities`** domain — the
same domain as repo-scoping. These are org-wide infrastructure tools that serve all
domains equally, not artefacts of any single project or custodian's personal workflow.
issue-core is explicitly disassociated from the markitect domain.
## Trigger types
Three trigger types are supported:
| Type | Description | Temporal mechanism |
|---|---|---|
| `cron` | Recurring schedule (5-field cron + timezone + misfire policy) | Temporal Schedule (implemented WP-0002) |
| `event` | React to a named event type on NATS | Temporal workflow started by Event Router |
| `scheduled` | One-off at a future datetime | Temporal Schedule with `remaining_actions: 1` |
`scheduled` is a new trigger type added in WP-0003.
## Consequences
### Immediate
- activity-core's `INTENT.md` and `SCOPE.md` are rewritten to reflect this architecture.
- The `task_instances` Postgres table is reclassified as a **spawn audit trail**
it records the act of spawning (what was created, when, which issue-core reference)
but is not the authoritative task record. Authoritative lifecycle state lives in
issue-core.
- A task emission adapter interface (`src/activity_core/issue_sink.py`) replaces any
direct Postgres writes to `task_instances` with calls through the adapter.
- The `TaskExecutorWorkflow` stub from WP-0001 is replaced with the actual adapter
call in WP-0003.
### Medium term
- State hub adds NATS publishing to its lifecycle operations.
- Gitea webhook receiver added to activity-core as a new HTTP router.
- Existing state hub maintenance crons are migrated to ActivityDefinitions.
- issue-facade is renamed issue-core and re-registered under the `capabilities` domain.
### Long term
- rules-core extracted as a standalone package when a second consumer appears.
- project-core created (depends on task-flow-engine extraction) for multi-phase
initiative management — explicitly out of scope for activity-core.
- NATS gets its own operational runbook and monitoring as org infrastructure.
## Alternatives Considered
**State hub absorbs activity-core functionality**: rejected — turns the state hub into
a God object, violates the read-model boundary, and makes automation logic impossible
to test independently.
**Per-repo automation (GitHub Actions style)**: rejected — cross-repo coordination
requires a single vantage point that can see all repos; per-repo actions can't express
org-level triggers or context.
**Activity-core as a thin Temporal wrapper only**: rejected — without the event type
registry and rule model, it's just a scheduler. The governance and introspection
properties are the point.
**Separate rules-core from day one**: rejected — premature extraction adds dependency
management overhead before a second consumer exists. Module-first with a clean boundary
costs nothing and preserves the extraction option.
## Related
- ACT-ADR-002 — Event type and ActivityDefinition definition format
- ACT-ADR-003 — Rule vs. Instruction model and DSL
- CUST-ADR-001 — Workplans as repository artefacts (canon/architecture/)
- CUST-TFE-SCOPE-2026-000001 — task-flow-engine extraction plan (canon/projects/)
- activity-core INTENT.md (to be written)
- activity-core WP-0003 (to be written)

View File

@@ -0,0 +1,356 @@
---
id: ACT-ADR-002
type: architecture-decision-record
title: "Markdown-as-Definition Format for Event Types and ActivityDefinitions"
status: accepted
decided_by: Bernd Worsch
date: "2026-05-14"
scope: cross-repo
affects:
- activity-core
- any event publisher registering event types
tags: ["architecture", "format", "event-type", "activity-definition", "markdown", "documentation"]
---
# ACT-ADR-002: Markdown-as-Definition Format
## Status
Accepted.
## Context
Event type schemas and ActivityDefinition rules need to be understood and authored
by three distinct audiences simultaneously: humans reviewing and debugging automation,
agents creating and modifying definitions at runtime, and machines parsing and
evaluating them. Traditional approaches split these concerns — schemas go in JSON
Schema or YAML, documentation goes in a wiki, logic goes in code — and they drift
apart. A bug in a rule requires cross-referencing three places to understand intent,
check the schema, and read the condition.
The Custodian ecosystem already uses markdown files with YAML frontmatter as the
authoritative format for workplans, ADRs, SCOPE.md, and INTENT.md — all understood
by humans and agents without additional tooling. The same pattern should apply here.
## Decision
**Event type definitions and ActivityDefinitions are markdown files** where machine-
parseable structure (frontmatter YAML and fenced definition blocks) is embedded within
human-readable narrative. Intent, schema, logic, and debugging notes live in one file.
### Event Type Definition Files
**Location**: `event-types/{namespace}.{event-name}.md` within the activity-core repo
(or a registered event-types registry repo if volumes justify separation).
**Naming convention**: `{publisher-domain}.{noun}.{verb}.md`, e.g.:
- `org.repo.registered.md`
- `org.security.cve.published.md`
- `org.workstream.completed.md`
**Structure**:
```markdown
---
id: org.repo.registered
type: event-type
version: "1.0"
publisher: the-custodian/state-hub
governance: publisher-declared # publisher-declared | curated
status: active # active | deprecated | draft
introduced: "2026-05-14"
---
# Event: org.repo.registered
## Intent
One-paragraph statement of why this event exists and what it signals.
Written for an agent or human who has never seen it before.
## When Published
Bulleted list of the exact conditions under which the publisher fires this event.
Be precise — ambiguity here causes missed or duplicate activations.
## Attributes
| Attribute | Type | Required | Description |
|---|---|---|---|
| `repo_slug` | string | yes | URL-safe repository identifier |
| `domain` | string | yes | Domain slug the repo is assigned to |
| `tags` | string[] | no | Capability tags set at registration time |
| `registered_at` | datetime | yes | ISO 8601 UTC timestamp |
## Example Payload
```json
{
"id": "evt-7f3a1b2c",
"type": "org.repo.registered",
"version": "1.0",
"timestamp": "2026-05-14T10:00:00Z",
"publisher": "the-custodian/state-hub",
"attributes": {
"repo_slug": "new-python-service",
"domain": "railiance",
"tags": ["python-service", "fastapi"],
"registered_at": "2026-05-14T10:00:00Z"
}
}
```
## Consumer Notes
Guidance for agents and humans writing rules against this event type:
- Which attributes are safe for instruction prompts (trusted fields)
- Common misuses or gotchas
- Related events that are often used together
## Debugging
What to check when an activity that subscribes to this event does not fire:
- How to verify the event was published (NATS subject, log entry)
- How to inspect the event payload in the registry
- Common schema validation failures
```
### Attribute Types
The type system for event attributes is intentionally small:
| Type | Notes |
|---|---|
| `string` | UTF-8 string |
| `integer` | 64-bit signed integer |
| `float` | 64-bit float |
| `boolean` | true / false |
| `datetime` | ISO 8601 UTC string in payload, parsed to datetime in evaluator |
| `uuid` | String in payload, validated as UUID v4 |
| `string[]` | JSON array of strings |
| `integer[]` | JSON array of integers |
| `object` | Freeform JSON object — cannot be used in rule conditions; instruction-only |
`object` type attributes are available to instructions but excluded from rule
conditions deliberately — rules must be deterministic and schema-validatable.
### ActivityDefinition Files
**Location**: `activity-definitions/{slug}.md` within the repo that owns the automation.
For org-wide automations: `activity-core/activity-definitions/`.
For domain-specific automations: `{domain-repo}/activity-definitions/`.
**Structure**:
```markdown
---
id: ACT-DEF-onboard-python-repo
type: activity-definition
version: "1.0"
status: active
trigger:
type: event # event | cron | scheduled
event_type: org.repo.registered # for type: event
# cron: "0 9 * * 1" # for type: cron (5-field, UTC)
# timezone: "Europe/Berlin" # optional, cron only
# misfire_policy: skip # skip | catchup | compress (cron only)
# at: "2026-06-01T09:00:00Z" # for type: scheduled (one-off)
context_sources:
- type: repo-scoping
query: repo_profile
bind_to: context.repo_profile
- type: state-hub
query: domain_summary
bind_to: context.domain_summary
governance: publisher-declared
owner: custodian-agent
created: "2026-05-14"
---
# ActivityDefinition: Onboard New Python Service
## Purpose
One paragraph. What does this automation do and why does it exist? What problem
would accumulate if this automation were turned off?
## Trigger
Which event type fires this activity, and under what conditions does it apply?
Cross-reference the event type definition file.
## Context Sources
What context is resolved before rules are evaluated? Explain what each source
provides and why it is needed.
## Rules
Each rule is a fenced block tagged `rule`. Rules are evaluated in order; all
matching rules fire (not first-match-only). See ACT-ADR-003 for the expression
language specification.
```rule
id: create-sbom-scan
condition: '"python-service" in event.attributes.tags'
action:
task_template: tasks/sbom-initial-scan.md
target_repo: event.attributes.repo_slug
priority: high
labels: ["onboarding", "security"]
```
```rule
id: create-scope-generation
condition: '"python-service" in event.attributes.tags and context.repo_profile.scope_md_exists == false'
action:
task_template: tasks/generate-scope-md.md
target_repo: event.attributes.repo_slug
priority: medium
labels: ["onboarding", "documentation"]
```
## Instructions
Instructions are evaluated after all rules. An instruction asks an LLM to decide
what additional tasks (if any) to create. See ACT-ADR-003 for safety requirements.
```instruction
id: domain-specific-onboarding
condition: 'event.attributes.domain != "test_domain_v2"'
trusted_fields:
- event.attributes.repo_slug
- event.attributes.domain
- event.attributes.tags
model: claude-sonnet-4-6
review_required: false
prompt: |
A new repository has been registered in the Coulomb organization.
Repository: {event.attributes.repo_slug}
Domain: {event.attributes.domain}
Tags: {event.attributes.tags}
Based on the domain's current standards and the repository profile above,
determine what additional domain-specific onboarding tasks should be created
beyond the standard SBOM scan and SCOPE.md generation. Return an empty list
if no additional tasks are warranted.
output_schema: tasks/task-template-list-schema.json
```
## Task Templates
References to task template files used in rule actions. Each template is a
separate markdown file under `tasks/` that defines the task title, description
template, default labels, and default assignee logic.
- `tasks/sbom-initial-scan.md`
- `tasks/generate-scope-md.md`
## Notes
Operational notes, edge cases, and context that does not fit elsewhere.
## Debugging
Checklist for when this ActivityDefinition fires but produces unexpected output:
1. Was the triggering event published with the correct type and attributes?
2. Do the rule conditions evaluate as expected? (Use `make eval-rule` with a fixture)
3. Is issue-core reachable and configured for the target domain?
4. For instructions: check the audit log for the model response and output validation result.
## Change History
- v1.0 (2026-05-14): Initial definition
```
### Governance model
The `governance` field on an event type definition determines how the registry
runtime handles it:
| Value | Behaviour |
|---|---|
| `publisher-declared` | Accepted immediately on publish; no review required |
| `curated` | Held in `pending` state until a curator approves via registry API |
The runtime checks the **environment's curator gate configuration** — not just the
file's governance field. An environment configured with `curator_gate: disabled`
treats all event types as `publisher-declared` regardless of the field value.
An environment with `curator_gate: required` treats all event types as `curated`
regardless of the field value. The field is the publisher's declared preference;
the environment config is the enforcement point.
This means:
- **Dev / integration**: `curator_gate: disabled` — developers and agents iterate
freely; new event types take effect immediately.
- **Staging / production**: `curator_gate: required` — all new event types queue
for curator review before the runtime accepts events of that type.
### File as source of truth
Following CUST-ADR-001 (Workplans as Repository Artefacts), definition files are
the canonical source of truth. The activity-core runtime indexes them into its
database on startup and via a sync command. The database is a queryable cache,
not the origin. A definition deleted from the filesystem is disabled at next sync.
### Task Templates
Task templates are separate markdown files (`tasks/{slug}.md`) referenced from
ActivityDefinition action blocks. They define:
```markdown
---
id: tasks/sbom-initial-scan
type: task-template
---
# Task: Run Initial SBOM Scan
## Title template
`Run SBOM scan — {target_repo}`
## Description template
Initial SBOM scan required for newly registered repository `{target_repo}`.
Run: `make ingest-sbom REPO={target_repo} SCAN=1`
## Default labels
["sbom", "security", "automated"]
## Default assignee
None (unassigned)
```
This keeps task content editable separately from the routing logic in
ActivityDefinitions.
## Consequences
- A new `event-types/` directory in activity-core (and eventually a shared registry)
holds all org event type definitions.
- A new `activity-definitions/` directory in activity-core holds org-wide automations.
- Domain repos may hold their own `activity-definitions/` for domain-specific
automations, scanned by activity-core at sync time.
- The runtime requires a parser for the `rule` and `instruction` fenced blocks.
- SCOPE.md for activity-core must be updated to list these directories.
## Alternatives Considered
**Pure JSON Schema for event types, separate wiki for docs**: rejected — documentation
and schema diverge immediately; agents must cross-reference two systems to author
a rule correctly.
**OpenAPI / AsyncAPI specification**: rejected — those formats are excellent for
API and broker documentation but not designed for co-locating operational intent
and debugging guidance. They are also less readable for non-specialists.
**Code-only (Python dataclasses for event schemas, Python functions for rules)**:
rejected — requires code deployment for any definition change; agents cannot modify
definitions without write access to the codebase; non-technical stakeholders cannot
review or understand automation policies.
## Related
- ACT-ADR-001 — Event Bridge Architecture
- ACT-ADR-003 — Rule vs. Instruction model and DSL
- CUST-ADR-001 — Workplans as repository artefacts

View File

@@ -0,0 +1,281 @@
---
id: ACT-ADR-003
type: architecture-decision-record
title: "Rule vs. Instruction Model and Expression DSL"
status: accepted
decided_by: Bernd Worsch
date: "2026-05-14"
scope: cross-repo
affects:
- activity-core
- rules-core (future extraction)
tags: ["architecture", "rules", "instructions", "dsl", "llm", "safety", "evaluation"]
---
# ACT-ADR-003: Rule vs. Instruction Model and Expression DSL
## Status
Accepted.
## Context
ActivityDefinitions need two distinct evaluation modes to cover the full range
of automation scenarios in the Coulomb org:
**Deterministic cases**: "if this repo has tag `python-service` AND has no SBOM
in the last 30 days, create a scan task." The condition is fully expressible as a
boolean predicate over known attributes. The output is fixed by the template. No
ambiguity, no LLM required, fully testable.
**Judgement cases**: "a new repository has been registered — based on its domain
and profile, determine what domain-specific onboarding tasks are appropriate." The
right answer depends on context that is expensive to encode as explicit rules. An
LLM is a better evaluator than a rule tree, but introduces non-determinism, cost,
and a new attack surface (prompt injection via event payload).
Conflating these two modes into one mechanism produces a system that is either
too rigid (rules only) or too unpredictable (LLM everywhere). The two modes
need different evaluation pipelines, testing strategies, and audit trails.
## Decision
**Two named, distinct evaluation modes: Rule and Instruction.**
Terminology is deliberate. A **Rule** is deterministic and mechanical — it applies
or it does not. An **Instruction** is contextual and interpretive — it guides an
LLM agent to make a judgement call. Both are expressed as fenced blocks in
ActivityDefinition markdown files (see ACT-ADR-002).
### Rules
A Rule has two parts: a **condition** (boolean predicate) and one or more
**actions** (task template references).
#### Condition expression language
The condition is a single-line string expression evaluated by a sandboxed
AST walker — never `exec()` or `eval()`. The evaluator walks the parsed AST
and whitelist-checks every node type before executing. Unknown node types
raise an `UnsafeExpression` error at parse time, not at evaluation time.
**Available operations**:
| Category | Syntax | Example |
|---|---|---|
| Equality | `==`, `!=` | `event.type == "org.repo.registered"` |
| Comparison | `>`, `<`, `>=`, `<=` | `event.attributes.sbom_age_days > 30` |
| Membership | `in`, `not in` | `"python-service" in event.attributes.tags` |
| Boolean | `and`, `or`, `not` | `a and (b or not c)` |
| Grouping | `( )` | `(a or b) and c` |
| Length | `len(x)` | `len(event.attributes.affected_repos) > 0` |
| Existence | `x is None`, `x is not None` | `event.attributes.domain is not None` |
**Attribute access** follows dot notation on the `event` object and the `context`
object (populated by context sources declared in the ActivityDefinition):
- `event.id` — UUID string
- `event.type` — event type identifier
- `event.version` — event type version
- `event.timestamp` — ISO 8601 datetime string
- `event.publisher` — publisher identifier
- `event.attributes.{name}` — typed attribute per event type schema
- `context.{source}.{field}` — resolved context data
**Explicitly forbidden** (evaluator rejects at parse time):
- Function calls other than `len()` and `None` tests
- Attribute access on arbitrary Python objects
- String interpolation or formatting
- Any control flow (`if`, `for`, `while`, `lambda`)
- Import statements
- Assignments
**Design rationale**: the expression language is intentionally small. Anything
complex enough to need more than this belongs in an Instruction, not a Rule.
When a rule condition becomes difficult to express, that is a signal that the
case requires LLM judgement, not a signal that the DSL needs more features.
#### Actions
A Rule's action block specifies:
```yaml
action:
task_template: tasks/{template-slug}.md # required
target_repo: event.attributes.repo_slug # expression — attribute access only
priority: high # high | medium | low | literal
labels: ["onboarding", "security"] # literal list
due_in_days: 7 # optional, integer literal
```
`target_repo` and similar fields accept simple attribute access expressions
(no boolean logic — just path traversal). This allows dynamic routing to the
correct issue-core instance without arbitrary expression evaluation in action
fields.
#### Evaluation semantics
- All rules in an ActivityDefinition are evaluated; **all matching rules fire**
(not first-match-only). There is no implicit ordering beyond the file order,
which is documented in the ActivityDefinition for human clarity.
- A rule whose condition raises an error during evaluation is skipped and logged
as `rule_error`; other rules still fire. This prevents a single malformed rule
from silencing an entire ActivityDefinition.
- An empty condition (omitted `condition` field) evaluates to `true` — the rule
always fires when the trigger fires.
### Instructions
An Instruction defers the task-creation decision to an LLM. It specifies what
context to provide, how to frame the prompt, and what output schema to enforce.
#### Structure
```yaml
# in an instruction fenced block:
id: {slug}
condition: '{expression}' # optional pre-filter (Rule DSL); runs before LLM
trusted_fields: # REQUIRED — explicit allowlist of payload fields
- event.attributes.repo_slug # safe to interpolate into prompt
- event.attributes.domain
- event.attributes.tags
model: claude-sonnet-4-6
review_required: false # true | false — curator gate for output
prompt: |
{prompt template — only trusted_fields may be interpolated}
output_schema: {path to JSON schema file}
```
#### Trusted fields and prompt injection protection
The `trusted_fields` list is **required** and enforced at parse time. Any field
not listed is unavailable to the prompt template. The template engine raises
`UntrustedFieldError` if the prompt references a field not in `trusted_fields`.
The rationale: event payloads may contain free-text from untrusted sources —
commit messages, issue titles, CVE descriptions, repo descriptions. Interpolating
these directly into a prompt creates a prompt injection surface. Trusted fields
are those whose values are validated by the event type schema (typed attributes
like slugs, domain names, tag lists) and cannot carry arbitrary instruction text
by construction.
Fields of type `object` (freeform JSON) are **never eligible** for `trusted_fields`
even if listed — the evaluator rejects this at parse time.
#### Output schema enforcement
The LLM response is validated against `output_schema` using JSON Schema validation.
If validation fails, the instruction retries once with the schema error appended
to the prompt. If the second attempt also fails, the instruction records an
`instruction_output_error` audit event and emits no tasks. Tasks are **never
created from unvalidated output**.
Structured output mode (tool_use / JSON mode) is used where the model supports
it. The output schema must define `List[TaskSpec]` or a compatible envelope.
#### `review_required: true`
When set, the instruction's proposed task list is written to a **pending review
queue** in issue-core rather than directly created. A human or curator agent
reviews and approves/rejects before tasks are materialised. This is the default
for instructions that create high-impact tasks (cross-repo changes, security
responses, production operations).
#### Evaluation semantics
- Instructions are evaluated **after** all rules in the ActivityDefinition.
- The optional `condition` field on an instruction uses the same Rule DSL as
a first-pass filter — if the condition is false, the LLM is not called.
This avoids LLM cost for events that clearly do not need instruction judgement.
- Instructions are **not** first-match-only; all instructions whose conditions
pass fire. An ActivityDefinition may have zero instructions.
### Audit trail
Every task emission records:
| Field | Rule | Instruction |
|---|---|---|
| `source_type` | `"rule"` | `"instruction"` |
| `source_id` | rule `id` from definition | instruction `id` from definition |
| `source_version` | ActivityDefinition version | ActivityDefinition version |
| `triggering_event_id` | event UUID | event UUID |
| `condition_matched` | expression string | expression string (pre-filter) |
| `prompt_hash` | — | SHA-256 of rendered prompt |
| `model` | — | model ID used |
| `output_validated` | — | `true` / `false` |
| `review_required` | — | `true` / `false` |
The audit trail is written to the `task_spawn_log` table in activity-core's database
and referenced from the task record in issue-core.
### Testing strategy
**Rules**: every rule can and should be unit-tested with fixture event payloads.
A test helper `evaluate_rule(condition_str, event_fixture)` returns `bool` and
raises on syntax errors. Tests live alongside ActivityDefinition files:
`activity-definitions/{slug}.test.json` — a list of `{event, expected_rules_fired}`
fixtures.
**Instructions**: instructions cannot be deterministically unit-tested. Instead:
- Sample evaluations are collected: given a fixture event, record the LLM response.
- Samples are committed to `activity-definitions/{slug}.samples/` for human review.
- Output schema validation is unit-tested independently of the LLM call.
- Prompt injection resistance is tested by including injection strings in fixture
event payloads and asserting they do not appear in the rendered prompt.
### rules-core module boundary
The rule evaluator and instruction executor live in `src/activity_core/rules/`.
Within this module:
- **No imports from** `temporalio`, `sqlalchemy`, `fastapi`, or any activity-core
application code.
- Public surface: `evaluate_condition(expr: str, event: EventEnvelope, context: dict) -> bool`
and `execute_instruction(instr: InstructionDef, event: EventEnvelope, context: dict) -> List[TaskSpec]`.
- The module is independently importable and testable without starting the Temporal
worker or Postgres.
This boundary makes future extraction to `rules-core` a packaging exercise, not a refactor.
## Consequences
- The `ActivityDefinition` Pydantic model gains `rules: List[RuleDef]` and
`instructions: List[InstructionDef]` fields. The current implicit "always create
tasks" behaviour is replaced by explicit rule blocks.
- A new `RuleEvaluator` class (AST walker) is added to `src/activity_core/rules/`.
- A new `InstructionExecutor` class handles prompt rendering, LLM call, output
validation, and review queue routing.
- Integration tests for rule evaluation use fixture JSON; no running Temporal required.
- The `task_spawn_log` table is added to the Postgres schema (new Alembic migration).
- ActivityDefinition files that omit both `rules` and `instructions` are valid
(they fire with no output) — this supports future placeholder definitions.
## Alternatives Considered
**OPA / Rego for rule conditions**: powerful, well-established policy language,
supports complex logic. Rejected — Rego's learning curve is high for non-specialists;
agents rarely produce correct Rego without fine-tuning; it adds a runtime dependency.
The simple AST-walker DSL covers the realistic condition complexity for this org.
**Rules as Python lambdas**: maximum expressiveness. Rejected — arbitrary code
execution in a rule condition is a serious security surface, especially in an
org-wide event loop. Code deployment required for any rule change; agents cannot
write rules without code write access.
**LLM for all conditions (no Rule/Instruction split)**: simpler model, more
flexible. Rejected — non-deterministic for cases that are deterministic; expensive
for high-frequency events like cron ticks; impossible to unit-test; audit trail
for deterministic rules becomes murky.
**Instructions only, no Rules**: allows arbitrary LLM judgement for everything.
Rejected — LLM cost for every event, latency, and non-determinism are unacceptable
for high-frequency maintenance automations. Many cases (SBOM staleness check,
tag-based routing) are fully deterministic and should stay that way.
## Related
- ACT-ADR-001 — Event Bridge Architecture
- ACT-ADR-002 — Definition format (where rule/instruction blocks live)
- CUST-TFE-SCOPE-2026-000001 — task-flow-engine extraction (analogue pattern)
- `src/activity_core/rules/` — implementation home