14 KiB
id, type, title, status, decided_by, date, scope, affects, tags
| id | type | title | status | decided_by | date | scope | affects | tags | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ACT-ADR-003 | architecture-decision-record | Rule vs. Instruction Model and Expression DSL | accepted | Bernd Worsch | 2026-05-14 | cross-repo |
|
|
ACT-ADR-003: Rule vs. Instruction Model and Expression DSL
Status
Accepted.
Context
ActivityDefinitions need two distinct evaluation modes to cover the full range of automation scenarios in the Coulomb org:
Deterministic cases: "if this repo has tag python-service AND has no SBOM
in the last 30 days, create a scan task." The condition is fully expressible as a
boolean predicate over known attributes. The output is fixed by the template. No
ambiguity, no LLM required, fully testable.
Judgement cases: "a new repository has been registered — based on its domain and profile, determine what domain-specific onboarding tasks are appropriate." The right answer depends on context that is expensive to encode as explicit rules. An LLM is a better evaluator than a rule tree, but introduces non-determinism, cost, and a new attack surface (prompt injection via event payload).
Conflating these two modes into one mechanism produces a system that is either too rigid (rules only) or too unpredictable (LLM everywhere). The two modes need different evaluation pipelines, testing strategies, and audit trails.
Decision
Two named, distinct evaluation modes: Rule and Instruction.
Terminology is deliberate. A Rule is deterministic and mechanical — it applies or it does not. An Instruction is contextual and interpretive — it guides an LLM agent to make a judgement call. Both are expressed as fenced blocks in ActivityDefinition markdown files (see ACT-ADR-002).
Rules
A Rule has two parts: a condition (boolean predicate) and one or more actions (task template references).
Condition expression language
The condition is a single-line string expression evaluated by a sandboxed
AST walker — never exec() or eval(). The evaluator walks the parsed AST
and whitelist-checks every node type before executing. Unknown node types
raise an UnsafeExpression error at parse time, not at evaluation time.
Available operations:
| Category | Syntax | Example |
|---|---|---|
| Equality | ==, != |
event.type == "org.repo.registered" |
| Comparison | >, <, >=, <= |
event.attributes.sbom_age_days > 30 |
| Membership | in, not in |
"python-service" in event.attributes.tags |
| Boolean | and, or, not |
a and (b or not c) |
| Grouping | ( ) |
(a or b) and c |
| Length | len(x) |
len(event.attributes.affected_repos) > 0 |
| Existence | x is None, x is not None |
event.attributes.domain is not None |
Attribute access follows dot notation on the event object and the context
object (populated by context sources declared in the ActivityDefinition):
event.id— UUID stringevent.type— event type identifierevent.version— event type versionevent.timestamp— ISO 8601 datetime stringevent.publisher— publisher identifierevent.attributes.{name}— typed attribute per event type schemacontext.{source}.{field}— resolved context data
Explicitly forbidden (evaluator rejects at parse time):
- Function calls other than
len()andNonetests - Attribute access on arbitrary Python objects
- String interpolation or formatting
- Any control flow (
if,for,while,lambda) - Import statements
- Assignments
Design rationale: the expression language is intentionally small. Anything complex enough to need more than this belongs in an Instruction, not a Rule. When a rule condition becomes difficult to express, that is a signal that the case requires LLM judgement, not a signal that the DSL needs more features.
Actions
A Rule's action block specifies:
action:
task_template: "Run SBOM rescan for {context.repo.repo_slug}"
target_repo: context.repo.repo_slug
priority: medium
labels: ["sbom", "security", "{context.repo.repo_slug}"]
due_in_days: 7
action.task_template is the emitted task title template. It is not a path to a
repo-local file. Older design notes and the legacy tasks/*.md directory use
"task template" for materialized task-body templates; that is a separate legacy
surface. To avoid surprise, new rule actions should treat task_template as
title_template semantics until the field can be renamed in a schema-breaking
revision.
Action fields accept two deterministic rendering forms:
- Whole-field paths: if the whole string is a path like
context.repo.repo_slugorevent.attributes.repo_slug, the rendered value keeps the original scalar/list/object shape from that path. This is the correct form fortarget_repoand other fields that should not become prose. - Scalar placeholders: strings may include
{context.foo}or{event.foo}placeholders. Each placeholder must resolve to a scalar. Lists and objects are rejected rather than stringified, which prevents accidental JSON blobs or untrusted text from being embedded into task titles.
Unsafe action cases are rejected:
- Any action path outside
context.*orevent.*. - Any path containing calls, indexing, arithmetic, filters, or boolean logic.
- Placeholder values that resolve to lists or objects.
for_eachvalues that are not a whole-fieldcontext.*orevent.*path to a list.bind_asnames that are not simple identifiers.
Per-item rule expansion is explicit:
for_each: context.repos.repos
bind_as: repo
condition: 'context.repo.sbom_age_days > 30'
action:
task_template: Run SBOM rescan for {context.repo.repo_slug}
target_repo: context.repo.repo_slug
priority: medium
labels: ["sbom", "security", "automated"]
The weekly SBOM staleness definition is the canonical pattern. The State Hub
bulk resolver exposes all repository entries at context.repos.repos, the rule
binds each item as context.repo, and the strict staleness definition is
context.repo.sbom_age_days > 30. Thirty days exactly is not stale; thirty-one
days is stale.
Evaluation semantics
- All rules in an ActivityDefinition are evaluated; all matching rules fire (not first-match-only). There is no implicit ordering beyond the file order, which is documented in the ActivityDefinition for human clarity.
- A rule whose condition raises an error during evaluation is skipped and logged
as
rule_error; other rules still fire. This prevents a single malformed rule from silencing an entire ActivityDefinition. - An empty condition (omitted
conditionfield) evaluates totrue— the rule always fires when the trigger fires.
Instructions
An Instruction defers the task-creation decision to an LLM. It specifies what context to provide, how to frame the prompt, and what output schema to enforce.
Structure
# in an instruction fenced block:
id: {slug}
condition: '{expression}' # optional pre-filter (Rule DSL); runs before LLM
trusted_fields: # REQUIRED — explicit allowlist of payload fields
- event.attributes.repo_slug # safe to interpolate into prompt
- event.attributes.domain
- event.attributes.tags
model: claude-sonnet-4-6
review_required: false # true | false — curator gate for output
prompt: |
{prompt template — only trusted_fields may be interpolated}
output_schema: {path to JSON schema file}
Trusted fields and prompt injection protection
The trusted_fields list is required and enforced at parse time. Any field
not listed is unavailable to the prompt template. The template engine raises
UntrustedFieldError if the prompt references a field not in trusted_fields.
The rationale: event payloads may contain free-text from untrusted sources — commit messages, issue titles, CVE descriptions, repo descriptions. Interpolating these directly into a prompt creates a prompt injection surface. Trusted fields are those whose values are validated by the event type schema (typed attributes like slugs, domain names, tag lists) and cannot carry arbitrary instruction text by construction.
Fields of type object (freeform JSON) are never eligible for trusted_fields
even if listed — the evaluator rejects this at parse time.
Output schema enforcement
The LLM response is validated against output_schema using JSON Schema validation.
If validation fails, the instruction retries once with the schema error appended
to the prompt. If the second attempt also fails, the instruction records an
instruction_output_error audit event and emits no tasks. Tasks are never
created from unvalidated output.
Structured output mode (tool_use / JSON mode) is used where the model supports
it. The output schema must define List[TaskSpec] or a compatible envelope.
review_required: true
When set, the instruction's proposed task list is written to a pending review queue in issue-core rather than directly created. A human or curator agent reviews and approves/rejects before tasks are materialised. This is the default for instructions that create high-impact tasks (cross-repo changes, security responses, production operations).
Evaluation semantics
- Instructions are evaluated after all rules in the ActivityDefinition.
- The optional
conditionfield on an instruction uses the same Rule DSL as a first-pass filter — if the condition is false, the LLM is not called. This avoids LLM cost for events that clearly do not need instruction judgement. - Instructions are not first-match-only; all instructions whose conditions pass fire. An ActivityDefinition may have zero instructions.
Audit trail
Every task emission records:
| Field | Rule | Instruction |
|---|---|---|
source_type |
"rule" |
"instruction" |
source_id |
rule id from definition |
instruction id from definition |
source_version |
ActivityDefinition version | ActivityDefinition version |
triggering_event_id |
event UUID | event UUID |
condition_matched |
expression string | expression string (pre-filter) |
prompt_hash |
— | SHA-256 of rendered prompt |
model |
— | model ID used |
output_validated |
— | true / false |
review_required |
— | true / false |
The audit trail is written to the task_spawn_log table in activity-core's database
and referenced from the task record in issue-core.
Testing strategy
Rules: every rule can and should be unit-tested with fixture event payloads.
A test helper evaluate_rule(condition_str, event_fixture) returns bool and
raises on syntax errors. Tests live alongside ActivityDefinition files:
activity-definitions/{slug}.test.json — a list of {event, expected_rules_fired}
fixtures.
Instructions: instructions cannot be deterministically unit-tested. Instead:
- Sample evaluations are collected: given a fixture event, record the LLM response.
- Samples are committed to
activity-definitions/{slug}.samples/for human review. - Output schema validation is unit-tested independently of the LLM call.
- Prompt injection resistance is tested by including injection strings in fixture event payloads and asserting they do not appear in the rendered prompt.
rules-core module boundary
The rule evaluator and instruction executor live in src/activity_core/rules/.
Within this module:
- No imports from
temporalio,sqlalchemy,fastapi, or any activity-core application code. - Public surface:
evaluate_condition(expr: str, event: EventEnvelope, context: dict) -> boolandexecute_instruction(instr: InstructionDef, event: EventEnvelope, context: dict) -> List[TaskSpec]. - The module is independently importable and testable without starting the Temporal worker or Postgres.
This boundary makes future extraction to rules-core a packaging exercise, not a refactor.
Consequences
- The
ActivityDefinitionPydantic model gainsrules: List[RuleDef]andinstructions: List[InstructionDef]fields. The current implicit "always create tasks" behaviour is replaced by explicit rule blocks. - A new
RuleEvaluatorclass (AST walker) is added tosrc/activity_core/rules/. - A new
InstructionExecutorclass handles prompt rendering, LLM call, output validation, and review queue routing. - Integration tests for rule evaluation use fixture JSON; no running Temporal required.
- The
task_spawn_logtable is added to the Postgres schema (new Alembic migration). - ActivityDefinition files that omit both
rulesandinstructionsare valid (they fire with no output) — this supports future placeholder definitions.
Alternatives Considered
OPA / Rego for rule conditions: powerful, well-established policy language, supports complex logic. Rejected — Rego's learning curve is high for non-specialists; agents rarely produce correct Rego without fine-tuning; it adds a runtime dependency. The simple AST-walker DSL covers the realistic condition complexity for this org.
Rules as Python lambdas: maximum expressiveness. Rejected — arbitrary code execution in a rule condition is a serious security surface, especially in an org-wide event loop. Code deployment required for any rule change; agents cannot write rules without code write access.
LLM for all conditions (no Rule/Instruction split): simpler model, more flexible. Rejected — non-deterministic for cases that are deterministic; expensive for high-frequency events like cron ticks; impossible to unit-test; audit trail for deterministic rules becomes murky.
Instructions only, no Rules: allows arbitrary LLM judgement for everything. Rejected — LLM cost for every event, latency, and non-determinism are unacceptable for high-frequency maintenance automations. Many cases (SBOM staleness check, tag-based routing) are fully deterministic and should stay that way.
Related
- ACT-ADR-001 — Event Bridge Architecture
- ACT-ADR-002 — Definition format (where rule/instruction blocks live)
- CUST-TFE-SCOPE-2026-000001 — task-flow-engine extraction (analogue pattern)
src/activity_core/rules/— implementation home