--- id: ACT-ADR-003 type: architecture-decision-record title: "Rule vs. Instruction Model and Expression DSL" status: accepted decided_by: Bernd Worsch date: "2026-05-14" scope: cross-repo affects: - activity-core - rules-core (future extraction) tags: ["architecture", "rules", "instructions", "dsl", "llm", "safety", "evaluation"] --- # ACT-ADR-003: Rule vs. Instruction Model and Expression DSL ## Status Accepted. ## Context ActivityDefinitions need two distinct evaluation modes to cover the full range of automation scenarios in the Coulomb org: **Deterministic cases**: "if this repo has tag `python-service` AND has no SBOM in the last 30 days, create a scan task." The condition is fully expressible as a boolean predicate over known attributes. The output is fixed by the template. No ambiguity, no LLM required, fully testable. **Judgement cases**: "a new repository has been registered — based on its domain and profile, determine what domain-specific onboarding tasks are appropriate." The right answer depends on context that is expensive to encode as explicit rules. An LLM is a better evaluator than a rule tree, but introduces non-determinism, cost, and a new attack surface (prompt injection via event payload). Conflating these two modes into one mechanism produces a system that is either too rigid (rules only) or too unpredictable (LLM everywhere). The two modes need different evaluation pipelines, testing strategies, and audit trails. ## Decision **Two named, distinct evaluation modes: Rule and Instruction.** Terminology is deliberate. A **Rule** is deterministic and mechanical — it applies or it does not. An **Instruction** is contextual and interpretive — it guides an LLM agent to make a judgement call. Both are expressed as fenced blocks in ActivityDefinition markdown files (see ACT-ADR-002). ### Rules A Rule has two parts: a **condition** (boolean predicate) and one or more **actions** (task template references). #### Condition expression language The condition is a single-line string expression evaluated by a sandboxed AST walker — never `exec()` or `eval()`. The evaluator walks the parsed AST and whitelist-checks every node type before executing. Unknown node types raise an `UnsafeExpression` error at parse time, not at evaluation time. **Available operations**: | Category | Syntax | Example | |---|---|---| | Equality | `==`, `!=` | `event.type == "org.repo.registered"` | | Comparison | `>`, `<`, `>=`, `<=` | `event.attributes.sbom_age_days > 30` | | Membership | `in`, `not in` | `"python-service" in event.attributes.tags` | | Boolean | `and`, `or`, `not` | `a and (b or not c)` | | Grouping | `( )` | `(a or b) and c` | | Length | `len(x)` | `len(event.attributes.affected_repos) > 0` | | Existence | `x is None`, `x is not None` | `event.attributes.domain is not None` | **Attribute access** follows dot notation on the `event` object and the `context` object (populated by context sources declared in the ActivityDefinition): - `event.id` — UUID string - `event.type` — event type identifier - `event.version` — event type version - `event.timestamp` — ISO 8601 datetime string - `event.publisher` — publisher identifier - `event.attributes.{name}` — typed attribute per event type schema - `context.{source}.{field}` — resolved context data **Explicitly forbidden** (evaluator rejects at parse time): - Function calls other than `len()` and `None` tests - Attribute access on arbitrary Python objects - String interpolation or formatting - Any control flow (`if`, `for`, `while`, `lambda`) - Import statements - Assignments **Design rationale**: the expression language is intentionally small. Anything complex enough to need more than this belongs in an Instruction, not a Rule. When a rule condition becomes difficult to express, that is a signal that the case requires LLM judgement, not a signal that the DSL needs more features. #### Actions A Rule's action block specifies: ```yaml action: task_template: "Run SBOM rescan for {context.repo.repo_slug}" target_repo: context.repo.repo_slug priority: medium labels: ["sbom", "security", "{context.repo.repo_slug}"] due_in_days: 7 ``` `action.task_template` is the emitted task title template. It is not a path to a repo-local file. Older design notes and the legacy `tasks/*.md` directory use "task template" for materialized task-body templates; that is a separate legacy surface. To avoid surprise, new rule actions should treat `task_template` as `title_template` semantics until the field can be renamed in a schema-breaking revision. Action fields accept two deterministic rendering forms: - Whole-field paths: if the whole string is a path like `context.repo.repo_slug` or `event.attributes.repo_slug`, the rendered value keeps the original scalar/list/object shape from that path. This is the correct form for `target_repo` and other fields that should not become prose. - Scalar placeholders: strings may include `{context.foo}` or `{event.foo}` placeholders. Each placeholder must resolve to a scalar. Lists and objects are rejected rather than stringified, which prevents accidental JSON blobs or untrusted text from being embedded into task titles. Unsafe action cases are rejected: - Any action path outside `context.*` or `event.*`. - Any path containing calls, indexing, arithmetic, filters, or boolean logic. - Placeholder values that resolve to lists or objects. - `for_each` values that are not a whole-field `context.*` or `event.*` path to a list. - `bind_as` names that are not simple identifiers. Per-item rule expansion is explicit: ```yaml for_each: context.repos.repos bind_as: repo condition: 'context.repo.sbom_age_days > 30' action: task_template: Run SBOM rescan for {context.repo.repo_slug} target_repo: context.repo.repo_slug priority: medium labels: ["sbom", "security", "automated"] ``` The weekly SBOM staleness definition is the canonical pattern. The State Hub bulk resolver exposes all repository entries at `context.repos.repos`, the rule binds each item as `context.repo`, and the strict staleness definition is `context.repo.sbom_age_days > 30`. Thirty days exactly is not stale; thirty-one days is stale. #### Evaluation semantics - All rules in an ActivityDefinition are evaluated; **all matching rules fire** (not first-match-only). There is no implicit ordering beyond the file order, which is documented in the ActivityDefinition for human clarity. - A rule whose condition raises an error during evaluation is skipped and logged as `rule_error`; other rules still fire. This prevents a single malformed rule from silencing an entire ActivityDefinition. - An empty condition (omitted `condition` field) evaluates to `true` — the rule always fires when the trigger fires. ### Instructions An Instruction defers the task-creation decision to an LLM. It specifies what context to provide, how to frame the prompt, and what output schema to enforce. #### Structure ```yaml # in an instruction fenced block: id: {slug} condition: '{expression}' # optional pre-filter (Rule DSL); runs before LLM trusted_fields: # REQUIRED — explicit allowlist of payload fields - event.attributes.repo_slug # safe to interpolate into prompt - event.attributes.domain - event.attributes.tags model: claude-sonnet-4-6 review_required: false # true | false — curator gate for output prompt: | {prompt template — only trusted_fields may be interpolated} output_schema: {path to JSON schema file} ``` #### Trusted fields and prompt injection protection The `trusted_fields` list is **required** and enforced at parse time. Any field not listed is unavailable to the prompt template. The template engine raises `UntrustedFieldError` if the prompt references a field not in `trusted_fields`. The rationale: event payloads may contain free-text from untrusted sources — commit messages, issue titles, CVE descriptions, repo descriptions. Interpolating these directly into a prompt creates a prompt injection surface. Trusted fields are those whose values are validated by the event type schema (typed attributes like slugs, domain names, tag lists) and cannot carry arbitrary instruction text by construction. Fields of type `object` (freeform JSON) are **never eligible** for `trusted_fields` even if listed — the evaluator rejects this at parse time. #### Output schema enforcement The LLM response is validated against `output_schema` using JSON Schema validation. If validation fails, the instruction retries once with the schema error appended to the prompt. If the second attempt also fails, the instruction records an `instruction_output_error` audit event and emits no tasks. Tasks are **never created from unvalidated output**. Structured output mode (tool_use / JSON mode) is used where the model supports it. The output schema must define `List[TaskSpec]` or a compatible envelope. #### `review_required: true` When set, the instruction's proposed task list is written to a **pending review queue** in issue-core rather than directly created. A human or curator agent reviews and approves/rejects before tasks are materialised. This is the default for instructions that create high-impact tasks (cross-repo changes, security responses, production operations). #### Evaluation semantics - Instructions are evaluated **after** all rules in the ActivityDefinition. - The optional `condition` field on an instruction uses the same Rule DSL as a first-pass filter — if the condition is false, the LLM is not called. This avoids LLM cost for events that clearly do not need instruction judgement. - Instructions are **not** first-match-only; all instructions whose conditions pass fire. An ActivityDefinition may have zero instructions. ### Audit trail Every task emission records: | Field | Rule | Instruction | |---|---|---| | `source_type` | `"rule"` | `"instruction"` | | `source_id` | rule `id` from definition | instruction `id` from definition | | `source_version` | ActivityDefinition version | ActivityDefinition version | | `triggering_event_id` | event UUID | event UUID | | `condition_matched` | expression string | expression string (pre-filter) | | `prompt_hash` | — | SHA-256 of rendered prompt | | `model` | — | model ID used | | `output_validated` | — | `true` / `false` | | `review_required` | — | `true` / `false` | The audit trail is written to the `task_spawn_log` table in activity-core's database and referenced from the task record in issue-core. ### Testing strategy **Rules**: every rule can and should be unit-tested with fixture event payloads. A test helper `evaluate_rule(condition_str, event_fixture)` returns `bool` and raises on syntax errors. Tests live alongside ActivityDefinition files: `activity-definitions/{slug}.test.json` — a list of `{event, expected_rules_fired}` fixtures. **Instructions**: instructions cannot be deterministically unit-tested. Instead: - Sample evaluations are collected: given a fixture event, record the LLM response. - Samples are committed to `activity-definitions/{slug}.samples/` for human review. - Output schema validation is unit-tested independently of the LLM call. - Prompt injection resistance is tested by including injection strings in fixture event payloads and asserting they do not appear in the rendered prompt. ### rules-core module boundary The rule evaluator and instruction executor live in `src/activity_core/rules/`. Within this module: - **No imports from** `temporalio`, `sqlalchemy`, `fastapi`, or any activity-core application code. - Public surface: `evaluate_condition(expr: str, event: EventEnvelope, context: dict) -> bool` and `execute_instruction(instr: InstructionDef, event: EventEnvelope, context: dict) -> List[TaskSpec]`. - The module is independently importable and testable without starting the Temporal worker or Postgres. This boundary makes future extraction to `rules-core` a packaging exercise, not a refactor. ## Consequences - The `ActivityDefinition` Pydantic model gains `rules: List[RuleDef]` and `instructions: List[InstructionDef]` fields. The current implicit "always create tasks" behaviour is replaced by explicit rule blocks. - A new `RuleEvaluator` class (AST walker) is added to `src/activity_core/rules/`. - A new `InstructionExecutor` class handles prompt rendering, LLM call, output validation, and review queue routing. - Integration tests for rule evaluation use fixture JSON; no running Temporal required. - The `task_spawn_log` table is added to the Postgres schema (new Alembic migration). - ActivityDefinition files that omit both `rules` and `instructions` are valid (they fire with no output) — this supports future placeholder definitions. ## Alternatives Considered **OPA / Rego for rule conditions**: powerful, well-established policy language, supports complex logic. Rejected — Rego's learning curve is high for non-specialists; agents rarely produce correct Rego without fine-tuning; it adds a runtime dependency. The simple AST-walker DSL covers the realistic condition complexity for this org. **Rules as Python lambdas**: maximum expressiveness. Rejected — arbitrary code execution in a rule condition is a serious security surface, especially in an org-wide event loop. Code deployment required for any rule change; agents cannot write rules without code write access. **LLM for all conditions (no Rule/Instruction split)**: simpler model, more flexible. Rejected — non-deterministic for cases that are deterministic; expensive for high-frequency events like cron ticks; impossible to unit-test; audit trail for deterministic rules becomes murky. **Instructions only, no Rules**: allows arbitrary LLM judgement for everything. Rejected — LLM cost for every event, latency, and non-determinism are unacceptable for high-frequency maintenance automations. Many cases (SBOM staleness check, tag-based routing) are fully deterministic and should stay that way. ## Related - ACT-ADR-001 — Event Bridge Architecture - ACT-ADR-002 — Definition format (where rule/instruction blocks live) - CUST-TFE-SCOPE-2026-000001 — task-flow-engine extraction (analogue pattern) - `src/activity_core/rules/` — implementation home