generated from coulomb/repo-seed
323 lines
14 KiB
Markdown
323 lines
14 KiB
Markdown
---
|
|
id: ACT-ADR-003
|
|
type: architecture-decision-record
|
|
title: "Rule vs. Instruction Model and Expression DSL"
|
|
status: accepted
|
|
decided_by: Bernd Worsch
|
|
date: "2026-05-14"
|
|
scope: cross-repo
|
|
affects:
|
|
- activity-core
|
|
- rules-core (future extraction)
|
|
tags: ["architecture", "rules", "instructions", "dsl", "llm", "safety", "evaluation"]
|
|
---
|
|
|
|
# ACT-ADR-003: Rule vs. Instruction Model and Expression DSL
|
|
|
|
## Status
|
|
|
|
Accepted.
|
|
|
|
## Context
|
|
|
|
ActivityDefinitions need two distinct evaluation modes to cover the full range
|
|
of automation scenarios in the Coulomb org:
|
|
|
|
**Deterministic cases**: "if this repo has tag `python-service` AND has no SBOM
|
|
in the last 30 days, create a scan task." The condition is fully expressible as a
|
|
boolean predicate over known attributes. The output is fixed by the template. No
|
|
ambiguity, no LLM required, fully testable.
|
|
|
|
**Judgement cases**: "a new repository has been registered — based on its domain
|
|
and profile, determine what domain-specific onboarding tasks are appropriate." The
|
|
right answer depends on context that is expensive to encode as explicit rules. An
|
|
LLM is a better evaluator than a rule tree, but introduces non-determinism, cost,
|
|
and a new attack surface (prompt injection via event payload).
|
|
|
|
Conflating these two modes into one mechanism produces a system that is either
|
|
too rigid (rules only) or too unpredictable (LLM everywhere). The two modes
|
|
need different evaluation pipelines, testing strategies, and audit trails.
|
|
|
|
## Decision
|
|
|
|
**Two named, distinct evaluation modes: Rule and Instruction.**
|
|
|
|
Terminology is deliberate. A **Rule** is deterministic and mechanical — it applies
|
|
or it does not. An **Instruction** is contextual and interpretive — it guides an
|
|
LLM agent to make a judgement call. Both are expressed as fenced blocks in
|
|
ActivityDefinition markdown files (see ACT-ADR-002).
|
|
|
|
### Rules
|
|
|
|
A Rule has two parts: a **condition** (boolean predicate) and one or more
|
|
**actions** (task template references).
|
|
|
|
#### Condition expression language
|
|
|
|
The condition is a single-line string expression evaluated by a sandboxed
|
|
AST walker — never `exec()` or `eval()`. The evaluator walks the parsed AST
|
|
and whitelist-checks every node type before executing. Unknown node types
|
|
raise an `UnsafeExpression` error at parse time, not at evaluation time.
|
|
|
|
**Available operations**:
|
|
|
|
| Category | Syntax | Example |
|
|
|---|---|---|
|
|
| Equality | `==`, `!=` | `event.type == "org.repo.registered"` |
|
|
| Comparison | `>`, `<`, `>=`, `<=` | `event.attributes.sbom_age_days > 30` |
|
|
| Membership | `in`, `not in` | `"python-service" in event.attributes.tags` |
|
|
| Boolean | `and`, `or`, `not` | `a and (b or not c)` |
|
|
| Grouping | `( )` | `(a or b) and c` |
|
|
| Length | `len(x)` | `len(event.attributes.affected_repos) > 0` |
|
|
| Existence | `x is None`, `x is not None` | `event.attributes.domain is not None` |
|
|
|
|
**Attribute access** follows dot notation on the `event` object and the `context`
|
|
object (populated by context sources declared in the ActivityDefinition):
|
|
|
|
- `event.id` — UUID string
|
|
- `event.type` — event type identifier
|
|
- `event.version` — event type version
|
|
- `event.timestamp` — ISO 8601 datetime string
|
|
- `event.publisher` — publisher identifier
|
|
- `event.attributes.{name}` — typed attribute per event type schema
|
|
- `context.{source}.{field}` — resolved context data
|
|
|
|
**Explicitly forbidden** (evaluator rejects at parse time):
|
|
- Function calls other than `len()` and `None` tests
|
|
- Attribute access on arbitrary Python objects
|
|
- String interpolation or formatting
|
|
- Any control flow (`if`, `for`, `while`, `lambda`)
|
|
- Import statements
|
|
- Assignments
|
|
|
|
**Design rationale**: the expression language is intentionally small. Anything
|
|
complex enough to need more than this belongs in an Instruction, not a Rule.
|
|
When a rule condition becomes difficult to express, that is a signal that the
|
|
case requires LLM judgement, not a signal that the DSL needs more features.
|
|
|
|
#### Actions
|
|
|
|
A Rule's action block specifies:
|
|
|
|
```yaml
|
|
action:
|
|
task_template: "Run SBOM rescan for {context.repo.repo_slug}"
|
|
target_repo: context.repo.repo_slug
|
|
priority: medium
|
|
labels: ["sbom", "security", "{context.repo.repo_slug}"]
|
|
due_in_days: 7
|
|
```
|
|
|
|
`action.task_template` is the emitted task title template. It is not a path to a
|
|
repo-local file. Older design notes and the legacy `tasks/*.md` directory use
|
|
"task template" for materialized task-body templates; that is a separate legacy
|
|
surface. To avoid surprise, new rule actions should treat `task_template` as
|
|
`title_template` semantics until the field can be renamed in a schema-breaking
|
|
revision.
|
|
|
|
Action fields accept two deterministic rendering forms:
|
|
|
|
- Whole-field paths: if the whole string is a path like
|
|
`context.repo.repo_slug` or `event.attributes.repo_slug`, the rendered value
|
|
keeps the original scalar/list/object shape from that path. This is the
|
|
correct form for `target_repo` and other fields that should not become prose.
|
|
- Scalar placeholders: strings may include `{context.foo}` or `{event.foo}`
|
|
placeholders. Each placeholder must resolve to a scalar. Lists and objects are
|
|
rejected rather than stringified, which prevents accidental JSON blobs or
|
|
untrusted text from being embedded into task titles.
|
|
|
|
Unsafe action cases are rejected:
|
|
|
|
- Any action path outside `context.*` or `event.*`.
|
|
- Any path containing calls, indexing, arithmetic, filters, or boolean logic.
|
|
- Placeholder values that resolve to lists or objects.
|
|
- `for_each` values that are not a whole-field `context.*` or `event.*` path to
|
|
a list.
|
|
- `bind_as` names that are not simple identifiers.
|
|
|
|
Per-item rule expansion is explicit:
|
|
|
|
```yaml
|
|
for_each: context.repos.repos
|
|
bind_as: repo
|
|
condition: 'context.repo.sbom_age_days > 30'
|
|
action:
|
|
task_template: Run SBOM rescan for {context.repo.repo_slug}
|
|
target_repo: context.repo.repo_slug
|
|
priority: medium
|
|
labels: ["sbom", "security", "automated"]
|
|
```
|
|
|
|
The weekly SBOM staleness definition is the canonical pattern. The State Hub
|
|
bulk resolver exposes all repository entries at `context.repos.repos`, the rule
|
|
binds each item as `context.repo`, and the strict staleness definition is
|
|
`context.repo.sbom_age_days > 30`. Thirty days exactly is not stale; thirty-one
|
|
days is stale.
|
|
|
|
#### Evaluation semantics
|
|
|
|
- All rules in an ActivityDefinition are evaluated; **all matching rules fire**
|
|
(not first-match-only). There is no implicit ordering beyond the file order,
|
|
which is documented in the ActivityDefinition for human clarity.
|
|
- A rule whose condition raises an error during evaluation is skipped and logged
|
|
as `rule_error`; other rules still fire. This prevents a single malformed rule
|
|
from silencing an entire ActivityDefinition.
|
|
- An empty condition (omitted `condition` field) evaluates to `true` — the rule
|
|
always fires when the trigger fires.
|
|
|
|
### Instructions
|
|
|
|
An Instruction defers the task-creation decision to an LLM. It specifies what
|
|
context to provide, how to frame the prompt, and what output schema to enforce.
|
|
|
|
#### Structure
|
|
|
|
```yaml
|
|
# in an instruction fenced block:
|
|
id: {slug}
|
|
condition: '{expression}' # optional pre-filter (Rule DSL); runs before LLM
|
|
trusted_fields: # REQUIRED — explicit allowlist of payload fields
|
|
- event.attributes.repo_slug # safe to interpolate into prompt
|
|
- event.attributes.domain
|
|
- event.attributes.tags
|
|
model: claude-sonnet-4-6
|
|
review_required: false # true | false — curator gate for output
|
|
prompt: |
|
|
{prompt template — only trusted_fields may be interpolated}
|
|
output_schema: {path to JSON schema file}
|
|
```
|
|
|
|
#### Trusted fields and prompt injection protection
|
|
|
|
The `trusted_fields` list is **required** and enforced at parse time. Any field
|
|
not listed is unavailable to the prompt template. The template engine raises
|
|
`UntrustedFieldError` if the prompt references a field not in `trusted_fields`.
|
|
|
|
The rationale: event payloads may contain free-text from untrusted sources —
|
|
commit messages, issue titles, CVE descriptions, repo descriptions. Interpolating
|
|
these directly into a prompt creates a prompt injection surface. Trusted fields
|
|
are those whose values are validated by the event type schema (typed attributes
|
|
like slugs, domain names, tag lists) and cannot carry arbitrary instruction text
|
|
by construction.
|
|
|
|
Fields of type `object` (freeform JSON) are **never eligible** for `trusted_fields`
|
|
even if listed — the evaluator rejects this at parse time.
|
|
|
|
#### Output schema enforcement
|
|
|
|
The LLM response is validated against `output_schema` using JSON Schema validation.
|
|
If validation fails, the instruction retries once with the schema error appended
|
|
to the prompt. If the second attempt also fails, the instruction records an
|
|
`instruction_output_error` audit event and emits no tasks. Tasks are **never
|
|
created from unvalidated output**.
|
|
|
|
Structured output mode (tool_use / JSON mode) is used where the model supports
|
|
it. The output schema must define `List[TaskSpec]` or a compatible envelope.
|
|
|
|
#### `review_required: true`
|
|
|
|
When set, the instruction's proposed task list is written to a **pending review
|
|
queue** in issue-core rather than directly created. A human or curator agent
|
|
reviews and approves/rejects before tasks are materialised. This is the default
|
|
for instructions that create high-impact tasks (cross-repo changes, security
|
|
responses, production operations).
|
|
|
|
#### Evaluation semantics
|
|
|
|
- Instructions are evaluated **after** all rules in the ActivityDefinition.
|
|
- The optional `condition` field on an instruction uses the same Rule DSL as
|
|
a first-pass filter — if the condition is false, the LLM is not called.
|
|
This avoids LLM cost for events that clearly do not need instruction judgement.
|
|
- Instructions are **not** first-match-only; all instructions whose conditions
|
|
pass fire. An ActivityDefinition may have zero instructions.
|
|
|
|
### Audit trail
|
|
|
|
Every task emission records:
|
|
|
|
| Field | Rule | Instruction |
|
|
|---|---|---|
|
|
| `source_type` | `"rule"` | `"instruction"` |
|
|
| `source_id` | rule `id` from definition | instruction `id` from definition |
|
|
| `source_version` | ActivityDefinition version | ActivityDefinition version |
|
|
| `triggering_event_id` | event UUID | event UUID |
|
|
| `condition_matched` | expression string | expression string (pre-filter) |
|
|
| `prompt_hash` | — | SHA-256 of rendered prompt |
|
|
| `model` | — | model ID used |
|
|
| `output_validated` | — | `true` / `false` |
|
|
| `review_required` | — | `true` / `false` |
|
|
|
|
The audit trail is written to the `task_spawn_log` table in activity-core's database
|
|
and referenced from the task record in issue-core.
|
|
|
|
### Testing strategy
|
|
|
|
**Rules**: every rule can and should be unit-tested with fixture event payloads.
|
|
A test helper `evaluate_rule(condition_str, event_fixture)` returns `bool` and
|
|
raises on syntax errors. Tests live alongside ActivityDefinition files:
|
|
`activity-definitions/{slug}.test.json` — a list of `{event, expected_rules_fired}`
|
|
fixtures.
|
|
|
|
**Instructions**: instructions cannot be deterministically unit-tested. Instead:
|
|
- Sample evaluations are collected: given a fixture event, record the LLM response.
|
|
- Samples are committed to `activity-definitions/{slug}.samples/` for human review.
|
|
- Output schema validation is unit-tested independently of the LLM call.
|
|
- Prompt injection resistance is tested by including injection strings in fixture
|
|
event payloads and asserting they do not appear in the rendered prompt.
|
|
|
|
### rules-core module boundary
|
|
|
|
The rule evaluator and instruction executor live in `src/activity_core/rules/`.
|
|
Within this module:
|
|
|
|
- **No imports from** `temporalio`, `sqlalchemy`, `fastapi`, or any activity-core
|
|
application code.
|
|
- Public surface: `evaluate_condition(expr: str, event: EventEnvelope, context: dict) -> bool`
|
|
and `execute_instruction(instr: InstructionDef, event: EventEnvelope, context: dict) -> List[TaskSpec]`.
|
|
- The module is independently importable and testable without starting the Temporal
|
|
worker or Postgres.
|
|
|
|
This boundary makes future extraction to `rules-core` a packaging exercise, not a refactor.
|
|
|
|
## Consequences
|
|
|
|
- The `ActivityDefinition` Pydantic model gains `rules: List[RuleDef]` and
|
|
`instructions: List[InstructionDef]` fields. The current implicit "always create
|
|
tasks" behaviour is replaced by explicit rule blocks.
|
|
- A new `RuleEvaluator` class (AST walker) is added to `src/activity_core/rules/`.
|
|
- A new `InstructionExecutor` class handles prompt rendering, LLM call, output
|
|
validation, and review queue routing.
|
|
- Integration tests for rule evaluation use fixture JSON; no running Temporal required.
|
|
- The `task_spawn_log` table is added to the Postgres schema (new Alembic migration).
|
|
- ActivityDefinition files that omit both `rules` and `instructions` are valid
|
|
(they fire with no output) — this supports future placeholder definitions.
|
|
|
|
## Alternatives Considered
|
|
|
|
**OPA / Rego for rule conditions**: powerful, well-established policy language,
|
|
supports complex logic. Rejected — Rego's learning curve is high for non-specialists;
|
|
agents rarely produce correct Rego without fine-tuning; it adds a runtime dependency.
|
|
The simple AST-walker DSL covers the realistic condition complexity for this org.
|
|
|
|
**Rules as Python lambdas**: maximum expressiveness. Rejected — arbitrary code
|
|
execution in a rule condition is a serious security surface, especially in an
|
|
org-wide event loop. Code deployment required for any rule change; agents cannot
|
|
write rules without code write access.
|
|
|
|
**LLM for all conditions (no Rule/Instruction split)**: simpler model, more
|
|
flexible. Rejected — non-deterministic for cases that are deterministic; expensive
|
|
for high-frequency events like cron ticks; impossible to unit-test; audit trail
|
|
for deterministic rules becomes murky.
|
|
|
|
**Instructions only, no Rules**: allows arbitrary LLM judgement for everything.
|
|
Rejected — LLM cost for every event, latency, and non-determinism are unacceptable
|
|
for high-frequency maintenance automations. Many cases (SBOM staleness check,
|
|
tag-based routing) are fully deterministic and should stay that way.
|
|
|
|
## Related
|
|
|
|
- ACT-ADR-001 — Event Bridge Architecture
|
|
- ACT-ADR-002 — Definition format (where rule/instruction blocks live)
|
|
- CUST-TFE-SCOPE-2026-000001 — task-flow-engine extraction (analogue pattern)
|
|
- `src/activity_core/rules/` — implementation home
|