T2 complete: OODA loop skeleton with LLM integration, bounded actions, and 32 offline unit tests. Deliverables: - runtime/agent.py — CLI entry point (--domain/--all/--dry-run/--llm) - runtime/context.py — Observe: fetch_state + build_context - runtime/actions.py — Act: parse_plan + execute (3 sanctioned writes) - runtime/README.md — usage guide and architecture overview - runtime/tests/ — 32 tests, fully offline - runtime/pyproject.toml — standalone package with llm-connect dep - canon/architecture/adr-002-custodian-agent-runtime-design.md Key design decisions (ADR-002): - Lives in runtime/ (not a new repo) — tight canon/state-hub coupling - ClaudeCodeAdapter by default (local-first, no API key) - Single-pass synchronous OODA for v0.1 simplicity - Exactly 3 sanctioned write ops: add_progress_event, update_task_status, flag_for_human - LLM returns JSON block in markdown for structured+auditable output Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
123 lines
4.7 KiB
Markdown
123 lines
4.7 KiB
Markdown
---
|
|
id: ADR-002
|
|
type: architecture-decision-record
|
|
title: "Custodian Agent Runtime — v0.1 Bootstrap Design"
|
|
status: accepted
|
|
decided_by: Bernd Worsch
|
|
date: "2026-03-12"
|
|
tags: ["architecture", "agent-runtime", "llm", "ooda", "bounded-agency"]
|
|
---
|
|
|
|
# ADR-002: Custodian Agent Runtime — v0.1 Bootstrap Design
|
|
|
|
## Status
|
|
|
|
Accepted.
|
|
|
|
## Context
|
|
|
|
CUST-WP-0001 requires a first working skeleton of the Custodian as an acting
|
|
agent: a loop that observes project state, reasons about it, and executes
|
|
bounded write operations — without human interaction for each step.
|
|
|
|
The dependencies (llm-connect, operational Railiance infra) are now resolved.
|
|
This ADR captures the five key architectural decisions for the v0.1 bootstrap.
|
|
|
|
## Decisions
|
|
|
|
### D1 — Location: `runtime/` inside the-custodian (not a new repo)
|
|
|
|
**Decision:** The runtime lives under `the-custodian/runtime/` as a standalone
|
|
Python package (`pyproject.toml`, own venv) rather than a new repository.
|
|
|
|
**Rationale:** The runtime is tightly coupled to canon (reads constitution,
|
|
memory) and the state-hub (its primary coordination layer). A separate repo
|
|
adds friction with no v0.1 benefit. The existing `runtime/` scaffold confirms
|
|
the original intent. Extraction to its own repo is deferred to when the
|
|
runtime has stable boundaries and multiple consumers.
|
|
|
|
### D2 — OODA loop: single-pass synchronous CLI
|
|
|
|
**Decision:** One `run()` call = one complete Observe → Orient → Decide → Act
|
|
cycle. Entry point is a CLI (`agent.py`) invoked manually or by a cron job.
|
|
|
|
```
|
|
Observe — HTTP GET to state-hub: state summary or domain summary
|
|
Orient — Load constitution + build structured LLM context prompt
|
|
Decide — Single LLM call (via llm-connect) returns a JSON action plan
|
|
Act — Execute only sanctioned write operations from the plan
|
|
```
|
|
|
|
**Rationale:** Async event loops and daemons add operational complexity that
|
|
v0.1 doesn't need. A single-pass CLI is testable, debuggable, and can be
|
|
scheduled externally. The transition to an event-driven loop is Phase 2.
|
|
|
|
### D3 — LLM backend: ClaudeCodeAdapter by default
|
|
|
|
**Decision:** The runtime uses `llm_connect.ClaudeCodeAdapter` as its default
|
|
LLM backend (shells out to `claude --print`). Provider is configurable via
|
|
`--llm` flag to support `gemini`, `openrouter`, or `openai`.
|
|
|
|
**Rationale:** `ClaudeCodeAdapter` requires no API key and honours the
|
|
Local-First value (V2). All current deployments have Claude Code available.
|
|
The llm-connect abstraction means switching providers is a one-line change.
|
|
|
|
### D4 — Action constraint: three sanctioned write operations only
|
|
|
|
**Decision:** The agent may execute exactly three state-hub write operations
|
|
without human approval:
|
|
1. `add_progress_event` — append an observation to the event log
|
|
2. `update_task_status` — mark a task done/in_progress (reversible)
|
|
3. `flag_for_human` — raise an intervention flag (escalation, not action)
|
|
|
|
All other operations (create workstream, record decision, resolve decision,
|
|
write to canon) require human approval before execution.
|
|
|
|
**Rationale:** Constitution §3/§4 require bounded agency. The three operations
|
|
are either append-only (progress events), reversible (task status), or
|
|
explicitly escalating (flag). They cannot produce irreversible harm.
|
|
|
|
### D5 — LLM response format: JSON block in markdown
|
|
|
|
**Decision:** The LLM is prompted to return a Markdown response with a
|
|
fenced ```json block containing the structured action plan:
|
|
|
|
```json
|
|
{
|
|
"observations": ["..."],
|
|
"progress_events": [
|
|
{"summary": "...", "workstream_id": "...", "event_type": "note"}
|
|
],
|
|
"tasks_to_update": [
|
|
{"task_id": "...", "status": "done"}
|
|
],
|
|
"tasks_to_flag": [
|
|
{"task_id": "...", "note": "..."}
|
|
]
|
|
}
|
|
```
|
|
|
|
The surrounding Markdown is preserved as a human-readable reasoning trace
|
|
and written to `memory/working/` as a session note.
|
|
|
|
**Rationale:** JSON blocks are robust to extraction (delimited), LLMs produce
|
|
them reliably with clear instructions, and the surrounding prose gives Bernd
|
|
an auditable reasoning trace without requiring a separate reasoning step.
|
|
|
|
## Consequences
|
|
|
|
- `runtime/` becomes a standalone Python package; `make agent-run DOMAIN=x`
|
|
invokes it.
|
|
- The runtime has no DB schema changes and no new API endpoints — it is a
|
|
pure client of the existing state-hub HTTP API.
|
|
- Autonomous actions are limited to append-only writes and escalations. Any
|
|
expansion of the action surface requires a new ADR and human approval.
|
|
- The v0.1 loop is single-user (Bernd). Multi-agent expansion is Phase 2+.
|
|
|
|
## Deferred
|
|
|
|
- Async event loop / daemon mode (Phase 2)
|
|
- RAG over canon (Phase 1 roadmap item)
|
|
- Tool adapters beyond state-hub HTTP (planned in `runtime/tool_adapters/`)
|
|
- Deployment on Railiance k3s as a scheduled CronJob
|