diff --git a/.claude/rules/agents.md b/.claude/rules/agents.md new file mode 100644 index 0000000..0e8a5d9 --- /dev/null +++ b/.claude/rules/agents.md @@ -0,0 +1,20 @@ +## Kaizen Agents + +Specialized agent personas available on demand via the state-hub MCP. + +**Discover:** `list_kaizen_agents()` — returns all agents with name, description, category +**Load:** `get_kaizen_agent("tdd-workflow")` — returns full instructions; read and follow them + +Common agents: + +| Agent | Category | When to use | +|-------|----------|-------------| +| `tdd-workflow` | testing | Step-by-step TDD8 workflow for any feature | +| `code-refactoring` | quality | Code quality analysis and safe refactoring | +| `test-maintenance` | testing | Diagnose and fix failing tests | +| `requirements-engineering` | process | Prevent interface/mock mismatches upfront | +| `keepaTodofile` | process | Maintain TODO.md during work | +| `project-management` | process | Track status, determine next steps | +| `datamodel-optimization` | quality | Optimize dataclasses and data structures | + +All 17 agents: call `list_kaizen_agents()` for the full list. diff --git a/.claude/rules/architecture.md b/.claude/rules/architecture.md new file mode 100644 index 0000000..7c2a645 --- /dev/null +++ b/.claude/rules/architecture.md @@ -0,0 +1,8 @@ +## Architecture + + + +## Quick Reference + +`~/state-hub/mcp_server/TOOLS.md` — MCP tool reference diff --git a/.claude/rules/credential-routing.md b/.claude/rules/credential-routing.md new file mode 100644 index 0000000..26a4beb --- /dev/null +++ b/.claude/rules/credential-routing.md @@ -0,0 +1,50 @@ +# Credential and access routing + +**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect** +for inference. Run this check **before** requesting secrets, API keys, SSH access, +login tokens, or database passwords — in any repo, not only `ops-warden`. + +ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every +other credential need belongs to another subsystem. **Do not** message +`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key. + +### Lookup (do this first) + +```bash +warden route find "" --json +warden route show --json +``` + +Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`). + +| Agent runtime | How to orient | +| --- | --- | +| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=sand-boxer` is for coordination, not secret vending | +| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership | +| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` | + +### Quick routing table + +| I need… | Owner | ops-warden executes? | +| --- | --- | --- | +| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes** — `warden sign` | +| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only | +| Login / OIDC / MFA | key-cape / Keycloak | No — route only | +| Authorization decision | flex-auth | No — route only | +| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` | +| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only | + +### Anti-patterns (do not do these) + +- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc. +- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist +- Pasting secrets into Git, State Hub, workplans, logs, or chat + +### Other capabilities (reuse-surface) + +Non-credential capabilities are usually discovered through **reuse-surface** federation +(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in +every repo's agent instructions because it is high-frequency, high-risk, and easy to +get wrong. + +**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml` \ No newline at end of file diff --git a/.claude/rules/first-session.md b/.claude/rules/first-session.md new file mode 100644 index 0000000..99b7955 --- /dev/null +++ b/.claude/rules/first-session.md @@ -0,0 +1,38 @@ +## First Session Protocol + +Triggered when `get_domain_summary("infotech")` shows **no workstreams**. +The project is registered but work has not yet been structured. + +**Step 1 — Read, don't write** +- `~/the-custodian/canon/projects/infotech/project_charter_v0.1.md` — purpose, scope +- `~/the-custodian/canon/projects/infotech/roadmap_v0.1.md` — planned phases +- Scan repo root: README, directory structure, existing code or docs + +**Step 2 — Survey in-progress work** +Look for TODOs, open branches, half-finished files. Note done vs. started but incomplete. + +**Step 3 — Propose workstreams to Bernd** +Propose 1–3 workstreams — each a coherent strand, weeks to months, anchored to a +roadmap phase. **Wait for approval before creating.** + +**Step 4 — Create workplan file first, then DB record (ADR-001)** +``` +workplans/SAND-WP-NNNN-.md ← write this first +``` +Then register in the hub: +``` +create_workstream(topic_id="cee7bedf-2b48-46ef-8601-006474f2ad7a", title="...", owner="...", description="...") +create_task(workstream_id="", title="...", priority="high|medium|low") +``` + +**Step 5 — Record the setup** +``` +add_progress_event( + summary="First session: structured infotech into N workstreams, M tasks", + event_type="milestone", + topic_id="cee7bedf-2b48-46ef-8601-006474f2ad7a", + detail={"workstreams": [...], "tasks_created": M} +) +``` + + diff --git a/.claude/rules/repo-boundary.md b/.claude/rules/repo-boundary.md new file mode 100644 index 0000000..f12186a --- /dev/null +++ b/.claude/rules/repo-boundary.md @@ -0,0 +1,8 @@ +## Repo boundary + +This repo owns **sand-boxer** only. It does not own: + + diff --git a/.claude/rules/repo-identity.md b/.claude/rules/repo-identity.md new file mode 100644 index 0000000..7ebb92b --- /dev/null +++ b/.claude/rules/repo-identity.md @@ -0,0 +1,5 @@ +**Purpose:** Sandboxing for agentic coding facility. + +**Domain:** infotech +**Repo slug:** sand-boxer +**Topic ID:** cee7bedf-2b48-46ef-8601-006474f2ad7a diff --git a/.claude/rules/session-protocol.md b/.claude/rules/session-protocol.md new file mode 100644 index 0000000..7ae373e --- /dev/null +++ b/.claude/rules/session-protocol.md @@ -0,0 +1,85 @@ +## Session Protocol + +Dev Hub (State Hub API): http://127.0.0.1:8000 +MCP server name in `~/.claude.json`: `dev-hub` + +**Step 1 — Orient** + +Read the offline-safe brief first — it works without a live hub connection: +```bash +cat .custodian-brief.md +``` +Then call the MCP tool for richer cross-domain context when MCP tools are exposed: +``` +get_domain_summary("infotech") +``` +If MCP tools are unavailable in the current agent session, use the REST API: +```bash +curl -s "http://127.0.0.1:8000/state/summary" | python3 -m json.tool +``` +If the hub is offline: `cd ~/state-hub && make api` + +**Step 2 — Check inbox** +With MCP tools: +``` +get_messages(to_agent="sand-boxer", unread_only=True) +``` +Mark read with `mark_message_read(message_id)`. Reply or act on coordination +requests before proceeding. + +Without MCP tools: +```bash +curl -s "http://127.0.0.1:8000/messages/?to_agent=sand-boxer&unread_only=true" \ + | python3 -m json.tool +curl -s -X PATCH "http://127.0.0.1:8000/messages//read" \ + -H "Content-Type: application/json" -d '{}' +``` + +**Step 3 — Scan workplans** +```bash +ls workplans/ +``` +For each file with `status: ready`, `active`, or `blocked`, note pending +`wait`/`todo`/`progress` tasks. + +**Step 4 — Present brief** + +1. **Active workstreams** for `infotech` — title, task counts, blocking decisions +2. **Pending tasks** from `workplans/` + any `[repo:sand-boxer]` hub tasks +3. **Goal guidance** — if `goal_guidance` in summary: + - `needs_workplan`: surface as top action — *"Repo goal '{title}' has no workplan yet"* + - `alignment_warnings`: flag if active work is not aligned with current goal +4. **Suggested next action** — highest-priority open item +5. **SBOM status** — flag if `last_sbom_at` is unset for this repo + +If no workstreams: follow First Session Protocol (`first-session.md`). + +**During work:** `record_decision()` · `add_progress_event()` · `resolve_decision()` + +> State Hub is a *read model*. Bootstrap tools (`create_workstream`, `create_task`) +> are First Session Protocol only. Work structure belongs in repo files (ADR-001). + +**Session close:** +With MCP tools: +``` +add_progress_event(summary="...", topic_id="cee7bedf-2b48-46ef-8601-006474f2ad7a", workstream_id="") +``` +Without MCP tools: +```bash +curl -s -X POST http://127.0.0.1:8000/progress/ \ + -H "Content-Type: application/json" \ + -d '{"topic_id":"cee7bedf-2b48-46ef-8601-006474f2ad7a","workstream_id":"","event_type":"note","summary":"what changed","author":"codex"}' +``` +If workplan files were modified, ensure the local copy is up to date first: +```bash +git -C pull --ff-only +cd ~/state-hub && make fix-consistency REPO=sand-boxer +``` +For repos where implementation runs on a remote machine (e.g. CoulombCore), +use the combined target which pulls before fixing: +```bash +cd ~/state-hub && make fix-consistency-remote REPO=sand-boxer +``` +**C-15** (DB task ahead of file) is normal in multi-machine workflows — writeback +will sync the file to match DB. **C-16** (repo behind remote) blocks all writes +until you pull — intentional to prevent clobbering remote progress. diff --git a/.claude/rules/stack-and-commands.md b/.claude/rules/stack-and-commands.md new file mode 100644 index 0000000..dc53ac6 --- /dev/null +++ b/.claude/rules/stack-and-commands.md @@ -0,0 +1,19 @@ +## Stack + + +- **Language:** +- **Key deps:** + +## Dev Commands + +```bash +# TODO: Fill in the standard commands for this repo + +# Install dependencies + +# Run tests + +# Lint / type check + +# Build / package (if applicable) +``` diff --git a/.claude/rules/workplan-convention.md b/.claude/rules/workplan-convention.md new file mode 100644 index 0000000..b765b69 --- /dev/null +++ b/.claude/rules/workplan-convention.md @@ -0,0 +1,40 @@ +## Workplan Convention (ADR-001) + +File location: `workplans/SAND-WP-NNNN-.md` +ID prefix: `SAND-WP-` + +Work items originate as files in this repo **before** being registered in the hub. + +Canonical workplan/workstream frontmatter statuses are: +`proposed`, `ready`, `active`, `blocked`, `backlog`, `finished`, `archived`. +Use `proposed` for a newly drafted plan, `ready` after review against current +repo state, and `finished` when implementation is complete. `stalled` and +`needs_review` are derived health labels, not stored statuses. + +Closed workplans may be moved to `workplans/archived/` with a completion-date +prefix: `YYMMDD-SAND-WP-NNNN-.md`. The frontmatter id remains +unchanged; the prefix is only for quick visual reference. + +Small opportunistic tasks discovered during another session use **Ad Hoc Tasks**: +`workplans/ADHOC-YYYY-MM-DD.md`, workstream slug `adhoc-YYYY-MM-DD`, and task ids +`ADHOC-YYYY-MM-DD-T01`, `T02`, etc. Use adhocs only for low-risk work completed +directly. Promote anything requiring analysis, design, approval, dependencies, or +multiple planned phases into a normal workplan. + +Ecosystem todos from other agents arrive as `[repo:sand-boxer]` hub tasks — +visible at session start. Pick one up by creating the workplan file, then registering +the workstream. + +Task blocks use this shape: + +```task +id: SAND-WP-NNNN-T01 +status: wait | todo | progress | done | cancel +priority: high | medium | low +state_hub_task_id: "" # written by fix-consistency — do not edit +``` + +Status progression is `todo` → `progress` → `done`; use `wait` for waiting or +blocked work and `cancel` for stopped work. + + diff --git a/.custodian-brief.md b/.custodian-brief.md new file mode 100644 index 0000000..29494b8 --- /dev/null +++ b/.custodian-brief.md @@ -0,0 +1,31 @@ + +# Custodian Brief - sand-boxer + +**Project:** sand-boxer +**Domain:** infotech +**State Hub:** http://127.0.0.1:8000 +**Topic ID:** `cee7bedf-2b48-46ef-8601-006474f2ad7a` + +## Open Workplans + +### Bootstrap State Hub integration + +Workplan file: `workplans/SAND-WP-0001-statehub-bootstrap.md` (status: active) + +Open tasks: +- T02 - Verify local developer workflow + +### Meta-framework foundation and first extension + +Workplan file: `workplans/SAND-WP-0002-meta-framework-foundation.md` (status: ready) + +Next: T01 - Design meta-framework contracts + +## Session Start + +1. Read `INTENT.md`, `SCOPE.md`, and `AGENTS.md`. +2. Check inbox: `GET /messages/?to_agent=sand-boxer&unread_only=true`. +3. Scan `workplans/`. +4. Update task statuses in workplan files as work progresses. + +Last generated: 2026-06-22 diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..50a091c --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,219 @@ +# sand-boxer — Agent Instructions + +## Repo Identity + +**Purpose:** Sandboxing for agentic coding facility. + +**Domain:** infotech +**Repo slug:** sand-boxer +**Topic ID:** `cee7bedf-2b48-46ef-8601-006474f2ad7a` +**Workplan prefix:** `SAND-WP-` + +--- + +## State Hub Integration + +The Custodian State Hub tracks work across all domains. Interact via HTTP REST — +there is no MCP server for Codex agents. + +| Context | URL | +|---------|-----| +| Local workstation | `http://127.0.0.1:8000` | +| Remote via tunnel | `http://127.0.0.1:18000` | + +### Orient at session start + +```bash +# Offline brief — works without hub connection +cat .custodian-brief.md + +# Active workstreams for this domain +curl -s "http://127.0.0.1:8000/workstreams/?topic_id=cee7bedf-2b48-46ef-8601-006474f2ad7a&status=active" \ + | python3 -m json.tool + +# Check inbox +curl -s "http://127.0.0.1:8000/messages/?to_agent=sand-boxer&unread_only=true" \ + | python3 -m json.tool +``` + +Mark a message read: +```bash +curl -s -X PATCH "http://127.0.0.1:8000/messages//read" \ + -H "Content-Type: application/json" -d '{}' +``` + +### Log progress (required at session close) + +```bash +curl -s -X POST http://127.0.0.1:8000/progress/ \ + -H "Content-Type: application/json" \ + -d '{ + "summary": "what was done", + "event_type": "note", + "author": "codex", + "workstream_id": "", + "task_id": "" + }' +``` + +Omit `workstream_id` / `task_id` when not applicable. + +### Update task status + +```bash +curl -s -X PATCH "http://127.0.0.1:8000/tasks/" \ + -H "Content-Type: application/json" \ + -d '{"status": "progress"}' +# values: wait | todo | progress | done | cancel +``` + +### Flag a task for human review + +```bash +curl -s -X PATCH "http://127.0.0.1:8000/tasks/" \ + -H "Content-Type: application/json" \ + -d '{"needs_human": true, "intervention_note": "reason"}' +``` + +--- + +## Session Protocol + +**Start:** +1. `cat .custodian-brief.md` — domain goal and open workstreams (offline-safe) +2. Check inbox: `GET /messages/?to_agent=sand-boxer&unread_only=true`; mark read +3. Scan workplans: `ls workplans/` — note `status: ready`, `active`, or `blocked` files and open tasks +4. Check human-needed tasks: `GET /tasks/?needs_human=true` + +**During work:** +- Update task statuses in workplan files as tasks progress +- Record significant decisions via `POST /decisions/` + +**Close:** +1. Update workplan file task statuses to reflect progress +2. Log: `POST /progress/` with a summary of what changed +3. Note for the custodian operator: after workplan file changes, run from + `~/state-hub`: + ```bash + make fix-consistency REPO=sand-boxer + ``` + This syncs task status from files into the hub DB. + +--- + +## Credential and access routing + +**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect** +for inference. Run this check **before** requesting secrets, API keys, SSH access, +login tokens, or database passwords — in any repo, not only `ops-warden`. + +ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every +other credential need belongs to another subsystem. **Do not** message +`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key. + +### Lookup (do this first) + +```bash +warden route find "" --json +warden route show --json +``` + +Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`). + +| Agent runtime | How to orient | +| --- | --- | +| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=sand-boxer` is for coordination, not secret vending | +| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership | +| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` | + +### Quick routing table + +| I need… | Owner | ops-warden executes? | +| --- | --- | --- | +| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes** — `warden sign` | +| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only | +| Login / OIDC / MFA | key-cape / Keycloak | No — route only | +| Authorization decision | flex-auth | No — route only | +| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` | +| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only | + +### Anti-patterns (do not do these) + +- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc. +- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist +- Pasting secrets into Git, State Hub, workplans, logs, or chat + +### Other capabilities (reuse-surface) + +Non-credential capabilities are usually discovered through **reuse-surface** federation +(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in +every repo's agent instructions because it is high-frequency, high-risk, and easy to +get wrong. + +**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml` + + + + +--- + +## Workplan Convention (ADR-001) + +Work items originate as files in this repo — not in the hub. The hub is a +read/cache/index layer that rebuilds from files. + +**File location:** `workplans/SAND-WP-NNNN-.md` + +**Archived location:** finished workplans may move to +`workplans/archived/YYMMDD-SAND-WP-NNNN-.md`. The `YYMMDD` prefix is +the completion/archive date; the frontmatter `id` does not change. + +**Ad Hoc Tasks:** small opportunistic fixes discovered during a session use +`workplans/ADHOC-YYYY-MM-DD.md` with task ids `ADHOC-YYYY-MM-DD-T01`, etc. Use +this only for low-risk work completed directly; create a normal workplan for +anything needing analysis, design, approval, dependencies, or multiple phases. + +**Frontmatter:** + +```yaml +--- +id: SAND-WP-NNNN +type: workplan +title: "..." +domain: infotech +repo: sand-boxer +status: proposed | ready | active | blocked | backlog | finished | archived +owner: codex +topic_slug: ... +created: "YYYY-MM-DD" +updated: "YYYY-MM-DD" +state_hub_workstream_id: "" # written by fix-consistency — do not edit +--- +``` + +Use `proposed` for a new draft, `ready` after review against current repo +state, and `finished` after implementation. `stalled` and `needs_review` are +derived health labels, not frontmatter statuses. + +**Task block format** (one per `##` section): + +``` +## Task Title + +` ` `task +id: SAND-WP-NNNN-T01 +status: wait | todo | progress | done | cancel +priority: high | medium | low +state_hub_task_id: "" # written by fix-consistency — do not edit +` ` ` + +Task description text. +``` + +Status progression: `todo` → `progress` → `done`; use `wait` for waiting/blocked work and `cancel` for stopped work. + +To create a new workplan: +1. Write the file following the format above +2. Notify the custodian operator to run `make fix-consistency REPO=sand-boxer` + (or send a message to the hub agent via `POST /messages/`) diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..618d210 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,12 @@ +# sand-boxer — Claude Code Instructions + +@SCOPE.md +@.claude/rules/repo-identity.md +@.claude/rules/session-protocol.md +@.claude/rules/first-session.md +@.claude/rules/workplan-convention.md +@.claude/rules/stack-and-commands.md +@.claude/rules/architecture.md +@.claude/rules/repo-boundary.md +@.claude/rules/credential-routing.md +@.claude/rules/agents.md diff --git a/INTENT.md b/INTENT.md index e25fa6e..9cd45f1 100644 --- a/INTENT.md +++ b/INTENT.md @@ -1,180 +1,338 @@ --- -domain: custodian +domain: infotech repo: sand-boxer -updated: "2026-06-21" +updated: "2026-06-22" --- # INTENT -> This file explains why sand-boxer exists, what problem it solves in the -> Custodian ecosystem, and where its authority begins and ends. +> sand-boxer is the Coulomb **meta-framework for establishing sandboxes** — a +> unified API and extension platform for provisioning every variation of isolated +> execution environment, from self-hosted compose stacks to metered SaaS +> runtimes. This file is the charter: why it exists, what it owns, and where +> sibling projects begin. + +Research backing this charter lives in `research/`. --- ## Why it exists Custodian automation is moving from **workstation-anchored** execution to -**Railiance01-scheduled** orchestration. That shift is right for reliability: -activity-core on Railiance01 can fire maintenance and coordination jobs on a -stable clock. It does not, by itself, give agents a safe place to **develop, -build, and test** without the laptop filesystem, sleep cycles, and single-user -blast radius. +**Railiance01-scheduled** orchestration. That shift improves reliability but does +not, by itself, answer the harder question: **where can agentic and deterministic +work run safely** without the laptop filesystem, sleep cycles, and single-user +blast radius? -sand-boxer exists to provide **isolated execution environments** — sandboxes — -where agentic and deterministic work can run on dedicated infrastructure while -remaining observable and governable from State Hub. +The industry has exploded with sandbox answers — E2B, Modal, Daytona, OpenShell, +OpenClaw-style Docker/SSH backends, hyperscaler interpreters — each with +different APIs, billing models, and isolation postures. Coulomb needs **one place +to establish sandboxes** regardless of backend, not a new integration per agent +harness, validator, or codegen pipeline. -The goal is progress without requiring the workstation as a runtime: repos are -checked out, tools run, tests execute, and artifacts return through controlled -channels. The laptop becomes optional for operations, not the hub of all -execution. +sand-boxer exists to be that place: **OpenRouter for sandboxes, not for models.** + +Consumers call one API. Extensions delegate to the sandbox system that fits — +self-hosted on sandboxer01, inherited compose-ssh from `the-custodian`, or a +metered cloud provider. An integrated **payments layer** handles SaaS consumption +when Coulomb uses external capacity. Over time, operational learning may justify +a Coulomb-native **best-of-brands runtime** — but that is a later phase built on +evidence, not day-one ambition. + +The workstation becomes optional for **runtime**. Railiance01 decides *when* +work runs (via activity-core). sand-boxer decides *where* isolated execution +happens. State Hub records *what* changed. --- ## The governing principle -sand-boxer is the **execution isolation and provisioning service** for agentic -development and related workloads. +sand-boxer is the **sandbox establishment service** — profiles, provisioning, +extension routing, placement, lifecycle, and metering. Nothing more. -It should answer: +It answers: -1. **Where can this work run safely?** Profile selection (compose stack, VM, - future cluster worker) and host placement. -2. **How is isolation enforced?** Networks, TTL, resource limits, teardown, and - cleanup guarantees. -3. **How does the sandbox phone home?** Reachability via ops-bridge tunnels and - SSH identity via ops-warden — without owning either. -4. **What happened?** Registration, health, and lifecycle events visible to - State Hub and reuse-surface consumers. +1. **Which sandbox recipe applies?** Profile selection and version resolution. +2. **Which backend fulfills it?** Extension routing (self-hosted vs SaaS). +3. **Where does it run?** Host placement and blast-radius policy. +4. **How is isolation enforced?** Network default-deny, TTL, resource limits, + teardown guarantees — as declared by profile + extension. +5. **How does it become reachable?** Consumer integration with ops-bridge and + ops-warden — without owning tunnels or certificates. +6. **What happened?** Lifecycle events, usage meters, State Hub registration. +7. **What did it cost?** Payments and credits for metered extensions. -It should not become the scheduler, the work-state database, the connectivity -authority, or production application hosting on Railiance01. +It must **not** become the agent harness, the e2e validator, the code generator, +the scheduler, the work-state database, the connectivity authority, or production +hosting on Railiance01. --- -## Strategic context +## The OpenRouter analogy -### Workstation automation is interim, not the target +| OpenRouter | sand-boxer | +|------------|------------| +| Unified LLM access API | Unified sandbox establishment API | +| Routes across model providers | Routes across sandbox extensions | +| Provider metadata (price, context) | Profile metadata (isolation, cost, latency) | +| API keys, credits, usage billing | Payments layer for SaaS sandbox consumption | +| BYOK supported | BYOK for extension provider keys | +| Does not train models | Does not replace extension runtimes (until phase 5) | -Local timers and laptop-resident scripts were useful for bootstrapping ADR-001 -consistency sync and similar jobs. They are not the long-term substrate. -Railiance01-based activity-core schedules are the primary direction; workstation -paths remain only where no sandbox or cluster alternative exists yet. +sand-boxer is **infrastructure routing**, not product UX. Harnesses, validators, +and inventors are customers. -### Railiance01 vs sandbox hosts +--- -| Layer | Role | -|-------|------| -| **Railiance01** | Production k3s, activity-core, Temporal, stable custodian schedules | -| **sandboxer01** (or equivalent) | Dedicated VM for dev/agent sandboxes — **isolated blast radius** | -| **CoulombCore** | Acceptable interim sandbox host during migration; not a substitute for deliberate isolation from production | -| **Workstation (WSL)** | Control plane anchor today; **not** the desired execution surface | +## Coulomb sibling boundaries -sand-boxer owns the **abstraction and lifecycle** of sandboxes. It does not own -Railiance01 cluster operations (see `railiance-cluster` / `railiance-apps`). +sand-boxer stays inside the **sandboxing boundary**. Three sibling Coulomb +projects own adjacent concerns. Integration is contractual — they **request** +sandboxes; sand-boxer **establishes** them. -### Lineage +### glas-harness — agent harness -This repository consolidates and generalizes patterns that today live split and -unregistered in `the-custodian`: +**Owns:** Gateway, tool orchestration, skills, memory, channels, subagent +delegation, session semantics, sandbox *consumption* from the agent's perspective. -- **E2E sandbox framework** (`e2e-framework/`) — SSH to remote host, isolated - directory, docker compose, teardown (`CUST-WP-0028`). -- **Build machines** (`infra/build-machines/`) — reproducible VM images, - reverse tunnels, State Hub capability registration (`CUST-WP-0032`). +**Does not own:** Sandbox runtimes, profile catalog authority, host placement, +extension adapters, isolation enforcement. -sand-boxer extracts a **reusable platform** from those precedents so -`the-custodian` can stay governance-focused with a small operational surface. +glas-harness configures *when* tools run in a sandbox (OpenClaw-style +`mode` / `scope` / `workspaceAccess`). sand-boxer provides the sandbox handle +and reachability descriptor. + +### wise-validator — e2e test and health + +**Owns:** Validation workflows, health check semantics, test orchestration, +pass/fail interpretation, structured result reporting to State Hub and CI. + +**Does not own:** Remote host provisioning, compose lifecycle, port isolation, +sandbox teardown. + +wise-validator replaces the validation half of `the-custodian/e2e-framework/`. +It requests `profile.compose-e2e` (or successors), runs tests inside the +established environment, and owns the `e2e.yml` contract. + +### snuggle-inventor — code generation + +**Owns:** Code generation, modernization pipelines, tech-spec and planning +artifacts, PR-oriented output, human-in-the-loop review gates. + +**Does not own:** Sandbox infrastructure, environment bootstrapping authority, +secret stores, runtime metering. + +snuggle-inventor may attach Blitzy-style **setup instructions** and secret +references as profile inputs. sand-boxer resolves secrets at the provision +boundary; generated code never transits sand-boxer APIs. + +### Boundary diagram + +``` + glas-harness wise-validator snuggle-inventor + (agent harness) (e2e + health) (code generation) + │ │ │ + └─────────────────────┼──────────────────────┘ + │ POST /v1/sandboxes + ▼ + sand-boxer + (establish sandboxes) + │ + ┌───────────────┼───────────────┐ + ▼ ▼ ▼ + ext.compose-ssh ext.modal ext.e2b … + (self-hosted) (SaaS+meter) (SaaS+meter) +``` + +### Existing Custodian repos (unchanged) + +| Concern | Owner | +|---------|--------| +| Workstream, task, progress state | `state-hub` | +| Cron and orchestration | `activity-core` | +| SSH reverse tunnels | `ops-bridge` | +| SSH certificate issuance | `ops-warden` | +| Canon and agent instruction canon | `the-custodian` | +| Capability federation hub | `reuse-surface` | +| Production on Railiance01 | `railiance-apps` / domain repos | +| ADR-001 reconciliation | `state-hub` | + +sand-boxer **consumes** ops-bridge and ops-warden; it does not subsume them. --- ## What it is -sand-boxer is the **sandbox provisioning and profile catalog** for Custodian. +sand-boxer is a **meta-framework** with four pillars: -It is intended to contain: +### 1. Unified establishment API -- **Sandbox profiles** — e.g. compose-based e2e stacks, VM images, future - container-on-worker patterns -- **Provision / wait / teardown** lifecycle — TTL, idempotent cleanup, port and - network conventions -- **Host placement policy** — which profiles run on sandboxer01, coulombcore - interim, or other registered hosts -- **CLI and/or API** for operators and agents to request isolated environments -- **State Hub registration contract** — extend the `build-agent` self-register - pattern to generic sandbox identities -- **Capability registry entries** in this repo's `registry/` for federation via - reuse-surface (e.g. `capability.execution.sandbox-provision`) -- Runbooks, templates (Packer, compose bundles), and tests for the above +One consistent surface for all sandbox variations: + +- Create, inspect, extend, snapshot, recreate, destroy +- Profile-driven inputs (repo ref, compose bundle, setup metadata, secret refs) +- Consumer attribution (`adm` / `agt` / `atm` + calling project id) +- Lifecycle states: `requested → provisioning → ready → active → expired → destroyed` + +Early versions may expose a subset; the API shape is designed for completeness. + +### 2. Profile catalog + +Named, versioned recipes — not one-off containers: + +- Extension binding (`ext.compose-ssh`, `ext.vm-packer`, `ext.e2b`, …) +- Isolation level, network policy, workspace mode (`mirror` | `remote-canonical`) +- Scope default (`agent` | `session` | `shared`) +- TTL, resource limits, placement preference +- Setup metadata (natural-language bootstrap instructions for extensions) +- Registered in `registry/` and federated via reuse-surface + +Profiles collect good ideas from OpenClaw (backend/scope/workspace), Hermes +(labeled reuse, resource limits), Blitzy (setup instructions, secret boundary), +and hosted platforms (checkpoint, persistence classes) into **one schema**. + +### 3. Extension platform + +Extensions **delegate** to sandbox systems and services: + +| Class | Examples | Billing | +|-------|----------|---------| +| **Self-hosted** | compose-ssh, vm-packer, Daytona OSS, OpenShell | Infra allocation | +| **SaaS consumption** | E2B, Modal, Daytona cloud, future providers | Payments layer | + +Each extension implements a provision / ready / teardown contract (optional +snapshot / cost estimate). Extensions ship as plugins; third-party and Coulomb- +native backends use the same interface. + +### 4. Payments and metering + +For metered SaaS extensions: + +- Org/workspace credits and usage accounting +- Pre-create cost estimates; post-destroy actuals +- BYOK for provider API keys where supported +- Export to domain billing systems — sand-boxer meters sandbox consumption, + not general payments + +Self-hosted extensions record **allocation** (host, duration), not external spend. --- ## What it is not -| Concern | Owner | -|---------|--------| -| Workstream, task, and progress state | `state-hub` | -| Cron and event-triggered orchestration | `activity-core` | -| SSH reverse tunnels and tunnel health | `ops-bridge` | -| SSH certificate issuance | `ops-warden` | -| Canon, charters, agent instruction canon | `the-custodian` | -| Capability index federation hub | `reuse-surface` | -| Production service deployment on Railiance01 | `railiance-apps` / domain repos | -| ADR-001 workplan ↔ DB reconciliation | `state-hub` (`consistency_check.py`) | +| Concern | Owner | sand-boxer role | +|---------|--------|-----------------| +| Agent gateway, tools, memory, channels | **glas-harness** | Customer API | +| E2e tests, health checks, validation | **wise-validator** | Customer API | +| Code generation, tech specs, AAP | **snuggle-inventor** | Customer API | +| When work runs | `activity-core` | None | +| What tasks exist | `state-hub` | Registers lifecycle only | +| Tunnels | `ops-bridge` | Consumer | +| Certs | `ops-warden` | Consumer | +| Intent-aware egress / prompt security | Research frontier | Document limits only | -sand-boxer may **consume** connectivity and certificates; it must not duplicate -or subsume those authorities. +sand-boxer provides **blast-radius isolation and governed reachability**. It does +not protect against a compromised agent abusing **allowed** egress paths (git, +npm, curl to allowlisted hosts). Security runbooks must state this explicitly. + +--- + +## Strategic context + +### Workstation automation is interim + +Local timers and laptop scripts bootstrapped ADR-001 sync. Railiance01 +activity-core schedules are the direction. Workstation paths remain only where no +sandbox alternative exists yet. + +### Host topology + +| Layer | Role | +|-------|------| +| **Railiance01** | Production k3s, activity-core, Temporal — **not** agent dev runtime | +| **sandboxer01** | Dedicated sandbox host — preferred blast-radius isolation | +| **CoulombCore** | Interim sandbox host during migration | +| **Workstation (WSL)** | Control-plane anchor today — **not** target execution surface | +| **SaaS extensions** | Burst / capability gap (GPU, desktop) via payments layer | + +### Lineage + +sand-boxer generalizes patterns split across `the-custodian`: + +| Legacy | sand-boxer | Sibling | +|--------|------------|---------| +| `e2e-framework/` provision/teardown | `ext.compose-ssh` | wise-validator owns test run | +| `e2e-framework/` health + test + report | — | wise-validator | +| `infra/build-machines/` | `ext.vm-packer` | — | +| Agent sandbox config (future) | API consumer | glas-harness | + +`the-custodian` stays governance-focused; sand-boxer becomes the execution +venue catalog. + +### Phase 5: Coulomb-native runtime (later) + +After operating extensions in production — observing latency, cost, failure +modes, isolation gaps — sand-boxer may ship an owned **best-of-brands** +sandboxing solution combining: + +- Persistent labeled workspaces (Hermes pattern) +- Default-deny policy layer (OpenShell lessons) +- Fast resume / checkpoint (industry baseline) +- Self-hosted economics (Daytona/OpenSandbox lessons) + +This is **not** v1 scope. Extensions and payments come first; native runtime +follows evidence. --- ## Intended users -- **Human operators (`adm`)** — provision sandboxes, manage profiles and hosts, - inspect lifecycle and cleanup -- **LLM agents (`agt`)** — request isolated environments for coding, testing, - and verification without laptop filesystem dependence -- **Deterministic automations (`atm`)** — activity-core instructions and CI - hooks that need a bounded execution venue +- **Human operators (`adm`)** — profiles, hosts, extensions, credits, lifecycle +- **LLM agents (`agt`)** — via glas-harness, snuggle-inventor, or direct API +- **Deterministic automations (`atm`)** — via wise-validator, activity-core, CI +- **Extension authors** — implement backend adapters against the extension contract +- **Platform integrators** — register capabilities, federate via reuse-surface --- ## Design principles -- **Blast radius isolation** — sandbox workloads must not jeopardize Railiance01 - production stability; prefer dedicated hosts (sandboxer01) for agentic dev -- **Profiles over one-offs** — every sandbox type is a named, versioned profile - with documented inputs, outputs, and teardown -- **Reachability, not ownership** — use ops-bridge for tunnels and ops-warden - for SSH identity; sand-boxer orchestrates, it does not issue certs or run - tunnel daemons -- **Observable lifecycle** — create, ready, active, expired, and destroyed states - are attributable and queryable -- **Disposable by default** — sandboxes are TTL-bound; persistence is explicit - and exceptional -- **Registry-first reuse** — register capabilities in this repo and federate - through reuse-surface before ad hoc duplication elsewhere +- **Meta-framework, not monolith** — one API; many extensions; optional native runtime later +- **Profiles over one-offs** — every sandbox type is named, versioned, registered +- **Prefer self-hosted** — SaaS via explicit routing policy, not silent default +- **Blast-radius isolation** — dedicated hosts; never jeopardize Railiance01 production +- **Reachability, not ownership** — ops-bridge + ops-warden as consumers +- **Secrets at the boundary** — resolve at provision; never in agent-visible workspace +- **Observable lifecycle** — every state transition attributable and queryable +- **Disposable by default** — TTL-bound; persistence and checkpoint are explicit +- **Honest security** — sandboxing limits blast radius; it is not intent enforcement +- **Registry-first reuse** — capabilities in `registry/` before ad hoc duplication +- **Payments transparency** — estimate before create; meter on destroy for SaaS --- ## Near-term outcomes -A first useful version of sand-boxer should: - -1. Define at least one **production-oriented profile** (e.g. compose sandbox on - sandboxer01 or coulombcore interim) with documented provision/teardown -2. Register **`capability.execution.sandbox-provision`** (or equivalent) in - `registry/` and pass reuse-surface validation -3. Integrate with **ops-bridge** reachability and **State Hub** registration -4. Provide a clear migration path for e2e-framework and build-machines callers -5. Enable activity-core and agents to request sandboxes without workstation repo - paths as a hard dependency +1. **Charter and research** — `INTENT.md`, `research/`, profile schema draft +2. **First self-hosted extension** — `ext.compose-ssh` from e2e-framework lineage +3. **Unified API v0** — create / get / destroy / recreate + State Hub registration +4. **First profile** — `profile.compose-e2e` for wise-validator migration +5. **Registry entry** — `capability.execution.sandbox-provision` via reuse-surface +6. **Extension SDK sketch** — contract for P1 backends (vm-packer, Daytona OSS) +7. **Sibling integration notes** — glas-harness, wise-validator, snuggle-inventor API expectations documented --- ## Maturity target -A mature sand-boxer should be the **standard execution venue** for agentic -development in Custodian: Railiance01 decides *when* work runs; sand-boxer -decides *where* isolated execution happens; State Hub records *what* changed. -The workstation is optional — used for human preference, not as a single point -of runtime failure. \ No newline at end of file +A mature sand-boxer is Coulomb's **default way to establish any sandbox**: + +- glas-harness requests agent dev sandboxes without choosing Docker vs Modal vs SSH +- wise-validator requests validation environments without owning provisioners +- snuggle-inventor requests build sandboxes with setup metadata and secret refs +- activity-core and CI request bounded venues with consistent lifecycle visibility +- Operators route spend across self-hosted and SaaS with one credits model +- A Coulomb-native runtime — if warranted — wins on ops data, not speculation + +The workstation is optional. The harness is not sand-boxer. The validator is not +sand-boxer. The inventor is not sand-boxer. **Establishing the box is.** \ No newline at end of file diff --git a/SCOPE.md b/SCOPE.md new file mode 100644 index 0000000..acea92c --- /dev/null +++ b/SCOPE.md @@ -0,0 +1,235 @@ +--- +domain: infotech +repo: sand-boxer +updated: "2026-06-22" +--- + +# SCOPE + +> This file helps you quickly understand what this repository is about, +> when it is relevant, and when it is not. +> It is intentionally lightweight and may be incomplete until implementation lands. + +--- + +## One-liner + +Sandbox provisioning and profile catalog for Custodian — isolated execution +environments where agents and automations can develop, build, and test without +depending on the workstation filesystem or blast radius. + +--- + +## Core Idea + +sand-boxer is the **execution isolation and provisioning service** for agentic +development and related workloads in the Custodian ecosystem. It answers where +work can run safely, how isolation is enforced, how sandboxes phone home, and +what happened during their lifecycle. + +A **sandbox profile** is a named, versioned recipe (compose stack, VM image, +future cluster worker) with documented inputs, outputs, host placement, TTL, +and teardown guarantees. Operators and agents request a profile; sand-boxer +provisions an isolated environment on a registered host, exposes reachability +through ops-bridge (without owning tunnels), registers lifecycle state with +State Hub, and tears down on expiry or explicit release. + +The repo consolidates patterns today split across `the-custodian`: +`e2e-framework/` (SSH + compose sandboxes for cross-repo e2e) and +`infra/build-machines/` (Packer VMs with build-agent self-registration). + +--- + +## In Scope + +- **Sandbox profile catalog** — versioned definitions for compose-based e2e + stacks, VM images, and future worker patterns; inputs, outputs, and teardown + contracts documented per profile +- **Provision / wait / teardown lifecycle** — TTL, idempotent cleanup, port and + network conventions, observable states (create → ready → active → expired → + destroyed) +- **Host placement policy** — which profiles run on sandboxer01, CoulombCore + interim, or other registered hosts; blast-radius isolation from Railiance01 + production +- **CLI and/or API** — request, inspect, and release sandboxes for operators + (`adm`), agents (`agt`), and automations (`atm`) +- **State Hub registration contract** — extend the `build-agent` self-register + pattern to generic sandbox identities and lifecycle events +- **Capability registry entries** in `registry/` for federation via + reuse-surface (e.g. `capability.execution.sandbox-provision`) +- **Runbooks, templates, and tests** — Packer/compose bundles, operator + runbooks, and automated tests for profile lifecycle +- **Migration path** — documented cutover from `the-custodian/e2e-framework` + and `infra/build-machines` callers to sand-boxer profiles +- **Agent and workplan metadata** — `INTENT.md`, `AGENTS.md`, `workplans/`, + and State Hub progress/decision logging per ADR-001 + +--- + +## Out of Scope + +| Concern | Owner | +|---------|--------| +| Workstream, task, and progress state | `state-hub` | +| Cron and event-triggered orchestration | `activity-core` | +| SSH reverse tunnels and tunnel health | `ops-bridge` | +| SSH certificate issuance | `ops-warden` | +| Canon, charters, agent instruction canon | `the-custodian` | +| Capability index federation hub | `reuse-surface` | +| Production service deployment on Railiance01 | `railiance-apps` / domain repos | +| Railiance01 cluster operations | `railiance-cluster` / `railiance-infra` | +| ADR-001 workplan ↔ DB reconciliation | `state-hub` (`consistency_check.py`) | + +sand-boxer may **consume** connectivity (ops-bridge) and certificates +(ops-warden); it must not duplicate or subsume those authorities. + +Additional boundaries: + +- **Scheduling** — activity-core decides *when* work runs; sand-boxer decides + *where* isolated execution happens +- **Workstation as runtime** — the laptop/WSL anchor is interim control plane, + not the target execution surface +- **Irreversible operational decisions** — host provisioning, production + cutovers, and CA policy changes require human approval + +--- + +## Relevant When + +- An agent or automation needs an isolated environment for coding, building, or + testing without laptop filesystem dependence +- Cross-repo e2e tests need a remote compose sandbox with guaranteed teardown +- A build or verification workload should run on dedicated hardware + (sandboxer01) rather than Railiance01 production or the workstation +- activity-core or CI needs a bounded execution venue with State Hub visibility +- Planning reuse of sandbox provisioning across repos (registry-first discovery) + +--- + +## Not Relevant When + +- All work runs locally with acceptable blast radius +- Only tunnel connectivity is needed (use `ops-bridge` directly) +- Only task/workstream state is needed (use `state-hub`) +- Only scheduling or rule evaluation is needed (use `activity-core`) +- Deploying or operating production services on Railiance01 + +--- + +## Current State + +- **Status:** bootstrap — repo registered with State Hub; charter written; + implementation not started +- **Implementation:** ~0% — no CLI, API, profiles, provisioner, or tests in tree +- **Docs:** `INTENT.md` (charter, 2026-06-21); `README.md` (one-liner); + `AGENTS.md` and `.custodian-brief.md` (State Hub integration, generated) +- **Registry:** scaffold present (`registry/indexes/capabilities.yaml` empty; + `registry/capabilities/` placeholder); domain in index still `helix_forge` + from scaffold — needs alignment to `infotech` +- **Workplans:** `SAND-WP-0001` (State Hub bootstrap) in `ready` +- **Lineage (external, not yet migrated):** `the-custodian/e2e-framework/` + (CUST-WP-0028, completed) and `infra/build-machines/` (CUST-WP-0032) + +--- + +## What Is Possible Now + +- Read the charter (`INTENT.md`) and integration instructions (`AGENTS.md`) +- Track bootstrap tasks via `workplans/SAND-WP-0001-statehub-bootstrap.md` +- Log progress and decisions to State Hub when the hub is reachable +- Use **interim** sandbox execution via `the-custodian` directly: + - `make e2e REPO=` (e2e-framework on railiance01/CoulombCore) + - `infra/build-machines/` Packer VMs with build-agent registration + +Nothing in **this repo** provisions or manages sandboxes yet. + +--- + +## What Is Not Possible Yet + +- Request a sandbox through sand-boxer CLI or API +- Select a named, versioned profile from this repo's catalog +- Register `capability.execution.sandbox-provision` (index entry absent) +- Automatic lifecycle registration of generic sandbox identities in State Hub +- Host placement on sandboxer01 via sand-boxer policy (host may not exist yet) +- activity-core or agents invoking sand-boxer without workstation repo paths +- Local install/test/lint/build commands documented for this repo (no package + layout yet) + +--- + +## How It Fits + +```mermaid +flowchart LR + AC[activity-core] -->|when| SB[sand-boxer] + AGT[agents / atm] -->|request sandbox| SB + SB -->|provision / teardown| HOST[sandboxer01 / interim host] + SB -->|lifecycle events| SH[state-hub] + SB -->|reachability| OB[ops-bridge] + SB -->|SSH identity| OW[ops-warden] + RS[reuse-surface] -->|federate| REG[registry/] + TC[the-custodian e2e + build-machines] -.->|migrate from| SB +``` + +- **Upstream dependencies:** ops-bridge (tunnels), ops-warden (certs, optional), + State Hub (registration API), registered sandbox hosts (SSH + Docker/Packer) +- **Downstream consumers:** LLM agents, activity-core instructions, CI hooks, + cross-repo e2e callers migrating off `the-custodian` +- **Often used with:** `activity-core` (orchestration), `state-hub` (visibility), + `reuse-surface` (capability discovery) + +--- + +## Terminology + +- **Profile** — named, versioned sandbox type with provision/teardown contract +- **Sandbox** — a running isolated environment instance of a profile +- **Host placement** — policy mapping profiles to sandboxer01, CoulombCore, etc. +- **TTL** — time-to-live; sandboxes are disposable by default +- **Phone home** — reachability and registration via ops-bridge + State Hub +- Actor types (consumers): `adm` (operator), `agt` (LLM agent), `atm` (automation) + +--- + +## Related / Overlapping + +- `the-custodian` — current home of e2e-framework and build-machines; governance + canon; sand-boxer extracts reusable execution platform from here +- `ops-bridge` — SSH reverse tunnels; sand-boxer orchestrates reachability, does + not run tunnel daemons +- `ops-warden` — SSH CA and certificate issuance +- `state-hub` — workstream/task state and sandbox lifecycle visibility +- `activity-core` — schedules work; may request sandboxes as execution venue +- `reuse-surface` — federates `registry/` capability entries +- `railiance-cluster` / `railiance-apps` — production layer; explicitly not + sandbox execution surface + +--- + +## Provided Capabilities + +*Planned — not yet registered in `registry/indexes/capabilities.yaml`.* + +```capability +type: execution +title: Sandbox provisioning +description: Isolated execution environments for agentic development, e2e testing, and bounded automations — profile-based provision, TTL teardown, and State Hub lifecycle registration. +keywords: [sandbox, isolation, provision, e2e, agentic, execution, profile] +``` + +Target registry id: `capability.execution.sandbox-provision` (or equivalent per +reuse-surface naming). + +--- + +## Getting Oriented + +- Start with: `INTENT.md` (meta-framework charter) +- Research: `research/` (landscape, reference systems, design synthesis) +- Agent instructions: `AGENTS.md` (State Hub session protocol) +- Offline brief: `.custodian-brief.md` +- Workplans: `workplans/` (bootstrap: `SAND-WP-0001`) +- Registry authoring: `registry/README.md` +- Lineage reference (external): `the-custodian/e2e-framework/RUNBOOK.md`, + `the-custodian/infra/build-machines/README.md` \ No newline at end of file diff --git a/research/01-agent-sandbox-landscape.md b/research/01-agent-sandbox-landscape.md new file mode 100644 index 0000000..f70bc00 --- /dev/null +++ b/research/01-agent-sandbox-landscape.md @@ -0,0 +1,153 @@ +# Agent sandbox landscape (2026) + +Survey of modern sandbox infrastructure for agentic coding — isolation +technologies, provider models, and industry convergence patterns relevant to +sand-boxer. + +## Market definition + +**AI agent sandboxes** are isolated execution environments for running +AI-generated or agent-requested code safely. They optimize for: + +- Fast create / resume / teardown +- Programmatic lifecycle APIs +- Isolation from host and peer workloads +- Developer- and agent-friendly SDKs + +This is distinct from general application hosting and from agent harnesses +(memory, channels, tool orchestration). + +## Provider landscape (summary) + +| Platform | Model | Creation | Persist / checkpoint | Isolation | Notes | +|----------|-------|----------|----------------------|-----------|-------| +| **E2B** | Managed SaaS | ~150ms | Pause/resume, snapshots | Firecracker | Scale leader; template + sandbox API | +| **Daytona** | Managed + OSS | ~90ms | Snapshots, fork | Docker/Kata | Open-source self-host path | +| **Modal** | Serverless SaaS | Sub-second | Memory snapshots, volumes | gVisor | Strong GPU; code-defined runtime | +| **Blaxel** | Managed | Sub-25ms resume | Hibernate | microVM | Zero idle compute billing | +| **Vercel Sandbox** | Managed | ms | Snapshots, persistent default | Firecracker | Vercel ecosystem | +| **Cloudflare Sandbox SDK** | Edge | seconds / ms (isolates) | DO state | Containers / V8 | Workers-native | +| **AWS AgentCore** | Managed sessions | — | Session ≤8h | microVM | Hyperscaler bundling | +| **Google Agent Sandbox** | Managed preview | Sub-second | TTL ≤14d | Hardened containers | Gemini Enterprise layer | +| **OpenSandbox** | Self-hosted OSS | Pool pre-warm | Pause/resume, PVC | gVisor/Kata/Firecracker | K8s-scale; CNCF Landscape | +| **OpenShell** | Policy runtime | — | Long-lived sandboxes | Landlock/seccomp/OPA | Governance layer, not hosted platform | +| **Northflank** | BYOC + managed | ~200ms | Persistent | microVM/gVisor | VPC deployment | +| **Runloop** | Managed | ~100ms exec | Snapshot, branch | Custom hypervisor | SWE-bench / eval focus | +| **Sprites** | Managed | 1–2s | ~300ms checkpoints | Firecracker | Persistent-first | +| **ComputeSDK** | Abstraction | Varies | Varies | Varies | Multi-provider router (9 backends) | + +Sources: [Ry Walker research (Jun 2026)](https://rywalker.com/research/ai-agent-sandboxes), +provider docs, Modal/E2B marketing materials. Treat vendor claims as directional. + +## Isolation technology spectrum + +| Technology | Used by | Security level | Performance | +|------------|---------|----------------|-------------| +| **Firecracker** | E2B, Sprites, Vercel | Hardware-level microVM | Fast | +| **gVisor / Kata** | Modal, Northflank, OpenSandbox | Kernel-level | Very fast | +| **Hardened Docker** | Daytona, AIO Sandbox | Container-level | Fastest setup | +| **Landlock / seccomp / OPA** | OpenShell | Kernel policy | Native speed | +| **V8 isolates** | Cloudflare Worker Loader | Process-level | Milliseconds | + +**Implication for sand-boxer:** profile metadata must declare `isolation_level` +so consumers can reason about blast radius. Extensions map profiles to concrete +runtimes; the meta-framework does not mandate one technology. + +## Convergence trends (2025 → 2026) + +### 1. Ephemeral vs persistent collapsed + +Early market split (E2B = ephemeral, Sprites = persistent) has merged. Most +platforms now offer: + +- Persistent workspace by default or as first-class option +- Checkpoint / snapshot / hibernate for fast resume +- TTL and explicit teardown still expected for cost and security + +**sand-boxer takeaway:** profiles should support `persistence: ephemeral | +persistent | checkpoint` as a first-class dimension, not a backend detail. + +### 2. Checkpointing is table stakes + +Sub-second to low-second restore times are becoming baseline for agent coding +(workspace state, installed deps, shell history — not always live PIDs). + +**sand-boxer takeaway:** lifecycle API needs `snapshot`, `restore`, `fork` +operations even if early extensions only implement `recreate`. + +### 3. Security stress-tests exposed limits + +Research on AWS AgentCore and OpenShell/NemoClaw showed that **allowed egress +paths** (git, npm, curl, node to allowlisted hosts) can be weaponized for +exfiltration when agents are prompt-injected or tricked into malicious +dependencies. Policy controls *destination*, not *intent*. + +**sand-boxer takeaway:** document honestly that sandboxing is blast-radius +control, not agent-behavior guarantee. Default-deny network; per-profile egress +allowlists; secrets injected at boundary, never in agent-visible workspace. + +### 4. Hyperscaler bundling pressures independents + +AWS, Google, Cloudflare, Vercel entered the category in one quarter. +Independents compete on multi-cloud neutrality, price, isolation depth, or +open-source self-host. + +**sand-boxer takeaway:** OpenRouter-style routing across self-hosted and SaaS +backends is a defensible Coulomb position — no single-vendor lock-in. + +### 5. Abstraction layers emerging + +ComputeSDK routes one TypeScript API across E2B, Modal, Daytona, Runloop, +Cloudflare, Vercel, etc. — "Terraform for running other people's code." + +**sand-boxer takeaway:** validate the meta-framework API against this pattern; +extensions are providers; sand-boxer core is router + policy + billing + registry. + +## Architecture patterns (industry) + +### Gateway / harness vs runtime (universal split) + +``` +[Agent gateway / harness] ──orchestrates──▶ [Sandbox runtime] + (host or control plane) (isolated) +``` + +OpenClaw and Hermes both keep the gateway on the host and run **tool execution** +in the sandbox. sand-boxer owns the runtime side only; **glas-harness** owns the +gateway/harness side (see `03-meta-framework-synthesis.md`). + +### Profile + backend + scope (OpenClaw / Hermes consensus) + +| Dimension | Examples | +|-----------|----------| +| **Backend** | docker, ssh, openshell, modal, daytona, compose-ssh | +| **Scope** | per-agent, per-session, shared | +| **Workspace** | isolated, ro-mount, rw-mount; mirror vs remote-canonical | +| **Network** | default deny; optional allowlist | +| **TTL** | mandatory; idle reaper optional | + +### Credential and reachability boundary + +Best practice: credentials brokered at sandbox edge (OpenShell proxy, Blitzy +secrets-never-to-AI, ops-warden certs). Agent process never holds production +tokens for unrelated systems. + +sand-boxer integrates **ops-bridge** (reachability) and **ops-warden** +(identity) as consumers — does not replace them. + +## What sand-boxer should adopt vs defer + +| Adopt now (meta-framework) | Defer (extension or phase 2) | +|----------------------------|------------------------------| +| Unified provision/teardown API | GPU profiles | +| Named versioned profiles | Browser sandbox profiles | +| Extension plugin interface | Intent-aware egress filtering | +| Self-hosted compose-ssh (e2e lineage) | Native Firecracker control plane | +| State Hub lifecycle registration | Multi-region routing | +| Default-deny network policy | Computer Use / desktop sandboxes | +| Payments routing for SaaS backends | Owned hyperscale sandbox fleet | + +## Related reading + +- [02-reference-frameworks.md](02-reference-frameworks.md) — OpenClaw, Hermes, Blitzy +- [03-meta-framework-synthesis.md](03-meta-framework-synthesis.md) — API and extensions \ No newline at end of file diff --git a/research/02-reference-frameworks.md b/research/02-reference-frameworks.md new file mode 100644 index 0000000..c07ca05 --- /dev/null +++ b/research/02-reference-frameworks.md @@ -0,0 +1,204 @@ +# Reference frameworks and platforms + +Deep dives on systems sand-boxer should learn from — especially OpenClaw, +Hermes Agent, Blitzy, and OpenShell — plus hosted platforms as extension +targets. + +--- + +## OpenClaw + +**What it is:** Personal AI assistant with optional tool sandboxing. +**Docs:** https://docs.openclaw.ai/gateway/sandboxing + +### Role in the stack + +OpenClaw is an **agent harness** (gateway, channels, skills, memory). Sandboxing +is optional configuration on tool execution — not the product core. This is the +same boundary sand-boxer draws vs **glas-harness**. + +### Sandbox architecture + +**What gets sandboxed:** `exec`, `read`, `write`, `edit`, `apply_patch`, +`process`, optional sandboxed browser. Gateway stays on host. + +**Backends:** + +| Backend | Where | Workspace model | +|---------|-------|-----------------| +| `docker` | Local container | Bind-mount or copy; default `network: "none"` | +| `ssh` | Remote SSH host | Remote-canonical: seed once, exec remotely | +| `openshell` | OpenShell-managed | `mirror` (local canonical) or `remote` (remote canonical) | + +**Scope:** `agent` (default) | `session` | `shared` — controls container count. + +**Mode:** `off` | `non-main` | `all` — when sandboxing applies. + +**Workspace access:** `none` | `ro` | `rw` — what tools can see. + +### Security patterns worth copying + +- Default Docker network **none** +- Bind-mount blocklist: `docker.sock`, `/etc`, `~/.ssh`, `~/.aws`, credential roots +- Symlink-aware path validation before bind approval +- `tools.elevated` as explicit sandbox bypass (audited escape hatch) +- Honest disclaimer: reduces blast radius, not perfect boundary + +### sand-boxer lessons + +1. **Backend / scope / workspaceAccess** vocabulary is proven — adopt in profile schema +2. **SSH remote-canonical** matches Custodian e2e-framework evolution path +3. **mirror vs remote** workspace modes belong in meta-framework API +4. OpenClaw integrates OpenShell as extension — validates extension-delegation model + +--- + +## Hermes Agent + +**What it is:** Agent harness from Nous Research with multi-backend terminal execution. +**Repo:** https://github.com/NousResearch/hermes-agent + +### Terminal backends (six) + +| Backend | Isolation | Persistence | +|---------|-----------|-------------| +| `local` | None | — | +| `docker` | Cap-drop ALL, pids-limit, tmpfs | Single long-lived labeled container | +| `ssh` | Network boundary | Persistent remote shell | +| `modal` | Cloud VM | Filesystem snapshots | +| `daytona` | Cloud container | Stop/resume | +| `singularity` | HPC namespaces | Writable overlay | + +### Docker backend highlights + +- **One container per task**, reused across sessions and Hermes process restarts +- Labels: `hermes-agent=1`, `hermes-task-id`, `hermes-profile` +- `docker_persist_across_processes: true` (default) — container survives process exit +- Resource limits: CPU, memory, disk, `lifetime_seconds` idle reaper +- `docker_forward_env` — secrets from host `.env`, not config YAML +- Parallel subagents **share** container unless per-task image override + +### sand-boxer lessons + +1. **Labeled reuse** beats cold provision per tool call for agent coding efficiency +2. Resource limits and idle reaper are profile-level concerns +3. Modal/Daytona as **extension backends** — Hermes consumes, does not own +4. Credential forwarding policy belongs in extension contract, not agent config + +--- + +## NVIDIA OpenShell + NemoClaw (Hermes deployment) + +**OpenShell:** Policy runtime for agent sandboxes — Landlock, seccomp, OPA egress. +**NemoClaw:** Reference stack deploying Hermes inside OpenShell. + +### Three-layer model (industry pattern) + +| Layer | Component | Responsibility | +|-------|-----------|----------------| +| Model | LLM provider | Reasoning | +| Harness | Hermes | Skills, memory, bridges, scheduling | +| Runtime | OpenShell | Filesystem/network policy, credential brokering | + +sand-boxer maps to **runtime** only. glas-harness maps to **harness**. + +### Policy model + +Declarative YAML: allowed hosts, ports, HTTP methods, **binary-scoped** rules +(e.g. only `curl` may reach `api.github.com`). Credentials injected at egress +proxy — agent never sees Slack/Outlook tokens. + +### Snapshot / restore + +NemoClaw ships `snapshot.sh` / `restore.sh` for agent state (skills, memories, +sessions) across redeploys. Credential filter excludes secrets from tarballs. + +### Security research (Lasso, Apr 2026) + +Demonstrated exfiltration via **policy-permitted** paths (git PR, npm postinstall +→ Discord). Policies enforced correctly; intent not evaluated. + +**sand-boxer lesson:** OpenShell-class extensions should be offered; security +runbooks must state limits of egress allowlisting. + +--- + +## Blitzy + +**What it is:** AI-native code generation platform — **not** a sandbox runtime. + +### "Blitzy Sandbox" GitHub org + +Public demo repos for Explore members. Not execution infrastructure. + +### Real isolation model: Environments + +https://docs.blitzy.com/administration/environments + +- Natural-language **setup instructions** (toolchain, build, run, test) +- **Variables** (plaintext) vs **Secrets** (encrypted, masked, **never sent to AI**) +- Multi-environment priority merge (base + project override) +- Validation in configured environment after code generation + +### sand-boxer lessons (environment metadata, not runtime) + +| Blitzy pattern | sand-boxer mapping | +|----------------|-------------------| +| Environment config | Profile `setup` metadata block | +| Secrets never to AI | `secret_refs` resolved at provision boundary | +| Setup instructions | Profile runbook for extension bootstrap | +| Human review gates | Out of scope — **snuggle-inventor** / PR workflow | + +Blitzy validates that **describing how to boot an environment** is as important +as **where it runs**. sand-boxer profiles carry both. + +--- + +## Hosted platforms as extension targets + +sand-boxer extensions may delegate to SaaS providers. Initial extension candidates: + +| Extension id | Provider | Self-host alt | Payments | +|--------------|----------|---------------|----------| +| `ext.e2b` | E2B | — | Per-second SaaS | +| `ext.modal` | Modal | — | Per-second + GPU | +| `ext.daytona` | Daytona cloud | `ext.daytona-self` (OSS) | SaaS or infra cost | +| `ext.openshell` | — | OpenShell local/k3s | Infra cost | +| `ext.compose-ssh` | — | sandboxer01 / CoulombCore | Infra cost | +| `ext.vm-packer` | — | build-machines lineage | Infra cost | + +ComputeSDK (https://github.com/computesdk/computesdk) is a useful reference for +normalizing provider differences behind one client API. + +--- + +## OpenRouter analogy + +| OpenRouter | sand-boxer | +|------------|------------| +| Unified LLM API | Unified sandbox API | +| Routes to OpenAI, Anthropic, … | Routes to E2B, Modal, self-hosted compose, … | +| API keys / credits / billing | Payments layer for SaaS consumption | +| Model metadata (context, price) | Profile metadata (isolation, cost, latency) | +| Fallback / routing policy | Host placement + extension fallback | + +sand-boxer does not run inference; it runs **isolation**. The routing and +payments patterns transfer directly. + +--- + +## Anti-patterns to avoid + +| Anti-pattern | Why | +|--------------|-----| +| Rebuild OpenClaw/Hermes gateway in sand-boxer | glas-harness scope | +| Embed e2e test orchestration in provisioner | wise-validator scope | +| Generate code inside sandbox API | snuggle-inventor scope | +| Own SSH tunnels or CA | ops-bridge / ops-warden scope | +| Claim sandbox = safe from prompt injection | Research disproves | + +## Related reading + +- [01-agent-sandbox-landscape.md](01-agent-sandbox-landscape.md) +- [03-meta-framework-synthesis.md](03-meta-framework-synthesis.md) +- `INTENT.md` — normative charter \ No newline at end of file diff --git a/research/03-meta-framework-synthesis.md b/research/03-meta-framework-synthesis.md new file mode 100644 index 0000000..eee6cea --- /dev/null +++ b/research/03-meta-framework-synthesis.md @@ -0,0 +1,294 @@ +# Meta-framework synthesis + +Design notes distilled from landscape research for sand-boxer's unified sandbox +API, extension model, payments layer, and Coulomb project boundaries. + +--- + +## Core thesis + +sand-boxer is a **meta-framework for establishing sandboxes** — like OpenRouter +is a meta-framework for accessing LLM models: + +- One **consistent API** for consumers (`adm`, `agt`, `atm`, domain services) +- Many **extensions** that delegate to self-hosted or SaaS sandbox systems +- **Integrated payments** when consuming metered external services +- **Registry-first** profiles and capabilities via reuse-surface +- **Later:** a Coulomb-native "best of brands" runtime built from operational + experience — not day one + +sand-boxer provisions **where and how code runs**. It does not provision **how +agents think**, **what tests mean**, or **what code gets written**. + +--- + +## Coulomb project boundaries + +These sibling projects are **planned Coulomb repos** with explicit authority +split. sand-boxer must not absorb their concerns. + +```mermaid +flowchart LR + subgraph establish [sand-boxer] + SB[Establish sandbox] + end + + subgraph harness [glas-harness] + GH[Agent harness: gateway tools memory channels] + end + + subgraph validate [wise-validator] + WV[E2E tests health checks validation orchestration] + end + + subgraph generate [snuggle-inventor] + SI[Code generation modernization] + end + + GH -->|request sandbox| SB + WV -->|request sandbox| SB + SI -->|request sandbox| SB + WV -.->|runs tests in| SB + GH -.->|executes tools in| SB + SI -.->|validates output in| SB +``` + +| Project | Owns | Does not own | +|---------|------|--------------| +| **sand-boxer** | Sandbox profiles, provision/teardown, extension routing, placement, lifecycle registration, payments for sandbox consumption | Agent memory, channels, tool policies, test definitions, code generation | +| **glas-harness** | Agent gateway, harness, skills, subagents, tool orchestration, channel bridges | Sandbox runtime, isolation enforcement, host placement | +| **wise-validator** | E2E test orchestration, health check semantics, validation workflows, result reporting | Sandbox provisioning, agent conversation state | +| **snuggle-inventor** | Code generation, tech specs, AAP-style planning, PR-oriented output | Sandbox infrastructure, test harness canon | + +### Integration contracts (intended) + +**glas-harness → sand-boxer** + +``` +POST /v1/sandboxes + profile: "profile.agent-dev" + scope: session | agent | shared + workspace: { mode: mirror | remote, access: none | ro | rw } + consumer: { actor: agt, harness: glas-harness, session_id } +``` + +Harness receives: `sandbox_id`, reachability descriptor (SSH endpoint, tunnel ref), +lifecycle webhook or poll URL. Harness executes tools **inside** sandbox via +agreed exec channel — sand-boxer does not parse tool calls. + +**wise-validator → sand-boxer** + +``` +POST /v1/sandboxes + profile: "profile.compose-e2e" + inputs: { repo_ref, compose_bundle_ref } + ttl: 2h + consumer: { actor: atm, harness: wise-validator, run_id } +``` + +wise-validator owns `e2e.yml` semantics, health check definitions, test commands, +and pass/fail interpretation. sand-boxer delivers an environment; wise-validator +runs the validation story **on top**. + +**snuggle-inventor → sand-boxer** + +``` +POST /v1/sandboxes + profile: "profile.build" + setup_metadata: { instructions_ref, secret_refs } + consumer: { actor: agt, harness: snuggle-inventor, job_id } +``` + +snuggle-inventor may attach Blitzy-style setup instructions as profile inputs. +sand-boxer resolves secrets at boundary; generated code never flows through +sand-boxer APIs. + +### Migration from the-custodian + +| Legacy | New owner | +|--------|-----------| +| `e2e-framework/` provision/teardown | sand-boxer `ext.compose-ssh` | +| `e2e-framework/` test run + report | wise-validator (calls sand-boxer) | +| Agent tool sandbox config | glas-harness (calls sand-boxer) | +| `infra/build-machines/` | sand-boxer `ext.vm-packer` | + +--- + +## Meta-framework API (conceptual) + +### Resources + +| Resource | Description | +|----------|-------------| +| `Profile` | Named, versioned sandbox recipe (image, isolation, network, TTL, extension) | +| `Extension` | Backend adapter (self-hosted or SaaS) | +| `Host` | Registered placement target for self-hosted extensions | +| `Sandbox` | Running instance of a profile | +| `Snapshot` | Point-in-time workspace checkpoint (optional) | +| `Route` | Extension selection policy (cost, latency, capability) | +| `Meter` | Usage record for payments layer | + +### Sandbox lifecycle states + +``` +requested → provisioning → ready → active → { expired | failed } → destroying → destroyed +``` + +All transitions emit State Hub events. `ready` means reachability probe succeeded. + +### Core operations + +| Operation | Description | +|-----------|-------------| +| `create` | Provision from profile + inputs | +| `get` / `list` | Inspect status | +| `exec` | Run command in sandbox (optional — may be harness-owned) | +| `extend_ttl` | Explicit persistence extension | +| `snapshot` / `restore` | Checkpoint workspace | +| `recreate` | Destroy and reprovision from seed | +| `destroy` | Idempotent teardown | + +Early versions may expose only `create`, `get`, `destroy`, `recreate`; harnesses +can own `exec` via SSH/tunnel without sand-boxer proxying every command. + +### Profile schema (minimum) + +```yaml +id: profile.compose-e2e +version: "1.0.0" +extension: ext.compose-ssh +isolation: + level: container # container | microvm | policy +network: + default: deny + egress: [] # extension interprets +workspace: + mode: remote-canonical # mirror | remote-canonical + access: rw +scope_default: session +ttl: + default: 4h + max: 24h + idle_reap: null +resources: + cpu: null + memory_mb: null +setup: + instructions: "" # Blitzy-style natural language for extension bootstrap + secret_refs: [] # resolved at provision; never in agent context +placement: + prefer: [sandboxer01] + fallback: [coulombcore] +reachability: + tunnel: ops-bridge + identity: ops-warden +metadata: + cost_class: self-hosted # self-hosted | saas-metered + latency_class: standard +``` + +### Extension interface (contract) + +Each extension implements: + +```text +provision(profile, inputs, placement) → sandbox_handle +wait_ready(sandbox_handle) → reachability +teardown(sandbox_handle) → cleanup_report +snapshot?(sandbox_handle) → snapshot_id +restore?(snapshot_id) → sandbox_handle +estimate_cost?(profile, duration) → meter_quote +``` + +Extensions register in `registry/` with capability vectors (isolation level, +regions, GPU, persistence, pricing model). + +**Bundled extensions (roadmap):** + +| Priority | Extension | Type | +|----------|-----------|------| +| P0 | `ext.compose-ssh` | Self-hosted (e2e-framework lineage) | +| P1 | `ext.vm-packer` | Self-hosted (build-machines lineage) | +| P2 | `ext.daytona-self` | Self-hosted OSS | +| P3 | `ext.e2b`, `ext.modal`, `ext.daytona` | SaaS + payments | +| P4 | `ext.openshell` | Policy runtime wrapper | + +--- + +## Payments layer + +For SaaS extensions, sand-boxer provides an **integrated payments and metering +layer** analogous to OpenRouter credits: + +| Concern | sand-boxer approach | +|---------|---------------------| +| Account credits | Org/workspace balance for sandbox consumption | +| Metering | Per-second, per-creation, GPU surcharge — per extension quote | +| Provider keys | BYOK optional; platform keys for convenience | +| Cost visibility | `estimate_cost` before create; actuals on destroy | +| Billing events | Export to fin-hub / external billing (consumer, not owner) | + +Self-hosted extensions bill **infra cost only** (host allocation) — no SaaS meter. + +Payments is a **facility inside sand-boxer**, not a general payment processor. +Domain billing authority remains elsewhere. + +--- + +## Routing policy (OpenRouter-style) + +When multiple extensions satisfy a profile capability: + +```yaml +route: + strategy: prefer-self-hosted | lowest-cost | lowest-latency | explicit + fallback: [ext.compose-ssh, ext.daytona] + constraints: + max_cost_per_hour: null + require_isolation: microvm + region: eu +``` + +Default Coulomb posture: **prefer-self-hosted** on sandboxer01; SaaS for burst +or capability gaps (GPU, desktop) once extensions exist. + +--- + +## Security posture (documented limits) + +sand-boxer commits to: + +1. Default-deny network unless profile explicitly allows egress +2. Secrets resolved at provision boundary via ops-warden / secret refs +3. Blast-radius isolation on dedicated hosts away from Railiance01 production +4. Observable lifecycle and attributable actors (`adm` / `agt` / `atm`) +5. Honest documentation: **allowed tool paths can be abused by compromised agents** + +sand-boxer does **not** commit to intent-aware egress filtering in v1. + +--- + +## Phased maturity + +| Phase | Deliverable | +|-------|-------------| +| **0** | Charter, research, profile schema, `ext.compose-ssh` design | +| **1** | Unified API + self-hosted compose-ssh + State Hub registration | +| **2** | Extension SDK + vm-packer + registry entries + routing | +| **3** | SaaS extensions + payments layer | +| **4** | Snapshot/restore + checkpoint profiles | +| **5** | Coulomb-native runtime ("best of brands") informed by extension ops data | + +Phase 5 is explicitly **later** — learn from routing, billing, failure modes, and +latency before building owned microVM/control-plane. + +--- + +## Open questions (for workplans) + +1. Does `exec` live in sand-boxer API or only in glas-harness via SSH? +2. Payments: integrate with existing fin-hub or standalone credits first? +3. Profile authorship: repo-local YAML vs hub-managed catalog? +4. wise-validator: fork e2e-framework reporter or new contract from day one? + +These belong in SAND-WP-0002+ design workplans, not INTENT.md. \ No newline at end of file diff --git a/research/README.md b/research/README.md new file mode 100644 index 0000000..f18a13f --- /dev/null +++ b/research/README.md @@ -0,0 +1,22 @@ +# sand-boxer research + +Research informing the sand-boxer meta-framework charter and implementation +roadmap. These documents are **inputs to design**, not normative specs — see +`INTENT.md` for authority and boundaries. + +## Index + +| Document | Contents | +|----------|----------| +| [01-agent-sandbox-landscape.md](01-agent-sandbox-landscape.md) | Market survey: isolation technologies, providers, convergence trends | +| [02-reference-frameworks.md](02-reference-frameworks.md) | Deep dives: OpenClaw, Hermes, Blitzy, OpenShell, hosted platforms | +| [03-meta-framework-synthesis.md](03-meta-framework-synthesis.md) | Design synthesis: API shape, extensions, payments, Coulomb boundaries | + +## How to use + +1. Read `INTENT.md` for the governing charter. +2. Use `03-meta-framework-synthesis.md` when designing profiles, extensions, or + the unified API. +3. Use `01` and `02` when evaluating a backend extension or security posture. + +Last updated: 2026-06-22 \ No newline at end of file diff --git a/workplans/SAND-WP-0001-statehub-bootstrap.md b/workplans/SAND-WP-0001-statehub-bootstrap.md new file mode 100644 index 0000000..30dd5ec --- /dev/null +++ b/workplans/SAND-WP-0001-statehub-bootstrap.md @@ -0,0 +1,56 @@ +--- +id: SAND-WP-0001 +type: workplan +title: "Bootstrap State Hub integration" +domain: infotech +repo: sand-boxer +status: ready +owner: codex +topic_slug: custodian +created: "2026-06-22" +updated: "2026-06-22" +status: active +--- + +# Bootstrap State Hub integration + +Sandboxing for agentic coding facility. + +## Review Generated Integration Files + +```task +id: SAND-WP-0001-T01 +status: done +priority: high +``` + +Review `INTENT.md`, `SCOPE.md`, `AGENTS.md`, and `.custodian-brief.md`. +Replace generated placeholders with repo-specific facts where needed. + +## Verify Local Developer Workflow + +```task +id: SAND-WP-0001-T02 +status: todo +priority: high +``` + +Identify the repo's install, test, lint, build, and run commands. Add or refine +those commands in the agent instructions so future coding sessions can verify +changes confidently. + +## Seed First Real Workplan + +```task +id: SAND-WP-0001-T03 +status: done +priority: medium +``` + +Create the first implementation workplan for the repository's most important +next change. Created `workplans/SAND-WP-0002-meta-framework-foundation.md`. +After workplan file updates, run from `~/state-hub`: + +```bash +make fix-consistency REPO=sand-boxer +``` diff --git a/workplans/SAND-WP-0002-meta-framework-foundation.md b/workplans/SAND-WP-0002-meta-framework-foundation.md new file mode 100644 index 0000000..fd69042 --- /dev/null +++ b/workplans/SAND-WP-0002-meta-framework-foundation.md @@ -0,0 +1,246 @@ +--- +id: SAND-WP-0002 +type: workplan +title: "Meta-framework foundation and first extension" +domain: infotech +repo: sand-boxer +status: ready +owner: codex +topic_slug: custodian +created: "2026-06-22" +updated: "2026-06-22" +--- + +# Meta-framework foundation and first extension + +Establish sand-boxer as a meta-framework: unified API contract, profile catalog, +extension platform, and the first self-hosted backend (`ext.compose-ssh`) migrated +from `the-custodian/e2e-framework/`. + +**Charter:** `INTENT.md` +**Research:** `research/03-meta-framework-synthesis.md` +**Predecessor:** SAND-WP-0001 (bootstrap; T02 dev workflow should complete in +parallel or before T03 here) + +## Design meta-framework contracts + +```task +id: SAND-WP-0002-T01 +status: todo +priority: high +``` + +Author `docs/meta-framework.md` (or `specs/meta-framework.md`) defining: + +- Resource model: Profile, Extension, Host, Sandbox, Snapshot, Route, Meter +- Lifecycle states and State Hub event mapping +- Core API operations: `create`, `get`, `list`, `extend_ttl`, `recreate`, + `destroy` (snapshot/restore deferred to SAND-WP-0003) +- Consumer attribution schema (`adm` / `agt` / `atm`, calling project id) +- Extension interface: `provision`, `wait_ready`, `teardown`, optional + `estimate_cost` +- Routing policy vocabulary (`prefer-self-hosted`, `lowest-cost`, `explicit`) +- Security limits statement (blast-radius vs intent — per research) + +Derive from `research/03-meta-framework-synthesis.md`; do not duplicate harness, +validator, or codegen concerns. + +## Define profile and extension schemas + +```task +id: SAND-WP-0002-T02 +status: todo +priority: high +``` + +Add machine-readable schemas (JSON Schema or Python pydantic models) for: + +- `Profile` — extension binding, isolation, network, workspace mode, scope, + TTL, resources, setup metadata, placement, reachability, cost class +- `Extension` — id, capabilities, isolation levels, pricing model, regions +- `SandboxCreateRequest` / `SandboxStatus` response shapes + +Ship `profiles/profile.compose-e2e.yaml` as the reference profile (successor to +`e2e/e2e.yml` inputs; validation semantics stay with wise-validator). + +Register extension stub `extensions/ext.compose-ssh.yaml` with capability +metadata. + +## Scaffold package and developer workflow + +```task +id: SAND-WP-0002-T03 +status: todo +priority: high +``` + +Create Python package layout (aligned with e2e-framework lineage): + +``` +src/sandboxer/ # or sandboxer/ at repo root — pick one, document in AGENTS.md + api/ + profiles/ + extensions/ + lifecycle/ +tests/ +pyproject.toml +``` + +Document in `AGENTS.md`: install (`uv sync` or equivalent), test, lint, format, +and CLI entry point. Satisfies SAND-WP-0001-T02 if not already done. + +## Implement extension registry and loader + +```task +id: SAND-WP-0002-T04 +status: todo +priority: high +``` + +Implement extension discovery and registration: + +- Load extension definitions from `extensions/` +- Plugin entry-point or explicit registry for `ext.compose-ssh` +- Validate extension declares required capability fields before registration +- Unit tests for load failures and duplicate ids + +No SaaS extensions in this workplan — self-hosted only. + +## Implement ext.compose-ssh (e2e-framework lineage) + +```task +id: SAND-WP-0002-T05 +status: todo +priority: high +``` + +Extract and adapt provision/teardown from `the-custodian/e2e-framework/`: + +- SSH to configured host; isolated directory per sandbox id +- Unique compose project name; `compose up` / `compose down` (idempotent) +- Default-deny network posture per profile (document host-side requirements) +- Host placement: read `placement.prefer` / `fallback` from profile +- **Do not** port test execution, health polling, or State Hub result reporting + — those are wise-validator responsibilities + +Provide a compatibility note in extension README for interim `make e2e` callers. + +## Implement API v0 and CLI + +```task +id: SAND-WP-0002-T06 +status: todo +priority: high +``` + +Ship minimal establishment surface: + +**CLI** (primary for v0): + +```bash +sandbox create --profile profile.compose-e2e --input repo=/path/to/repo +sandbox get +sandbox list +sandbox recreate +sandbox destroy +``` + +**HTTP** (optional in v0; stub acceptable if CLI calls core library directly): + +- `POST /v1/sandboxes`, `GET /v1/sandboxes/{id}`, `DELETE /v1/sandboxes/{id}` + +Core library must be harness-agnostic — glas-harness, wise-validator, and +snuggle-inventor call the same functions. + +## State Hub lifecycle registration + +```task +id: SAND-WP-0002-T07 +status: todo +priority: medium +``` + +On sandbox state transitions, emit State Hub progress events (or dedicated +registration endpoint when available): + +- `requested`, `provisioning`, `ready`, `active`, `destroying`, `destroyed` +- Include: `sandbox_id`, `profile_id`, `extension_id`, `host`, `consumer`, + `actor_type`, timestamps + +Extend the `build-agent` self-register pattern sketch for generic sandbox +identities. Document contract in meta-framework spec. + +## Document sibling integration contracts + +```task +id: SAND-WP-0002-T08 +status: todo +priority: medium +``` + +Add `docs/integrations/` with one page per planned sibling: + +| Doc | Contents | +|-----|----------| +| `glas-harness.md` | Sandbox handle + reachability; harness owns exec | +| `wise-validator.md` | `profile.compose-e2e`; validator owns e2e.yml + health + tests | +| `snuggle-inventor.md` | Setup metadata + secret_refs; no codegen in sand-boxer | + +Each doc: example request, response fields, ownership table, out-of-scope list. +Cross-link from `INTENT.md` Coulomb boundaries section. + +## Register capability and fix registry scaffold + +```task +id: SAND-WP-0002-T09 +status: todo +priority: medium +``` + +- Author `registry/capabilities/execution.sandbox-provision.md` +- Add row to `registry/indexes/capabilities.yaml` (`domain: infotech`) +- Run `reuse-surface validate` when CLI available +- Notify operator: `make fix-consistency REPO=sand-boxer` from `~/state-hub` + +## Verification and migration smoke test + +```task +id: SAND-WP-0002-T10 +status: todo +priority: medium +``` + +End-to-end proof on CoulombCore or sandboxer01 (when reachable): + +1. `sandbox create` with `profile.compose-e2e` for a repo with `e2e/` layout +2. Confirm `ready` state and reachability descriptor +3. Manual or scripted compose health check (not wise-validator — just proves + environment exists) +4. `sandbox destroy` — confirm idempotent cleanup (no leftover compose projects + or `/tmp` dirs) +5. Document runbook in `docs/runbooks/profile-compose-e2e.md` + +Record gaps for wise-validator migration (SAND-WP-0003) and `the-custodian` +shim (SAND-WP-0004). + +--- + +## Out of scope (follow-on workplans) + +| Item | Target workplan | +|------|-----------------| +| wise-validator extraction + e2e test orchestration | SAND-WP-0003 | +| `the-custodian` Makefile shim + deprecation timeline | SAND-WP-0004 | +| `ext.vm-packer` (build-machines) | SAND-WP-0005 | +| SaaS extensions + payments layer | SAND-WP-0006 | +| Snapshot / restore / checkpoint profiles | SAND-WP-0007 | +| Coulomb-native runtime (phase 5) | Backlog | + +## Completion criteria + +- Meta-framework spec and schemas merged +- `ext.compose-ssh` provisions and tears down a compose sandbox via CLI +- State Hub receives lifecycle events for at least one full create→destroy cycle +- Sibling integration docs published +- `capability.execution.sandbox-provision` registered and validated +- All tasks `done`; workplan `status: finished`; operator runs fix-consistency \ No newline at end of file