docs: charter meta-framework vision, research, and SAND-WP-0002

Rewrite INTENT.md as the sand-boxer meta-framework charter (OpenRouter-style
sandbox API, extensions, payments, Coulomb sibling boundaries). Add research
under research/, update SCOPE.md, bootstrap workplans SAND-WP-0001/0002, and
State Hub integration files from the bootstrap pass.
This commit is contained in:
2026-06-22 21:32:32 +02:00
parent e248f669a3
commit f33cff5363
20 changed files with 2016 additions and 113 deletions

20
.claude/rules/agents.md Normal file
View File

@@ -0,0 +1,20 @@
## Kaizen Agents
Specialized agent personas available on demand via the state-hub MCP.
**Discover:** `list_kaizen_agents()` — returns all agents with name, description, category
**Load:** `get_kaizen_agent("tdd-workflow")` — returns full instructions; read and follow them
Common agents:
| Agent | Category | When to use |
|-------|----------|-------------|
| `tdd-workflow` | testing | Step-by-step TDD8 workflow for any feature |
| `code-refactoring` | quality | Code quality analysis and safe refactoring |
| `test-maintenance` | testing | Diagnose and fix failing tests |
| `requirements-engineering` | process | Prevent interface/mock mismatches upfront |
| `keepaTodofile` | process | Maintain TODO.md during work |
| `project-management` | process | Track status, determine next steps |
| `datamodel-optimization` | quality | Optimize dataclasses and data structures |
All 17 agents: call `list_kaizen_agents()` for the full list.

View File

@@ -0,0 +1,8 @@
## Architecture
<!-- TODO: Describe the key design decisions and component structure.
Key modules, data flows, external integrations, state machines, etc. -->
## Quick Reference
`~/state-hub/mcp_server/TOOLS.md` — MCP tool reference

View File

@@ -0,0 +1,50 @@
# Credential and access routing
**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
for inference. Run this check **before** requesting secrets, API keys, SSH access,
login tokens, or database passwords — in any repo, not only `ops-warden`.
ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
other credential need belongs to another subsystem. **Do not** message
`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
### Lookup (do this first)
```bash
warden route find "<describe your need>" --json
warden route show <catalog-id> --json
```
Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
| Agent runtime | How to orient |
| --- | --- |
| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=sand-boxer` is for coordination, not secret vending |
| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
### Quick routing table
| I need… | Owner | ops-warden executes? |
| --- | --- | --- |
| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes**`warden sign` |
| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
| Login / OIDC / MFA | key-cape / Keycloak | No — route only |
| Authorization decision | flex-auth | No — route only |
| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
### Anti-patterns (do not do these)
- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
- Pasting secrets into Git, State Hub, workplans, logs, or chat
### Other capabilities (reuse-surface)
Non-credential capabilities are usually discovered through **reuse-surface** federation
(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
every repo's agent instructions because it is high-frequency, high-risk, and easy to
get wrong.
**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`

View File

@@ -0,0 +1,38 @@
## First Session Protocol
Triggered when `get_domain_summary("infotech")` shows **no workstreams**.
The project is registered but work has not yet been structured.
**Step 1 — Read, don't write**
- `~/the-custodian/canon/projects/infotech/project_charter_v0.1.md` — purpose, scope
- `~/the-custodian/canon/projects/infotech/roadmap_v0.1.md` — planned phases
- Scan repo root: README, directory structure, existing code or docs
**Step 2 — Survey in-progress work**
Look for TODOs, open branches, half-finished files. Note done vs. started but incomplete.
**Step 3 — Propose workstreams to Bernd**
Propose 13 workstreams — each a coherent strand, weeks to months, anchored to a
roadmap phase. **Wait for approval before creating.**
**Step 4 — Create workplan file first, then DB record (ADR-001)**
```
workplans/SAND-WP-NNNN-<slug>.md ← write this first
```
Then register in the hub:
```
create_workstream(topic_id="cee7bedf-2b48-46ef-8601-006474f2ad7a", title="...", owner="...", description="...")
create_task(workstream_id="<id>", title="...", priority="high|medium|low")
```
**Step 5 — Record the setup**
```
add_progress_event(
summary="First session: structured infotech into N workstreams, M tasks",
event_type="milestone",
topic_id="cee7bedf-2b48-46ef-8601-006474f2ad7a",
detail={"workstreams": [...], "tasks_created": M}
)
```
<!-- Delete or archive this file once past first session -->

View File

@@ -0,0 +1,8 @@
## Repo boundary
This repo owns **sand-boxer** only. It does not own:
<!-- TODO: List what belongs in adjacent repos, e.g.:
- SSH key management → railiance-infra/
- State hub code → state-hub/
-->

View File

@@ -0,0 +1,5 @@
**Purpose:** Sandboxing for agentic coding facility.
**Domain:** infotech
**Repo slug:** sand-boxer
**Topic ID:** cee7bedf-2b48-46ef-8601-006474f2ad7a

View File

@@ -0,0 +1,85 @@
## Session Protocol
Dev Hub (State Hub API): http://127.0.0.1:8000
MCP server name in `~/.claude.json`: `dev-hub`
**Step 1 — Orient**
Read the offline-safe brief first — it works without a live hub connection:
```bash
cat .custodian-brief.md
```
Then call the MCP tool for richer cross-domain context when MCP tools are exposed:
```
get_domain_summary("infotech")
```
If MCP tools are unavailable in the current agent session, use the REST API:
```bash
curl -s "http://127.0.0.1:8000/state/summary" | python3 -m json.tool
```
If the hub is offline: `cd ~/state-hub && make api`
**Step 2 — Check inbox**
With MCP tools:
```
get_messages(to_agent="sand-boxer", unread_only=True)
```
Mark read with `mark_message_read(message_id)`. Reply or act on coordination
requests before proceeding.
Without MCP tools:
```bash
curl -s "http://127.0.0.1:8000/messages/?to_agent=sand-boxer&unread_only=true" \
| python3 -m json.tool
curl -s -X PATCH "http://127.0.0.1:8000/messages/<id>/read" \
-H "Content-Type: application/json" -d '{}'
```
**Step 3 — Scan workplans**
```bash
ls workplans/
```
For each file with `status: ready`, `active`, or `blocked`, note pending
`wait`/`todo`/`progress` tasks.
**Step 4 — Present brief**
1. **Active workstreams** for `infotech` — title, task counts, blocking decisions
2. **Pending tasks** from `workplans/` + any `[repo:sand-boxer]` hub tasks
3. **Goal guidance** — if `goal_guidance` in summary:
- `needs_workplan`: surface as top action — *"Repo goal '{title}' has no workplan yet"*
- `alignment_warnings`: flag if active work is not aligned with current goal
4. **Suggested next action** — highest-priority open item
5. **SBOM status** — flag if `last_sbom_at` is unset for this repo
If no workstreams: follow First Session Protocol (`first-session.md`).
**During work:** `record_decision()` · `add_progress_event()` · `resolve_decision()`
> State Hub is a *read model*. Bootstrap tools (`create_workstream`, `create_task`)
> are First Session Protocol only. Work structure belongs in repo files (ADR-001).
**Session close:**
With MCP tools:
```
add_progress_event(summary="...", topic_id="cee7bedf-2b48-46ef-8601-006474f2ad7a", workstream_id="<uuid>")
```
Without MCP tools:
```bash
curl -s -X POST http://127.0.0.1:8000/progress/ \
-H "Content-Type: application/json" \
-d '{"topic_id":"cee7bedf-2b48-46ef-8601-006474f2ad7a","workstream_id":"<uuid>","event_type":"note","summary":"what changed","author":"codex"}'
```
If workplan files were modified, ensure the local copy is up to date first:
```bash
git -C <repo_path> pull --ff-only
cd ~/state-hub && make fix-consistency REPO=sand-boxer
```
For repos where implementation runs on a remote machine (e.g. CoulombCore),
use the combined target which pulls before fixing:
```bash
cd ~/state-hub && make fix-consistency-remote REPO=sand-boxer
```
**C-15** (DB task ahead of file) is normal in multi-machine workflows — writeback
will sync the file to match DB. **C-16** (repo behind remote) blocks all writes
until you pull — intentional to prevent clobbering remote progress.

View File

@@ -0,0 +1,19 @@
## Stack
<!-- TODO: Fill in language, frameworks, and key dependencies -->
- **Language:**
- **Key deps:**
## Dev Commands
```bash
# TODO: Fill in the standard commands for this repo
# Install dependencies
# Run tests
# Lint / type check
# Build / package (if applicable)
```

View File

@@ -0,0 +1,40 @@
## Workplan Convention (ADR-001)
File location: `workplans/SAND-WP-NNNN-<slug>.md`
ID prefix: `SAND-WP-`
Work items originate as files in this repo **before** being registered in the hub.
Canonical workplan/workstream frontmatter statuses are:
`proposed`, `ready`, `active`, `blocked`, `backlog`, `finished`, `archived`.
Use `proposed` for a newly drafted plan, `ready` after review against current
repo state, and `finished` when implementation is complete. `stalled` and
`needs_review` are derived health labels, not stored statuses.
Closed workplans may be moved to `workplans/archived/` with a completion-date
prefix: `YYMMDD-SAND-WP-NNNN-<slug>.md`. The frontmatter id remains
unchanged; the prefix is only for quick visual reference.
Small opportunistic tasks discovered during another session use **Ad Hoc Tasks**:
`workplans/ADHOC-YYYY-MM-DD.md`, workstream slug `adhoc-YYYY-MM-DD`, and task ids
`ADHOC-YYYY-MM-DD-T01`, `T02`, etc. Use adhocs only for low-risk work completed
directly. Promote anything requiring analysis, design, approval, dependencies, or
multiple planned phases into a normal workplan.
Ecosystem todos from other agents arrive as `[repo:sand-boxer]` hub tasks —
visible at session start. Pick one up by creating the workplan file, then registering
the workstream.
Task blocks use this shape:
```task
id: SAND-WP-NNNN-T01
status: wait | todo | progress | done | cancel
priority: high | medium | low
state_hub_task_id: "<uuid>" # written by fix-consistency — do not edit
```
Status progression is `todo``progress``done`; use `wait` for waiting or
blocked work and `cancel` for stopped work.
<!-- Ralph Loop rules and HEUREKA sequence: ~/.claude/CLAUDE.md — do not duplicate here -->

31
.custodian-brief.md Normal file
View File

@@ -0,0 +1,31 @@
<!-- custodian-brief: generated by statehub register; fix-consistency may replace this file -->
# Custodian Brief - sand-boxer
**Project:** sand-boxer
**Domain:** infotech
**State Hub:** http://127.0.0.1:8000
**Topic ID:** `cee7bedf-2b48-46ef-8601-006474f2ad7a`
## Open Workplans
### Bootstrap State Hub integration
Workplan file: `workplans/SAND-WP-0001-statehub-bootstrap.md` (status: active)
Open tasks:
- T02 - Verify local developer workflow
### Meta-framework foundation and first extension
Workplan file: `workplans/SAND-WP-0002-meta-framework-foundation.md` (status: ready)
Next: T01 - Design meta-framework contracts
## Session Start
1. Read `INTENT.md`, `SCOPE.md`, and `AGENTS.md`.
2. Check inbox: `GET /messages/?to_agent=sand-boxer&unread_only=true`.
3. Scan `workplans/`.
4. Update task statuses in workplan files as work progresses.
Last generated: 2026-06-22

219
AGENTS.md Normal file
View File

@@ -0,0 +1,219 @@
# sand-boxer — Agent Instructions
## Repo Identity
**Purpose:** Sandboxing for agentic coding facility.
**Domain:** infotech
**Repo slug:** sand-boxer
**Topic ID:** `cee7bedf-2b48-46ef-8601-006474f2ad7a`
**Workplan prefix:** `SAND-WP-`
---
## State Hub Integration
The Custodian State Hub tracks work across all domains. Interact via HTTP REST —
there is no MCP server for Codex agents.
| Context | URL |
|---------|-----|
| Local workstation | `http://127.0.0.1:8000` |
| Remote via tunnel | `http://127.0.0.1:18000` |
### Orient at session start
```bash
# Offline brief — works without hub connection
cat .custodian-brief.md
# Active workstreams for this domain
curl -s "http://127.0.0.1:8000/workstreams/?topic_id=cee7bedf-2b48-46ef-8601-006474f2ad7a&status=active" \
| python3 -m json.tool
# Check inbox
curl -s "http://127.0.0.1:8000/messages/?to_agent=sand-boxer&unread_only=true" \
| python3 -m json.tool
```
Mark a message read:
```bash
curl -s -X PATCH "http://127.0.0.1:8000/messages/<id>/read" \
-H "Content-Type: application/json" -d '{}'
```
### Log progress (required at session close)
```bash
curl -s -X POST http://127.0.0.1:8000/progress/ \
-H "Content-Type: application/json" \
-d '{
"summary": "what was done",
"event_type": "note",
"author": "codex",
"workstream_id": "<uuid>",
"task_id": "<uuid>"
}'
```
Omit `workstream_id` / `task_id` when not applicable.
### Update task status
```bash
curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
-H "Content-Type: application/json" \
-d '{"status": "progress"}'
# values: wait | todo | progress | done | cancel
```
### Flag a task for human review
```bash
curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
-H "Content-Type: application/json" \
-d '{"needs_human": true, "intervention_note": "reason"}'
```
---
## Session Protocol
**Start:**
1. `cat .custodian-brief.md` — domain goal and open workstreams (offline-safe)
2. Check inbox: `GET /messages/?to_agent=sand-boxer&unread_only=true`; mark read
3. Scan workplans: `ls workplans/` — note `status: ready`, `active`, or `blocked` files and open tasks
4. Check human-needed tasks: `GET /tasks/?needs_human=true`
**During work:**
- Update task statuses in workplan files as tasks progress
- Record significant decisions via `POST /decisions/`
**Close:**
1. Update workplan file task statuses to reflect progress
2. Log: `POST /progress/` with a summary of what changed
3. Note for the custodian operator: after workplan file changes, run from
`~/state-hub`:
```bash
make fix-consistency REPO=sand-boxer
```
This syncs task status from files into the hub DB.
---
## Credential and access routing
**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
for inference. Run this check **before** requesting secrets, API keys, SSH access,
login tokens, or database passwords — in any repo, not only `ops-warden`.
ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
other credential need belongs to another subsystem. **Do not** message
`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
### Lookup (do this first)
```bash
warden route find "<describe your need>" --json
warden route show <catalog-id> --json
```
Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
| Agent runtime | How to orient |
| --- | --- |
| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=sand-boxer` is for coordination, not secret vending |
| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
### Quick routing table
| I need… | Owner | ops-warden executes? |
| --- | --- | --- |
| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes** — `warden sign` |
| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
| Login / OIDC / MFA | key-cape / Keycloak | No — route only |
| Authorization decision | flex-auth | No — route only |
| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
### Anti-patterns (do not do these)
- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
- Pasting secrets into Git, State Hub, workplans, logs, or chat
### Other capabilities (reuse-surface)
Non-credential capabilities are usually discovered through **reuse-surface** federation
(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
every repo's agent instructions because it is high-frequency, high-risk, and easy to
get wrong.
**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`
<!-- REPO-AGENTS-EXTENSIONS -->
<!-- Append repo-specific agent instructions below this marker.
The state-hub template sync preserves content after this line. -->
---
## Workplan Convention (ADR-001)
Work items originate as files in this repo — not in the hub. The hub is a
read/cache/index layer that rebuilds from files.
**File location:** `workplans/SAND-WP-NNNN-<slug>.md`
**Archived location:** finished workplans may move to
`workplans/archived/YYMMDD-SAND-WP-NNNN-<slug>.md`. The `YYMMDD` prefix is
the completion/archive date; the frontmatter `id` does not change.
**Ad Hoc Tasks:** small opportunistic fixes discovered during a session use
`workplans/ADHOC-YYYY-MM-DD.md` with task ids `ADHOC-YYYY-MM-DD-T01`, etc. Use
this only for low-risk work completed directly; create a normal workplan for
anything needing analysis, design, approval, dependencies, or multiple phases.
**Frontmatter:**
```yaml
---
id: SAND-WP-NNNN
type: workplan
title: "..."
domain: infotech
repo: sand-boxer
status: proposed | ready | active | blocked | backlog | finished | archived
owner: codex
topic_slug: ...
created: "YYYY-MM-DD"
updated: "YYYY-MM-DD"
state_hub_workstream_id: "<uuid>" # written by fix-consistency — do not edit
---
```
Use `proposed` for a new draft, `ready` after review against current repo
state, and `finished` after implementation. `stalled` and `needs_review` are
derived health labels, not frontmatter statuses.
**Task block format** (one per `##` section):
```
## Task Title
` ` `task
id: SAND-WP-NNNN-T01
status: wait | todo | progress | done | cancel
priority: high | medium | low
state_hub_task_id: "<uuid>" # written by fix-consistency — do not edit
` ` `
Task description text.
```
Status progression: `todo` → `progress` → `done`; use `wait` for waiting/blocked work and `cancel` for stopped work.
To create a new workplan:
1. Write the file following the format above
2. Notify the custodian operator to run `make fix-consistency REPO=sand-boxer`
(or send a message to the hub agent via `POST /messages/`)

12
CLAUDE.md Normal file
View File

@@ -0,0 +1,12 @@
# sand-boxer — Claude Code Instructions
@SCOPE.md
@.claude/rules/repo-identity.md
@.claude/rules/session-protocol.md
@.claude/rules/first-session.md
@.claude/rules/workplan-convention.md
@.claude/rules/stack-and-commands.md
@.claude/rules/architecture.md
@.claude/rules/repo-boundary.md
@.claude/rules/credential-routing.md
@.claude/rules/agents.md

384
INTENT.md
View File

@@ -1,180 +1,338 @@
---
domain: custodian
domain: infotech
repo: sand-boxer
updated: "2026-06-21"
updated: "2026-06-22"
---
# INTENT
> This file explains why sand-boxer exists, what problem it solves in the
> Custodian ecosystem, and where its authority begins and ends.
> sand-boxer is the Coulomb **meta-framework for establishing sandboxes** — a
> unified API and extension platform for provisioning every variation of isolated
> execution environment, from self-hosted compose stacks to metered SaaS
> runtimes. This file is the charter: why it exists, what it owns, and where
> sibling projects begin.
Research backing this charter lives in `research/`.
---
## Why it exists
Custodian automation is moving from **workstation-anchored** execution to
**Railiance01-scheduled** orchestration. That shift is right for reliability:
activity-core on Railiance01 can fire maintenance and coordination jobs on a
stable clock. It does not, by itself, give agents a safe place to **develop,
build, and test** without the laptop filesystem, sleep cycles, and single-user
blast radius.
**Railiance01-scheduled** orchestration. That shift improves reliability but does
not, by itself, answer the harder question: **where can agentic and deterministic
work run safely** without the laptop filesystem, sleep cycles, and single-user
blast radius?
sand-boxer exists to provide **isolated execution environments** — sandboxes —
where agentic and deterministic work can run on dedicated infrastructure while
remaining observable and governable from State Hub.
The industry has exploded with sandbox answers — E2B, Modal, Daytona, OpenShell,
OpenClaw-style Docker/SSH backends, hyperscaler interpreters — each with
different APIs, billing models, and isolation postures. Coulomb needs **one place
to establish sandboxes** regardless of backend, not a new integration per agent
harness, validator, or codegen pipeline.
The goal is progress without requiring the workstation as a runtime: repos are
checked out, tools run, tests execute, and artifacts return through controlled
channels. The laptop becomes optional for operations, not the hub of all
execution.
sand-boxer exists to be that place: **OpenRouter for sandboxes, not for models.**
Consumers call one API. Extensions delegate to the sandbox system that fits —
self-hosted on sandboxer01, inherited compose-ssh from `the-custodian`, or a
metered cloud provider. An integrated **payments layer** handles SaaS consumption
when Coulomb uses external capacity. Over time, operational learning may justify
a Coulomb-native **best-of-brands runtime** — but that is a later phase built on
evidence, not day-one ambition.
The workstation becomes optional for **runtime**. Railiance01 decides *when*
work runs (via activity-core). sand-boxer decides *where* isolated execution
happens. State Hub records *what* changed.
---
## The governing principle
sand-boxer is the **execution isolation and provisioning service** for agentic
development and related workloads.
sand-boxer is the **sandbox establishment service** — profiles, provisioning,
extension routing, placement, lifecycle, and metering. Nothing more.
It should answer:
It answers:
1. **Where can this work run safely?** Profile selection (compose stack, VM,
future cluster worker) and host placement.
2. **How is isolation enforced?** Networks, TTL, resource limits, teardown, and
cleanup guarantees.
3. **How does the sandbox phone home?** Reachability via ops-bridge tunnels and
SSH identity via ops-warden — without owning either.
4. **What happened?** Registration, health, and lifecycle events visible to
State Hub and reuse-surface consumers.
1. **Which sandbox recipe applies?** Profile selection and version resolution.
2. **Which backend fulfills it?** Extension routing (self-hosted vs SaaS).
3. **Where does it run?** Host placement and blast-radius policy.
4. **How is isolation enforced?** Network default-deny, TTL, resource limits,
teardown guarantees — as declared by profile + extension.
5. **How does it become reachable?** Consumer integration with ops-bridge and
ops-warden — without owning tunnels or certificates.
6. **What happened?** Lifecycle events, usage meters, State Hub registration.
7. **What did it cost?** Payments and credits for metered extensions.
It should not become the scheduler, the work-state database, the connectivity
authority, or production application hosting on Railiance01.
It must **not** become the agent harness, the e2e validator, the code generator,
the scheduler, the work-state database, the connectivity authority, or production
hosting on Railiance01.
---
## Strategic context
## The OpenRouter analogy
### Workstation automation is interim, not the target
| OpenRouter | sand-boxer |
|------------|------------|
| Unified LLM access API | Unified sandbox establishment API |
| Routes across model providers | Routes across sandbox extensions |
| Provider metadata (price, context) | Profile metadata (isolation, cost, latency) |
| API keys, credits, usage billing | Payments layer for SaaS sandbox consumption |
| BYOK supported | BYOK for extension provider keys |
| Does not train models | Does not replace extension runtimes (until phase 5) |
Local timers and laptop-resident scripts were useful for bootstrapping ADR-001
consistency sync and similar jobs. They are not the long-term substrate.
Railiance01-based activity-core schedules are the primary direction; workstation
paths remain only where no sandbox or cluster alternative exists yet.
sand-boxer is **infrastructure routing**, not product UX. Harnesses, validators,
and inventors are customers.
### Railiance01 vs sandbox hosts
---
| Layer | Role |
|-------|------|
| **Railiance01** | Production k3s, activity-core, Temporal, stable custodian schedules |
| **sandboxer01** (or equivalent) | Dedicated VM for dev/agent sandboxes — **isolated blast radius** |
| **CoulombCore** | Acceptable interim sandbox host during migration; not a substitute for deliberate isolation from production |
| **Workstation (WSL)** | Control plane anchor today; **not** the desired execution surface |
## Coulomb sibling boundaries
sand-boxer owns the **abstraction and lifecycle** of sandboxes. It does not own
Railiance01 cluster operations (see `railiance-cluster` / `railiance-apps`).
sand-boxer stays inside the **sandboxing boundary**. Three sibling Coulomb
projects own adjacent concerns. Integration is contractual — they **request**
sandboxes; sand-boxer **establishes** them.
### Lineage
### glas-harness — agent harness
This repository consolidates and generalizes patterns that today live split and
unregistered in `the-custodian`:
**Owns:** Gateway, tool orchestration, skills, memory, channels, subagent
delegation, session semantics, sandbox *consumption* from the agent's perspective.
- **E2E sandbox framework** (`e2e-framework/`) — SSH to remote host, isolated
directory, docker compose, teardown (`CUST-WP-0028`).
- **Build machines** (`infra/build-machines/`) — reproducible VM images,
reverse tunnels, State Hub capability registration (`CUST-WP-0032`).
**Does not own:** Sandbox runtimes, profile catalog authority, host placement,
extension adapters, isolation enforcement.
sand-boxer extracts a **reusable platform** from those precedents so
`the-custodian` can stay governance-focused with a small operational surface.
glas-harness configures *when* tools run in a sandbox (OpenClaw-style
`mode` / `scope` / `workspaceAccess`). sand-boxer provides the sandbox handle
and reachability descriptor.
### wise-validator — e2e test and health
**Owns:** Validation workflows, health check semantics, test orchestration,
pass/fail interpretation, structured result reporting to State Hub and CI.
**Does not own:** Remote host provisioning, compose lifecycle, port isolation,
sandbox teardown.
wise-validator replaces the validation half of `the-custodian/e2e-framework/`.
It requests `profile.compose-e2e` (or successors), runs tests inside the
established environment, and owns the `e2e.yml` contract.
### snuggle-inventor — code generation
**Owns:** Code generation, modernization pipelines, tech-spec and planning
artifacts, PR-oriented output, human-in-the-loop review gates.
**Does not own:** Sandbox infrastructure, environment bootstrapping authority,
secret stores, runtime metering.
snuggle-inventor may attach Blitzy-style **setup instructions** and secret
references as profile inputs. sand-boxer resolves secrets at the provision
boundary; generated code never transits sand-boxer APIs.
### Boundary diagram
```
glas-harness wise-validator snuggle-inventor
(agent harness) (e2e + health) (code generation)
│ │ │
└─────────────────────┼──────────────────────┘
│ POST /v1/sandboxes
sand-boxer
(establish sandboxes)
┌───────────────┼───────────────┐
▼ ▼ ▼
ext.compose-ssh ext.modal ext.e2b …
(self-hosted) (SaaS+meter) (SaaS+meter)
```
### Existing Custodian repos (unchanged)
| Concern | Owner |
|---------|--------|
| Workstream, task, progress state | `state-hub` |
| Cron and orchestration | `activity-core` |
| SSH reverse tunnels | `ops-bridge` |
| SSH certificate issuance | `ops-warden` |
| Canon and agent instruction canon | `the-custodian` |
| Capability federation hub | `reuse-surface` |
| Production on Railiance01 | `railiance-apps` / domain repos |
| ADR-001 reconciliation | `state-hub` |
sand-boxer **consumes** ops-bridge and ops-warden; it does not subsume them.
---
## What it is
sand-boxer is the **sandbox provisioning and profile catalog** for Custodian.
sand-boxer is a **meta-framework** with four pillars:
It is intended to contain:
### 1. Unified establishment API
- **Sandbox profiles** — e.g. compose-based e2e stacks, VM images, future
container-on-worker patterns
- **Provision / wait / teardown** lifecycle — TTL, idempotent cleanup, port and
network conventions
- **Host placement policy** — which profiles run on sandboxer01, coulombcore
interim, or other registered hosts
- **CLI and/or API** for operators and agents to request isolated environments
- **State Hub registration contract** — extend the `build-agent` self-register
pattern to generic sandbox identities
- **Capability registry entries** in this repo's `registry/` for federation via
reuse-surface (e.g. `capability.execution.sandbox-provision`)
- Runbooks, templates (Packer, compose bundles), and tests for the above
One consistent surface for all sandbox variations:
- Create, inspect, extend, snapshot, recreate, destroy
- Profile-driven inputs (repo ref, compose bundle, setup metadata, secret refs)
- Consumer attribution (`adm` / `agt` / `atm` + calling project id)
- Lifecycle states: `requested → provisioning → ready → active → expired → destroyed`
Early versions may expose a subset; the API shape is designed for completeness.
### 2. Profile catalog
Named, versioned recipes — not one-off containers:
- Extension binding (`ext.compose-ssh`, `ext.vm-packer`, `ext.e2b`, …)
- Isolation level, network policy, workspace mode (`mirror` | `remote-canonical`)
- Scope default (`agent` | `session` | `shared`)
- TTL, resource limits, placement preference
- Setup metadata (natural-language bootstrap instructions for extensions)
- Registered in `registry/` and federated via reuse-surface
Profiles collect good ideas from OpenClaw (backend/scope/workspace), Hermes
(labeled reuse, resource limits), Blitzy (setup instructions, secret boundary),
and hosted platforms (checkpoint, persistence classes) into **one schema**.
### 3. Extension platform
Extensions **delegate** to sandbox systems and services:
| Class | Examples | Billing |
|-------|----------|---------|
| **Self-hosted** | compose-ssh, vm-packer, Daytona OSS, OpenShell | Infra allocation |
| **SaaS consumption** | E2B, Modal, Daytona cloud, future providers | Payments layer |
Each extension implements a provision / ready / teardown contract (optional
snapshot / cost estimate). Extensions ship as plugins; third-party and Coulomb-
native backends use the same interface.
### 4. Payments and metering
For metered SaaS extensions:
- Org/workspace credits and usage accounting
- Pre-create cost estimates; post-destroy actuals
- BYOK for provider API keys where supported
- Export to domain billing systems — sand-boxer meters sandbox consumption,
not general payments
Self-hosted extensions record **allocation** (host, duration), not external spend.
---
## What it is not
| Concern | Owner |
|---------|--------|
| Workstream, task, and progress state | `state-hub` |
| Cron and event-triggered orchestration | `activity-core` |
| SSH reverse tunnels and tunnel health | `ops-bridge` |
| SSH certificate issuance | `ops-warden` |
| Canon, charters, agent instruction canon | `the-custodian` |
| Capability index federation hub | `reuse-surface` |
| Production service deployment on Railiance01 | `railiance-apps` / domain repos |
| ADR-001 workplan ↔ DB reconciliation | `state-hub` (`consistency_check.py`) |
| Concern | Owner | sand-boxer role |
|---------|--------|-----------------|
| Agent gateway, tools, memory, channels | **glas-harness** | Customer API |
| E2e tests, health checks, validation | **wise-validator** | Customer API |
| Code generation, tech specs, AAP | **snuggle-inventor** | Customer API |
| When work runs | `activity-core` | None |
| What tasks exist | `state-hub` | Registers lifecycle only |
| Tunnels | `ops-bridge` | Consumer |
| Certs | `ops-warden` | Consumer |
| Intent-aware egress / prompt security | Research frontier | Document limits only |
sand-boxer may **consume** connectivity and certificates; it must not duplicate
or subsume those authorities.
sand-boxer provides **blast-radius isolation and governed reachability**. It does
not protect against a compromised agent abusing **allowed** egress paths (git,
npm, curl to allowlisted hosts). Security runbooks must state this explicitly.
---
## Strategic context
### Workstation automation is interim
Local timers and laptop scripts bootstrapped ADR-001 sync. Railiance01
activity-core schedules are the direction. Workstation paths remain only where no
sandbox alternative exists yet.
### Host topology
| Layer | Role |
|-------|------|
| **Railiance01** | Production k3s, activity-core, Temporal — **not** agent dev runtime |
| **sandboxer01** | Dedicated sandbox host — preferred blast-radius isolation |
| **CoulombCore** | Interim sandbox host during migration |
| **Workstation (WSL)** | Control-plane anchor today — **not** target execution surface |
| **SaaS extensions** | Burst / capability gap (GPU, desktop) via payments layer |
### Lineage
sand-boxer generalizes patterns split across `the-custodian`:
| Legacy | sand-boxer | Sibling |
|--------|------------|---------|
| `e2e-framework/` provision/teardown | `ext.compose-ssh` | wise-validator owns test run |
| `e2e-framework/` health + test + report | — | wise-validator |
| `infra/build-machines/` | `ext.vm-packer` | — |
| Agent sandbox config (future) | API consumer | glas-harness |
`the-custodian` stays governance-focused; sand-boxer becomes the execution
venue catalog.
### Phase 5: Coulomb-native runtime (later)
After operating extensions in production — observing latency, cost, failure
modes, isolation gaps — sand-boxer may ship an owned **best-of-brands**
sandboxing solution combining:
- Persistent labeled workspaces (Hermes pattern)
- Default-deny policy layer (OpenShell lessons)
- Fast resume / checkpoint (industry baseline)
- Self-hosted economics (Daytona/OpenSandbox lessons)
This is **not** v1 scope. Extensions and payments come first; native runtime
follows evidence.
---
## Intended users
- **Human operators (`adm`)** — provision sandboxes, manage profiles and hosts,
inspect lifecycle and cleanup
- **LLM agents (`agt`)** — request isolated environments for coding, testing,
and verification without laptop filesystem dependence
- **Deterministic automations (`atm`)** — activity-core instructions and CI
hooks that need a bounded execution venue
- **Human operators (`adm`)** — profiles, hosts, extensions, credits, lifecycle
- **LLM agents (`agt`)** — via glas-harness, snuggle-inventor, or direct API
- **Deterministic automations (`atm`)** — via wise-validator, activity-core, CI
- **Extension authors** — implement backend adapters against the extension contract
- **Platform integrators** — register capabilities, federate via reuse-surface
---
## Design principles
- **Blast radius isolation** — sandbox workloads must not jeopardize Railiance01
production stability; prefer dedicated hosts (sandboxer01) for agentic dev
- **Profiles over one-offs** — every sandbox type is a named, versioned profile
with documented inputs, outputs, and teardown
- **Reachability, not ownership** — use ops-bridge for tunnels and ops-warden
for SSH identity; sand-boxer orchestrates, it does not issue certs or run
tunnel daemons
- **Observable lifecycle** — create, ready, active, expired, and destroyed states
are attributable and queryable
- **Disposable by default** — sandboxes are TTL-bound; persistence is explicit
and exceptional
- **Registry-first reuse** — register capabilities in this repo and federate
through reuse-surface before ad hoc duplication elsewhere
- **Meta-framework, not monolith** — one API; many extensions; optional native runtime later
- **Profiles over one-offs** — every sandbox type is named, versioned, registered
- **Prefer self-hosted** — SaaS via explicit routing policy, not silent default
- **Blast-radius isolation** — dedicated hosts; never jeopardize Railiance01 production
- **Reachability, not ownership** — ops-bridge + ops-warden as consumers
- **Secrets at the boundary** — resolve at provision; never in agent-visible workspace
- **Observable lifecycle** — every state transition attributable and queryable
- **Disposable by default** — TTL-bound; persistence and checkpoint are explicit
- **Honest security** — sandboxing limits blast radius; it is not intent enforcement
- **Registry-first reuse** — capabilities in `registry/` before ad hoc duplication
- **Payments transparency** — estimate before create; meter on destroy for SaaS
---
## Near-term outcomes
A first useful version of sand-boxer should:
1. Define at least one **production-oriented profile** (e.g. compose sandbox on
sandboxer01 or coulombcore interim) with documented provision/teardown
2. Register **`capability.execution.sandbox-provision`** (or equivalent) in
`registry/` and pass reuse-surface validation
3. Integrate with **ops-bridge** reachability and **State Hub** registration
4. Provide a clear migration path for e2e-framework and build-machines callers
5. Enable activity-core and agents to request sandboxes without workstation repo
paths as a hard dependency
1. **Charter and research**`INTENT.md`, `research/`, profile schema draft
2. **First self-hosted extension**`ext.compose-ssh` from e2e-framework lineage
3. **Unified API v0** — create / get / destroy / recreate + State Hub registration
4. **First profile**`profile.compose-e2e` for wise-validator migration
5. **Registry entry**`capability.execution.sandbox-provision` via reuse-surface
6. **Extension SDK sketch** — contract for P1 backends (vm-packer, Daytona OSS)
7. **Sibling integration notes** — glas-harness, wise-validator, snuggle-inventor API expectations documented
---
## Maturity target
A mature sand-boxer should be the **standard execution venue** for agentic
development in Custodian: Railiance01 decides *when* work runs; sand-boxer
decides *where* isolated execution happens; State Hub records *what* changed.
The workstation is optional — used for human preference, not as a single point
of runtime failure.
A mature sand-boxer is Coulomb's **default way to establish any sandbox**:
- glas-harness requests agent dev sandboxes without choosing Docker vs Modal vs SSH
- wise-validator requests validation environments without owning provisioners
- snuggle-inventor requests build sandboxes with setup metadata and secret refs
- activity-core and CI request bounded venues with consistent lifecycle visibility
- Operators route spend across self-hosted and SaaS with one credits model
- A Coulomb-native runtime — if warranted — wins on ops data, not speculation
The workstation is optional. The harness is not sand-boxer. The validator is not
sand-boxer. The inventor is not sand-boxer. **Establishing the box is.**

235
SCOPE.md Normal file
View File

@@ -0,0 +1,235 @@
---
domain: infotech
repo: sand-boxer
updated: "2026-06-22"
---
# SCOPE
> This file helps you quickly understand what this repository is about,
> when it is relevant, and when it is not.
> It is intentionally lightweight and may be incomplete until implementation lands.
---
## One-liner
Sandbox provisioning and profile catalog for Custodian — isolated execution
environments where agents and automations can develop, build, and test without
depending on the workstation filesystem or blast radius.
---
## Core Idea
sand-boxer is the **execution isolation and provisioning service** for agentic
development and related workloads in the Custodian ecosystem. It answers where
work can run safely, how isolation is enforced, how sandboxes phone home, and
what happened during their lifecycle.
A **sandbox profile** is a named, versioned recipe (compose stack, VM image,
future cluster worker) with documented inputs, outputs, host placement, TTL,
and teardown guarantees. Operators and agents request a profile; sand-boxer
provisions an isolated environment on a registered host, exposes reachability
through ops-bridge (without owning tunnels), registers lifecycle state with
State Hub, and tears down on expiry or explicit release.
The repo consolidates patterns today split across `the-custodian`:
`e2e-framework/` (SSH + compose sandboxes for cross-repo e2e) and
`infra/build-machines/` (Packer VMs with build-agent self-registration).
---
## In Scope
- **Sandbox profile catalog** — versioned definitions for compose-based e2e
stacks, VM images, and future worker patterns; inputs, outputs, and teardown
contracts documented per profile
- **Provision / wait / teardown lifecycle** — TTL, idempotent cleanup, port and
network conventions, observable states (create → ready → active → expired →
destroyed)
- **Host placement policy** — which profiles run on sandboxer01, CoulombCore
interim, or other registered hosts; blast-radius isolation from Railiance01
production
- **CLI and/or API** — request, inspect, and release sandboxes for operators
(`adm`), agents (`agt`), and automations (`atm`)
- **State Hub registration contract** — extend the `build-agent` self-register
pattern to generic sandbox identities and lifecycle events
- **Capability registry entries** in `registry/` for federation via
reuse-surface (e.g. `capability.execution.sandbox-provision`)
- **Runbooks, templates, and tests** — Packer/compose bundles, operator
runbooks, and automated tests for profile lifecycle
- **Migration path** — documented cutover from `the-custodian/e2e-framework`
and `infra/build-machines` callers to sand-boxer profiles
- **Agent and workplan metadata** — `INTENT.md`, `AGENTS.md`, `workplans/`,
and State Hub progress/decision logging per ADR-001
---
## Out of Scope
| Concern | Owner |
|---------|--------|
| Workstream, task, and progress state | `state-hub` |
| Cron and event-triggered orchestration | `activity-core` |
| SSH reverse tunnels and tunnel health | `ops-bridge` |
| SSH certificate issuance | `ops-warden` |
| Canon, charters, agent instruction canon | `the-custodian` |
| Capability index federation hub | `reuse-surface` |
| Production service deployment on Railiance01 | `railiance-apps` / domain repos |
| Railiance01 cluster operations | `railiance-cluster` / `railiance-infra` |
| ADR-001 workplan ↔ DB reconciliation | `state-hub` (`consistency_check.py`) |
sand-boxer may **consume** connectivity (ops-bridge) and certificates
(ops-warden); it must not duplicate or subsume those authorities.
Additional boundaries:
- **Scheduling** — activity-core decides *when* work runs; sand-boxer decides
*where* isolated execution happens
- **Workstation as runtime** — the laptop/WSL anchor is interim control plane,
not the target execution surface
- **Irreversible operational decisions** — host provisioning, production
cutovers, and CA policy changes require human approval
---
## Relevant When
- An agent or automation needs an isolated environment for coding, building, or
testing without laptop filesystem dependence
- Cross-repo e2e tests need a remote compose sandbox with guaranteed teardown
- A build or verification workload should run on dedicated hardware
(sandboxer01) rather than Railiance01 production or the workstation
- activity-core or CI needs a bounded execution venue with State Hub visibility
- Planning reuse of sandbox provisioning across repos (registry-first discovery)
---
## Not Relevant When
- All work runs locally with acceptable blast radius
- Only tunnel connectivity is needed (use `ops-bridge` directly)
- Only task/workstream state is needed (use `state-hub`)
- Only scheduling or rule evaluation is needed (use `activity-core`)
- Deploying or operating production services on Railiance01
---
## Current State
- **Status:** bootstrap — repo registered with State Hub; charter written;
implementation not started
- **Implementation:** ~0% — no CLI, API, profiles, provisioner, or tests in tree
- **Docs:** `INTENT.md` (charter, 2026-06-21); `README.md` (one-liner);
`AGENTS.md` and `.custodian-brief.md` (State Hub integration, generated)
- **Registry:** scaffold present (`registry/indexes/capabilities.yaml` empty;
`registry/capabilities/` placeholder); domain in index still `helix_forge`
from scaffold — needs alignment to `infotech`
- **Workplans:** `SAND-WP-0001` (State Hub bootstrap) in `ready`
- **Lineage (external, not yet migrated):** `the-custodian/e2e-framework/`
(CUST-WP-0028, completed) and `infra/build-machines/` (CUST-WP-0032)
---
## What Is Possible Now
- Read the charter (`INTENT.md`) and integration instructions (`AGENTS.md`)
- Track bootstrap tasks via `workplans/SAND-WP-0001-statehub-bootstrap.md`
- Log progress and decisions to State Hub when the hub is reachable
- Use **interim** sandbox execution via `the-custodian` directly:
- `make e2e REPO=<repo>` (e2e-framework on railiance01/CoulombCore)
- `infra/build-machines/` Packer VMs with build-agent registration
Nothing in **this repo** provisions or manages sandboxes yet.
---
## What Is Not Possible Yet
- Request a sandbox through sand-boxer CLI or API
- Select a named, versioned profile from this repo's catalog
- Register `capability.execution.sandbox-provision` (index entry absent)
- Automatic lifecycle registration of generic sandbox identities in State Hub
- Host placement on sandboxer01 via sand-boxer policy (host may not exist yet)
- activity-core or agents invoking sand-boxer without workstation repo paths
- Local install/test/lint/build commands documented for this repo (no package
layout yet)
---
## How It Fits
```mermaid
flowchart LR
AC[activity-core] -->|when| SB[sand-boxer]
AGT[agents / atm] -->|request sandbox| SB
SB -->|provision / teardown| HOST[sandboxer01 / interim host]
SB -->|lifecycle events| SH[state-hub]
SB -->|reachability| OB[ops-bridge]
SB -->|SSH identity| OW[ops-warden]
RS[reuse-surface] -->|federate| REG[registry/]
TC[the-custodian e2e + build-machines] -.->|migrate from| SB
```
- **Upstream dependencies:** ops-bridge (tunnels), ops-warden (certs, optional),
State Hub (registration API), registered sandbox hosts (SSH + Docker/Packer)
- **Downstream consumers:** LLM agents, activity-core instructions, CI hooks,
cross-repo e2e callers migrating off `the-custodian`
- **Often used with:** `activity-core` (orchestration), `state-hub` (visibility),
`reuse-surface` (capability discovery)
---
## Terminology
- **Profile** — named, versioned sandbox type with provision/teardown contract
- **Sandbox** — a running isolated environment instance of a profile
- **Host placement** — policy mapping profiles to sandboxer01, CoulombCore, etc.
- **TTL** — time-to-live; sandboxes are disposable by default
- **Phone home** — reachability and registration via ops-bridge + State Hub
- Actor types (consumers): `adm` (operator), `agt` (LLM agent), `atm` (automation)
---
## Related / Overlapping
- `the-custodian` — current home of e2e-framework and build-machines; governance
canon; sand-boxer extracts reusable execution platform from here
- `ops-bridge` — SSH reverse tunnels; sand-boxer orchestrates reachability, does
not run tunnel daemons
- `ops-warden` — SSH CA and certificate issuance
- `state-hub` — workstream/task state and sandbox lifecycle visibility
- `activity-core` — schedules work; may request sandboxes as execution venue
- `reuse-surface` — federates `registry/` capability entries
- `railiance-cluster` / `railiance-apps` — production layer; explicitly not
sandbox execution surface
---
## Provided Capabilities
*Planned — not yet registered in `registry/indexes/capabilities.yaml`.*
```capability
type: execution
title: Sandbox provisioning
description: Isolated execution environments for agentic development, e2e testing, and bounded automations — profile-based provision, TTL teardown, and State Hub lifecycle registration.
keywords: [sandbox, isolation, provision, e2e, agentic, execution, profile]
```
Target registry id: `capability.execution.sandbox-provision` (or equivalent per
reuse-surface naming).
---
## Getting Oriented
- Start with: `INTENT.md` (meta-framework charter)
- Research: `research/` (landscape, reference systems, design synthesis)
- Agent instructions: `AGENTS.md` (State Hub session protocol)
- Offline brief: `.custodian-brief.md`
- Workplans: `workplans/` (bootstrap: `SAND-WP-0001`)
- Registry authoring: `registry/README.md`
- Lineage reference (external): `the-custodian/e2e-framework/RUNBOOK.md`,
`the-custodian/infra/build-machines/README.md`

View File

@@ -0,0 +1,153 @@
# Agent sandbox landscape (2026)
Survey of modern sandbox infrastructure for agentic coding — isolation
technologies, provider models, and industry convergence patterns relevant to
sand-boxer.
## Market definition
**AI agent sandboxes** are isolated execution environments for running
AI-generated or agent-requested code safely. They optimize for:
- Fast create / resume / teardown
- Programmatic lifecycle APIs
- Isolation from host and peer workloads
- Developer- and agent-friendly SDKs
This is distinct from general application hosting and from agent harnesses
(memory, channels, tool orchestration).
## Provider landscape (summary)
| Platform | Model | Creation | Persist / checkpoint | Isolation | Notes |
|----------|-------|----------|----------------------|-----------|-------|
| **E2B** | Managed SaaS | ~150ms | Pause/resume, snapshots | Firecracker | Scale leader; template + sandbox API |
| **Daytona** | Managed + OSS | ~90ms | Snapshots, fork | Docker/Kata | Open-source self-host path |
| **Modal** | Serverless SaaS | Sub-second | Memory snapshots, volumes | gVisor | Strong GPU; code-defined runtime |
| **Blaxel** | Managed | Sub-25ms resume | Hibernate | microVM | Zero idle compute billing |
| **Vercel Sandbox** | Managed | ms | Snapshots, persistent default | Firecracker | Vercel ecosystem |
| **Cloudflare Sandbox SDK** | Edge | seconds / ms (isolates) | DO state | Containers / V8 | Workers-native |
| **AWS AgentCore** | Managed sessions | — | Session ≤8h | microVM | Hyperscaler bundling |
| **Google Agent Sandbox** | Managed preview | Sub-second | TTL ≤14d | Hardened containers | Gemini Enterprise layer |
| **OpenSandbox** | Self-hosted OSS | Pool pre-warm | Pause/resume, PVC | gVisor/Kata/Firecracker | K8s-scale; CNCF Landscape |
| **OpenShell** | Policy runtime | — | Long-lived sandboxes | Landlock/seccomp/OPA | Governance layer, not hosted platform |
| **Northflank** | BYOC + managed | ~200ms | Persistent | microVM/gVisor | VPC deployment |
| **Runloop** | Managed | ~100ms exec | Snapshot, branch | Custom hypervisor | SWE-bench / eval focus |
| **Sprites** | Managed | 12s | ~300ms checkpoints | Firecracker | Persistent-first |
| **ComputeSDK** | Abstraction | Varies | Varies | Varies | Multi-provider router (9 backends) |
Sources: [Ry Walker research (Jun 2026)](https://rywalker.com/research/ai-agent-sandboxes),
provider docs, Modal/E2B marketing materials. Treat vendor claims as directional.
## Isolation technology spectrum
| Technology | Used by | Security level | Performance |
|------------|---------|----------------|-------------|
| **Firecracker** | E2B, Sprites, Vercel | Hardware-level microVM | Fast |
| **gVisor / Kata** | Modal, Northflank, OpenSandbox | Kernel-level | Very fast |
| **Hardened Docker** | Daytona, AIO Sandbox | Container-level | Fastest setup |
| **Landlock / seccomp / OPA** | OpenShell | Kernel policy | Native speed |
| **V8 isolates** | Cloudflare Worker Loader | Process-level | Milliseconds |
**Implication for sand-boxer:** profile metadata must declare `isolation_level`
so consumers can reason about blast radius. Extensions map profiles to concrete
runtimes; the meta-framework does not mandate one technology.
## Convergence trends (2025 → 2026)
### 1. Ephemeral vs persistent collapsed
Early market split (E2B = ephemeral, Sprites = persistent) has merged. Most
platforms now offer:
- Persistent workspace by default or as first-class option
- Checkpoint / snapshot / hibernate for fast resume
- TTL and explicit teardown still expected for cost and security
**sand-boxer takeaway:** profiles should support `persistence: ephemeral |
persistent | checkpoint` as a first-class dimension, not a backend detail.
### 2. Checkpointing is table stakes
Sub-second to low-second restore times are becoming baseline for agent coding
(workspace state, installed deps, shell history — not always live PIDs).
**sand-boxer takeaway:** lifecycle API needs `snapshot`, `restore`, `fork`
operations even if early extensions only implement `recreate`.
### 3. Security stress-tests exposed limits
Research on AWS AgentCore and OpenShell/NemoClaw showed that **allowed egress
paths** (git, npm, curl, node to allowlisted hosts) can be weaponized for
exfiltration when agents are prompt-injected or tricked into malicious
dependencies. Policy controls *destination*, not *intent*.
**sand-boxer takeaway:** document honestly that sandboxing is blast-radius
control, not agent-behavior guarantee. Default-deny network; per-profile egress
allowlists; secrets injected at boundary, never in agent-visible workspace.
### 4. Hyperscaler bundling pressures independents
AWS, Google, Cloudflare, Vercel entered the category in one quarter.
Independents compete on multi-cloud neutrality, price, isolation depth, or
open-source self-host.
**sand-boxer takeaway:** OpenRouter-style routing across self-hosted and SaaS
backends is a defensible Coulomb position — no single-vendor lock-in.
### 5. Abstraction layers emerging
ComputeSDK routes one TypeScript API across E2B, Modal, Daytona, Runloop,
Cloudflare, Vercel, etc. — "Terraform for running other people's code."
**sand-boxer takeaway:** validate the meta-framework API against this pattern;
extensions are providers; sand-boxer core is router + policy + billing + registry.
## Architecture patterns (industry)
### Gateway / harness vs runtime (universal split)
```
[Agent gateway / harness] ──orchestrates──▶ [Sandbox runtime]
(host or control plane) (isolated)
```
OpenClaw and Hermes both keep the gateway on the host and run **tool execution**
in the sandbox. sand-boxer owns the runtime side only; **glas-harness** owns the
gateway/harness side (see `03-meta-framework-synthesis.md`).
### Profile + backend + scope (OpenClaw / Hermes consensus)
| Dimension | Examples |
|-----------|----------|
| **Backend** | docker, ssh, openshell, modal, daytona, compose-ssh |
| **Scope** | per-agent, per-session, shared |
| **Workspace** | isolated, ro-mount, rw-mount; mirror vs remote-canonical |
| **Network** | default deny; optional allowlist |
| **TTL** | mandatory; idle reaper optional |
### Credential and reachability boundary
Best practice: credentials brokered at sandbox edge (OpenShell proxy, Blitzy
secrets-never-to-AI, ops-warden certs). Agent process never holds production
tokens for unrelated systems.
sand-boxer integrates **ops-bridge** (reachability) and **ops-warden**
(identity) as consumers — does not replace them.
## What sand-boxer should adopt vs defer
| Adopt now (meta-framework) | Defer (extension or phase 2) |
|----------------------------|------------------------------|
| Unified provision/teardown API | GPU profiles |
| Named versioned profiles | Browser sandbox profiles |
| Extension plugin interface | Intent-aware egress filtering |
| Self-hosted compose-ssh (e2e lineage) | Native Firecracker control plane |
| State Hub lifecycle registration | Multi-region routing |
| Default-deny network policy | Computer Use / desktop sandboxes |
| Payments routing for SaaS backends | Owned hyperscale sandbox fleet |
## Related reading
- [02-reference-frameworks.md](02-reference-frameworks.md) — OpenClaw, Hermes, Blitzy
- [03-meta-framework-synthesis.md](03-meta-framework-synthesis.md) — API and extensions

View File

@@ -0,0 +1,204 @@
# Reference frameworks and platforms
Deep dives on systems sand-boxer should learn from — especially OpenClaw,
Hermes Agent, Blitzy, and OpenShell — plus hosted platforms as extension
targets.
---
## OpenClaw
**What it is:** Personal AI assistant with optional tool sandboxing.
**Docs:** https://docs.openclaw.ai/gateway/sandboxing
### Role in the stack
OpenClaw is an **agent harness** (gateway, channels, skills, memory). Sandboxing
is optional configuration on tool execution — not the product core. This is the
same boundary sand-boxer draws vs **glas-harness**.
### Sandbox architecture
**What gets sandboxed:** `exec`, `read`, `write`, `edit`, `apply_patch`,
`process`, optional sandboxed browser. Gateway stays on host.
**Backends:**
| Backend | Where | Workspace model |
|---------|-------|-----------------|
| `docker` | Local container | Bind-mount or copy; default `network: "none"` |
| `ssh` | Remote SSH host | Remote-canonical: seed once, exec remotely |
| `openshell` | OpenShell-managed | `mirror` (local canonical) or `remote` (remote canonical) |
**Scope:** `agent` (default) | `session` | `shared` — controls container count.
**Mode:** `off` | `non-main` | `all` — when sandboxing applies.
**Workspace access:** `none` | `ro` | `rw` — what tools can see.
### Security patterns worth copying
- Default Docker network **none**
- Bind-mount blocklist: `docker.sock`, `/etc`, `~/.ssh`, `~/.aws`, credential roots
- Symlink-aware path validation before bind approval
- `tools.elevated` as explicit sandbox bypass (audited escape hatch)
- Honest disclaimer: reduces blast radius, not perfect boundary
### sand-boxer lessons
1. **Backend / scope / workspaceAccess** vocabulary is proven — adopt in profile schema
2. **SSH remote-canonical** matches Custodian e2e-framework evolution path
3. **mirror vs remote** workspace modes belong in meta-framework API
4. OpenClaw integrates OpenShell as extension — validates extension-delegation model
---
## Hermes Agent
**What it is:** Agent harness from Nous Research with multi-backend terminal execution.
**Repo:** https://github.com/NousResearch/hermes-agent
### Terminal backends (six)
| Backend | Isolation | Persistence |
|---------|-----------|-------------|
| `local` | None | — |
| `docker` | Cap-drop ALL, pids-limit, tmpfs | Single long-lived labeled container |
| `ssh` | Network boundary | Persistent remote shell |
| `modal` | Cloud VM | Filesystem snapshots |
| `daytona` | Cloud container | Stop/resume |
| `singularity` | HPC namespaces | Writable overlay |
### Docker backend highlights
- **One container per task**, reused across sessions and Hermes process restarts
- Labels: `hermes-agent=1`, `hermes-task-id`, `hermes-profile`
- `docker_persist_across_processes: true` (default) — container survives process exit
- Resource limits: CPU, memory, disk, `lifetime_seconds` idle reaper
- `docker_forward_env` — secrets from host `.env`, not config YAML
- Parallel subagents **share** container unless per-task image override
### sand-boxer lessons
1. **Labeled reuse** beats cold provision per tool call for agent coding efficiency
2. Resource limits and idle reaper are profile-level concerns
3. Modal/Daytona as **extension backends** — Hermes consumes, does not own
4. Credential forwarding policy belongs in extension contract, not agent config
---
## NVIDIA OpenShell + NemoClaw (Hermes deployment)
**OpenShell:** Policy runtime for agent sandboxes — Landlock, seccomp, OPA egress.
**NemoClaw:** Reference stack deploying Hermes inside OpenShell.
### Three-layer model (industry pattern)
| Layer | Component | Responsibility |
|-------|-----------|----------------|
| Model | LLM provider | Reasoning |
| Harness | Hermes | Skills, memory, bridges, scheduling |
| Runtime | OpenShell | Filesystem/network policy, credential brokering |
sand-boxer maps to **runtime** only. glas-harness maps to **harness**.
### Policy model
Declarative YAML: allowed hosts, ports, HTTP methods, **binary-scoped** rules
(e.g. only `curl` may reach `api.github.com`). Credentials injected at egress
proxy — agent never sees Slack/Outlook tokens.
### Snapshot / restore
NemoClaw ships `snapshot.sh` / `restore.sh` for agent state (skills, memories,
sessions) across redeploys. Credential filter excludes secrets from tarballs.
### Security research (Lasso, Apr 2026)
Demonstrated exfiltration via **policy-permitted** paths (git PR, npm postinstall
→ Discord). Policies enforced correctly; intent not evaluated.
**sand-boxer lesson:** OpenShell-class extensions should be offered; security
runbooks must state limits of egress allowlisting.
---
## Blitzy
**What it is:** AI-native code generation platform — **not** a sandbox runtime.
### "Blitzy Sandbox" GitHub org
Public demo repos for Explore members. Not execution infrastructure.
### Real isolation model: Environments
https://docs.blitzy.com/administration/environments
- Natural-language **setup instructions** (toolchain, build, run, test)
- **Variables** (plaintext) vs **Secrets** (encrypted, masked, **never sent to AI**)
- Multi-environment priority merge (base + project override)
- Validation in configured environment after code generation
### sand-boxer lessons (environment metadata, not runtime)
| Blitzy pattern | sand-boxer mapping |
|----------------|-------------------|
| Environment config | Profile `setup` metadata block |
| Secrets never to AI | `secret_refs` resolved at provision boundary |
| Setup instructions | Profile runbook for extension bootstrap |
| Human review gates | Out of scope — **snuggle-inventor** / PR workflow |
Blitzy validates that **describing how to boot an environment** is as important
as **where it runs**. sand-boxer profiles carry both.
---
## Hosted platforms as extension targets
sand-boxer extensions may delegate to SaaS providers. Initial extension candidates:
| Extension id | Provider | Self-host alt | Payments |
|--------------|----------|---------------|----------|
| `ext.e2b` | E2B | — | Per-second SaaS |
| `ext.modal` | Modal | — | Per-second + GPU |
| `ext.daytona` | Daytona cloud | `ext.daytona-self` (OSS) | SaaS or infra cost |
| `ext.openshell` | — | OpenShell local/k3s | Infra cost |
| `ext.compose-ssh` | — | sandboxer01 / CoulombCore | Infra cost |
| `ext.vm-packer` | — | build-machines lineage | Infra cost |
ComputeSDK (https://github.com/computesdk/computesdk) is a useful reference for
normalizing provider differences behind one client API.
---
## OpenRouter analogy
| OpenRouter | sand-boxer |
|------------|------------|
| Unified LLM API | Unified sandbox API |
| Routes to OpenAI, Anthropic, … | Routes to E2B, Modal, self-hosted compose, … |
| API keys / credits / billing | Payments layer for SaaS consumption |
| Model metadata (context, price) | Profile metadata (isolation, cost, latency) |
| Fallback / routing policy | Host placement + extension fallback |
sand-boxer does not run inference; it runs **isolation**. The routing and
payments patterns transfer directly.
---
## Anti-patterns to avoid
| Anti-pattern | Why |
|--------------|-----|
| Rebuild OpenClaw/Hermes gateway in sand-boxer | glas-harness scope |
| Embed e2e test orchestration in provisioner | wise-validator scope |
| Generate code inside sandbox API | snuggle-inventor scope |
| Own SSH tunnels or CA | ops-bridge / ops-warden scope |
| Claim sandbox = safe from prompt injection | Research disproves |
## Related reading
- [01-agent-sandbox-landscape.md](01-agent-sandbox-landscape.md)
- [03-meta-framework-synthesis.md](03-meta-framework-synthesis.md)
- `INTENT.md` — normative charter

View File

@@ -0,0 +1,294 @@
# Meta-framework synthesis
Design notes distilled from landscape research for sand-boxer's unified sandbox
API, extension model, payments layer, and Coulomb project boundaries.
---
## Core thesis
sand-boxer is a **meta-framework for establishing sandboxes** — like OpenRouter
is a meta-framework for accessing LLM models:
- One **consistent API** for consumers (`adm`, `agt`, `atm`, domain services)
- Many **extensions** that delegate to self-hosted or SaaS sandbox systems
- **Integrated payments** when consuming metered external services
- **Registry-first** profiles and capabilities via reuse-surface
- **Later:** a Coulomb-native "best of brands" runtime built from operational
experience — not day one
sand-boxer provisions **where and how code runs**. It does not provision **how
agents think**, **what tests mean**, or **what code gets written**.
---
## Coulomb project boundaries
These sibling projects are **planned Coulomb repos** with explicit authority
split. sand-boxer must not absorb their concerns.
```mermaid
flowchart LR
subgraph establish [sand-boxer]
SB[Establish sandbox]
end
subgraph harness [glas-harness]
GH[Agent harness: gateway tools memory channels]
end
subgraph validate [wise-validator]
WV[E2E tests health checks validation orchestration]
end
subgraph generate [snuggle-inventor]
SI[Code generation modernization]
end
GH -->|request sandbox| SB
WV -->|request sandbox| SB
SI -->|request sandbox| SB
WV -.->|runs tests in| SB
GH -.->|executes tools in| SB
SI -.->|validates output in| SB
```
| Project | Owns | Does not own |
|---------|------|--------------|
| **sand-boxer** | Sandbox profiles, provision/teardown, extension routing, placement, lifecycle registration, payments for sandbox consumption | Agent memory, channels, tool policies, test definitions, code generation |
| **glas-harness** | Agent gateway, harness, skills, subagents, tool orchestration, channel bridges | Sandbox runtime, isolation enforcement, host placement |
| **wise-validator** | E2E test orchestration, health check semantics, validation workflows, result reporting | Sandbox provisioning, agent conversation state |
| **snuggle-inventor** | Code generation, tech specs, AAP-style planning, PR-oriented output | Sandbox infrastructure, test harness canon |
### Integration contracts (intended)
**glas-harness → sand-boxer**
```
POST /v1/sandboxes
profile: "profile.agent-dev"
scope: session | agent | shared
workspace: { mode: mirror | remote, access: none | ro | rw }
consumer: { actor: agt, harness: glas-harness, session_id }
```
Harness receives: `sandbox_id`, reachability descriptor (SSH endpoint, tunnel ref),
lifecycle webhook or poll URL. Harness executes tools **inside** sandbox via
agreed exec channel — sand-boxer does not parse tool calls.
**wise-validator → sand-boxer**
```
POST /v1/sandboxes
profile: "profile.compose-e2e"
inputs: { repo_ref, compose_bundle_ref }
ttl: 2h
consumer: { actor: atm, harness: wise-validator, run_id }
```
wise-validator owns `e2e.yml` semantics, health check definitions, test commands,
and pass/fail interpretation. sand-boxer delivers an environment; wise-validator
runs the validation story **on top**.
**snuggle-inventor → sand-boxer**
```
POST /v1/sandboxes
profile: "profile.build"
setup_metadata: { instructions_ref, secret_refs }
consumer: { actor: agt, harness: snuggle-inventor, job_id }
```
snuggle-inventor may attach Blitzy-style setup instructions as profile inputs.
sand-boxer resolves secrets at boundary; generated code never flows through
sand-boxer APIs.
### Migration from the-custodian
| Legacy | New owner |
|--------|-----------|
| `e2e-framework/` provision/teardown | sand-boxer `ext.compose-ssh` |
| `e2e-framework/` test run + report | wise-validator (calls sand-boxer) |
| Agent tool sandbox config | glas-harness (calls sand-boxer) |
| `infra/build-machines/` | sand-boxer `ext.vm-packer` |
---
## Meta-framework API (conceptual)
### Resources
| Resource | Description |
|----------|-------------|
| `Profile` | Named, versioned sandbox recipe (image, isolation, network, TTL, extension) |
| `Extension` | Backend adapter (self-hosted or SaaS) |
| `Host` | Registered placement target for self-hosted extensions |
| `Sandbox` | Running instance of a profile |
| `Snapshot` | Point-in-time workspace checkpoint (optional) |
| `Route` | Extension selection policy (cost, latency, capability) |
| `Meter` | Usage record for payments layer |
### Sandbox lifecycle states
```
requested → provisioning → ready → active → { expired | failed } → destroying → destroyed
```
All transitions emit State Hub events. `ready` means reachability probe succeeded.
### Core operations
| Operation | Description |
|-----------|-------------|
| `create` | Provision from profile + inputs |
| `get` / `list` | Inspect status |
| `exec` | Run command in sandbox (optional — may be harness-owned) |
| `extend_ttl` | Explicit persistence extension |
| `snapshot` / `restore` | Checkpoint workspace |
| `recreate` | Destroy and reprovision from seed |
| `destroy` | Idempotent teardown |
Early versions may expose only `create`, `get`, `destroy`, `recreate`; harnesses
can own `exec` via SSH/tunnel without sand-boxer proxying every command.
### Profile schema (minimum)
```yaml
id: profile.compose-e2e
version: "1.0.0"
extension: ext.compose-ssh
isolation:
level: container # container | microvm | policy
network:
default: deny
egress: [] # extension interprets
workspace:
mode: remote-canonical # mirror | remote-canonical
access: rw
scope_default: session
ttl:
default: 4h
max: 24h
idle_reap: null
resources:
cpu: null
memory_mb: null
setup:
instructions: "" # Blitzy-style natural language for extension bootstrap
secret_refs: [] # resolved at provision; never in agent context
placement:
prefer: [sandboxer01]
fallback: [coulombcore]
reachability:
tunnel: ops-bridge
identity: ops-warden
metadata:
cost_class: self-hosted # self-hosted | saas-metered
latency_class: standard
```
### Extension interface (contract)
Each extension implements:
```text
provision(profile, inputs, placement) → sandbox_handle
wait_ready(sandbox_handle) → reachability
teardown(sandbox_handle) → cleanup_report
snapshot?(sandbox_handle) → snapshot_id
restore?(snapshot_id) → sandbox_handle
estimate_cost?(profile, duration) → meter_quote
```
Extensions register in `registry/` with capability vectors (isolation level,
regions, GPU, persistence, pricing model).
**Bundled extensions (roadmap):**
| Priority | Extension | Type |
|----------|-----------|------|
| P0 | `ext.compose-ssh` | Self-hosted (e2e-framework lineage) |
| P1 | `ext.vm-packer` | Self-hosted (build-machines lineage) |
| P2 | `ext.daytona-self` | Self-hosted OSS |
| P3 | `ext.e2b`, `ext.modal`, `ext.daytona` | SaaS + payments |
| P4 | `ext.openshell` | Policy runtime wrapper |
---
## Payments layer
For SaaS extensions, sand-boxer provides an **integrated payments and metering
layer** analogous to OpenRouter credits:
| Concern | sand-boxer approach |
|---------|---------------------|
| Account credits | Org/workspace balance for sandbox consumption |
| Metering | Per-second, per-creation, GPU surcharge — per extension quote |
| Provider keys | BYOK optional; platform keys for convenience |
| Cost visibility | `estimate_cost` before create; actuals on destroy |
| Billing events | Export to fin-hub / external billing (consumer, not owner) |
Self-hosted extensions bill **infra cost only** (host allocation) — no SaaS meter.
Payments is a **facility inside sand-boxer**, not a general payment processor.
Domain billing authority remains elsewhere.
---
## Routing policy (OpenRouter-style)
When multiple extensions satisfy a profile capability:
```yaml
route:
strategy: prefer-self-hosted | lowest-cost | lowest-latency | explicit
fallback: [ext.compose-ssh, ext.daytona]
constraints:
max_cost_per_hour: null
require_isolation: microvm
region: eu
```
Default Coulomb posture: **prefer-self-hosted** on sandboxer01; SaaS for burst
or capability gaps (GPU, desktop) once extensions exist.
---
## Security posture (documented limits)
sand-boxer commits to:
1. Default-deny network unless profile explicitly allows egress
2. Secrets resolved at provision boundary via ops-warden / secret refs
3. Blast-radius isolation on dedicated hosts away from Railiance01 production
4. Observable lifecycle and attributable actors (`adm` / `agt` / `atm`)
5. Honest documentation: **allowed tool paths can be abused by compromised agents**
sand-boxer does **not** commit to intent-aware egress filtering in v1.
---
## Phased maturity
| Phase | Deliverable |
|-------|-------------|
| **0** | Charter, research, profile schema, `ext.compose-ssh` design |
| **1** | Unified API + self-hosted compose-ssh + State Hub registration |
| **2** | Extension SDK + vm-packer + registry entries + routing |
| **3** | SaaS extensions + payments layer |
| **4** | Snapshot/restore + checkpoint profiles |
| **5** | Coulomb-native runtime ("best of brands") informed by extension ops data |
Phase 5 is explicitly **later** — learn from routing, billing, failure modes, and
latency before building owned microVM/control-plane.
---
## Open questions (for workplans)
1. Does `exec` live in sand-boxer API or only in glas-harness via SSH?
2. Payments: integrate with existing fin-hub or standalone credits first?
3. Profile authorship: repo-local YAML vs hub-managed catalog?
4. wise-validator: fork e2e-framework reporter or new contract from day one?
These belong in SAND-WP-0002+ design workplans, not INTENT.md.

22
research/README.md Normal file
View File

@@ -0,0 +1,22 @@
# sand-boxer research
Research informing the sand-boxer meta-framework charter and implementation
roadmap. These documents are **inputs to design**, not normative specs — see
`INTENT.md` for authority and boundaries.
## Index
| Document | Contents |
|----------|----------|
| [01-agent-sandbox-landscape.md](01-agent-sandbox-landscape.md) | Market survey: isolation technologies, providers, convergence trends |
| [02-reference-frameworks.md](02-reference-frameworks.md) | Deep dives: OpenClaw, Hermes, Blitzy, OpenShell, hosted platforms |
| [03-meta-framework-synthesis.md](03-meta-framework-synthesis.md) | Design synthesis: API shape, extensions, payments, Coulomb boundaries |
## How to use
1. Read `INTENT.md` for the governing charter.
2. Use `03-meta-framework-synthesis.md` when designing profiles, extensions, or
the unified API.
3. Use `01` and `02` when evaluating a backend extension or security posture.
Last updated: 2026-06-22

View File

@@ -0,0 +1,56 @@
---
id: SAND-WP-0001
type: workplan
title: "Bootstrap State Hub integration"
domain: infotech
repo: sand-boxer
status: ready
owner: codex
topic_slug: custodian
created: "2026-06-22"
updated: "2026-06-22"
status: active
---
# Bootstrap State Hub integration
Sandboxing for agentic coding facility.
## Review Generated Integration Files
```task
id: SAND-WP-0001-T01
status: done
priority: high
```
Review `INTENT.md`, `SCOPE.md`, `AGENTS.md`, and `.custodian-brief.md`.
Replace generated placeholders with repo-specific facts where needed.
## Verify Local Developer Workflow
```task
id: SAND-WP-0001-T02
status: todo
priority: high
```
Identify the repo's install, test, lint, build, and run commands. Add or refine
those commands in the agent instructions so future coding sessions can verify
changes confidently.
## Seed First Real Workplan
```task
id: SAND-WP-0001-T03
status: done
priority: medium
```
Create the first implementation workplan for the repository's most important
next change. Created `workplans/SAND-WP-0002-meta-framework-foundation.md`.
After workplan file updates, run from `~/state-hub`:
```bash
make fix-consistency REPO=sand-boxer
```

View File

@@ -0,0 +1,246 @@
---
id: SAND-WP-0002
type: workplan
title: "Meta-framework foundation and first extension"
domain: infotech
repo: sand-boxer
status: ready
owner: codex
topic_slug: custodian
created: "2026-06-22"
updated: "2026-06-22"
---
# Meta-framework foundation and first extension
Establish sand-boxer as a meta-framework: unified API contract, profile catalog,
extension platform, and the first self-hosted backend (`ext.compose-ssh`) migrated
from `the-custodian/e2e-framework/`.
**Charter:** `INTENT.md`
**Research:** `research/03-meta-framework-synthesis.md`
**Predecessor:** SAND-WP-0001 (bootstrap; T02 dev workflow should complete in
parallel or before T03 here)
## Design meta-framework contracts
```task
id: SAND-WP-0002-T01
status: todo
priority: high
```
Author `docs/meta-framework.md` (or `specs/meta-framework.md`) defining:
- Resource model: Profile, Extension, Host, Sandbox, Snapshot, Route, Meter
- Lifecycle states and State Hub event mapping
- Core API operations: `create`, `get`, `list`, `extend_ttl`, `recreate`,
`destroy` (snapshot/restore deferred to SAND-WP-0003)
- Consumer attribution schema (`adm` / `agt` / `atm`, calling project id)
- Extension interface: `provision`, `wait_ready`, `teardown`, optional
`estimate_cost`
- Routing policy vocabulary (`prefer-self-hosted`, `lowest-cost`, `explicit`)
- Security limits statement (blast-radius vs intent — per research)
Derive from `research/03-meta-framework-synthesis.md`; do not duplicate harness,
validator, or codegen concerns.
## Define profile and extension schemas
```task
id: SAND-WP-0002-T02
status: todo
priority: high
```
Add machine-readable schemas (JSON Schema or Python pydantic models) for:
- `Profile` — extension binding, isolation, network, workspace mode, scope,
TTL, resources, setup metadata, placement, reachability, cost class
- `Extension` — id, capabilities, isolation levels, pricing model, regions
- `SandboxCreateRequest` / `SandboxStatus` response shapes
Ship `profiles/profile.compose-e2e.yaml` as the reference profile (successor to
`e2e/e2e.yml` inputs; validation semantics stay with wise-validator).
Register extension stub `extensions/ext.compose-ssh.yaml` with capability
metadata.
## Scaffold package and developer workflow
```task
id: SAND-WP-0002-T03
status: todo
priority: high
```
Create Python package layout (aligned with e2e-framework lineage):
```
src/sandboxer/ # or sandboxer/ at repo root — pick one, document in AGENTS.md
api/
profiles/
extensions/
lifecycle/
tests/
pyproject.toml
```
Document in `AGENTS.md`: install (`uv sync` or equivalent), test, lint, format,
and CLI entry point. Satisfies SAND-WP-0001-T02 if not already done.
## Implement extension registry and loader
```task
id: SAND-WP-0002-T04
status: todo
priority: high
```
Implement extension discovery and registration:
- Load extension definitions from `extensions/`
- Plugin entry-point or explicit registry for `ext.compose-ssh`
- Validate extension declares required capability fields before registration
- Unit tests for load failures and duplicate ids
No SaaS extensions in this workplan — self-hosted only.
## Implement ext.compose-ssh (e2e-framework lineage)
```task
id: SAND-WP-0002-T05
status: todo
priority: high
```
Extract and adapt provision/teardown from `the-custodian/e2e-framework/`:
- SSH to configured host; isolated directory per sandbox id
- Unique compose project name; `compose up` / `compose down` (idempotent)
- Default-deny network posture per profile (document host-side requirements)
- Host placement: read `placement.prefer` / `fallback` from profile
- **Do not** port test execution, health polling, or State Hub result reporting
— those are wise-validator responsibilities
Provide a compatibility note in extension README for interim `make e2e` callers.
## Implement API v0 and CLI
```task
id: SAND-WP-0002-T06
status: todo
priority: high
```
Ship minimal establishment surface:
**CLI** (primary for v0):
```bash
sandbox create --profile profile.compose-e2e --input repo=/path/to/repo
sandbox get <id>
sandbox list
sandbox recreate <id>
sandbox destroy <id>
```
**HTTP** (optional in v0; stub acceptable if CLI calls core library directly):
- `POST /v1/sandboxes`, `GET /v1/sandboxes/{id}`, `DELETE /v1/sandboxes/{id}`
Core library must be harness-agnostic — glas-harness, wise-validator, and
snuggle-inventor call the same functions.
## State Hub lifecycle registration
```task
id: SAND-WP-0002-T07
status: todo
priority: medium
```
On sandbox state transitions, emit State Hub progress events (or dedicated
registration endpoint when available):
- `requested`, `provisioning`, `ready`, `active`, `destroying`, `destroyed`
- Include: `sandbox_id`, `profile_id`, `extension_id`, `host`, `consumer`,
`actor_type`, timestamps
Extend the `build-agent` self-register pattern sketch for generic sandbox
identities. Document contract in meta-framework spec.
## Document sibling integration contracts
```task
id: SAND-WP-0002-T08
status: todo
priority: medium
```
Add `docs/integrations/` with one page per planned sibling:
| Doc | Contents |
|-----|----------|
| `glas-harness.md` | Sandbox handle + reachability; harness owns exec |
| `wise-validator.md` | `profile.compose-e2e`; validator owns e2e.yml + health + tests |
| `snuggle-inventor.md` | Setup metadata + secret_refs; no codegen in sand-boxer |
Each doc: example request, response fields, ownership table, out-of-scope list.
Cross-link from `INTENT.md` Coulomb boundaries section.
## Register capability and fix registry scaffold
```task
id: SAND-WP-0002-T09
status: todo
priority: medium
```
- Author `registry/capabilities/execution.sandbox-provision.md`
- Add row to `registry/indexes/capabilities.yaml` (`domain: infotech`)
- Run `reuse-surface validate` when CLI available
- Notify operator: `make fix-consistency REPO=sand-boxer` from `~/state-hub`
## Verification and migration smoke test
```task
id: SAND-WP-0002-T10
status: todo
priority: medium
```
End-to-end proof on CoulombCore or sandboxer01 (when reachable):
1. `sandbox create` with `profile.compose-e2e` for a repo with `e2e/` layout
2. Confirm `ready` state and reachability descriptor
3. Manual or scripted compose health check (not wise-validator — just proves
environment exists)
4. `sandbox destroy` — confirm idempotent cleanup (no leftover compose projects
or `/tmp` dirs)
5. Document runbook in `docs/runbooks/profile-compose-e2e.md`
Record gaps for wise-validator migration (SAND-WP-0003) and `the-custodian`
shim (SAND-WP-0004).
---
## Out of scope (follow-on workplans)
| Item | Target workplan |
|------|-----------------|
| wise-validator extraction + e2e test orchestration | SAND-WP-0003 |
| `the-custodian` Makefile shim + deprecation timeline | SAND-WP-0004 |
| `ext.vm-packer` (build-machines) | SAND-WP-0005 |
| SaaS extensions + payments layer | SAND-WP-0006 |
| Snapshot / restore / checkpoint profiles | SAND-WP-0007 |
| Coulomb-native runtime (phase 5) | Backlog |
## Completion criteria
- Meta-framework spec and schemas merged
- `ext.compose-ssh` provisions and tears down a compose sandbox via CLI
- State Hub receives lifecycle events for at least one full create→destroy cycle
- Sibling integration docs published
- `capability.execution.sandbox-provision` registered and validated
- All tasks `done`; workplan `status: finished`; operator runs fix-consistency