generated from coulomb/repo-seed
docs: charter meta-framework vision, research, and SAND-WP-0002
Rewrite INTENT.md as the sand-boxer meta-framework charter (OpenRouter-style sandbox API, extensions, payments, Coulomb sibling boundaries). Add research under research/, update SCOPE.md, bootstrap workplans SAND-WP-0001/0002, and State Hub integration files from the bootstrap pass.
This commit is contained in:
20
.claude/rules/agents.md
Normal file
20
.claude/rules/agents.md
Normal file
@@ -0,0 +1,20 @@
|
||||
## Kaizen Agents
|
||||
|
||||
Specialized agent personas available on demand via the state-hub MCP.
|
||||
|
||||
**Discover:** `list_kaizen_agents()` — returns all agents with name, description, category
|
||||
**Load:** `get_kaizen_agent("tdd-workflow")` — returns full instructions; read and follow them
|
||||
|
||||
Common agents:
|
||||
|
||||
| Agent | Category | When to use |
|
||||
|-------|----------|-------------|
|
||||
| `tdd-workflow` | testing | Step-by-step TDD8 workflow for any feature |
|
||||
| `code-refactoring` | quality | Code quality analysis and safe refactoring |
|
||||
| `test-maintenance` | testing | Diagnose and fix failing tests |
|
||||
| `requirements-engineering` | process | Prevent interface/mock mismatches upfront |
|
||||
| `keepaTodofile` | process | Maintain TODO.md during work |
|
||||
| `project-management` | process | Track status, determine next steps |
|
||||
| `datamodel-optimization` | quality | Optimize dataclasses and data structures |
|
||||
|
||||
All 17 agents: call `list_kaizen_agents()` for the full list.
|
||||
8
.claude/rules/architecture.md
Normal file
8
.claude/rules/architecture.md
Normal file
@@ -0,0 +1,8 @@
|
||||
## Architecture
|
||||
|
||||
<!-- TODO: Describe the key design decisions and component structure.
|
||||
Key modules, data flows, external integrations, state machines, etc. -->
|
||||
|
||||
## Quick Reference
|
||||
|
||||
`~/state-hub/mcp_server/TOOLS.md` — MCP tool reference
|
||||
50
.claude/rules/credential-routing.md
Normal file
50
.claude/rules/credential-routing.md
Normal file
@@ -0,0 +1,50 @@
|
||||
# Credential and access routing
|
||||
|
||||
**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
|
||||
for inference. Run this check **before** requesting secrets, API keys, SSH access,
|
||||
login tokens, or database passwords — in any repo, not only `ops-warden`.
|
||||
|
||||
ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
|
||||
other credential need belongs to another subsystem. **Do not** message
|
||||
`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
|
||||
|
||||
### Lookup (do this first)
|
||||
|
||||
```bash
|
||||
warden route find "<describe your need>" --json
|
||||
warden route show <catalog-id> --json
|
||||
```
|
||||
|
||||
Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
|
||||
|
||||
| Agent runtime | How to orient |
|
||||
| --- | --- |
|
||||
| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=sand-boxer` is for coordination, not secret vending |
|
||||
| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
|
||||
| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
|
||||
|
||||
### Quick routing table
|
||||
|
||||
| I need… | Owner | ops-warden executes? |
|
||||
| --- | --- | --- |
|
||||
| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes** — `warden sign` |
|
||||
| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
|
||||
| Login / OIDC / MFA | key-cape / Keycloak | No — route only |
|
||||
| Authorization decision | flex-auth | No — route only |
|
||||
| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
|
||||
| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
|
||||
|
||||
### Anti-patterns (do not do these)
|
||||
|
||||
- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
|
||||
- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
|
||||
- Pasting secrets into Git, State Hub, workplans, logs, or chat
|
||||
|
||||
### Other capabilities (reuse-surface)
|
||||
|
||||
Non-credential capabilities are usually discovered through **reuse-surface** federation
|
||||
(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
|
||||
every repo's agent instructions because it is high-frequency, high-risk, and easy to
|
||||
get wrong.
|
||||
|
||||
**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`
|
||||
38
.claude/rules/first-session.md
Normal file
38
.claude/rules/first-session.md
Normal file
@@ -0,0 +1,38 @@
|
||||
## First Session Protocol
|
||||
|
||||
Triggered when `get_domain_summary("infotech")` shows **no workstreams**.
|
||||
The project is registered but work has not yet been structured.
|
||||
|
||||
**Step 1 — Read, don't write**
|
||||
- `~/the-custodian/canon/projects/infotech/project_charter_v0.1.md` — purpose, scope
|
||||
- `~/the-custodian/canon/projects/infotech/roadmap_v0.1.md` — planned phases
|
||||
- Scan repo root: README, directory structure, existing code or docs
|
||||
|
||||
**Step 2 — Survey in-progress work**
|
||||
Look for TODOs, open branches, half-finished files. Note done vs. started but incomplete.
|
||||
|
||||
**Step 3 — Propose workstreams to Bernd**
|
||||
Propose 1–3 workstreams — each a coherent strand, weeks to months, anchored to a
|
||||
roadmap phase. **Wait for approval before creating.**
|
||||
|
||||
**Step 4 — Create workplan file first, then DB record (ADR-001)**
|
||||
```
|
||||
workplans/SAND-WP-NNNN-<slug>.md ← write this first
|
||||
```
|
||||
Then register in the hub:
|
||||
```
|
||||
create_workstream(topic_id="cee7bedf-2b48-46ef-8601-006474f2ad7a", title="...", owner="...", description="...")
|
||||
create_task(workstream_id="<id>", title="...", priority="high|medium|low")
|
||||
```
|
||||
|
||||
**Step 5 — Record the setup**
|
||||
```
|
||||
add_progress_event(
|
||||
summary="First session: structured infotech into N workstreams, M tasks",
|
||||
event_type="milestone",
|
||||
topic_id="cee7bedf-2b48-46ef-8601-006474f2ad7a",
|
||||
detail={"workstreams": [...], "tasks_created": M}
|
||||
)
|
||||
```
|
||||
|
||||
<!-- Delete or archive this file once past first session -->
|
||||
8
.claude/rules/repo-boundary.md
Normal file
8
.claude/rules/repo-boundary.md
Normal file
@@ -0,0 +1,8 @@
|
||||
## Repo boundary
|
||||
|
||||
This repo owns **sand-boxer** only. It does not own:
|
||||
|
||||
<!-- TODO: List what belongs in adjacent repos, e.g.:
|
||||
- SSH key management → railiance-infra/
|
||||
- State hub code → state-hub/
|
||||
-->
|
||||
5
.claude/rules/repo-identity.md
Normal file
5
.claude/rules/repo-identity.md
Normal file
@@ -0,0 +1,5 @@
|
||||
**Purpose:** Sandboxing for agentic coding facility.
|
||||
|
||||
**Domain:** infotech
|
||||
**Repo slug:** sand-boxer
|
||||
**Topic ID:** cee7bedf-2b48-46ef-8601-006474f2ad7a
|
||||
85
.claude/rules/session-protocol.md
Normal file
85
.claude/rules/session-protocol.md
Normal file
@@ -0,0 +1,85 @@
|
||||
## Session Protocol
|
||||
|
||||
Dev Hub (State Hub API): http://127.0.0.1:8000
|
||||
MCP server name in `~/.claude.json`: `dev-hub`
|
||||
|
||||
**Step 1 — Orient**
|
||||
|
||||
Read the offline-safe brief first — it works without a live hub connection:
|
||||
```bash
|
||||
cat .custodian-brief.md
|
||||
```
|
||||
Then call the MCP tool for richer cross-domain context when MCP tools are exposed:
|
||||
```
|
||||
get_domain_summary("infotech")
|
||||
```
|
||||
If MCP tools are unavailable in the current agent session, use the REST API:
|
||||
```bash
|
||||
curl -s "http://127.0.0.1:8000/state/summary" | python3 -m json.tool
|
||||
```
|
||||
If the hub is offline: `cd ~/state-hub && make api`
|
||||
|
||||
**Step 2 — Check inbox**
|
||||
With MCP tools:
|
||||
```
|
||||
get_messages(to_agent="sand-boxer", unread_only=True)
|
||||
```
|
||||
Mark read with `mark_message_read(message_id)`. Reply or act on coordination
|
||||
requests before proceeding.
|
||||
|
||||
Without MCP tools:
|
||||
```bash
|
||||
curl -s "http://127.0.0.1:8000/messages/?to_agent=sand-boxer&unread_only=true" \
|
||||
| python3 -m json.tool
|
||||
curl -s -X PATCH "http://127.0.0.1:8000/messages/<id>/read" \
|
||||
-H "Content-Type: application/json" -d '{}'
|
||||
```
|
||||
|
||||
**Step 3 — Scan workplans**
|
||||
```bash
|
||||
ls workplans/
|
||||
```
|
||||
For each file with `status: ready`, `active`, or `blocked`, note pending
|
||||
`wait`/`todo`/`progress` tasks.
|
||||
|
||||
**Step 4 — Present brief**
|
||||
|
||||
1. **Active workstreams** for `infotech` — title, task counts, blocking decisions
|
||||
2. **Pending tasks** from `workplans/` + any `[repo:sand-boxer]` hub tasks
|
||||
3. **Goal guidance** — if `goal_guidance` in summary:
|
||||
- `needs_workplan`: surface as top action — *"Repo goal '{title}' has no workplan yet"*
|
||||
- `alignment_warnings`: flag if active work is not aligned with current goal
|
||||
4. **Suggested next action** — highest-priority open item
|
||||
5. **SBOM status** — flag if `last_sbom_at` is unset for this repo
|
||||
|
||||
If no workstreams: follow First Session Protocol (`first-session.md`).
|
||||
|
||||
**During work:** `record_decision()` · `add_progress_event()` · `resolve_decision()`
|
||||
|
||||
> State Hub is a *read model*. Bootstrap tools (`create_workstream`, `create_task`)
|
||||
> are First Session Protocol only. Work structure belongs in repo files (ADR-001).
|
||||
|
||||
**Session close:**
|
||||
With MCP tools:
|
||||
```
|
||||
add_progress_event(summary="...", topic_id="cee7bedf-2b48-46ef-8601-006474f2ad7a", workstream_id="<uuid>")
|
||||
```
|
||||
Without MCP tools:
|
||||
```bash
|
||||
curl -s -X POST http://127.0.0.1:8000/progress/ \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"topic_id":"cee7bedf-2b48-46ef-8601-006474f2ad7a","workstream_id":"<uuid>","event_type":"note","summary":"what changed","author":"codex"}'
|
||||
```
|
||||
If workplan files were modified, ensure the local copy is up to date first:
|
||||
```bash
|
||||
git -C <repo_path> pull --ff-only
|
||||
cd ~/state-hub && make fix-consistency REPO=sand-boxer
|
||||
```
|
||||
For repos where implementation runs on a remote machine (e.g. CoulombCore),
|
||||
use the combined target which pulls before fixing:
|
||||
```bash
|
||||
cd ~/state-hub && make fix-consistency-remote REPO=sand-boxer
|
||||
```
|
||||
**C-15** (DB task ahead of file) is normal in multi-machine workflows — writeback
|
||||
will sync the file to match DB. **C-16** (repo behind remote) blocks all writes
|
||||
until you pull — intentional to prevent clobbering remote progress.
|
||||
19
.claude/rules/stack-and-commands.md
Normal file
19
.claude/rules/stack-and-commands.md
Normal file
@@ -0,0 +1,19 @@
|
||||
## Stack
|
||||
|
||||
<!-- TODO: Fill in language, frameworks, and key dependencies -->
|
||||
- **Language:**
|
||||
- **Key deps:**
|
||||
|
||||
## Dev Commands
|
||||
|
||||
```bash
|
||||
# TODO: Fill in the standard commands for this repo
|
||||
|
||||
# Install dependencies
|
||||
|
||||
# Run tests
|
||||
|
||||
# Lint / type check
|
||||
|
||||
# Build / package (if applicable)
|
||||
```
|
||||
40
.claude/rules/workplan-convention.md
Normal file
40
.claude/rules/workplan-convention.md
Normal file
@@ -0,0 +1,40 @@
|
||||
## Workplan Convention (ADR-001)
|
||||
|
||||
File location: `workplans/SAND-WP-NNNN-<slug>.md`
|
||||
ID prefix: `SAND-WP-`
|
||||
|
||||
Work items originate as files in this repo **before** being registered in the hub.
|
||||
|
||||
Canonical workplan/workstream frontmatter statuses are:
|
||||
`proposed`, `ready`, `active`, `blocked`, `backlog`, `finished`, `archived`.
|
||||
Use `proposed` for a newly drafted plan, `ready` after review against current
|
||||
repo state, and `finished` when implementation is complete. `stalled` and
|
||||
`needs_review` are derived health labels, not stored statuses.
|
||||
|
||||
Closed workplans may be moved to `workplans/archived/` with a completion-date
|
||||
prefix: `YYMMDD-SAND-WP-NNNN-<slug>.md`. The frontmatter id remains
|
||||
unchanged; the prefix is only for quick visual reference.
|
||||
|
||||
Small opportunistic tasks discovered during another session use **Ad Hoc Tasks**:
|
||||
`workplans/ADHOC-YYYY-MM-DD.md`, workstream slug `adhoc-YYYY-MM-DD`, and task ids
|
||||
`ADHOC-YYYY-MM-DD-T01`, `T02`, etc. Use adhocs only for low-risk work completed
|
||||
directly. Promote anything requiring analysis, design, approval, dependencies, or
|
||||
multiple planned phases into a normal workplan.
|
||||
|
||||
Ecosystem todos from other agents arrive as `[repo:sand-boxer]` hub tasks —
|
||||
visible at session start. Pick one up by creating the workplan file, then registering
|
||||
the workstream.
|
||||
|
||||
Task blocks use this shape:
|
||||
|
||||
```task
|
||||
id: SAND-WP-NNNN-T01
|
||||
status: wait | todo | progress | done | cancel
|
||||
priority: high | medium | low
|
||||
state_hub_task_id: "<uuid>" # written by fix-consistency — do not edit
|
||||
```
|
||||
|
||||
Status progression is `todo` → `progress` → `done`; use `wait` for waiting or
|
||||
blocked work and `cancel` for stopped work.
|
||||
|
||||
<!-- Ralph Loop rules and HEUREKA sequence: ~/.claude/CLAUDE.md — do not duplicate here -->
|
||||
31
.custodian-brief.md
Normal file
31
.custodian-brief.md
Normal file
@@ -0,0 +1,31 @@
|
||||
<!-- custodian-brief: generated by statehub register; fix-consistency may replace this file -->
|
||||
# Custodian Brief - sand-boxer
|
||||
|
||||
**Project:** sand-boxer
|
||||
**Domain:** infotech
|
||||
**State Hub:** http://127.0.0.1:8000
|
||||
**Topic ID:** `cee7bedf-2b48-46ef-8601-006474f2ad7a`
|
||||
|
||||
## Open Workplans
|
||||
|
||||
### Bootstrap State Hub integration
|
||||
|
||||
Workplan file: `workplans/SAND-WP-0001-statehub-bootstrap.md` (status: active)
|
||||
|
||||
Open tasks:
|
||||
- T02 - Verify local developer workflow
|
||||
|
||||
### Meta-framework foundation and first extension
|
||||
|
||||
Workplan file: `workplans/SAND-WP-0002-meta-framework-foundation.md` (status: ready)
|
||||
|
||||
Next: T01 - Design meta-framework contracts
|
||||
|
||||
## Session Start
|
||||
|
||||
1. Read `INTENT.md`, `SCOPE.md`, and `AGENTS.md`.
|
||||
2. Check inbox: `GET /messages/?to_agent=sand-boxer&unread_only=true`.
|
||||
3. Scan `workplans/`.
|
||||
4. Update task statuses in workplan files as work progresses.
|
||||
|
||||
Last generated: 2026-06-22
|
||||
219
AGENTS.md
Normal file
219
AGENTS.md
Normal file
@@ -0,0 +1,219 @@
|
||||
# sand-boxer — Agent Instructions
|
||||
|
||||
## Repo Identity
|
||||
|
||||
**Purpose:** Sandboxing for agentic coding facility.
|
||||
|
||||
**Domain:** infotech
|
||||
**Repo slug:** sand-boxer
|
||||
**Topic ID:** `cee7bedf-2b48-46ef-8601-006474f2ad7a`
|
||||
**Workplan prefix:** `SAND-WP-`
|
||||
|
||||
---
|
||||
|
||||
## State Hub Integration
|
||||
|
||||
The Custodian State Hub tracks work across all domains. Interact via HTTP REST —
|
||||
there is no MCP server for Codex agents.
|
||||
|
||||
| Context | URL |
|
||||
|---------|-----|
|
||||
| Local workstation | `http://127.0.0.1:8000` |
|
||||
| Remote via tunnel | `http://127.0.0.1:18000` |
|
||||
|
||||
### Orient at session start
|
||||
|
||||
```bash
|
||||
# Offline brief — works without hub connection
|
||||
cat .custodian-brief.md
|
||||
|
||||
# Active workstreams for this domain
|
||||
curl -s "http://127.0.0.1:8000/workstreams/?topic_id=cee7bedf-2b48-46ef-8601-006474f2ad7a&status=active" \
|
||||
| python3 -m json.tool
|
||||
|
||||
# Check inbox
|
||||
curl -s "http://127.0.0.1:8000/messages/?to_agent=sand-boxer&unread_only=true" \
|
||||
| python3 -m json.tool
|
||||
```
|
||||
|
||||
Mark a message read:
|
||||
```bash
|
||||
curl -s -X PATCH "http://127.0.0.1:8000/messages/<id>/read" \
|
||||
-H "Content-Type: application/json" -d '{}'
|
||||
```
|
||||
|
||||
### Log progress (required at session close)
|
||||
|
||||
```bash
|
||||
curl -s -X POST http://127.0.0.1:8000/progress/ \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"summary": "what was done",
|
||||
"event_type": "note",
|
||||
"author": "codex",
|
||||
"workstream_id": "<uuid>",
|
||||
"task_id": "<uuid>"
|
||||
}'
|
||||
```
|
||||
|
||||
Omit `workstream_id` / `task_id` when not applicable.
|
||||
|
||||
### Update task status
|
||||
|
||||
```bash
|
||||
curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"status": "progress"}'
|
||||
# values: wait | todo | progress | done | cancel
|
||||
```
|
||||
|
||||
### Flag a task for human review
|
||||
|
||||
```bash
|
||||
curl -s -X PATCH "http://127.0.0.1:8000/tasks/<task_id>" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"needs_human": true, "intervention_note": "reason"}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Session Protocol
|
||||
|
||||
**Start:**
|
||||
1. `cat .custodian-brief.md` — domain goal and open workstreams (offline-safe)
|
||||
2. Check inbox: `GET /messages/?to_agent=sand-boxer&unread_only=true`; mark read
|
||||
3. Scan workplans: `ls workplans/` — note `status: ready`, `active`, or `blocked` files and open tasks
|
||||
4. Check human-needed tasks: `GET /tasks/?needs_human=true`
|
||||
|
||||
**During work:**
|
||||
- Update task statuses in workplan files as tasks progress
|
||||
- Record significant decisions via `POST /decisions/`
|
||||
|
||||
**Close:**
|
||||
1. Update workplan file task statuses to reflect progress
|
||||
2. Log: `POST /progress/` with a summary of what changed
|
||||
3. Note for the custodian operator: after workplan file changes, run from
|
||||
`~/state-hub`:
|
||||
```bash
|
||||
make fix-consistency REPO=sand-boxer
|
||||
```
|
||||
This syncs task status from files into the hub DB.
|
||||
|
||||
---
|
||||
|
||||
## Credential and access routing
|
||||
|
||||
**Audience:** Codex, Claude Code, Grok, and custodian agents that call **llm-connect**
|
||||
for inference. Run this check **before** requesting secrets, API keys, SSH access,
|
||||
login tokens, or database passwords — in any repo, not only `ops-warden`.
|
||||
|
||||
ops-warden **issues SSH certificates only** (`warden sign`, `cert_command`). Every
|
||||
other credential need belongs to another subsystem. **Do not** message
|
||||
`ops-warden` on State Hub expecting a secret value; the reply is a pointer, not a key.
|
||||
|
||||
### Lookup (do this first)
|
||||
|
||||
```bash
|
||||
warden route find "<describe your need>" --json
|
||||
warden route show <catalog-id> --json
|
||||
```
|
||||
|
||||
Requires the `warden` CLI from `~/ops-warden` (`uv tool install .` or `uv run warden`).
|
||||
|
||||
| Agent runtime | How to orient |
|
||||
| --- | --- |
|
||||
| **Codex / Grok** (shell, HTTP State Hub) | `warden route` commands above; inbox `to_agent=sand-boxer` is for coordination, not secret vending |
|
||||
| **Claude Code** (MCP when available) | `get_domain_summary("custodian")` for workstreams; **still** use `warden route` for credential ownership |
|
||||
| **llm-connect** (inference service) | Never put secret retrieval in prompts; route custody to OpenBao/operator paths surfaced by `warden route` |
|
||||
|
||||
### Quick routing table
|
||||
|
||||
| I need… | Owner | ops-warden executes? |
|
||||
| --- | --- | --- |
|
||||
| SSH cert (`adm`/`agt`/`atm`) | ops-warden | **Yes** — `warden sign` |
|
||||
| API key, DB password, provider token | OpenBao (`railiance-platform`) | No — route only |
|
||||
| Login / OIDC / MFA | key-cape / Keycloak | No — route only |
|
||||
| Authorization decision | flex-auth | No — route only |
|
||||
| activity-core → issue-core emission | activity-core + issue-core | No — `warden route show activity-core-issue-sink` |
|
||||
| SSH tunnel | ops-bridge (+ `cert_command` from warden) | No — route only |
|
||||
|
||||
### Anti-patterns (do not do these)
|
||||
|
||||
- `POST /messages/` to `ops-warden` asking for `ISSUE_CORE_API_KEY`, `OPENROUTER_API_KEY`, etc.
|
||||
- Inventing `warden secret`, `warden login`, `warden bao`, `warden tunnel` — they do not exist
|
||||
- Pasting secrets into Git, State Hub, workplans, logs, or chat
|
||||
|
||||
### Other capabilities (reuse-surface)
|
||||
|
||||
Non-credential capabilities are usually discovered through **reuse-surface** federation
|
||||
(`reuse-surface` registry / `capability.*` indexes). Credential routing is inlined in
|
||||
every repo's agent instructions because it is high-frequency, high-risk, and easy to
|
||||
get wrong.
|
||||
|
||||
**Canon:** `~/ops-warden/wiki/CredentialRouting.md` · catalog `~/ops-warden/registry/routing/catalog.yaml`
|
||||
|
||||
<!-- REPO-AGENTS-EXTENSIONS -->
|
||||
<!-- Append repo-specific agent instructions below this marker.
|
||||
The state-hub template sync preserves content after this line. -->
|
||||
|
||||
---
|
||||
|
||||
## Workplan Convention (ADR-001)
|
||||
|
||||
Work items originate as files in this repo — not in the hub. The hub is a
|
||||
read/cache/index layer that rebuilds from files.
|
||||
|
||||
**File location:** `workplans/SAND-WP-NNNN-<slug>.md`
|
||||
|
||||
**Archived location:** finished workplans may move to
|
||||
`workplans/archived/YYMMDD-SAND-WP-NNNN-<slug>.md`. The `YYMMDD` prefix is
|
||||
the completion/archive date; the frontmatter `id` does not change.
|
||||
|
||||
**Ad Hoc Tasks:** small opportunistic fixes discovered during a session use
|
||||
`workplans/ADHOC-YYYY-MM-DD.md` with task ids `ADHOC-YYYY-MM-DD-T01`, etc. Use
|
||||
this only for low-risk work completed directly; create a normal workplan for
|
||||
anything needing analysis, design, approval, dependencies, or multiple phases.
|
||||
|
||||
**Frontmatter:**
|
||||
|
||||
```yaml
|
||||
---
|
||||
id: SAND-WP-NNNN
|
||||
type: workplan
|
||||
title: "..."
|
||||
domain: infotech
|
||||
repo: sand-boxer
|
||||
status: proposed | ready | active | blocked | backlog | finished | archived
|
||||
owner: codex
|
||||
topic_slug: ...
|
||||
created: "YYYY-MM-DD"
|
||||
updated: "YYYY-MM-DD"
|
||||
state_hub_workstream_id: "<uuid>" # written by fix-consistency — do not edit
|
||||
---
|
||||
```
|
||||
|
||||
Use `proposed` for a new draft, `ready` after review against current repo
|
||||
state, and `finished` after implementation. `stalled` and `needs_review` are
|
||||
derived health labels, not frontmatter statuses.
|
||||
|
||||
**Task block format** (one per `##` section):
|
||||
|
||||
```
|
||||
## Task Title
|
||||
|
||||
` ` `task
|
||||
id: SAND-WP-NNNN-T01
|
||||
status: wait | todo | progress | done | cancel
|
||||
priority: high | medium | low
|
||||
state_hub_task_id: "<uuid>" # written by fix-consistency — do not edit
|
||||
` ` `
|
||||
|
||||
Task description text.
|
||||
```
|
||||
|
||||
Status progression: `todo` → `progress` → `done`; use `wait` for waiting/blocked work and `cancel` for stopped work.
|
||||
|
||||
To create a new workplan:
|
||||
1. Write the file following the format above
|
||||
2. Notify the custodian operator to run `make fix-consistency REPO=sand-boxer`
|
||||
(or send a message to the hub agent via `POST /messages/`)
|
||||
12
CLAUDE.md
Normal file
12
CLAUDE.md
Normal file
@@ -0,0 +1,12 @@
|
||||
# sand-boxer — Claude Code Instructions
|
||||
|
||||
@SCOPE.md
|
||||
@.claude/rules/repo-identity.md
|
||||
@.claude/rules/session-protocol.md
|
||||
@.claude/rules/first-session.md
|
||||
@.claude/rules/workplan-convention.md
|
||||
@.claude/rules/stack-and-commands.md
|
||||
@.claude/rules/architecture.md
|
||||
@.claude/rules/repo-boundary.md
|
||||
@.claude/rules/credential-routing.md
|
||||
@.claude/rules/agents.md
|
||||
384
INTENT.md
384
INTENT.md
@@ -1,180 +1,338 @@
|
||||
---
|
||||
domain: custodian
|
||||
domain: infotech
|
||||
repo: sand-boxer
|
||||
updated: "2026-06-21"
|
||||
updated: "2026-06-22"
|
||||
---
|
||||
|
||||
# INTENT
|
||||
|
||||
> This file explains why sand-boxer exists, what problem it solves in the
|
||||
> Custodian ecosystem, and where its authority begins and ends.
|
||||
> sand-boxer is the Coulomb **meta-framework for establishing sandboxes** — a
|
||||
> unified API and extension platform for provisioning every variation of isolated
|
||||
> execution environment, from self-hosted compose stacks to metered SaaS
|
||||
> runtimes. This file is the charter: why it exists, what it owns, and where
|
||||
> sibling projects begin.
|
||||
|
||||
Research backing this charter lives in `research/`.
|
||||
|
||||
---
|
||||
|
||||
## Why it exists
|
||||
|
||||
Custodian automation is moving from **workstation-anchored** execution to
|
||||
**Railiance01-scheduled** orchestration. That shift is right for reliability:
|
||||
activity-core on Railiance01 can fire maintenance and coordination jobs on a
|
||||
stable clock. It does not, by itself, give agents a safe place to **develop,
|
||||
build, and test** without the laptop filesystem, sleep cycles, and single-user
|
||||
blast radius.
|
||||
**Railiance01-scheduled** orchestration. That shift improves reliability but does
|
||||
not, by itself, answer the harder question: **where can agentic and deterministic
|
||||
work run safely** without the laptop filesystem, sleep cycles, and single-user
|
||||
blast radius?
|
||||
|
||||
sand-boxer exists to provide **isolated execution environments** — sandboxes —
|
||||
where agentic and deterministic work can run on dedicated infrastructure while
|
||||
remaining observable and governable from State Hub.
|
||||
The industry has exploded with sandbox answers — E2B, Modal, Daytona, OpenShell,
|
||||
OpenClaw-style Docker/SSH backends, hyperscaler interpreters — each with
|
||||
different APIs, billing models, and isolation postures. Coulomb needs **one place
|
||||
to establish sandboxes** regardless of backend, not a new integration per agent
|
||||
harness, validator, or codegen pipeline.
|
||||
|
||||
The goal is progress without requiring the workstation as a runtime: repos are
|
||||
checked out, tools run, tests execute, and artifacts return through controlled
|
||||
channels. The laptop becomes optional for operations, not the hub of all
|
||||
execution.
|
||||
sand-boxer exists to be that place: **OpenRouter for sandboxes, not for models.**
|
||||
|
||||
Consumers call one API. Extensions delegate to the sandbox system that fits —
|
||||
self-hosted on sandboxer01, inherited compose-ssh from `the-custodian`, or a
|
||||
metered cloud provider. An integrated **payments layer** handles SaaS consumption
|
||||
when Coulomb uses external capacity. Over time, operational learning may justify
|
||||
a Coulomb-native **best-of-brands runtime** — but that is a later phase built on
|
||||
evidence, not day-one ambition.
|
||||
|
||||
The workstation becomes optional for **runtime**. Railiance01 decides *when*
|
||||
work runs (via activity-core). sand-boxer decides *where* isolated execution
|
||||
happens. State Hub records *what* changed.
|
||||
|
||||
---
|
||||
|
||||
## The governing principle
|
||||
|
||||
sand-boxer is the **execution isolation and provisioning service** for agentic
|
||||
development and related workloads.
|
||||
sand-boxer is the **sandbox establishment service** — profiles, provisioning,
|
||||
extension routing, placement, lifecycle, and metering. Nothing more.
|
||||
|
||||
It should answer:
|
||||
It answers:
|
||||
|
||||
1. **Where can this work run safely?** Profile selection (compose stack, VM,
|
||||
future cluster worker) and host placement.
|
||||
2. **How is isolation enforced?** Networks, TTL, resource limits, teardown, and
|
||||
cleanup guarantees.
|
||||
3. **How does the sandbox phone home?** Reachability via ops-bridge tunnels and
|
||||
SSH identity via ops-warden — without owning either.
|
||||
4. **What happened?** Registration, health, and lifecycle events visible to
|
||||
State Hub and reuse-surface consumers.
|
||||
1. **Which sandbox recipe applies?** Profile selection and version resolution.
|
||||
2. **Which backend fulfills it?** Extension routing (self-hosted vs SaaS).
|
||||
3. **Where does it run?** Host placement and blast-radius policy.
|
||||
4. **How is isolation enforced?** Network default-deny, TTL, resource limits,
|
||||
teardown guarantees — as declared by profile + extension.
|
||||
5. **How does it become reachable?** Consumer integration with ops-bridge and
|
||||
ops-warden — without owning tunnels or certificates.
|
||||
6. **What happened?** Lifecycle events, usage meters, State Hub registration.
|
||||
7. **What did it cost?** Payments and credits for metered extensions.
|
||||
|
||||
It should not become the scheduler, the work-state database, the connectivity
|
||||
authority, or production application hosting on Railiance01.
|
||||
It must **not** become the agent harness, the e2e validator, the code generator,
|
||||
the scheduler, the work-state database, the connectivity authority, or production
|
||||
hosting on Railiance01.
|
||||
|
||||
---
|
||||
|
||||
## Strategic context
|
||||
## The OpenRouter analogy
|
||||
|
||||
### Workstation automation is interim, not the target
|
||||
| OpenRouter | sand-boxer |
|
||||
|------------|------------|
|
||||
| Unified LLM access API | Unified sandbox establishment API |
|
||||
| Routes across model providers | Routes across sandbox extensions |
|
||||
| Provider metadata (price, context) | Profile metadata (isolation, cost, latency) |
|
||||
| API keys, credits, usage billing | Payments layer for SaaS sandbox consumption |
|
||||
| BYOK supported | BYOK for extension provider keys |
|
||||
| Does not train models | Does not replace extension runtimes (until phase 5) |
|
||||
|
||||
Local timers and laptop-resident scripts were useful for bootstrapping ADR-001
|
||||
consistency sync and similar jobs. They are not the long-term substrate.
|
||||
Railiance01-based activity-core schedules are the primary direction; workstation
|
||||
paths remain only where no sandbox or cluster alternative exists yet.
|
||||
sand-boxer is **infrastructure routing**, not product UX. Harnesses, validators,
|
||||
and inventors are customers.
|
||||
|
||||
### Railiance01 vs sandbox hosts
|
||||
---
|
||||
|
||||
| Layer | Role |
|
||||
|-------|------|
|
||||
| **Railiance01** | Production k3s, activity-core, Temporal, stable custodian schedules |
|
||||
| **sandboxer01** (or equivalent) | Dedicated VM for dev/agent sandboxes — **isolated blast radius** |
|
||||
| **CoulombCore** | Acceptable interim sandbox host during migration; not a substitute for deliberate isolation from production |
|
||||
| **Workstation (WSL)** | Control plane anchor today; **not** the desired execution surface |
|
||||
## Coulomb sibling boundaries
|
||||
|
||||
sand-boxer owns the **abstraction and lifecycle** of sandboxes. It does not own
|
||||
Railiance01 cluster operations (see `railiance-cluster` / `railiance-apps`).
|
||||
sand-boxer stays inside the **sandboxing boundary**. Three sibling Coulomb
|
||||
projects own adjacent concerns. Integration is contractual — they **request**
|
||||
sandboxes; sand-boxer **establishes** them.
|
||||
|
||||
### Lineage
|
||||
### glas-harness — agent harness
|
||||
|
||||
This repository consolidates and generalizes patterns that today live split and
|
||||
unregistered in `the-custodian`:
|
||||
**Owns:** Gateway, tool orchestration, skills, memory, channels, subagent
|
||||
delegation, session semantics, sandbox *consumption* from the agent's perspective.
|
||||
|
||||
- **E2E sandbox framework** (`e2e-framework/`) — SSH to remote host, isolated
|
||||
directory, docker compose, teardown (`CUST-WP-0028`).
|
||||
- **Build machines** (`infra/build-machines/`) — reproducible VM images,
|
||||
reverse tunnels, State Hub capability registration (`CUST-WP-0032`).
|
||||
**Does not own:** Sandbox runtimes, profile catalog authority, host placement,
|
||||
extension adapters, isolation enforcement.
|
||||
|
||||
sand-boxer extracts a **reusable platform** from those precedents so
|
||||
`the-custodian` can stay governance-focused with a small operational surface.
|
||||
glas-harness configures *when* tools run in a sandbox (OpenClaw-style
|
||||
`mode` / `scope` / `workspaceAccess`). sand-boxer provides the sandbox handle
|
||||
and reachability descriptor.
|
||||
|
||||
### wise-validator — e2e test and health
|
||||
|
||||
**Owns:** Validation workflows, health check semantics, test orchestration,
|
||||
pass/fail interpretation, structured result reporting to State Hub and CI.
|
||||
|
||||
**Does not own:** Remote host provisioning, compose lifecycle, port isolation,
|
||||
sandbox teardown.
|
||||
|
||||
wise-validator replaces the validation half of `the-custodian/e2e-framework/`.
|
||||
It requests `profile.compose-e2e` (or successors), runs tests inside the
|
||||
established environment, and owns the `e2e.yml` contract.
|
||||
|
||||
### snuggle-inventor — code generation
|
||||
|
||||
**Owns:** Code generation, modernization pipelines, tech-spec and planning
|
||||
artifacts, PR-oriented output, human-in-the-loop review gates.
|
||||
|
||||
**Does not own:** Sandbox infrastructure, environment bootstrapping authority,
|
||||
secret stores, runtime metering.
|
||||
|
||||
snuggle-inventor may attach Blitzy-style **setup instructions** and secret
|
||||
references as profile inputs. sand-boxer resolves secrets at the provision
|
||||
boundary; generated code never transits sand-boxer APIs.
|
||||
|
||||
### Boundary diagram
|
||||
|
||||
```
|
||||
glas-harness wise-validator snuggle-inventor
|
||||
(agent harness) (e2e + health) (code generation)
|
||||
│ │ │
|
||||
└─────────────────────┼──────────────────────┘
|
||||
│ POST /v1/sandboxes
|
||||
▼
|
||||
sand-boxer
|
||||
(establish sandboxes)
|
||||
│
|
||||
┌───────────────┼───────────────┐
|
||||
▼ ▼ ▼
|
||||
ext.compose-ssh ext.modal ext.e2b …
|
||||
(self-hosted) (SaaS+meter) (SaaS+meter)
|
||||
```
|
||||
|
||||
### Existing Custodian repos (unchanged)
|
||||
|
||||
| Concern | Owner |
|
||||
|---------|--------|
|
||||
| Workstream, task, progress state | `state-hub` |
|
||||
| Cron and orchestration | `activity-core` |
|
||||
| SSH reverse tunnels | `ops-bridge` |
|
||||
| SSH certificate issuance | `ops-warden` |
|
||||
| Canon and agent instruction canon | `the-custodian` |
|
||||
| Capability federation hub | `reuse-surface` |
|
||||
| Production on Railiance01 | `railiance-apps` / domain repos |
|
||||
| ADR-001 reconciliation | `state-hub` |
|
||||
|
||||
sand-boxer **consumes** ops-bridge and ops-warden; it does not subsume them.
|
||||
|
||||
---
|
||||
|
||||
## What it is
|
||||
|
||||
sand-boxer is the **sandbox provisioning and profile catalog** for Custodian.
|
||||
sand-boxer is a **meta-framework** with four pillars:
|
||||
|
||||
It is intended to contain:
|
||||
### 1. Unified establishment API
|
||||
|
||||
- **Sandbox profiles** — e.g. compose-based e2e stacks, VM images, future
|
||||
container-on-worker patterns
|
||||
- **Provision / wait / teardown** lifecycle — TTL, idempotent cleanup, port and
|
||||
network conventions
|
||||
- **Host placement policy** — which profiles run on sandboxer01, coulombcore
|
||||
interim, or other registered hosts
|
||||
- **CLI and/or API** for operators and agents to request isolated environments
|
||||
- **State Hub registration contract** — extend the `build-agent` self-register
|
||||
pattern to generic sandbox identities
|
||||
- **Capability registry entries** in this repo's `registry/` for federation via
|
||||
reuse-surface (e.g. `capability.execution.sandbox-provision`)
|
||||
- Runbooks, templates (Packer, compose bundles), and tests for the above
|
||||
One consistent surface for all sandbox variations:
|
||||
|
||||
- Create, inspect, extend, snapshot, recreate, destroy
|
||||
- Profile-driven inputs (repo ref, compose bundle, setup metadata, secret refs)
|
||||
- Consumer attribution (`adm` / `agt` / `atm` + calling project id)
|
||||
- Lifecycle states: `requested → provisioning → ready → active → expired → destroyed`
|
||||
|
||||
Early versions may expose a subset; the API shape is designed for completeness.
|
||||
|
||||
### 2. Profile catalog
|
||||
|
||||
Named, versioned recipes — not one-off containers:
|
||||
|
||||
- Extension binding (`ext.compose-ssh`, `ext.vm-packer`, `ext.e2b`, …)
|
||||
- Isolation level, network policy, workspace mode (`mirror` | `remote-canonical`)
|
||||
- Scope default (`agent` | `session` | `shared`)
|
||||
- TTL, resource limits, placement preference
|
||||
- Setup metadata (natural-language bootstrap instructions for extensions)
|
||||
- Registered in `registry/` and federated via reuse-surface
|
||||
|
||||
Profiles collect good ideas from OpenClaw (backend/scope/workspace), Hermes
|
||||
(labeled reuse, resource limits), Blitzy (setup instructions, secret boundary),
|
||||
and hosted platforms (checkpoint, persistence classes) into **one schema**.
|
||||
|
||||
### 3. Extension platform
|
||||
|
||||
Extensions **delegate** to sandbox systems and services:
|
||||
|
||||
| Class | Examples | Billing |
|
||||
|-------|----------|---------|
|
||||
| **Self-hosted** | compose-ssh, vm-packer, Daytona OSS, OpenShell | Infra allocation |
|
||||
| **SaaS consumption** | E2B, Modal, Daytona cloud, future providers | Payments layer |
|
||||
|
||||
Each extension implements a provision / ready / teardown contract (optional
|
||||
snapshot / cost estimate). Extensions ship as plugins; third-party and Coulomb-
|
||||
native backends use the same interface.
|
||||
|
||||
### 4. Payments and metering
|
||||
|
||||
For metered SaaS extensions:
|
||||
|
||||
- Org/workspace credits and usage accounting
|
||||
- Pre-create cost estimates; post-destroy actuals
|
||||
- BYOK for provider API keys where supported
|
||||
- Export to domain billing systems — sand-boxer meters sandbox consumption,
|
||||
not general payments
|
||||
|
||||
Self-hosted extensions record **allocation** (host, duration), not external spend.
|
||||
|
||||
---
|
||||
|
||||
## What it is not
|
||||
|
||||
| Concern | Owner |
|
||||
|---------|--------|
|
||||
| Workstream, task, and progress state | `state-hub` |
|
||||
| Cron and event-triggered orchestration | `activity-core` |
|
||||
| SSH reverse tunnels and tunnel health | `ops-bridge` |
|
||||
| SSH certificate issuance | `ops-warden` |
|
||||
| Canon, charters, agent instruction canon | `the-custodian` |
|
||||
| Capability index federation hub | `reuse-surface` |
|
||||
| Production service deployment on Railiance01 | `railiance-apps` / domain repos |
|
||||
| ADR-001 workplan ↔ DB reconciliation | `state-hub` (`consistency_check.py`) |
|
||||
| Concern | Owner | sand-boxer role |
|
||||
|---------|--------|-----------------|
|
||||
| Agent gateway, tools, memory, channels | **glas-harness** | Customer API |
|
||||
| E2e tests, health checks, validation | **wise-validator** | Customer API |
|
||||
| Code generation, tech specs, AAP | **snuggle-inventor** | Customer API |
|
||||
| When work runs | `activity-core` | None |
|
||||
| What tasks exist | `state-hub` | Registers lifecycle only |
|
||||
| Tunnels | `ops-bridge` | Consumer |
|
||||
| Certs | `ops-warden` | Consumer |
|
||||
| Intent-aware egress / prompt security | Research frontier | Document limits only |
|
||||
|
||||
sand-boxer may **consume** connectivity and certificates; it must not duplicate
|
||||
or subsume those authorities.
|
||||
sand-boxer provides **blast-radius isolation and governed reachability**. It does
|
||||
not protect against a compromised agent abusing **allowed** egress paths (git,
|
||||
npm, curl to allowlisted hosts). Security runbooks must state this explicitly.
|
||||
|
||||
---
|
||||
|
||||
## Strategic context
|
||||
|
||||
### Workstation automation is interim
|
||||
|
||||
Local timers and laptop scripts bootstrapped ADR-001 sync. Railiance01
|
||||
activity-core schedules are the direction. Workstation paths remain only where no
|
||||
sandbox alternative exists yet.
|
||||
|
||||
### Host topology
|
||||
|
||||
| Layer | Role |
|
||||
|-------|------|
|
||||
| **Railiance01** | Production k3s, activity-core, Temporal — **not** agent dev runtime |
|
||||
| **sandboxer01** | Dedicated sandbox host — preferred blast-radius isolation |
|
||||
| **CoulombCore** | Interim sandbox host during migration |
|
||||
| **Workstation (WSL)** | Control-plane anchor today — **not** target execution surface |
|
||||
| **SaaS extensions** | Burst / capability gap (GPU, desktop) via payments layer |
|
||||
|
||||
### Lineage
|
||||
|
||||
sand-boxer generalizes patterns split across `the-custodian`:
|
||||
|
||||
| Legacy | sand-boxer | Sibling |
|
||||
|--------|------------|---------|
|
||||
| `e2e-framework/` provision/teardown | `ext.compose-ssh` | wise-validator owns test run |
|
||||
| `e2e-framework/` health + test + report | — | wise-validator |
|
||||
| `infra/build-machines/` | `ext.vm-packer` | — |
|
||||
| Agent sandbox config (future) | API consumer | glas-harness |
|
||||
|
||||
`the-custodian` stays governance-focused; sand-boxer becomes the execution
|
||||
venue catalog.
|
||||
|
||||
### Phase 5: Coulomb-native runtime (later)
|
||||
|
||||
After operating extensions in production — observing latency, cost, failure
|
||||
modes, isolation gaps — sand-boxer may ship an owned **best-of-brands**
|
||||
sandboxing solution combining:
|
||||
|
||||
- Persistent labeled workspaces (Hermes pattern)
|
||||
- Default-deny policy layer (OpenShell lessons)
|
||||
- Fast resume / checkpoint (industry baseline)
|
||||
- Self-hosted economics (Daytona/OpenSandbox lessons)
|
||||
|
||||
This is **not** v1 scope. Extensions and payments come first; native runtime
|
||||
follows evidence.
|
||||
|
||||
---
|
||||
|
||||
## Intended users
|
||||
|
||||
- **Human operators (`adm`)** — provision sandboxes, manage profiles and hosts,
|
||||
inspect lifecycle and cleanup
|
||||
- **LLM agents (`agt`)** — request isolated environments for coding, testing,
|
||||
and verification without laptop filesystem dependence
|
||||
- **Deterministic automations (`atm`)** — activity-core instructions and CI
|
||||
hooks that need a bounded execution venue
|
||||
- **Human operators (`adm`)** — profiles, hosts, extensions, credits, lifecycle
|
||||
- **LLM agents (`agt`)** — via glas-harness, snuggle-inventor, or direct API
|
||||
- **Deterministic automations (`atm`)** — via wise-validator, activity-core, CI
|
||||
- **Extension authors** — implement backend adapters against the extension contract
|
||||
- **Platform integrators** — register capabilities, federate via reuse-surface
|
||||
|
||||
---
|
||||
|
||||
## Design principles
|
||||
|
||||
- **Blast radius isolation** — sandbox workloads must not jeopardize Railiance01
|
||||
production stability; prefer dedicated hosts (sandboxer01) for agentic dev
|
||||
- **Profiles over one-offs** — every sandbox type is a named, versioned profile
|
||||
with documented inputs, outputs, and teardown
|
||||
- **Reachability, not ownership** — use ops-bridge for tunnels and ops-warden
|
||||
for SSH identity; sand-boxer orchestrates, it does not issue certs or run
|
||||
tunnel daemons
|
||||
- **Observable lifecycle** — create, ready, active, expired, and destroyed states
|
||||
are attributable and queryable
|
||||
- **Disposable by default** — sandboxes are TTL-bound; persistence is explicit
|
||||
and exceptional
|
||||
- **Registry-first reuse** — register capabilities in this repo and federate
|
||||
through reuse-surface before ad hoc duplication elsewhere
|
||||
- **Meta-framework, not monolith** — one API; many extensions; optional native runtime later
|
||||
- **Profiles over one-offs** — every sandbox type is named, versioned, registered
|
||||
- **Prefer self-hosted** — SaaS via explicit routing policy, not silent default
|
||||
- **Blast-radius isolation** — dedicated hosts; never jeopardize Railiance01 production
|
||||
- **Reachability, not ownership** — ops-bridge + ops-warden as consumers
|
||||
- **Secrets at the boundary** — resolve at provision; never in agent-visible workspace
|
||||
- **Observable lifecycle** — every state transition attributable and queryable
|
||||
- **Disposable by default** — TTL-bound; persistence and checkpoint are explicit
|
||||
- **Honest security** — sandboxing limits blast radius; it is not intent enforcement
|
||||
- **Registry-first reuse** — capabilities in `registry/` before ad hoc duplication
|
||||
- **Payments transparency** — estimate before create; meter on destroy for SaaS
|
||||
|
||||
---
|
||||
|
||||
## Near-term outcomes
|
||||
|
||||
A first useful version of sand-boxer should:
|
||||
|
||||
1. Define at least one **production-oriented profile** (e.g. compose sandbox on
|
||||
sandboxer01 or coulombcore interim) with documented provision/teardown
|
||||
2. Register **`capability.execution.sandbox-provision`** (or equivalent) in
|
||||
`registry/` and pass reuse-surface validation
|
||||
3. Integrate with **ops-bridge** reachability and **State Hub** registration
|
||||
4. Provide a clear migration path for e2e-framework and build-machines callers
|
||||
5. Enable activity-core and agents to request sandboxes without workstation repo
|
||||
paths as a hard dependency
|
||||
1. **Charter and research** — `INTENT.md`, `research/`, profile schema draft
|
||||
2. **First self-hosted extension** — `ext.compose-ssh` from e2e-framework lineage
|
||||
3. **Unified API v0** — create / get / destroy / recreate + State Hub registration
|
||||
4. **First profile** — `profile.compose-e2e` for wise-validator migration
|
||||
5. **Registry entry** — `capability.execution.sandbox-provision` via reuse-surface
|
||||
6. **Extension SDK sketch** — contract for P1 backends (vm-packer, Daytona OSS)
|
||||
7. **Sibling integration notes** — glas-harness, wise-validator, snuggle-inventor API expectations documented
|
||||
|
||||
---
|
||||
|
||||
## Maturity target
|
||||
|
||||
A mature sand-boxer should be the **standard execution venue** for agentic
|
||||
development in Custodian: Railiance01 decides *when* work runs; sand-boxer
|
||||
decides *where* isolated execution happens; State Hub records *what* changed.
|
||||
The workstation is optional — used for human preference, not as a single point
|
||||
of runtime failure.
|
||||
A mature sand-boxer is Coulomb's **default way to establish any sandbox**:
|
||||
|
||||
- glas-harness requests agent dev sandboxes without choosing Docker vs Modal vs SSH
|
||||
- wise-validator requests validation environments without owning provisioners
|
||||
- snuggle-inventor requests build sandboxes with setup metadata and secret refs
|
||||
- activity-core and CI request bounded venues with consistent lifecycle visibility
|
||||
- Operators route spend across self-hosted and SaaS with one credits model
|
||||
- A Coulomb-native runtime — if warranted — wins on ops data, not speculation
|
||||
|
||||
The workstation is optional. The harness is not sand-boxer. The validator is not
|
||||
sand-boxer. The inventor is not sand-boxer. **Establishing the box is.**
|
||||
235
SCOPE.md
Normal file
235
SCOPE.md
Normal file
@@ -0,0 +1,235 @@
|
||||
---
|
||||
domain: infotech
|
||||
repo: sand-boxer
|
||||
updated: "2026-06-22"
|
||||
---
|
||||
|
||||
# SCOPE
|
||||
|
||||
> This file helps you quickly understand what this repository is about,
|
||||
> when it is relevant, and when it is not.
|
||||
> It is intentionally lightweight and may be incomplete until implementation lands.
|
||||
|
||||
---
|
||||
|
||||
## One-liner
|
||||
|
||||
Sandbox provisioning and profile catalog for Custodian — isolated execution
|
||||
environments where agents and automations can develop, build, and test without
|
||||
depending on the workstation filesystem or blast radius.
|
||||
|
||||
---
|
||||
|
||||
## Core Idea
|
||||
|
||||
sand-boxer is the **execution isolation and provisioning service** for agentic
|
||||
development and related workloads in the Custodian ecosystem. It answers where
|
||||
work can run safely, how isolation is enforced, how sandboxes phone home, and
|
||||
what happened during their lifecycle.
|
||||
|
||||
A **sandbox profile** is a named, versioned recipe (compose stack, VM image,
|
||||
future cluster worker) with documented inputs, outputs, host placement, TTL,
|
||||
and teardown guarantees. Operators and agents request a profile; sand-boxer
|
||||
provisions an isolated environment on a registered host, exposes reachability
|
||||
through ops-bridge (without owning tunnels), registers lifecycle state with
|
||||
State Hub, and tears down on expiry or explicit release.
|
||||
|
||||
The repo consolidates patterns today split across `the-custodian`:
|
||||
`e2e-framework/` (SSH + compose sandboxes for cross-repo e2e) and
|
||||
`infra/build-machines/` (Packer VMs with build-agent self-registration).
|
||||
|
||||
---
|
||||
|
||||
## In Scope
|
||||
|
||||
- **Sandbox profile catalog** — versioned definitions for compose-based e2e
|
||||
stacks, VM images, and future worker patterns; inputs, outputs, and teardown
|
||||
contracts documented per profile
|
||||
- **Provision / wait / teardown lifecycle** — TTL, idempotent cleanup, port and
|
||||
network conventions, observable states (create → ready → active → expired →
|
||||
destroyed)
|
||||
- **Host placement policy** — which profiles run on sandboxer01, CoulombCore
|
||||
interim, or other registered hosts; blast-radius isolation from Railiance01
|
||||
production
|
||||
- **CLI and/or API** — request, inspect, and release sandboxes for operators
|
||||
(`adm`), agents (`agt`), and automations (`atm`)
|
||||
- **State Hub registration contract** — extend the `build-agent` self-register
|
||||
pattern to generic sandbox identities and lifecycle events
|
||||
- **Capability registry entries** in `registry/` for federation via
|
||||
reuse-surface (e.g. `capability.execution.sandbox-provision`)
|
||||
- **Runbooks, templates, and tests** — Packer/compose bundles, operator
|
||||
runbooks, and automated tests for profile lifecycle
|
||||
- **Migration path** — documented cutover from `the-custodian/e2e-framework`
|
||||
and `infra/build-machines` callers to sand-boxer profiles
|
||||
- **Agent and workplan metadata** — `INTENT.md`, `AGENTS.md`, `workplans/`,
|
||||
and State Hub progress/decision logging per ADR-001
|
||||
|
||||
---
|
||||
|
||||
## Out of Scope
|
||||
|
||||
| Concern | Owner |
|
||||
|---------|--------|
|
||||
| Workstream, task, and progress state | `state-hub` |
|
||||
| Cron and event-triggered orchestration | `activity-core` |
|
||||
| SSH reverse tunnels and tunnel health | `ops-bridge` |
|
||||
| SSH certificate issuance | `ops-warden` |
|
||||
| Canon, charters, agent instruction canon | `the-custodian` |
|
||||
| Capability index federation hub | `reuse-surface` |
|
||||
| Production service deployment on Railiance01 | `railiance-apps` / domain repos |
|
||||
| Railiance01 cluster operations | `railiance-cluster` / `railiance-infra` |
|
||||
| ADR-001 workplan ↔ DB reconciliation | `state-hub` (`consistency_check.py`) |
|
||||
|
||||
sand-boxer may **consume** connectivity (ops-bridge) and certificates
|
||||
(ops-warden); it must not duplicate or subsume those authorities.
|
||||
|
||||
Additional boundaries:
|
||||
|
||||
- **Scheduling** — activity-core decides *when* work runs; sand-boxer decides
|
||||
*where* isolated execution happens
|
||||
- **Workstation as runtime** — the laptop/WSL anchor is interim control plane,
|
||||
not the target execution surface
|
||||
- **Irreversible operational decisions** — host provisioning, production
|
||||
cutovers, and CA policy changes require human approval
|
||||
|
||||
---
|
||||
|
||||
## Relevant When
|
||||
|
||||
- An agent or automation needs an isolated environment for coding, building, or
|
||||
testing without laptop filesystem dependence
|
||||
- Cross-repo e2e tests need a remote compose sandbox with guaranteed teardown
|
||||
- A build or verification workload should run on dedicated hardware
|
||||
(sandboxer01) rather than Railiance01 production or the workstation
|
||||
- activity-core or CI needs a bounded execution venue with State Hub visibility
|
||||
- Planning reuse of sandbox provisioning across repos (registry-first discovery)
|
||||
|
||||
---
|
||||
|
||||
## Not Relevant When
|
||||
|
||||
- All work runs locally with acceptable blast radius
|
||||
- Only tunnel connectivity is needed (use `ops-bridge` directly)
|
||||
- Only task/workstream state is needed (use `state-hub`)
|
||||
- Only scheduling or rule evaluation is needed (use `activity-core`)
|
||||
- Deploying or operating production services on Railiance01
|
||||
|
||||
---
|
||||
|
||||
## Current State
|
||||
|
||||
- **Status:** bootstrap — repo registered with State Hub; charter written;
|
||||
implementation not started
|
||||
- **Implementation:** ~0% — no CLI, API, profiles, provisioner, or tests in tree
|
||||
- **Docs:** `INTENT.md` (charter, 2026-06-21); `README.md` (one-liner);
|
||||
`AGENTS.md` and `.custodian-brief.md` (State Hub integration, generated)
|
||||
- **Registry:** scaffold present (`registry/indexes/capabilities.yaml` empty;
|
||||
`registry/capabilities/` placeholder); domain in index still `helix_forge`
|
||||
from scaffold — needs alignment to `infotech`
|
||||
- **Workplans:** `SAND-WP-0001` (State Hub bootstrap) in `ready`
|
||||
- **Lineage (external, not yet migrated):** `the-custodian/e2e-framework/`
|
||||
(CUST-WP-0028, completed) and `infra/build-machines/` (CUST-WP-0032)
|
||||
|
||||
---
|
||||
|
||||
## What Is Possible Now
|
||||
|
||||
- Read the charter (`INTENT.md`) and integration instructions (`AGENTS.md`)
|
||||
- Track bootstrap tasks via `workplans/SAND-WP-0001-statehub-bootstrap.md`
|
||||
- Log progress and decisions to State Hub when the hub is reachable
|
||||
- Use **interim** sandbox execution via `the-custodian` directly:
|
||||
- `make e2e REPO=<repo>` (e2e-framework on railiance01/CoulombCore)
|
||||
- `infra/build-machines/` Packer VMs with build-agent registration
|
||||
|
||||
Nothing in **this repo** provisions or manages sandboxes yet.
|
||||
|
||||
---
|
||||
|
||||
## What Is Not Possible Yet
|
||||
|
||||
- Request a sandbox through sand-boxer CLI or API
|
||||
- Select a named, versioned profile from this repo's catalog
|
||||
- Register `capability.execution.sandbox-provision` (index entry absent)
|
||||
- Automatic lifecycle registration of generic sandbox identities in State Hub
|
||||
- Host placement on sandboxer01 via sand-boxer policy (host may not exist yet)
|
||||
- activity-core or agents invoking sand-boxer without workstation repo paths
|
||||
- Local install/test/lint/build commands documented for this repo (no package
|
||||
layout yet)
|
||||
|
||||
---
|
||||
|
||||
## How It Fits
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
AC[activity-core] -->|when| SB[sand-boxer]
|
||||
AGT[agents / atm] -->|request sandbox| SB
|
||||
SB -->|provision / teardown| HOST[sandboxer01 / interim host]
|
||||
SB -->|lifecycle events| SH[state-hub]
|
||||
SB -->|reachability| OB[ops-bridge]
|
||||
SB -->|SSH identity| OW[ops-warden]
|
||||
RS[reuse-surface] -->|federate| REG[registry/]
|
||||
TC[the-custodian e2e + build-machines] -.->|migrate from| SB
|
||||
```
|
||||
|
||||
- **Upstream dependencies:** ops-bridge (tunnels), ops-warden (certs, optional),
|
||||
State Hub (registration API), registered sandbox hosts (SSH + Docker/Packer)
|
||||
- **Downstream consumers:** LLM agents, activity-core instructions, CI hooks,
|
||||
cross-repo e2e callers migrating off `the-custodian`
|
||||
- **Often used with:** `activity-core` (orchestration), `state-hub` (visibility),
|
||||
`reuse-surface` (capability discovery)
|
||||
|
||||
---
|
||||
|
||||
## Terminology
|
||||
|
||||
- **Profile** — named, versioned sandbox type with provision/teardown contract
|
||||
- **Sandbox** — a running isolated environment instance of a profile
|
||||
- **Host placement** — policy mapping profiles to sandboxer01, CoulombCore, etc.
|
||||
- **TTL** — time-to-live; sandboxes are disposable by default
|
||||
- **Phone home** — reachability and registration via ops-bridge + State Hub
|
||||
- Actor types (consumers): `adm` (operator), `agt` (LLM agent), `atm` (automation)
|
||||
|
||||
---
|
||||
|
||||
## Related / Overlapping
|
||||
|
||||
- `the-custodian` — current home of e2e-framework and build-machines; governance
|
||||
canon; sand-boxer extracts reusable execution platform from here
|
||||
- `ops-bridge` — SSH reverse tunnels; sand-boxer orchestrates reachability, does
|
||||
not run tunnel daemons
|
||||
- `ops-warden` — SSH CA and certificate issuance
|
||||
- `state-hub` — workstream/task state and sandbox lifecycle visibility
|
||||
- `activity-core` — schedules work; may request sandboxes as execution venue
|
||||
- `reuse-surface` — federates `registry/` capability entries
|
||||
- `railiance-cluster` / `railiance-apps` — production layer; explicitly not
|
||||
sandbox execution surface
|
||||
|
||||
---
|
||||
|
||||
## Provided Capabilities
|
||||
|
||||
*Planned — not yet registered in `registry/indexes/capabilities.yaml`.*
|
||||
|
||||
```capability
|
||||
type: execution
|
||||
title: Sandbox provisioning
|
||||
description: Isolated execution environments for agentic development, e2e testing, and bounded automations — profile-based provision, TTL teardown, and State Hub lifecycle registration.
|
||||
keywords: [sandbox, isolation, provision, e2e, agentic, execution, profile]
|
||||
```
|
||||
|
||||
Target registry id: `capability.execution.sandbox-provision` (or equivalent per
|
||||
reuse-surface naming).
|
||||
|
||||
---
|
||||
|
||||
## Getting Oriented
|
||||
|
||||
- Start with: `INTENT.md` (meta-framework charter)
|
||||
- Research: `research/` (landscape, reference systems, design synthesis)
|
||||
- Agent instructions: `AGENTS.md` (State Hub session protocol)
|
||||
- Offline brief: `.custodian-brief.md`
|
||||
- Workplans: `workplans/` (bootstrap: `SAND-WP-0001`)
|
||||
- Registry authoring: `registry/README.md`
|
||||
- Lineage reference (external): `the-custodian/e2e-framework/RUNBOOK.md`,
|
||||
`the-custodian/infra/build-machines/README.md`
|
||||
153
research/01-agent-sandbox-landscape.md
Normal file
153
research/01-agent-sandbox-landscape.md
Normal file
@@ -0,0 +1,153 @@
|
||||
# Agent sandbox landscape (2026)
|
||||
|
||||
Survey of modern sandbox infrastructure for agentic coding — isolation
|
||||
technologies, provider models, and industry convergence patterns relevant to
|
||||
sand-boxer.
|
||||
|
||||
## Market definition
|
||||
|
||||
**AI agent sandboxes** are isolated execution environments for running
|
||||
AI-generated or agent-requested code safely. They optimize for:
|
||||
|
||||
- Fast create / resume / teardown
|
||||
- Programmatic lifecycle APIs
|
||||
- Isolation from host and peer workloads
|
||||
- Developer- and agent-friendly SDKs
|
||||
|
||||
This is distinct from general application hosting and from agent harnesses
|
||||
(memory, channels, tool orchestration).
|
||||
|
||||
## Provider landscape (summary)
|
||||
|
||||
| Platform | Model | Creation | Persist / checkpoint | Isolation | Notes |
|
||||
|----------|-------|----------|----------------------|-----------|-------|
|
||||
| **E2B** | Managed SaaS | ~150ms | Pause/resume, snapshots | Firecracker | Scale leader; template + sandbox API |
|
||||
| **Daytona** | Managed + OSS | ~90ms | Snapshots, fork | Docker/Kata | Open-source self-host path |
|
||||
| **Modal** | Serverless SaaS | Sub-second | Memory snapshots, volumes | gVisor | Strong GPU; code-defined runtime |
|
||||
| **Blaxel** | Managed | Sub-25ms resume | Hibernate | microVM | Zero idle compute billing |
|
||||
| **Vercel Sandbox** | Managed | ms | Snapshots, persistent default | Firecracker | Vercel ecosystem |
|
||||
| **Cloudflare Sandbox SDK** | Edge | seconds / ms (isolates) | DO state | Containers / V8 | Workers-native |
|
||||
| **AWS AgentCore** | Managed sessions | — | Session ≤8h | microVM | Hyperscaler bundling |
|
||||
| **Google Agent Sandbox** | Managed preview | Sub-second | TTL ≤14d | Hardened containers | Gemini Enterprise layer |
|
||||
| **OpenSandbox** | Self-hosted OSS | Pool pre-warm | Pause/resume, PVC | gVisor/Kata/Firecracker | K8s-scale; CNCF Landscape |
|
||||
| **OpenShell** | Policy runtime | — | Long-lived sandboxes | Landlock/seccomp/OPA | Governance layer, not hosted platform |
|
||||
| **Northflank** | BYOC + managed | ~200ms | Persistent | microVM/gVisor | VPC deployment |
|
||||
| **Runloop** | Managed | ~100ms exec | Snapshot, branch | Custom hypervisor | SWE-bench / eval focus |
|
||||
| **Sprites** | Managed | 1–2s | ~300ms checkpoints | Firecracker | Persistent-first |
|
||||
| **ComputeSDK** | Abstraction | Varies | Varies | Varies | Multi-provider router (9 backends) |
|
||||
|
||||
Sources: [Ry Walker research (Jun 2026)](https://rywalker.com/research/ai-agent-sandboxes),
|
||||
provider docs, Modal/E2B marketing materials. Treat vendor claims as directional.
|
||||
|
||||
## Isolation technology spectrum
|
||||
|
||||
| Technology | Used by | Security level | Performance |
|
||||
|------------|---------|----------------|-------------|
|
||||
| **Firecracker** | E2B, Sprites, Vercel | Hardware-level microVM | Fast |
|
||||
| **gVisor / Kata** | Modal, Northflank, OpenSandbox | Kernel-level | Very fast |
|
||||
| **Hardened Docker** | Daytona, AIO Sandbox | Container-level | Fastest setup |
|
||||
| **Landlock / seccomp / OPA** | OpenShell | Kernel policy | Native speed |
|
||||
| **V8 isolates** | Cloudflare Worker Loader | Process-level | Milliseconds |
|
||||
|
||||
**Implication for sand-boxer:** profile metadata must declare `isolation_level`
|
||||
so consumers can reason about blast radius. Extensions map profiles to concrete
|
||||
runtimes; the meta-framework does not mandate one technology.
|
||||
|
||||
## Convergence trends (2025 → 2026)
|
||||
|
||||
### 1. Ephemeral vs persistent collapsed
|
||||
|
||||
Early market split (E2B = ephemeral, Sprites = persistent) has merged. Most
|
||||
platforms now offer:
|
||||
|
||||
- Persistent workspace by default or as first-class option
|
||||
- Checkpoint / snapshot / hibernate for fast resume
|
||||
- TTL and explicit teardown still expected for cost and security
|
||||
|
||||
**sand-boxer takeaway:** profiles should support `persistence: ephemeral |
|
||||
persistent | checkpoint` as a first-class dimension, not a backend detail.
|
||||
|
||||
### 2. Checkpointing is table stakes
|
||||
|
||||
Sub-second to low-second restore times are becoming baseline for agent coding
|
||||
(workspace state, installed deps, shell history — not always live PIDs).
|
||||
|
||||
**sand-boxer takeaway:** lifecycle API needs `snapshot`, `restore`, `fork`
|
||||
operations even if early extensions only implement `recreate`.
|
||||
|
||||
### 3. Security stress-tests exposed limits
|
||||
|
||||
Research on AWS AgentCore and OpenShell/NemoClaw showed that **allowed egress
|
||||
paths** (git, npm, curl, node to allowlisted hosts) can be weaponized for
|
||||
exfiltration when agents are prompt-injected or tricked into malicious
|
||||
dependencies. Policy controls *destination*, not *intent*.
|
||||
|
||||
**sand-boxer takeaway:** document honestly that sandboxing is blast-radius
|
||||
control, not agent-behavior guarantee. Default-deny network; per-profile egress
|
||||
allowlists; secrets injected at boundary, never in agent-visible workspace.
|
||||
|
||||
### 4. Hyperscaler bundling pressures independents
|
||||
|
||||
AWS, Google, Cloudflare, Vercel entered the category in one quarter.
|
||||
Independents compete on multi-cloud neutrality, price, isolation depth, or
|
||||
open-source self-host.
|
||||
|
||||
**sand-boxer takeaway:** OpenRouter-style routing across self-hosted and SaaS
|
||||
backends is a defensible Coulomb position — no single-vendor lock-in.
|
||||
|
||||
### 5. Abstraction layers emerging
|
||||
|
||||
ComputeSDK routes one TypeScript API across E2B, Modal, Daytona, Runloop,
|
||||
Cloudflare, Vercel, etc. — "Terraform for running other people's code."
|
||||
|
||||
**sand-boxer takeaway:** validate the meta-framework API against this pattern;
|
||||
extensions are providers; sand-boxer core is router + policy + billing + registry.
|
||||
|
||||
## Architecture patterns (industry)
|
||||
|
||||
### Gateway / harness vs runtime (universal split)
|
||||
|
||||
```
|
||||
[Agent gateway / harness] ──orchestrates──▶ [Sandbox runtime]
|
||||
(host or control plane) (isolated)
|
||||
```
|
||||
|
||||
OpenClaw and Hermes both keep the gateway on the host and run **tool execution**
|
||||
in the sandbox. sand-boxer owns the runtime side only; **glas-harness** owns the
|
||||
gateway/harness side (see `03-meta-framework-synthesis.md`).
|
||||
|
||||
### Profile + backend + scope (OpenClaw / Hermes consensus)
|
||||
|
||||
| Dimension | Examples |
|
||||
|-----------|----------|
|
||||
| **Backend** | docker, ssh, openshell, modal, daytona, compose-ssh |
|
||||
| **Scope** | per-agent, per-session, shared |
|
||||
| **Workspace** | isolated, ro-mount, rw-mount; mirror vs remote-canonical |
|
||||
| **Network** | default deny; optional allowlist |
|
||||
| **TTL** | mandatory; idle reaper optional |
|
||||
|
||||
### Credential and reachability boundary
|
||||
|
||||
Best practice: credentials brokered at sandbox edge (OpenShell proxy, Blitzy
|
||||
secrets-never-to-AI, ops-warden certs). Agent process never holds production
|
||||
tokens for unrelated systems.
|
||||
|
||||
sand-boxer integrates **ops-bridge** (reachability) and **ops-warden**
|
||||
(identity) as consumers — does not replace them.
|
||||
|
||||
## What sand-boxer should adopt vs defer
|
||||
|
||||
| Adopt now (meta-framework) | Defer (extension or phase 2) |
|
||||
|----------------------------|------------------------------|
|
||||
| Unified provision/teardown API | GPU profiles |
|
||||
| Named versioned profiles | Browser sandbox profiles |
|
||||
| Extension plugin interface | Intent-aware egress filtering |
|
||||
| Self-hosted compose-ssh (e2e lineage) | Native Firecracker control plane |
|
||||
| State Hub lifecycle registration | Multi-region routing |
|
||||
| Default-deny network policy | Computer Use / desktop sandboxes |
|
||||
| Payments routing for SaaS backends | Owned hyperscale sandbox fleet |
|
||||
|
||||
## Related reading
|
||||
|
||||
- [02-reference-frameworks.md](02-reference-frameworks.md) — OpenClaw, Hermes, Blitzy
|
||||
- [03-meta-framework-synthesis.md](03-meta-framework-synthesis.md) — API and extensions
|
||||
204
research/02-reference-frameworks.md
Normal file
204
research/02-reference-frameworks.md
Normal file
@@ -0,0 +1,204 @@
|
||||
# Reference frameworks and platforms
|
||||
|
||||
Deep dives on systems sand-boxer should learn from — especially OpenClaw,
|
||||
Hermes Agent, Blitzy, and OpenShell — plus hosted platforms as extension
|
||||
targets.
|
||||
|
||||
---
|
||||
|
||||
## OpenClaw
|
||||
|
||||
**What it is:** Personal AI assistant with optional tool sandboxing.
|
||||
**Docs:** https://docs.openclaw.ai/gateway/sandboxing
|
||||
|
||||
### Role in the stack
|
||||
|
||||
OpenClaw is an **agent harness** (gateway, channels, skills, memory). Sandboxing
|
||||
is optional configuration on tool execution — not the product core. This is the
|
||||
same boundary sand-boxer draws vs **glas-harness**.
|
||||
|
||||
### Sandbox architecture
|
||||
|
||||
**What gets sandboxed:** `exec`, `read`, `write`, `edit`, `apply_patch`,
|
||||
`process`, optional sandboxed browser. Gateway stays on host.
|
||||
|
||||
**Backends:**
|
||||
|
||||
| Backend | Where | Workspace model |
|
||||
|---------|-------|-----------------|
|
||||
| `docker` | Local container | Bind-mount or copy; default `network: "none"` |
|
||||
| `ssh` | Remote SSH host | Remote-canonical: seed once, exec remotely |
|
||||
| `openshell` | OpenShell-managed | `mirror` (local canonical) or `remote` (remote canonical) |
|
||||
|
||||
**Scope:** `agent` (default) | `session` | `shared` — controls container count.
|
||||
|
||||
**Mode:** `off` | `non-main` | `all` — when sandboxing applies.
|
||||
|
||||
**Workspace access:** `none` | `ro` | `rw` — what tools can see.
|
||||
|
||||
### Security patterns worth copying
|
||||
|
||||
- Default Docker network **none**
|
||||
- Bind-mount blocklist: `docker.sock`, `/etc`, `~/.ssh`, `~/.aws`, credential roots
|
||||
- Symlink-aware path validation before bind approval
|
||||
- `tools.elevated` as explicit sandbox bypass (audited escape hatch)
|
||||
- Honest disclaimer: reduces blast radius, not perfect boundary
|
||||
|
||||
### sand-boxer lessons
|
||||
|
||||
1. **Backend / scope / workspaceAccess** vocabulary is proven — adopt in profile schema
|
||||
2. **SSH remote-canonical** matches Custodian e2e-framework evolution path
|
||||
3. **mirror vs remote** workspace modes belong in meta-framework API
|
||||
4. OpenClaw integrates OpenShell as extension — validates extension-delegation model
|
||||
|
||||
---
|
||||
|
||||
## Hermes Agent
|
||||
|
||||
**What it is:** Agent harness from Nous Research with multi-backend terminal execution.
|
||||
**Repo:** https://github.com/NousResearch/hermes-agent
|
||||
|
||||
### Terminal backends (six)
|
||||
|
||||
| Backend | Isolation | Persistence |
|
||||
|---------|-----------|-------------|
|
||||
| `local` | None | — |
|
||||
| `docker` | Cap-drop ALL, pids-limit, tmpfs | Single long-lived labeled container |
|
||||
| `ssh` | Network boundary | Persistent remote shell |
|
||||
| `modal` | Cloud VM | Filesystem snapshots |
|
||||
| `daytona` | Cloud container | Stop/resume |
|
||||
| `singularity` | HPC namespaces | Writable overlay |
|
||||
|
||||
### Docker backend highlights
|
||||
|
||||
- **One container per task**, reused across sessions and Hermes process restarts
|
||||
- Labels: `hermes-agent=1`, `hermes-task-id`, `hermes-profile`
|
||||
- `docker_persist_across_processes: true` (default) — container survives process exit
|
||||
- Resource limits: CPU, memory, disk, `lifetime_seconds` idle reaper
|
||||
- `docker_forward_env` — secrets from host `.env`, not config YAML
|
||||
- Parallel subagents **share** container unless per-task image override
|
||||
|
||||
### sand-boxer lessons
|
||||
|
||||
1. **Labeled reuse** beats cold provision per tool call for agent coding efficiency
|
||||
2. Resource limits and idle reaper are profile-level concerns
|
||||
3. Modal/Daytona as **extension backends** — Hermes consumes, does not own
|
||||
4. Credential forwarding policy belongs in extension contract, not agent config
|
||||
|
||||
---
|
||||
|
||||
## NVIDIA OpenShell + NemoClaw (Hermes deployment)
|
||||
|
||||
**OpenShell:** Policy runtime for agent sandboxes — Landlock, seccomp, OPA egress.
|
||||
**NemoClaw:** Reference stack deploying Hermes inside OpenShell.
|
||||
|
||||
### Three-layer model (industry pattern)
|
||||
|
||||
| Layer | Component | Responsibility |
|
||||
|-------|-----------|----------------|
|
||||
| Model | LLM provider | Reasoning |
|
||||
| Harness | Hermes | Skills, memory, bridges, scheduling |
|
||||
| Runtime | OpenShell | Filesystem/network policy, credential brokering |
|
||||
|
||||
sand-boxer maps to **runtime** only. glas-harness maps to **harness**.
|
||||
|
||||
### Policy model
|
||||
|
||||
Declarative YAML: allowed hosts, ports, HTTP methods, **binary-scoped** rules
|
||||
(e.g. only `curl` may reach `api.github.com`). Credentials injected at egress
|
||||
proxy — agent never sees Slack/Outlook tokens.
|
||||
|
||||
### Snapshot / restore
|
||||
|
||||
NemoClaw ships `snapshot.sh` / `restore.sh` for agent state (skills, memories,
|
||||
sessions) across redeploys. Credential filter excludes secrets from tarballs.
|
||||
|
||||
### Security research (Lasso, Apr 2026)
|
||||
|
||||
Demonstrated exfiltration via **policy-permitted** paths (git PR, npm postinstall
|
||||
→ Discord). Policies enforced correctly; intent not evaluated.
|
||||
|
||||
**sand-boxer lesson:** OpenShell-class extensions should be offered; security
|
||||
runbooks must state limits of egress allowlisting.
|
||||
|
||||
---
|
||||
|
||||
## Blitzy
|
||||
|
||||
**What it is:** AI-native code generation platform — **not** a sandbox runtime.
|
||||
|
||||
### "Blitzy Sandbox" GitHub org
|
||||
|
||||
Public demo repos for Explore members. Not execution infrastructure.
|
||||
|
||||
### Real isolation model: Environments
|
||||
|
||||
https://docs.blitzy.com/administration/environments
|
||||
|
||||
- Natural-language **setup instructions** (toolchain, build, run, test)
|
||||
- **Variables** (plaintext) vs **Secrets** (encrypted, masked, **never sent to AI**)
|
||||
- Multi-environment priority merge (base + project override)
|
||||
- Validation in configured environment after code generation
|
||||
|
||||
### sand-boxer lessons (environment metadata, not runtime)
|
||||
|
||||
| Blitzy pattern | sand-boxer mapping |
|
||||
|----------------|-------------------|
|
||||
| Environment config | Profile `setup` metadata block |
|
||||
| Secrets never to AI | `secret_refs` resolved at provision boundary |
|
||||
| Setup instructions | Profile runbook for extension bootstrap |
|
||||
| Human review gates | Out of scope — **snuggle-inventor** / PR workflow |
|
||||
|
||||
Blitzy validates that **describing how to boot an environment** is as important
|
||||
as **where it runs**. sand-boxer profiles carry both.
|
||||
|
||||
---
|
||||
|
||||
## Hosted platforms as extension targets
|
||||
|
||||
sand-boxer extensions may delegate to SaaS providers. Initial extension candidates:
|
||||
|
||||
| Extension id | Provider | Self-host alt | Payments |
|
||||
|--------------|----------|---------------|----------|
|
||||
| `ext.e2b` | E2B | — | Per-second SaaS |
|
||||
| `ext.modal` | Modal | — | Per-second + GPU |
|
||||
| `ext.daytona` | Daytona cloud | `ext.daytona-self` (OSS) | SaaS or infra cost |
|
||||
| `ext.openshell` | — | OpenShell local/k3s | Infra cost |
|
||||
| `ext.compose-ssh` | — | sandboxer01 / CoulombCore | Infra cost |
|
||||
| `ext.vm-packer` | — | build-machines lineage | Infra cost |
|
||||
|
||||
ComputeSDK (https://github.com/computesdk/computesdk) is a useful reference for
|
||||
normalizing provider differences behind one client API.
|
||||
|
||||
---
|
||||
|
||||
## OpenRouter analogy
|
||||
|
||||
| OpenRouter | sand-boxer |
|
||||
|------------|------------|
|
||||
| Unified LLM API | Unified sandbox API |
|
||||
| Routes to OpenAI, Anthropic, … | Routes to E2B, Modal, self-hosted compose, … |
|
||||
| API keys / credits / billing | Payments layer for SaaS consumption |
|
||||
| Model metadata (context, price) | Profile metadata (isolation, cost, latency) |
|
||||
| Fallback / routing policy | Host placement + extension fallback |
|
||||
|
||||
sand-boxer does not run inference; it runs **isolation**. The routing and
|
||||
payments patterns transfer directly.
|
||||
|
||||
---
|
||||
|
||||
## Anti-patterns to avoid
|
||||
|
||||
| Anti-pattern | Why |
|
||||
|--------------|-----|
|
||||
| Rebuild OpenClaw/Hermes gateway in sand-boxer | glas-harness scope |
|
||||
| Embed e2e test orchestration in provisioner | wise-validator scope |
|
||||
| Generate code inside sandbox API | snuggle-inventor scope |
|
||||
| Own SSH tunnels or CA | ops-bridge / ops-warden scope |
|
||||
| Claim sandbox = safe from prompt injection | Research disproves |
|
||||
|
||||
## Related reading
|
||||
|
||||
- [01-agent-sandbox-landscape.md](01-agent-sandbox-landscape.md)
|
||||
- [03-meta-framework-synthesis.md](03-meta-framework-synthesis.md)
|
||||
- `INTENT.md` — normative charter
|
||||
294
research/03-meta-framework-synthesis.md
Normal file
294
research/03-meta-framework-synthesis.md
Normal file
@@ -0,0 +1,294 @@
|
||||
# Meta-framework synthesis
|
||||
|
||||
Design notes distilled from landscape research for sand-boxer's unified sandbox
|
||||
API, extension model, payments layer, and Coulomb project boundaries.
|
||||
|
||||
---
|
||||
|
||||
## Core thesis
|
||||
|
||||
sand-boxer is a **meta-framework for establishing sandboxes** — like OpenRouter
|
||||
is a meta-framework for accessing LLM models:
|
||||
|
||||
- One **consistent API** for consumers (`adm`, `agt`, `atm`, domain services)
|
||||
- Many **extensions** that delegate to self-hosted or SaaS sandbox systems
|
||||
- **Integrated payments** when consuming metered external services
|
||||
- **Registry-first** profiles and capabilities via reuse-surface
|
||||
- **Later:** a Coulomb-native "best of brands" runtime built from operational
|
||||
experience — not day one
|
||||
|
||||
sand-boxer provisions **where and how code runs**. It does not provision **how
|
||||
agents think**, **what tests mean**, or **what code gets written**.
|
||||
|
||||
---
|
||||
|
||||
## Coulomb project boundaries
|
||||
|
||||
These sibling projects are **planned Coulomb repos** with explicit authority
|
||||
split. sand-boxer must not absorb their concerns.
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph establish [sand-boxer]
|
||||
SB[Establish sandbox]
|
||||
end
|
||||
|
||||
subgraph harness [glas-harness]
|
||||
GH[Agent harness: gateway tools memory channels]
|
||||
end
|
||||
|
||||
subgraph validate [wise-validator]
|
||||
WV[E2E tests health checks validation orchestration]
|
||||
end
|
||||
|
||||
subgraph generate [snuggle-inventor]
|
||||
SI[Code generation modernization]
|
||||
end
|
||||
|
||||
GH -->|request sandbox| SB
|
||||
WV -->|request sandbox| SB
|
||||
SI -->|request sandbox| SB
|
||||
WV -.->|runs tests in| SB
|
||||
GH -.->|executes tools in| SB
|
||||
SI -.->|validates output in| SB
|
||||
```
|
||||
|
||||
| Project | Owns | Does not own |
|
||||
|---------|------|--------------|
|
||||
| **sand-boxer** | Sandbox profiles, provision/teardown, extension routing, placement, lifecycle registration, payments for sandbox consumption | Agent memory, channels, tool policies, test definitions, code generation |
|
||||
| **glas-harness** | Agent gateway, harness, skills, subagents, tool orchestration, channel bridges | Sandbox runtime, isolation enforcement, host placement |
|
||||
| **wise-validator** | E2E test orchestration, health check semantics, validation workflows, result reporting | Sandbox provisioning, agent conversation state |
|
||||
| **snuggle-inventor** | Code generation, tech specs, AAP-style planning, PR-oriented output | Sandbox infrastructure, test harness canon |
|
||||
|
||||
### Integration contracts (intended)
|
||||
|
||||
**glas-harness → sand-boxer**
|
||||
|
||||
```
|
||||
POST /v1/sandboxes
|
||||
profile: "profile.agent-dev"
|
||||
scope: session | agent | shared
|
||||
workspace: { mode: mirror | remote, access: none | ro | rw }
|
||||
consumer: { actor: agt, harness: glas-harness, session_id }
|
||||
```
|
||||
|
||||
Harness receives: `sandbox_id`, reachability descriptor (SSH endpoint, tunnel ref),
|
||||
lifecycle webhook or poll URL. Harness executes tools **inside** sandbox via
|
||||
agreed exec channel — sand-boxer does not parse tool calls.
|
||||
|
||||
**wise-validator → sand-boxer**
|
||||
|
||||
```
|
||||
POST /v1/sandboxes
|
||||
profile: "profile.compose-e2e"
|
||||
inputs: { repo_ref, compose_bundle_ref }
|
||||
ttl: 2h
|
||||
consumer: { actor: atm, harness: wise-validator, run_id }
|
||||
```
|
||||
|
||||
wise-validator owns `e2e.yml` semantics, health check definitions, test commands,
|
||||
and pass/fail interpretation. sand-boxer delivers an environment; wise-validator
|
||||
runs the validation story **on top**.
|
||||
|
||||
**snuggle-inventor → sand-boxer**
|
||||
|
||||
```
|
||||
POST /v1/sandboxes
|
||||
profile: "profile.build"
|
||||
setup_metadata: { instructions_ref, secret_refs }
|
||||
consumer: { actor: agt, harness: snuggle-inventor, job_id }
|
||||
```
|
||||
|
||||
snuggle-inventor may attach Blitzy-style setup instructions as profile inputs.
|
||||
sand-boxer resolves secrets at boundary; generated code never flows through
|
||||
sand-boxer APIs.
|
||||
|
||||
### Migration from the-custodian
|
||||
|
||||
| Legacy | New owner |
|
||||
|--------|-----------|
|
||||
| `e2e-framework/` provision/teardown | sand-boxer `ext.compose-ssh` |
|
||||
| `e2e-framework/` test run + report | wise-validator (calls sand-boxer) |
|
||||
| Agent tool sandbox config | glas-harness (calls sand-boxer) |
|
||||
| `infra/build-machines/` | sand-boxer `ext.vm-packer` |
|
||||
|
||||
---
|
||||
|
||||
## Meta-framework API (conceptual)
|
||||
|
||||
### Resources
|
||||
|
||||
| Resource | Description |
|
||||
|----------|-------------|
|
||||
| `Profile` | Named, versioned sandbox recipe (image, isolation, network, TTL, extension) |
|
||||
| `Extension` | Backend adapter (self-hosted or SaaS) |
|
||||
| `Host` | Registered placement target for self-hosted extensions |
|
||||
| `Sandbox` | Running instance of a profile |
|
||||
| `Snapshot` | Point-in-time workspace checkpoint (optional) |
|
||||
| `Route` | Extension selection policy (cost, latency, capability) |
|
||||
| `Meter` | Usage record for payments layer |
|
||||
|
||||
### Sandbox lifecycle states
|
||||
|
||||
```
|
||||
requested → provisioning → ready → active → { expired | failed } → destroying → destroyed
|
||||
```
|
||||
|
||||
All transitions emit State Hub events. `ready` means reachability probe succeeded.
|
||||
|
||||
### Core operations
|
||||
|
||||
| Operation | Description |
|
||||
|-----------|-------------|
|
||||
| `create` | Provision from profile + inputs |
|
||||
| `get` / `list` | Inspect status |
|
||||
| `exec` | Run command in sandbox (optional — may be harness-owned) |
|
||||
| `extend_ttl` | Explicit persistence extension |
|
||||
| `snapshot` / `restore` | Checkpoint workspace |
|
||||
| `recreate` | Destroy and reprovision from seed |
|
||||
| `destroy` | Idempotent teardown |
|
||||
|
||||
Early versions may expose only `create`, `get`, `destroy`, `recreate`; harnesses
|
||||
can own `exec` via SSH/tunnel without sand-boxer proxying every command.
|
||||
|
||||
### Profile schema (minimum)
|
||||
|
||||
```yaml
|
||||
id: profile.compose-e2e
|
||||
version: "1.0.0"
|
||||
extension: ext.compose-ssh
|
||||
isolation:
|
||||
level: container # container | microvm | policy
|
||||
network:
|
||||
default: deny
|
||||
egress: [] # extension interprets
|
||||
workspace:
|
||||
mode: remote-canonical # mirror | remote-canonical
|
||||
access: rw
|
||||
scope_default: session
|
||||
ttl:
|
||||
default: 4h
|
||||
max: 24h
|
||||
idle_reap: null
|
||||
resources:
|
||||
cpu: null
|
||||
memory_mb: null
|
||||
setup:
|
||||
instructions: "" # Blitzy-style natural language for extension bootstrap
|
||||
secret_refs: [] # resolved at provision; never in agent context
|
||||
placement:
|
||||
prefer: [sandboxer01]
|
||||
fallback: [coulombcore]
|
||||
reachability:
|
||||
tunnel: ops-bridge
|
||||
identity: ops-warden
|
||||
metadata:
|
||||
cost_class: self-hosted # self-hosted | saas-metered
|
||||
latency_class: standard
|
||||
```
|
||||
|
||||
### Extension interface (contract)
|
||||
|
||||
Each extension implements:
|
||||
|
||||
```text
|
||||
provision(profile, inputs, placement) → sandbox_handle
|
||||
wait_ready(sandbox_handle) → reachability
|
||||
teardown(sandbox_handle) → cleanup_report
|
||||
snapshot?(sandbox_handle) → snapshot_id
|
||||
restore?(snapshot_id) → sandbox_handle
|
||||
estimate_cost?(profile, duration) → meter_quote
|
||||
```
|
||||
|
||||
Extensions register in `registry/` with capability vectors (isolation level,
|
||||
regions, GPU, persistence, pricing model).
|
||||
|
||||
**Bundled extensions (roadmap):**
|
||||
|
||||
| Priority | Extension | Type |
|
||||
|----------|-----------|------|
|
||||
| P0 | `ext.compose-ssh` | Self-hosted (e2e-framework lineage) |
|
||||
| P1 | `ext.vm-packer` | Self-hosted (build-machines lineage) |
|
||||
| P2 | `ext.daytona-self` | Self-hosted OSS |
|
||||
| P3 | `ext.e2b`, `ext.modal`, `ext.daytona` | SaaS + payments |
|
||||
| P4 | `ext.openshell` | Policy runtime wrapper |
|
||||
|
||||
---
|
||||
|
||||
## Payments layer
|
||||
|
||||
For SaaS extensions, sand-boxer provides an **integrated payments and metering
|
||||
layer** analogous to OpenRouter credits:
|
||||
|
||||
| Concern | sand-boxer approach |
|
||||
|---------|---------------------|
|
||||
| Account credits | Org/workspace balance for sandbox consumption |
|
||||
| Metering | Per-second, per-creation, GPU surcharge — per extension quote |
|
||||
| Provider keys | BYOK optional; platform keys for convenience |
|
||||
| Cost visibility | `estimate_cost` before create; actuals on destroy |
|
||||
| Billing events | Export to fin-hub / external billing (consumer, not owner) |
|
||||
|
||||
Self-hosted extensions bill **infra cost only** (host allocation) — no SaaS meter.
|
||||
|
||||
Payments is a **facility inside sand-boxer**, not a general payment processor.
|
||||
Domain billing authority remains elsewhere.
|
||||
|
||||
---
|
||||
|
||||
## Routing policy (OpenRouter-style)
|
||||
|
||||
When multiple extensions satisfy a profile capability:
|
||||
|
||||
```yaml
|
||||
route:
|
||||
strategy: prefer-self-hosted | lowest-cost | lowest-latency | explicit
|
||||
fallback: [ext.compose-ssh, ext.daytona]
|
||||
constraints:
|
||||
max_cost_per_hour: null
|
||||
require_isolation: microvm
|
||||
region: eu
|
||||
```
|
||||
|
||||
Default Coulomb posture: **prefer-self-hosted** on sandboxer01; SaaS for burst
|
||||
or capability gaps (GPU, desktop) once extensions exist.
|
||||
|
||||
---
|
||||
|
||||
## Security posture (documented limits)
|
||||
|
||||
sand-boxer commits to:
|
||||
|
||||
1. Default-deny network unless profile explicitly allows egress
|
||||
2. Secrets resolved at provision boundary via ops-warden / secret refs
|
||||
3. Blast-radius isolation on dedicated hosts away from Railiance01 production
|
||||
4. Observable lifecycle and attributable actors (`adm` / `agt` / `atm`)
|
||||
5. Honest documentation: **allowed tool paths can be abused by compromised agents**
|
||||
|
||||
sand-boxer does **not** commit to intent-aware egress filtering in v1.
|
||||
|
||||
---
|
||||
|
||||
## Phased maturity
|
||||
|
||||
| Phase | Deliverable |
|
||||
|-------|-------------|
|
||||
| **0** | Charter, research, profile schema, `ext.compose-ssh` design |
|
||||
| **1** | Unified API + self-hosted compose-ssh + State Hub registration |
|
||||
| **2** | Extension SDK + vm-packer + registry entries + routing |
|
||||
| **3** | SaaS extensions + payments layer |
|
||||
| **4** | Snapshot/restore + checkpoint profiles |
|
||||
| **5** | Coulomb-native runtime ("best of brands") informed by extension ops data |
|
||||
|
||||
Phase 5 is explicitly **later** — learn from routing, billing, failure modes, and
|
||||
latency before building owned microVM/control-plane.
|
||||
|
||||
---
|
||||
|
||||
## Open questions (for workplans)
|
||||
|
||||
1. Does `exec` live in sand-boxer API or only in glas-harness via SSH?
|
||||
2. Payments: integrate with existing fin-hub or standalone credits first?
|
||||
3. Profile authorship: repo-local YAML vs hub-managed catalog?
|
||||
4. wise-validator: fork e2e-framework reporter or new contract from day one?
|
||||
|
||||
These belong in SAND-WP-0002+ design workplans, not INTENT.md.
|
||||
22
research/README.md
Normal file
22
research/README.md
Normal file
@@ -0,0 +1,22 @@
|
||||
# sand-boxer research
|
||||
|
||||
Research informing the sand-boxer meta-framework charter and implementation
|
||||
roadmap. These documents are **inputs to design**, not normative specs — see
|
||||
`INTENT.md` for authority and boundaries.
|
||||
|
||||
## Index
|
||||
|
||||
| Document | Contents |
|
||||
|----------|----------|
|
||||
| [01-agent-sandbox-landscape.md](01-agent-sandbox-landscape.md) | Market survey: isolation technologies, providers, convergence trends |
|
||||
| [02-reference-frameworks.md](02-reference-frameworks.md) | Deep dives: OpenClaw, Hermes, Blitzy, OpenShell, hosted platforms |
|
||||
| [03-meta-framework-synthesis.md](03-meta-framework-synthesis.md) | Design synthesis: API shape, extensions, payments, Coulomb boundaries |
|
||||
|
||||
## How to use
|
||||
|
||||
1. Read `INTENT.md` for the governing charter.
|
||||
2. Use `03-meta-framework-synthesis.md` when designing profiles, extensions, or
|
||||
the unified API.
|
||||
3. Use `01` and `02` when evaluating a backend extension or security posture.
|
||||
|
||||
Last updated: 2026-06-22
|
||||
56
workplans/SAND-WP-0001-statehub-bootstrap.md
Normal file
56
workplans/SAND-WP-0001-statehub-bootstrap.md
Normal file
@@ -0,0 +1,56 @@
|
||||
---
|
||||
id: SAND-WP-0001
|
||||
type: workplan
|
||||
title: "Bootstrap State Hub integration"
|
||||
domain: infotech
|
||||
repo: sand-boxer
|
||||
status: ready
|
||||
owner: codex
|
||||
topic_slug: custodian
|
||||
created: "2026-06-22"
|
||||
updated: "2026-06-22"
|
||||
status: active
|
||||
---
|
||||
|
||||
# Bootstrap State Hub integration
|
||||
|
||||
Sandboxing for agentic coding facility.
|
||||
|
||||
## Review Generated Integration Files
|
||||
|
||||
```task
|
||||
id: SAND-WP-0001-T01
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
Review `INTENT.md`, `SCOPE.md`, `AGENTS.md`, and `.custodian-brief.md`.
|
||||
Replace generated placeholders with repo-specific facts where needed.
|
||||
|
||||
## Verify Local Developer Workflow
|
||||
|
||||
```task
|
||||
id: SAND-WP-0001-T02
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Identify the repo's install, test, lint, build, and run commands. Add or refine
|
||||
those commands in the agent instructions so future coding sessions can verify
|
||||
changes confidently.
|
||||
|
||||
## Seed First Real Workplan
|
||||
|
||||
```task
|
||||
id: SAND-WP-0001-T03
|
||||
status: done
|
||||
priority: medium
|
||||
```
|
||||
|
||||
Create the first implementation workplan for the repository's most important
|
||||
next change. Created `workplans/SAND-WP-0002-meta-framework-foundation.md`.
|
||||
After workplan file updates, run from `~/state-hub`:
|
||||
|
||||
```bash
|
||||
make fix-consistency REPO=sand-boxer
|
||||
```
|
||||
246
workplans/SAND-WP-0002-meta-framework-foundation.md
Normal file
246
workplans/SAND-WP-0002-meta-framework-foundation.md
Normal file
@@ -0,0 +1,246 @@
|
||||
---
|
||||
id: SAND-WP-0002
|
||||
type: workplan
|
||||
title: "Meta-framework foundation and first extension"
|
||||
domain: infotech
|
||||
repo: sand-boxer
|
||||
status: ready
|
||||
owner: codex
|
||||
topic_slug: custodian
|
||||
created: "2026-06-22"
|
||||
updated: "2026-06-22"
|
||||
---
|
||||
|
||||
# Meta-framework foundation and first extension
|
||||
|
||||
Establish sand-boxer as a meta-framework: unified API contract, profile catalog,
|
||||
extension platform, and the first self-hosted backend (`ext.compose-ssh`) migrated
|
||||
from `the-custodian/e2e-framework/`.
|
||||
|
||||
**Charter:** `INTENT.md`
|
||||
**Research:** `research/03-meta-framework-synthesis.md`
|
||||
**Predecessor:** SAND-WP-0001 (bootstrap; T02 dev workflow should complete in
|
||||
parallel or before T03 here)
|
||||
|
||||
## Design meta-framework contracts
|
||||
|
||||
```task
|
||||
id: SAND-WP-0002-T01
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Author `docs/meta-framework.md` (or `specs/meta-framework.md`) defining:
|
||||
|
||||
- Resource model: Profile, Extension, Host, Sandbox, Snapshot, Route, Meter
|
||||
- Lifecycle states and State Hub event mapping
|
||||
- Core API operations: `create`, `get`, `list`, `extend_ttl`, `recreate`,
|
||||
`destroy` (snapshot/restore deferred to SAND-WP-0003)
|
||||
- Consumer attribution schema (`adm` / `agt` / `atm`, calling project id)
|
||||
- Extension interface: `provision`, `wait_ready`, `teardown`, optional
|
||||
`estimate_cost`
|
||||
- Routing policy vocabulary (`prefer-self-hosted`, `lowest-cost`, `explicit`)
|
||||
- Security limits statement (blast-radius vs intent — per research)
|
||||
|
||||
Derive from `research/03-meta-framework-synthesis.md`; do not duplicate harness,
|
||||
validator, or codegen concerns.
|
||||
|
||||
## Define profile and extension schemas
|
||||
|
||||
```task
|
||||
id: SAND-WP-0002-T02
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Add machine-readable schemas (JSON Schema or Python pydantic models) for:
|
||||
|
||||
- `Profile` — extension binding, isolation, network, workspace mode, scope,
|
||||
TTL, resources, setup metadata, placement, reachability, cost class
|
||||
- `Extension` — id, capabilities, isolation levels, pricing model, regions
|
||||
- `SandboxCreateRequest` / `SandboxStatus` response shapes
|
||||
|
||||
Ship `profiles/profile.compose-e2e.yaml` as the reference profile (successor to
|
||||
`e2e/e2e.yml` inputs; validation semantics stay with wise-validator).
|
||||
|
||||
Register extension stub `extensions/ext.compose-ssh.yaml` with capability
|
||||
metadata.
|
||||
|
||||
## Scaffold package and developer workflow
|
||||
|
||||
```task
|
||||
id: SAND-WP-0002-T03
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Create Python package layout (aligned with e2e-framework lineage):
|
||||
|
||||
```
|
||||
src/sandboxer/ # or sandboxer/ at repo root — pick one, document in AGENTS.md
|
||||
api/
|
||||
profiles/
|
||||
extensions/
|
||||
lifecycle/
|
||||
tests/
|
||||
pyproject.toml
|
||||
```
|
||||
|
||||
Document in `AGENTS.md`: install (`uv sync` or equivalent), test, lint, format,
|
||||
and CLI entry point. Satisfies SAND-WP-0001-T02 if not already done.
|
||||
|
||||
## Implement extension registry and loader
|
||||
|
||||
```task
|
||||
id: SAND-WP-0002-T04
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Implement extension discovery and registration:
|
||||
|
||||
- Load extension definitions from `extensions/`
|
||||
- Plugin entry-point or explicit registry for `ext.compose-ssh`
|
||||
- Validate extension declares required capability fields before registration
|
||||
- Unit tests for load failures and duplicate ids
|
||||
|
||||
No SaaS extensions in this workplan — self-hosted only.
|
||||
|
||||
## Implement ext.compose-ssh (e2e-framework lineage)
|
||||
|
||||
```task
|
||||
id: SAND-WP-0002-T05
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Extract and adapt provision/teardown from `the-custodian/e2e-framework/`:
|
||||
|
||||
- SSH to configured host; isolated directory per sandbox id
|
||||
- Unique compose project name; `compose up` / `compose down` (idempotent)
|
||||
- Default-deny network posture per profile (document host-side requirements)
|
||||
- Host placement: read `placement.prefer` / `fallback` from profile
|
||||
- **Do not** port test execution, health polling, or State Hub result reporting
|
||||
— those are wise-validator responsibilities
|
||||
|
||||
Provide a compatibility note in extension README for interim `make e2e` callers.
|
||||
|
||||
## Implement API v0 and CLI
|
||||
|
||||
```task
|
||||
id: SAND-WP-0002-T06
|
||||
status: todo
|
||||
priority: high
|
||||
```
|
||||
|
||||
Ship minimal establishment surface:
|
||||
|
||||
**CLI** (primary for v0):
|
||||
|
||||
```bash
|
||||
sandbox create --profile profile.compose-e2e --input repo=/path/to/repo
|
||||
sandbox get <id>
|
||||
sandbox list
|
||||
sandbox recreate <id>
|
||||
sandbox destroy <id>
|
||||
```
|
||||
|
||||
**HTTP** (optional in v0; stub acceptable if CLI calls core library directly):
|
||||
|
||||
- `POST /v1/sandboxes`, `GET /v1/sandboxes/{id}`, `DELETE /v1/sandboxes/{id}`
|
||||
|
||||
Core library must be harness-agnostic — glas-harness, wise-validator, and
|
||||
snuggle-inventor call the same functions.
|
||||
|
||||
## State Hub lifecycle registration
|
||||
|
||||
```task
|
||||
id: SAND-WP-0002-T07
|
||||
status: todo
|
||||
priority: medium
|
||||
```
|
||||
|
||||
On sandbox state transitions, emit State Hub progress events (or dedicated
|
||||
registration endpoint when available):
|
||||
|
||||
- `requested`, `provisioning`, `ready`, `active`, `destroying`, `destroyed`
|
||||
- Include: `sandbox_id`, `profile_id`, `extension_id`, `host`, `consumer`,
|
||||
`actor_type`, timestamps
|
||||
|
||||
Extend the `build-agent` self-register pattern sketch for generic sandbox
|
||||
identities. Document contract in meta-framework spec.
|
||||
|
||||
## Document sibling integration contracts
|
||||
|
||||
```task
|
||||
id: SAND-WP-0002-T08
|
||||
status: todo
|
||||
priority: medium
|
||||
```
|
||||
|
||||
Add `docs/integrations/` with one page per planned sibling:
|
||||
|
||||
| Doc | Contents |
|
||||
|-----|----------|
|
||||
| `glas-harness.md` | Sandbox handle + reachability; harness owns exec |
|
||||
| `wise-validator.md` | `profile.compose-e2e`; validator owns e2e.yml + health + tests |
|
||||
| `snuggle-inventor.md` | Setup metadata + secret_refs; no codegen in sand-boxer |
|
||||
|
||||
Each doc: example request, response fields, ownership table, out-of-scope list.
|
||||
Cross-link from `INTENT.md` Coulomb boundaries section.
|
||||
|
||||
## Register capability and fix registry scaffold
|
||||
|
||||
```task
|
||||
id: SAND-WP-0002-T09
|
||||
status: todo
|
||||
priority: medium
|
||||
```
|
||||
|
||||
- Author `registry/capabilities/execution.sandbox-provision.md`
|
||||
- Add row to `registry/indexes/capabilities.yaml` (`domain: infotech`)
|
||||
- Run `reuse-surface validate` when CLI available
|
||||
- Notify operator: `make fix-consistency REPO=sand-boxer` from `~/state-hub`
|
||||
|
||||
## Verification and migration smoke test
|
||||
|
||||
```task
|
||||
id: SAND-WP-0002-T10
|
||||
status: todo
|
||||
priority: medium
|
||||
```
|
||||
|
||||
End-to-end proof on CoulombCore or sandboxer01 (when reachable):
|
||||
|
||||
1. `sandbox create` with `profile.compose-e2e` for a repo with `e2e/` layout
|
||||
2. Confirm `ready` state and reachability descriptor
|
||||
3. Manual or scripted compose health check (not wise-validator — just proves
|
||||
environment exists)
|
||||
4. `sandbox destroy` — confirm idempotent cleanup (no leftover compose projects
|
||||
or `/tmp` dirs)
|
||||
5. Document runbook in `docs/runbooks/profile-compose-e2e.md`
|
||||
|
||||
Record gaps for wise-validator migration (SAND-WP-0003) and `the-custodian`
|
||||
shim (SAND-WP-0004).
|
||||
|
||||
---
|
||||
|
||||
## Out of scope (follow-on workplans)
|
||||
|
||||
| Item | Target workplan |
|
||||
|------|-----------------|
|
||||
| wise-validator extraction + e2e test orchestration | SAND-WP-0003 |
|
||||
| `the-custodian` Makefile shim + deprecation timeline | SAND-WP-0004 |
|
||||
| `ext.vm-packer` (build-machines) | SAND-WP-0005 |
|
||||
| SaaS extensions + payments layer | SAND-WP-0006 |
|
||||
| Snapshot / restore / checkpoint profiles | SAND-WP-0007 |
|
||||
| Coulomb-native runtime (phase 5) | Backlog |
|
||||
|
||||
## Completion criteria
|
||||
|
||||
- Meta-framework spec and schemas merged
|
||||
- `ext.compose-ssh` provisions and tears down a compose sandbox via CLI
|
||||
- State Hub receives lifecycle events for at least one full create→destroy cycle
|
||||
- Sibling integration docs published
|
||||
- `capability.execution.sandbox-provision` registered and validated
|
||||
- All tasks `done`; workplan `status: finished`; operator runs fix-consistency
|
||||
Reference in New Issue
Block a user