Rewrite INTENT.md as the sand-boxer meta-framework charter (OpenRouter-style sandbox API, extensions, payments, Coulomb sibling boundaries). Add research under research/, update SCOPE.md, bootstrap workplans SAND-WP-0001/0002, and State Hub integration files from the bootstrap pass.
9.6 KiB
Meta-framework synthesis
Design notes distilled from landscape research for sand-boxer's unified sandbox API, extension model, payments layer, and Coulomb project boundaries.
Core thesis
sand-boxer is a meta-framework for establishing sandboxes — like OpenRouter is a meta-framework for accessing LLM models:
- One consistent API for consumers (
adm,agt,atm, domain services) - Many extensions that delegate to self-hosted or SaaS sandbox systems
- Integrated payments when consuming metered external services
- Registry-first profiles and capabilities via reuse-surface
- Later: a Coulomb-native "best of brands" runtime built from operational experience — not day one
sand-boxer provisions where and how code runs. It does not provision how agents think, what tests mean, or what code gets written.
Coulomb project boundaries
These sibling projects are planned Coulomb repos with explicit authority split. sand-boxer must not absorb their concerns.
flowchart LR
subgraph establish [sand-boxer]
SB[Establish sandbox]
end
subgraph harness [glas-harness]
GH[Agent harness: gateway tools memory channels]
end
subgraph validate [wise-validator]
WV[E2E tests health checks validation orchestration]
end
subgraph generate [snuggle-inventor]
SI[Code generation modernization]
end
GH -->|request sandbox| SB
WV -->|request sandbox| SB
SI -->|request sandbox| SB
WV -.->|runs tests in| SB
GH -.->|executes tools in| SB
SI -.->|validates output in| SB
| Project | Owns | Does not own |
|---|---|---|
| sand-boxer | Sandbox profiles, provision/teardown, extension routing, placement, lifecycle registration, payments for sandbox consumption | Agent memory, channels, tool policies, test definitions, code generation |
| glas-harness | Agent gateway, harness, skills, subagents, tool orchestration, channel bridges | Sandbox runtime, isolation enforcement, host placement |
| wise-validator | E2E test orchestration, health check semantics, validation workflows, result reporting | Sandbox provisioning, agent conversation state |
| snuggle-inventor | Code generation, tech specs, AAP-style planning, PR-oriented output | Sandbox infrastructure, test harness canon |
Integration contracts (intended)
glas-harness → sand-boxer
POST /v1/sandboxes
profile: "profile.agent-dev"
scope: session | agent | shared
workspace: { mode: mirror | remote, access: none | ro | rw }
consumer: { actor: agt, harness: glas-harness, session_id }
Harness receives: sandbox_id, reachability descriptor (SSH endpoint, tunnel ref),
lifecycle webhook or poll URL. Harness executes tools inside sandbox via
agreed exec channel — sand-boxer does not parse tool calls.
wise-validator → sand-boxer
POST /v1/sandboxes
profile: "profile.compose-e2e"
inputs: { repo_ref, compose_bundle_ref }
ttl: 2h
consumer: { actor: atm, harness: wise-validator, run_id }
wise-validator owns e2e.yml semantics, health check definitions, test commands,
and pass/fail interpretation. sand-boxer delivers an environment; wise-validator
runs the validation story on top.
snuggle-inventor → sand-boxer
POST /v1/sandboxes
profile: "profile.build"
setup_metadata: { instructions_ref, secret_refs }
consumer: { actor: agt, harness: snuggle-inventor, job_id }
snuggle-inventor may attach Blitzy-style setup instructions as profile inputs. sand-boxer resolves secrets at boundary; generated code never flows through sand-boxer APIs.
Migration from the-custodian
| Legacy | New owner |
|---|---|
e2e-framework/ provision/teardown |
sand-boxer ext.compose-ssh |
e2e-framework/ test run + report |
wise-validator (calls sand-boxer) |
| Agent tool sandbox config | glas-harness (calls sand-boxer) |
infra/build-machines/ |
sand-boxer ext.vm-packer |
Meta-framework API (conceptual)
Resources
| Resource | Description |
|---|---|
Profile |
Named, versioned sandbox recipe (image, isolation, network, TTL, extension) |
Extension |
Backend adapter (self-hosted or SaaS) |
Host |
Registered placement target for self-hosted extensions |
Sandbox |
Running instance of a profile |
Snapshot |
Point-in-time workspace checkpoint (optional) |
Route |
Extension selection policy (cost, latency, capability) |
Meter |
Usage record for payments layer |
Sandbox lifecycle states
requested → provisioning → ready → active → { expired | failed } → destroying → destroyed
All transitions emit State Hub events. ready means reachability probe succeeded.
Core operations
| Operation | Description |
|---|---|
create |
Provision from profile + inputs |
get / list |
Inspect status |
exec |
Run command in sandbox (optional — may be harness-owned) |
extend_ttl |
Explicit persistence extension |
snapshot / restore |
Checkpoint workspace |
recreate |
Destroy and reprovision from seed |
destroy |
Idempotent teardown |
Early versions may expose only create, get, destroy, recreate; harnesses
can own exec via SSH/tunnel without sand-boxer proxying every command.
Profile schema (minimum)
id: profile.compose-e2e
version: "1.0.0"
extension: ext.compose-ssh
isolation:
level: container # container | microvm | policy
network:
default: deny
egress: [] # extension interprets
workspace:
mode: remote-canonical # mirror | remote-canonical
access: rw
scope_default: session
ttl:
default: 4h
max: 24h
idle_reap: null
resources:
cpu: null
memory_mb: null
setup:
instructions: "" # Blitzy-style natural language for extension bootstrap
secret_refs: [] # resolved at provision; never in agent context
placement:
prefer: [sandboxer01]
fallback: [coulombcore]
reachability:
tunnel: ops-bridge
identity: ops-warden
metadata:
cost_class: self-hosted # self-hosted | saas-metered
latency_class: standard
Extension interface (contract)
Each extension implements:
provision(profile, inputs, placement) → sandbox_handle
wait_ready(sandbox_handle) → reachability
teardown(sandbox_handle) → cleanup_report
snapshot?(sandbox_handle) → snapshot_id
restore?(snapshot_id) → sandbox_handle
estimate_cost?(profile, duration) → meter_quote
Extensions register in registry/ with capability vectors (isolation level,
regions, GPU, persistence, pricing model).
Bundled extensions (roadmap):
| Priority | Extension | Type |
|---|---|---|
| P0 | ext.compose-ssh |
Self-hosted (e2e-framework lineage) |
| P1 | ext.vm-packer |
Self-hosted (build-machines lineage) |
| P2 | ext.daytona-self |
Self-hosted OSS |
| P3 | ext.e2b, ext.modal, ext.daytona |
SaaS + payments |
| P4 | ext.openshell |
Policy runtime wrapper |
Payments layer
For SaaS extensions, sand-boxer provides an integrated payments and metering layer analogous to OpenRouter credits:
| Concern | sand-boxer approach |
|---|---|
| Account credits | Org/workspace balance for sandbox consumption |
| Metering | Per-second, per-creation, GPU surcharge — per extension quote |
| Provider keys | BYOK optional; platform keys for convenience |
| Cost visibility | estimate_cost before create; actuals on destroy |
| Billing events | Export to fin-hub / external billing (consumer, not owner) |
Self-hosted extensions bill infra cost only (host allocation) — no SaaS meter.
Payments is a facility inside sand-boxer, not a general payment processor. Domain billing authority remains elsewhere.
Routing policy (OpenRouter-style)
When multiple extensions satisfy a profile capability:
route:
strategy: prefer-self-hosted | lowest-cost | lowest-latency | explicit
fallback: [ext.compose-ssh, ext.daytona]
constraints:
max_cost_per_hour: null
require_isolation: microvm
region: eu
Default Coulomb posture: prefer-self-hosted on sandboxer01; SaaS for burst or capability gaps (GPU, desktop) once extensions exist.
Security posture (documented limits)
sand-boxer commits to:
- Default-deny network unless profile explicitly allows egress
- Secrets resolved at provision boundary via ops-warden / secret refs
- Blast-radius isolation on dedicated hosts away from Railiance01 production
- Observable lifecycle and attributable actors (
adm/agt/atm) - Honest documentation: allowed tool paths can be abused by compromised agents
sand-boxer does not commit to intent-aware egress filtering in v1.
Phased maturity
| Phase | Deliverable |
|---|---|
| 0 | Charter, research, profile schema, ext.compose-ssh design |
| 1 | Unified API + self-hosted compose-ssh + State Hub registration |
| 2 | Extension SDK + vm-packer + registry entries + routing |
| 3 | SaaS extensions + payments layer |
| 4 | Snapshot/restore + checkpoint profiles |
| 5 | Coulomb-native runtime ("best of brands") informed by extension ops data |
Phase 5 is explicitly later — learn from routing, billing, failure modes, and latency before building owned microVM/control-plane.
Open questions (for workplans)
- Does
execlive in sand-boxer API or only in glas-harness via SSH? - Payments: integrate with existing fin-hub or standalone credits first?
- Profile authorship: repo-local YAML vs hub-managed catalog?
- wise-validator: fork e2e-framework reporter or new contract from day one?
These belong in SAND-WP-0002+ design workplans, not INTENT.md.