Add meta-framework spec, pydantic schemas, profile/extension YAML, extension registry, ext.compose-ssh backend, SandboxManager with State Hub events, CLI commands, integration docs, capability registry entry, and compose-e2e runbook. Nine unit tests pass. T10 remote smoke test remains for operator.
14 KiB
domain, repo, updated
| domain | repo | updated |
|---|---|---|
| infotech | sand-boxer | 2026-06-22 |
INTENT
sand-boxer is the Coulomb meta-framework for establishing sandboxes — a unified API and extension platform for provisioning every variation of isolated execution environment, from self-hosted compose stacks to metered SaaS runtimes. This file is the charter: why it exists, what it owns, and where sibling projects begin.
Research backing this charter lives in research/.
Why it exists
Custodian automation is moving from workstation-anchored execution to Railiance01-scheduled orchestration. That shift improves reliability but does not, by itself, answer the harder question: where can agentic and deterministic work run safely without the laptop filesystem, sleep cycles, and single-user blast radius?
The industry has exploded with sandbox answers — E2B, Modal, Daytona, OpenShell, OpenClaw-style Docker/SSH backends, hyperscaler interpreters — each with different APIs, billing models, and isolation postures. Coulomb needs one place to establish sandboxes regardless of backend, not a new integration per agent harness, validator, or codegen pipeline.
sand-boxer exists to be that place: OpenRouter for sandboxes, not for models.
Consumers call one API. Extensions delegate to the sandbox system that fits —
self-hosted on sandboxer01, inherited compose-ssh from the-custodian, or a
metered cloud provider. An integrated payments layer handles SaaS consumption
when Coulomb uses external capacity. Over time, operational learning may justify
a Coulomb-native best-of-brands runtime — but that is a later phase built on
evidence, not day-one ambition.
The workstation becomes optional for runtime. Railiance01 decides when work runs (via activity-core). sand-boxer decides where isolated execution happens. State Hub records what changed.
The governing principle
sand-boxer is the sandbox establishment service — profiles, provisioning, extension routing, placement, lifecycle, and metering. Nothing more.
It answers:
- Which sandbox recipe applies? Profile selection and version resolution.
- Which backend fulfills it? Extension routing (self-hosted vs SaaS).
- Where does it run? Host placement and blast-radius policy.
- How is isolation enforced? Network default-deny, TTL, resource limits, teardown guarantees — as declared by profile + extension.
- How does it become reachable? Consumer integration with ops-bridge and ops-warden — without owning tunnels or certificates.
- What happened? Lifecycle events, usage meters, State Hub registration.
- What did it cost? Payments and credits for metered extensions.
It must not become the agent harness, the e2e validator, the code generator, the scheduler, the work-state database, the connectivity authority, or production hosting on Railiance01.
The OpenRouter analogy
| OpenRouter | sand-boxer |
|---|---|
| Unified LLM access API | Unified sandbox establishment API |
| Routes across model providers | Routes across sandbox extensions |
| Provider metadata (price, context) | Profile metadata (isolation, cost, latency) |
| API keys, credits, usage billing | Payments layer for SaaS sandbox consumption |
| BYOK supported | BYOK for extension provider keys |
| Does not train models | Does not replace extension runtimes (until phase 5) |
sand-boxer is infrastructure routing, not product UX. Harnesses, validators, and inventors are customers.
Coulomb sibling boundaries
sand-boxer stays inside the sandboxing boundary. Three sibling Coulomb projects own adjacent concerns. Integration is contractual — they request sandboxes; sand-boxer establishes them.
Per-sibling integration contracts: docs/integrations/ (glas-harness,
wise-validator, snuggle-inventor).
glas-harness — agent harness
Owns: Gateway, tool orchestration, skills, memory, channels, subagent delegation, session semantics, sandbox consumption from the agent's perspective.
Does not own: Sandbox runtimes, profile catalog authority, host placement, extension adapters, isolation enforcement.
glas-harness configures when tools run in a sandbox (OpenClaw-style
mode / scope / workspaceAccess). sand-boxer provides the sandbox handle
and reachability descriptor.
wise-validator — e2e test and health
Owns: Validation workflows, health check semantics, test orchestration, pass/fail interpretation, structured result reporting to State Hub and CI.
Does not own: Remote host provisioning, compose lifecycle, port isolation, sandbox teardown.
wise-validator replaces the validation half of the-custodian/e2e-framework/.
It requests profile.compose-e2e (or successors), runs tests inside the
established environment, and owns the e2e.yml contract.
snuggle-inventor — code generation
Owns: Code generation, modernization pipelines, tech-spec and planning artifacts, PR-oriented output, human-in-the-loop review gates.
Does not own: Sandbox infrastructure, environment bootstrapping authority, secret stores, runtime metering.
snuggle-inventor may attach Blitzy-style setup instructions and secret references as profile inputs. sand-boxer resolves secrets at the provision boundary; generated code never transits sand-boxer APIs.
Boundary diagram
glas-harness wise-validator snuggle-inventor
(agent harness) (e2e + health) (code generation)
│ │ │
└─────────────────────┼──────────────────────┘
│ POST /v1/sandboxes
▼
sand-boxer
(establish sandboxes)
│
┌───────────────┼───────────────┐
▼ ▼ ▼
ext.compose-ssh ext.modal ext.e2b …
(self-hosted) (SaaS+meter) (SaaS+meter)
Existing Custodian repos (unchanged)
| Concern | Owner |
|---|---|
| Workstream, task, progress state | state-hub |
| Cron and orchestration | activity-core |
| SSH reverse tunnels | ops-bridge |
| SSH certificate issuance | ops-warden |
| Canon and agent instruction canon | the-custodian |
| Capability federation hub | reuse-surface |
| Production on Railiance01 | railiance-apps / domain repos |
| ADR-001 reconciliation | state-hub |
sand-boxer consumes ops-bridge and ops-warden; it does not subsume them.
What it is
sand-boxer is a meta-framework with four pillars:
1. Unified establishment API
One consistent surface for all sandbox variations:
- Create, inspect, extend, snapshot, recreate, destroy
- Profile-driven inputs (repo ref, compose bundle, setup metadata, secret refs)
- Consumer attribution (
adm/agt/atm+ calling project id) - Lifecycle states:
requested → provisioning → ready → active → expired → destroyed
Early versions may expose a subset; the API shape is designed for completeness.
2. Profile catalog
Named, versioned recipes — not one-off containers:
- Extension binding (
ext.compose-ssh,ext.vm-packer,ext.e2b, …) - Isolation level, network policy, workspace mode (
mirror|remote-canonical) - Scope default (
agent|session|shared) - TTL, resource limits, placement preference
- Setup metadata (natural-language bootstrap instructions for extensions)
- Registered in
registry/and federated via reuse-surface
Profiles collect good ideas from OpenClaw (backend/scope/workspace), Hermes (labeled reuse, resource limits), Blitzy (setup instructions, secret boundary), and hosted platforms (checkpoint, persistence classes) into one schema.
3. Extension platform
Extensions delegate to sandbox systems and services:
| Class | Examples | Billing |
|---|---|---|
| Self-hosted | compose-ssh, vm-packer, Daytona OSS, OpenShell | Infra allocation |
| SaaS consumption | E2B, Modal, Daytona cloud, future providers | Payments layer |
Each extension implements a provision / ready / teardown contract (optional snapshot / cost estimate). Extensions ship as plugins; third-party and Coulomb- native backends use the same interface.
4. Payments and metering
For metered SaaS extensions:
- Org/workspace credits and usage accounting
- Pre-create cost estimates; post-destroy actuals
- BYOK for provider API keys where supported
- Export to domain billing systems — sand-boxer meters sandbox consumption, not general payments
Self-hosted extensions record allocation (host, duration), not external spend.
What it is not
| Concern | Owner | sand-boxer role |
|---|---|---|
| Agent gateway, tools, memory, channels | glas-harness | Customer API |
| E2e tests, health checks, validation | wise-validator | Customer API |
| Code generation, tech specs, AAP | snuggle-inventor | Customer API |
| When work runs | activity-core |
None |
| What tasks exist | state-hub |
Registers lifecycle only |
| Tunnels | ops-bridge |
Consumer |
| Certs | ops-warden |
Consumer |
| Intent-aware egress / prompt security | Research frontier | Document limits only |
sand-boxer provides blast-radius isolation and governed reachability. It does not protect against a compromised agent abusing allowed egress paths (git, npm, curl to allowlisted hosts). Security runbooks must state this explicitly.
Strategic context
Workstation automation is interim
Local timers and laptop scripts bootstrapped ADR-001 sync. Railiance01 activity-core schedules are the direction. Workstation paths remain only where no sandbox alternative exists yet.
Host topology
| Layer | Role |
|---|---|
| Railiance01 | Production k3s, activity-core, Temporal — not agent dev runtime |
| sandboxer01 | Dedicated sandbox host — preferred blast-radius isolation |
| CoulombCore | Interim sandbox host during migration |
| Workstation (WSL) | Control-plane anchor today — not target execution surface |
| SaaS extensions | Burst / capability gap (GPU, desktop) via payments layer |
Lineage
sand-boxer generalizes patterns split across the-custodian:
| Legacy | sand-boxer | Sibling |
|---|---|---|
e2e-framework/ provision/teardown |
ext.compose-ssh |
wise-validator owns test run |
e2e-framework/ health + test + report |
— | wise-validator |
infra/build-machines/ |
ext.vm-packer |
— |
| Agent sandbox config (future) | API consumer | glas-harness |
the-custodian stays governance-focused; sand-boxer becomes the execution
venue catalog.
Phase 5: Coulomb-native runtime (later)
After operating extensions in production — observing latency, cost, failure modes, isolation gaps — sand-boxer may ship an owned best-of-brands sandboxing solution combining:
- Persistent labeled workspaces (Hermes pattern)
- Default-deny policy layer (OpenShell lessons)
- Fast resume / checkpoint (industry baseline)
- Self-hosted economics (Daytona/OpenSandbox lessons)
This is not v1 scope. Extensions and payments come first; native runtime follows evidence.
Intended users
- Human operators (
adm) — profiles, hosts, extensions, credits, lifecycle - LLM agents (
agt) — via glas-harness, snuggle-inventor, or direct API - Deterministic automations (
atm) — via wise-validator, activity-core, CI - Extension authors — implement backend adapters against the extension contract
- Platform integrators — register capabilities, federate via reuse-surface
Design principles
- Meta-framework, not monolith — one API; many extensions; optional native runtime later
- Profiles over one-offs — every sandbox type is named, versioned, registered
- Prefer self-hosted — SaaS via explicit routing policy, not silent default
- Blast-radius isolation — dedicated hosts; never jeopardize Railiance01 production
- Reachability, not ownership — ops-bridge + ops-warden as consumers
- Secrets at the boundary — resolve at provision; never in agent-visible workspace
- Observable lifecycle — every state transition attributable and queryable
- Disposable by default — TTL-bound; persistence and checkpoint are explicit
- Honest security — sandboxing limits blast radius; it is not intent enforcement
- Registry-first reuse — capabilities in
registry/before ad hoc duplication - Payments transparency — estimate before create; meter on destroy for SaaS
Near-term outcomes
- Charter and research —
INTENT.md,research/, profile schema draft - First self-hosted extension —
ext.compose-sshfrom e2e-framework lineage - Unified API v0 — create / get / destroy / recreate + State Hub registration
- First profile —
profile.compose-e2efor wise-validator migration - Registry entry —
capability.execution.sandbox-provisionvia reuse-surface - Extension SDK sketch — contract for P1 backends (vm-packer, Daytona OSS)
- Sibling integration notes — glas-harness, wise-validator, snuggle-inventor API expectations documented
Maturity target
A mature sand-boxer is Coulomb's default way to establish any sandbox:
- glas-harness requests agent dev sandboxes without choosing Docker vs Modal vs SSH
- wise-validator requests validation environments without owning provisioners
- snuggle-inventor requests build sandboxes with setup metadata and secret refs
- activity-core and CI request bounded venues with consistent lifecycle visibility
- Operators route spend across self-hosted and SaaS with one credits model
- A Coulomb-native runtime — if warranted — wins on ops data, not speculation
The workstation is optional. The harness is not sand-boxer. The validator is not sand-boxer. The inventor is not sand-boxer. Establishing the box is.