Files

tegwick d6d3155792 Implement SAND-WP-0002 meta-framework foundation (T01–T09)

Add meta-framework spec, pydantic schemas, profile/extension YAML, extension
registry, ext.compose-ssh backend, SandboxManager with State Hub events, CLI
commands, integration docs, capability registry entry, and compose-e2e runbook.
Nine unit tests pass. T10 remote smoke test remains for operator.

2026-06-22 23:27:31 +02:00

14 KiB

Raw Blame History

domain, repo, updated

domain	repo	updated
infotech	sand-boxer	2026-06-22

INTENT

sand-boxer is the Coulomb meta-framework for establishing sandboxes — a unified API and extension platform for provisioning every variation of isolated execution environment, from self-hosted compose stacks to metered SaaS runtimes. This file is the charter: why it exists, what it owns, and where sibling projects begin.

Research backing this charter lives in research/.

Why it exists

Custodian automation is moving from workstation-anchored execution to Railiance01-scheduled orchestration. That shift improves reliability but does not, by itself, answer the harder question: where can agentic and deterministic work run safely without the laptop filesystem, sleep cycles, and single-user blast radius?

The industry has exploded with sandbox answers — E2B, Modal, Daytona, OpenShell, OpenClaw-style Docker/SSH backends, hyperscaler interpreters — each with different APIs, billing models, and isolation postures. Coulomb needs one place to establish sandboxes regardless of backend, not a new integration per agent harness, validator, or codegen pipeline.

sand-boxer exists to be that place: OpenRouter for sandboxes, not for models.

Consumers call one API. Extensions delegate to the sandbox system that fits — self-hosted on sandboxer01, inherited compose-ssh from the-custodian, or a metered cloud provider. An integrated payments layer handles SaaS consumption when Coulomb uses external capacity. Over time, operational learning may justify a Coulomb-native best-of-brands runtime — but that is a later phase built on evidence, not day-one ambition.

The workstation becomes optional for runtime. Railiance01 decides when work runs (via activity-core). sand-boxer decides where isolated execution happens. State Hub records what changed.

The governing principle

sand-boxer is the sandbox establishment service — profiles, provisioning, extension routing, placement, lifecycle, and metering. Nothing more.

It answers:

Which sandbox recipe applies? Profile selection and version resolution.
Which backend fulfills it? Extension routing (self-hosted vs SaaS).
Where does it run? Host placement and blast-radius policy.
How is isolation enforced? Network default-deny, TTL, resource limits, teardown guarantees — as declared by profile + extension.
How does it become reachable? Consumer integration with ops-bridge and ops-warden — without owning tunnels or certificates.
What happened? Lifecycle events, usage meters, State Hub registration.
What did it cost? Payments and credits for metered extensions.

It must not become the agent harness, the e2e validator, the code generator, the scheduler, the work-state database, the connectivity authority, or production hosting on Railiance01.

The OpenRouter analogy

OpenRouter	sand-boxer
Unified LLM access API	Unified sandbox establishment API
Routes across model providers	Routes across sandbox extensions
Provider metadata (price, context)	Profile metadata (isolation, cost, latency)
API keys, credits, usage billing	Payments layer for SaaS sandbox consumption
BYOK supported	BYOK for extension provider keys
Does not train models	Does not replace extension runtimes (until phase 5)

sand-boxer is infrastructure routing, not product UX. Harnesses, validators, and inventors are customers.

Coulomb sibling boundaries

sand-boxer stays inside the sandboxing boundary. Three sibling Coulomb projects own adjacent concerns. Integration is contractual — they request sandboxes; sand-boxer establishes them.

Per-sibling integration contracts: docs/integrations/ (glas-harness, wise-validator, snuggle-inventor).

glas-harness — agent harness

Owns: Gateway, tool orchestration, skills, memory, channels, subagent delegation, session semantics, sandbox consumption from the agent's perspective.

Does not own: Sandbox runtimes, profile catalog authority, host placement, extension adapters, isolation enforcement.

glas-harness configures when tools run in a sandbox (OpenClaw-style mode / scope / workspaceAccess). sand-boxer provides the sandbox handle and reachability descriptor.

wise-validator — e2e test and health

Owns: Validation workflows, health check semantics, test orchestration, pass/fail interpretation, structured result reporting to State Hub and CI.

Does not own: Remote host provisioning, compose lifecycle, port isolation, sandbox teardown.

wise-validator replaces the validation half of the-custodian/e2e-framework/. It requests profile.compose-e2e (or successors), runs tests inside the established environment, and owns the e2e.yml contract.

snuggle-inventor — code generation

Owns: Code generation, modernization pipelines, tech-spec and planning artifacts, PR-oriented output, human-in-the-loop review gates.

Does not own: Sandbox infrastructure, environment bootstrapping authority, secret stores, runtime metering.

snuggle-inventor may attach Blitzy-style setup instructions and secret references as profile inputs. sand-boxer resolves secrets at the provision boundary; generated code never transits sand-boxer APIs.

Boundary diagram

  glas-harness          wise-validator         snuggle-inventor
  (agent harness)       (e2e + health)         (code generation)
        │                     │                      │
        └─────────────────────┼──────────────────────┘
                              │  POST /v1/sandboxes
                              ▼
                        sand-boxer
                   (establish sandboxes)
                              │
              ┌───────────────┼───────────────┐
              ▼               ▼               ▼
        ext.compose-ssh   ext.modal      ext.e2b …
        (self-hosted)     (SaaS+meter)   (SaaS+meter)

Existing Custodian repos (unchanged)

Concern	Owner
Workstream, task, progress state	`state-hub`
Cron and orchestration	`activity-core`
SSH reverse tunnels	`ops-bridge`
SSH certificate issuance	`ops-warden`
Canon and agent instruction canon	`the-custodian`
Capability federation hub	`reuse-surface`
Production on Railiance01	`railiance-apps` / domain repos
ADR-001 reconciliation	`state-hub`

sand-boxer consumes ops-bridge and ops-warden; it does not subsume them.

What it is

sand-boxer is a meta-framework with four pillars:

1. Unified establishment API

One consistent surface for all sandbox variations:

Create, inspect, extend, snapshot, recreate, destroy
Profile-driven inputs (repo ref, compose bundle, setup metadata, secret refs)
Consumer attribution (adm / agt / atm + calling project id)
Lifecycle states: requested → provisioning → ready → active → expired → destroyed

Early versions may expose a subset; the API shape is designed for completeness.

2. Profile catalog

Named, versioned recipes — not one-off containers:

Extension binding (ext.compose-ssh, ext.vm-packer, ext.e2b, …)
Isolation level, network policy, workspace mode (mirror | remote-canonical)
Scope default (agent | session | shared)
TTL, resource limits, placement preference
Setup metadata (natural-language bootstrap instructions for extensions)
Registered in registry/ and federated via reuse-surface

Profiles collect good ideas from OpenClaw (backend/scope/workspace), Hermes (labeled reuse, resource limits), Blitzy (setup instructions, secret boundary), and hosted platforms (checkpoint, persistence classes) into one schema.

3. Extension platform

Extensions delegate to sandbox systems and services:

Class	Examples	Billing
Self-hosted	compose-ssh, vm-packer, Daytona OSS, OpenShell	Infra allocation
SaaS consumption	E2B, Modal, Daytona cloud, future providers	Payments layer

Each extension implements a provision / ready / teardown contract (optional snapshot / cost estimate). Extensions ship as plugins; third-party and Coulomb- native backends use the same interface.

4. Payments and metering

For metered SaaS extensions:

Org/workspace credits and usage accounting
Pre-create cost estimates; post-destroy actuals
BYOK for provider API keys where supported
Export to domain billing systems — sand-boxer meters sandbox consumption, not general payments

Self-hosted extensions record allocation (host, duration), not external spend.

What it is not

Concern	Owner	sand-boxer role
Agent gateway, tools, memory, channels	glas-harness	Customer API
E2e tests, health checks, validation	wise-validator	Customer API
Code generation, tech specs, AAP	snuggle-inventor	Customer API
When work runs	`activity-core`	None
What tasks exist	`state-hub`	Registers lifecycle only
Tunnels	`ops-bridge`	Consumer
Certs	`ops-warden`	Consumer
Intent-aware egress / prompt security	Research frontier	Document limits only

sand-boxer provides blast-radius isolation and governed reachability. It does not protect against a compromised agent abusing allowed egress paths (git, npm, curl to allowlisted hosts). Security runbooks must state this explicitly.

Strategic context

Workstation automation is interim

Local timers and laptop scripts bootstrapped ADR-001 sync. Railiance01 activity-core schedules are the direction. Workstation paths remain only where no sandbox alternative exists yet.

Host topology

Layer	Role
Railiance01	Production k3s, activity-core, Temporal — not agent dev runtime
sandboxer01	Dedicated sandbox host — preferred blast-radius isolation
CoulombCore	Interim sandbox host during migration
Workstation (WSL)	Control-plane anchor today — not target execution surface
SaaS extensions	Burst / capability gap (GPU, desktop) via payments layer

Lineage

sand-boxer generalizes patterns split across the-custodian:

Legacy	sand-boxer	Sibling
`e2e-framework/` provision/teardown	`ext.compose-ssh`	wise-validator owns test run
`e2e-framework/` health + test + report	—	wise-validator
`infra/build-machines/`	`ext.vm-packer`	—
Agent sandbox config (future)	API consumer	glas-harness

the-custodian stays governance-focused; sand-boxer becomes the execution venue catalog.

Phase 5: Coulomb-native runtime (later)

After operating extensions in production — observing latency, cost, failure modes, isolation gaps — sand-boxer may ship an owned best-of-brands sandboxing solution combining:

Persistent labeled workspaces (Hermes pattern)
Default-deny policy layer (OpenShell lessons)
Fast resume / checkpoint (industry baseline)
Self-hosted economics (Daytona/OpenSandbox lessons)

This is not v1 scope. Extensions and payments come first; native runtime follows evidence.

Intended users

Human operators (adm) — profiles, hosts, extensions, credits, lifecycle
LLM agents (agt) — via glas-harness, snuggle-inventor, or direct API
Deterministic automations (atm) — via wise-validator, activity-core, CI
Extension authors — implement backend adapters against the extension contract
Platform integrators — register capabilities, federate via reuse-surface

Design principles

Meta-framework, not monolith — one API; many extensions; optional native runtime later
Profiles over one-offs — every sandbox type is named, versioned, registered
Prefer self-hosted — SaaS via explicit routing policy, not silent default
Blast-radius isolation — dedicated hosts; never jeopardize Railiance01 production
Reachability, not ownership — ops-bridge + ops-warden as consumers
Secrets at the boundary — resolve at provision; never in agent-visible workspace
Observable lifecycle — every state transition attributable and queryable
Disposable by default — TTL-bound; persistence and checkpoint are explicit
Honest security — sandboxing limits blast radius; it is not intent enforcement
Registry-first reuse — capabilities in registry/ before ad hoc duplication
Payments transparency — estimate before create; meter on destroy for SaaS

Near-term outcomes

Charter and research — INTENT.md, research/, profile schema draft
First self-hosted extension — ext.compose-ssh from e2e-framework lineage
Unified API v0 — create / get / destroy / recreate + State Hub registration
First profile — profile.compose-e2e for wise-validator migration
Registry entry — capability.execution.sandbox-provision via reuse-surface
Extension SDK sketch — contract for P1 backends (vm-packer, Daytona OSS)
Sibling integration notes — glas-harness, wise-validator, snuggle-inventor API expectations documented

Maturity target

A mature sand-boxer is Coulomb's default way to establish any sandbox:

glas-harness requests agent dev sandboxes without choosing Docker vs Modal vs SSH
wise-validator requests validation environments without owning provisioners
snuggle-inventor requests build sandboxes with setup metadata and secret refs
activity-core and CI request bounded venues with consistent lifecycle visibility
Operators route spend across self-hosted and SaaS with one credits model
A Coulomb-native runtime — if warranted — wins on ops data, not speculation

The workstation is optional. The harness is not sand-boxer. The validator is not sand-boxer. The inventor is not sand-boxer. Establishing the box is.

14 KiB Raw Blame History