Files
sand-boxer/INTENT.md
tegwick d6d3155792 Implement SAND-WP-0002 meta-framework foundation (T01–T09)
Add meta-framework spec, pydantic schemas, profile/extension YAML, extension
registry, ext.compose-ssh backend, SandboxManager with State Hub events, CLI
commands, integration docs, capability registry entry, and compose-e2e runbook.
Nine unit tests pass. T10 remote smoke test remains for operator.
2026-06-22 23:27:31 +02:00

14 KiB

domain, repo, updated
domain repo updated
infotech sand-boxer 2026-06-22

INTENT

sand-boxer is the Coulomb meta-framework for establishing sandboxes — a unified API and extension platform for provisioning every variation of isolated execution environment, from self-hosted compose stacks to metered SaaS runtimes. This file is the charter: why it exists, what it owns, and where sibling projects begin.

Research backing this charter lives in research/.


Why it exists

Custodian automation is moving from workstation-anchored execution to Railiance01-scheduled orchestration. That shift improves reliability but does not, by itself, answer the harder question: where can agentic and deterministic work run safely without the laptop filesystem, sleep cycles, and single-user blast radius?

The industry has exploded with sandbox answers — E2B, Modal, Daytona, OpenShell, OpenClaw-style Docker/SSH backends, hyperscaler interpreters — each with different APIs, billing models, and isolation postures. Coulomb needs one place to establish sandboxes regardless of backend, not a new integration per agent harness, validator, or codegen pipeline.

sand-boxer exists to be that place: OpenRouter for sandboxes, not for models.

Consumers call one API. Extensions delegate to the sandbox system that fits — self-hosted on sandboxer01, inherited compose-ssh from the-custodian, or a metered cloud provider. An integrated payments layer handles SaaS consumption when Coulomb uses external capacity. Over time, operational learning may justify a Coulomb-native best-of-brands runtime — but that is a later phase built on evidence, not day-one ambition.

The workstation becomes optional for runtime. Railiance01 decides when work runs (via activity-core). sand-boxer decides where isolated execution happens. State Hub records what changed.


The governing principle

sand-boxer is the sandbox establishment service — profiles, provisioning, extension routing, placement, lifecycle, and metering. Nothing more.

It answers:

  1. Which sandbox recipe applies? Profile selection and version resolution.
  2. Which backend fulfills it? Extension routing (self-hosted vs SaaS).
  3. Where does it run? Host placement and blast-radius policy.
  4. How is isolation enforced? Network default-deny, TTL, resource limits, teardown guarantees — as declared by profile + extension.
  5. How does it become reachable? Consumer integration with ops-bridge and ops-warden — without owning tunnels or certificates.
  6. What happened? Lifecycle events, usage meters, State Hub registration.
  7. What did it cost? Payments and credits for metered extensions.

It must not become the agent harness, the e2e validator, the code generator, the scheduler, the work-state database, the connectivity authority, or production hosting on Railiance01.


The OpenRouter analogy

OpenRouter sand-boxer
Unified LLM access API Unified sandbox establishment API
Routes across model providers Routes across sandbox extensions
Provider metadata (price, context) Profile metadata (isolation, cost, latency)
API keys, credits, usage billing Payments layer for SaaS sandbox consumption
BYOK supported BYOK for extension provider keys
Does not train models Does not replace extension runtimes (until phase 5)

sand-boxer is infrastructure routing, not product UX. Harnesses, validators, and inventors are customers.


Coulomb sibling boundaries

sand-boxer stays inside the sandboxing boundary. Three sibling Coulomb projects own adjacent concerns. Integration is contractual — they request sandboxes; sand-boxer establishes them.

Per-sibling integration contracts: docs/integrations/ (glas-harness, wise-validator, snuggle-inventor).

glas-harness — agent harness

Owns: Gateway, tool orchestration, skills, memory, channels, subagent delegation, session semantics, sandbox consumption from the agent's perspective.

Does not own: Sandbox runtimes, profile catalog authority, host placement, extension adapters, isolation enforcement.

glas-harness configures when tools run in a sandbox (OpenClaw-style mode / scope / workspaceAccess). sand-boxer provides the sandbox handle and reachability descriptor.

wise-validator — e2e test and health

Owns: Validation workflows, health check semantics, test orchestration, pass/fail interpretation, structured result reporting to State Hub and CI.

Does not own: Remote host provisioning, compose lifecycle, port isolation, sandbox teardown.

wise-validator replaces the validation half of the-custodian/e2e-framework/. It requests profile.compose-e2e (or successors), runs tests inside the established environment, and owns the e2e.yml contract.

snuggle-inventor — code generation

Owns: Code generation, modernization pipelines, tech-spec and planning artifacts, PR-oriented output, human-in-the-loop review gates.

Does not own: Sandbox infrastructure, environment bootstrapping authority, secret stores, runtime metering.

snuggle-inventor may attach Blitzy-style setup instructions and secret references as profile inputs. sand-boxer resolves secrets at the provision boundary; generated code never transits sand-boxer APIs.

Boundary diagram

  glas-harness          wise-validator         snuggle-inventor
  (agent harness)       (e2e + health)         (code generation)
        │                     │                      │
        └─────────────────────┼──────────────────────┘
                              │  POST /v1/sandboxes
                              ▼
                        sand-boxer
                   (establish sandboxes)
                              │
              ┌───────────────┼───────────────┐
              ▼               ▼               ▼
        ext.compose-ssh   ext.modal      ext.e2b …
        (self-hosted)     (SaaS+meter)   (SaaS+meter)

Existing Custodian repos (unchanged)

Concern Owner
Workstream, task, progress state state-hub
Cron and orchestration activity-core
SSH reverse tunnels ops-bridge
SSH certificate issuance ops-warden
Canon and agent instruction canon the-custodian
Capability federation hub reuse-surface
Production on Railiance01 railiance-apps / domain repos
ADR-001 reconciliation state-hub

sand-boxer consumes ops-bridge and ops-warden; it does not subsume them.


What it is

sand-boxer is a meta-framework with four pillars:

1. Unified establishment API

One consistent surface for all sandbox variations:

  • Create, inspect, extend, snapshot, recreate, destroy
  • Profile-driven inputs (repo ref, compose bundle, setup metadata, secret refs)
  • Consumer attribution (adm / agt / atm + calling project id)
  • Lifecycle states: requested → provisioning → ready → active → expired → destroyed

Early versions may expose a subset; the API shape is designed for completeness.

2. Profile catalog

Named, versioned recipes — not one-off containers:

  • Extension binding (ext.compose-ssh, ext.vm-packer, ext.e2b, …)
  • Isolation level, network policy, workspace mode (mirror | remote-canonical)
  • Scope default (agent | session | shared)
  • TTL, resource limits, placement preference
  • Setup metadata (natural-language bootstrap instructions for extensions)
  • Registered in registry/ and federated via reuse-surface

Profiles collect good ideas from OpenClaw (backend/scope/workspace), Hermes (labeled reuse, resource limits), Blitzy (setup instructions, secret boundary), and hosted platforms (checkpoint, persistence classes) into one schema.

3. Extension platform

Extensions delegate to sandbox systems and services:

Class Examples Billing
Self-hosted compose-ssh, vm-packer, Daytona OSS, OpenShell Infra allocation
SaaS consumption E2B, Modal, Daytona cloud, future providers Payments layer

Each extension implements a provision / ready / teardown contract (optional snapshot / cost estimate). Extensions ship as plugins; third-party and Coulomb- native backends use the same interface.

4. Payments and metering

For metered SaaS extensions:

  • Org/workspace credits and usage accounting
  • Pre-create cost estimates; post-destroy actuals
  • BYOK for provider API keys where supported
  • Export to domain billing systems — sand-boxer meters sandbox consumption, not general payments

Self-hosted extensions record allocation (host, duration), not external spend.


What it is not

Concern Owner sand-boxer role
Agent gateway, tools, memory, channels glas-harness Customer API
E2e tests, health checks, validation wise-validator Customer API
Code generation, tech specs, AAP snuggle-inventor Customer API
When work runs activity-core None
What tasks exist state-hub Registers lifecycle only
Tunnels ops-bridge Consumer
Certs ops-warden Consumer
Intent-aware egress / prompt security Research frontier Document limits only

sand-boxer provides blast-radius isolation and governed reachability. It does not protect against a compromised agent abusing allowed egress paths (git, npm, curl to allowlisted hosts). Security runbooks must state this explicitly.


Strategic context

Workstation automation is interim

Local timers and laptop scripts bootstrapped ADR-001 sync. Railiance01 activity-core schedules are the direction. Workstation paths remain only where no sandbox alternative exists yet.

Host topology

Layer Role
Railiance01 Production k3s, activity-core, Temporal — not agent dev runtime
sandboxer01 Dedicated sandbox host — preferred blast-radius isolation
CoulombCore Interim sandbox host during migration
Workstation (WSL) Control-plane anchor today — not target execution surface
SaaS extensions Burst / capability gap (GPU, desktop) via payments layer

Lineage

sand-boxer generalizes patterns split across the-custodian:

Legacy sand-boxer Sibling
e2e-framework/ provision/teardown ext.compose-ssh wise-validator owns test run
e2e-framework/ health + test + report wise-validator
infra/build-machines/ ext.vm-packer
Agent sandbox config (future) API consumer glas-harness

the-custodian stays governance-focused; sand-boxer becomes the execution venue catalog.

Phase 5: Coulomb-native runtime (later)

After operating extensions in production — observing latency, cost, failure modes, isolation gaps — sand-boxer may ship an owned best-of-brands sandboxing solution combining:

  • Persistent labeled workspaces (Hermes pattern)
  • Default-deny policy layer (OpenShell lessons)
  • Fast resume / checkpoint (industry baseline)
  • Self-hosted economics (Daytona/OpenSandbox lessons)

This is not v1 scope. Extensions and payments come first; native runtime follows evidence.


Intended users

  • Human operators (adm) — profiles, hosts, extensions, credits, lifecycle
  • LLM agents (agt) — via glas-harness, snuggle-inventor, or direct API
  • Deterministic automations (atm) — via wise-validator, activity-core, CI
  • Extension authors — implement backend adapters against the extension contract
  • Platform integrators — register capabilities, federate via reuse-surface

Design principles

  • Meta-framework, not monolith — one API; many extensions; optional native runtime later
  • Profiles over one-offs — every sandbox type is named, versioned, registered
  • Prefer self-hosted — SaaS via explicit routing policy, not silent default
  • Blast-radius isolation — dedicated hosts; never jeopardize Railiance01 production
  • Reachability, not ownership — ops-bridge + ops-warden as consumers
  • Secrets at the boundary — resolve at provision; never in agent-visible workspace
  • Observable lifecycle — every state transition attributable and queryable
  • Disposable by default — TTL-bound; persistence and checkpoint are explicit
  • Honest security — sandboxing limits blast radius; it is not intent enforcement
  • Registry-first reuse — capabilities in registry/ before ad hoc duplication
  • Payments transparency — estimate before create; meter on destroy for SaaS

Near-term outcomes

  1. Charter and researchINTENT.md, research/, profile schema draft
  2. First self-hosted extensionext.compose-ssh from e2e-framework lineage
  3. Unified API v0 — create / get / destroy / recreate + State Hub registration
  4. First profileprofile.compose-e2e for wise-validator migration
  5. Registry entrycapability.execution.sandbox-provision via reuse-surface
  6. Extension SDK sketch — contract for P1 backends (vm-packer, Daytona OSS)
  7. Sibling integration notes — glas-harness, wise-validator, snuggle-inventor API expectations documented

Maturity target

A mature sand-boxer is Coulomb's default way to establish any sandbox:

  • glas-harness requests agent dev sandboxes without choosing Docker vs Modal vs SSH
  • wise-validator requests validation environments without owning provisioners
  • snuggle-inventor requests build sandboxes with setup metadata and secret refs
  • activity-core and CI request bounded venues with consistent lifecycle visibility
  • Operators route spend across self-hosted and SaaS with one credits model
  • A Coulomb-native runtime — if warranted — wins on ops data, not speculation

The workstation is optional. The harness is not sand-boxer. The validator is not sand-boxer. The inventor is not sand-boxer. Establishing the box is.