Files
sand-boxer/INTENT.md
tegwick 9054d33e46 Clarify INTENT.md: sand-boxer self-sufficiency and sibling boundaries
Document that sand-boxer is self-sustained without wise-validator, that
validation is an optional downstream consumer, and update near-term outcomes
to reflect completed SAND-WP-0002 work.
2026-06-23 21:23:39 +02:00

16 KiB

domain, repo, updated
domain repo updated
infotech sand-boxer 2026-06-23

INTENT

sand-boxer is the Coulomb meta-framework for establishing sandboxes — a unified API and extension platform for provisioning every variation of isolated execution environment, from self-hosted compose stacks to metered SaaS runtimes. This file is the charter: why it exists, what it owns, and where sibling projects begin.

Research backing this charter lives in research/.


Why it exists

Custodian automation is moving from workstation-anchored execution to Railiance01-scheduled orchestration. That shift improves reliability but does not, by itself, answer the harder question: where can agentic and deterministic work run safely without the laptop filesystem, sleep cycles, and single-user blast radius?

The industry has exploded with sandbox answers — E2B, Modal, Daytona, OpenShell, OpenClaw-style Docker/SSH backends, hyperscaler interpreters — each with different APIs, billing models, and isolation postures. Coulomb needs one place to establish sandboxes regardless of backend, not a new integration per agent harness, validator, or codegen pipeline.

sand-boxer exists to be that place: OpenRouter for sandboxes, not for models.

Consumers call one API. Extensions delegate to the sandbox system that fits — self-hosted on sandboxer01, inherited compose-ssh from the-custodian, or a metered cloud provider. An integrated payments layer handles SaaS consumption when Coulomb uses external capacity. Over time, operational learning may justify a Coulomb-native best-of-brands runtime — but that is a later phase built on evidence, not day-one ambition.

The workstation becomes optional for runtime. Railiance01 decides when work runs (via activity-core). sand-boxer decides where isolated execution happens. State Hub records what changed.


The governing principle

sand-boxer is the sandbox establishment service — profiles, provisioning, extension routing, placement, lifecycle, and metering. Nothing more.

It answers:

  1. Which sandbox recipe applies? Profile selection and version resolution.
  2. Which backend fulfills it? Extension routing (self-hosted vs SaaS).
  3. Where does it run? Host placement and blast-radius policy.
  4. How is isolation enforced? Network default-deny, TTL, resource limits, teardown guarantees — as declared by profile + extension.
  5. How does it become reachable? Consumer integration with ops-bridge and ops-warden — without owning tunnels or certificates.
  6. What happened? Lifecycle events, usage meters, State Hub registration.
  7. What did it cost? Payments and credits for metered extensions.

It must not become the agent harness, the e2e validator, the code generator, the scheduler, the work-state database, the connectivity authority, or production hosting on Railiance01.


Self-sufficiency

sand-boxer is self-sustained. It ships a complete establishment surface — profiles, extensions, CLI, lifecycle registration, and host telemetry (canary self-deploy) — without depending on wise-validator or any other sibling project.

sand-boxer does sand-boxer does not require
Provision and teardown sandboxes wise-validator to exist or run
Prove reachability (ready) Repo e2e/e2e.yml or test contracts
Emit sandbox lifecycle to State Hub Validation pass/fail from another service
Dogfood via profile.sandbox-canary Cross-repo use-case orchestration

wise-validator is an optional downstream consumer, not a co-requisite. If wise-validator were never built, sand-boxer would still provision agent dev environments, compose stacks, and operator smoke paths. Conversely, wise-validator depends on sand-boxer (or a compatible establishment API) for environments — never the reverse.

Other peers (glas-harness, snuggle-inventor, activity-core, CI) are equally optional consumers of the same API.


The OpenRouter analogy

OpenRouter sand-boxer
Unified LLM access API Unified sandbox establishment API
Routes across model providers Routes across sandbox extensions
Provider metadata (price, context) Profile metadata (isolation, cost, latency)
API keys, credits, usage billing Payments layer for SaaS sandbox consumption
BYOK supported BYOK for extension provider keys
Does not train models Does not replace extension runtimes (until phase 5)

sand-boxer is infrastructure routing, not product UX. Harnesses, validators, and inventors are customers.


Coulomb sibling boundaries

sand-boxer stays inside the sandboxing boundary. Three sibling Coulomb projects own adjacent concerns. Integration is contractual — they request sandboxes; sand-boxer establishes them.

Per-sibling integration contracts: docs/integrations/ (glas-harness, wise-validator, snuggle-inventor).

glas-harness — agent harness

Owns: Gateway, tool orchestration, skills, memory, channels, subagent delegation, session semantics, sandbox consumption from the agent's perspective.

Does not own: Sandbox runtimes, profile catalog authority, host placement, extension adapters, isolation enforcement.

glas-harness configures when tools run in a sandbox (OpenClaw-style mode / scope / workspaceAccess). sand-boxer provides the sandbox handle and reachability descriptor.

wise-validator — cross-repo use-case validation (optional consumer)

wise-validator owns: Use-case validation orchestration across the Coulomb ecosystem — health check semantics, test execution, pass/fail interpretation, structured validation results to State Hub and CI. It stabilizes use cases that may not run daily by detecting silent degeneration (dependency drift, host changes, cross-repo breakage) before someone depends on a stale path again.

sand-boxer does not own: Any of the above. sand-boxer does not parse e2e/e2e.yml, poll HTTP health endpoints, run test_command, or emit validation pass/fail. That boundary is intentional so establishment stays independent of validation.

Relationship: wise-validator is a separate project that may call sand-boxer to obtain environments (profile.compose-e2e, etc.), then runs the validation story inside them. sand-boxer establishes the box; wise-validator proves use cases still work. sand-boxer neither waits for nor requires wise-validator.

Lineage: wise-validator replaces the validation half of the-custodian/e2e-framework/; sand-boxer already owns the provision/teardown half (ext.compose-ssh).

snuggle-inventor — code generation

Owns: Code generation, modernization pipelines, tech-spec and planning artifacts, PR-oriented output, human-in-the-loop review gates.

Does not own: Sandbox infrastructure, environment bootstrapping authority, secret stores, runtime metering.

snuggle-inventor may attach Blitzy-style setup instructions and secret references as profile inputs. sand-boxer resolves secrets at the provision boundary; generated code never transits sand-boxer APIs.

Boundary diagram

  glas-harness          wise-validator         snuggle-inventor
  (agent harness)       (e2e + health)         (code generation)
        │                     │                      │
        └─────────────────────┼──────────────────────┘
                              │  POST /v1/sandboxes
                              ▼
                        sand-boxer
                   (establish sandboxes)
                              │
              ┌───────────────┼───────────────┐
              ▼               ▼               ▼
        ext.compose-ssh   ext.modal      ext.e2b …
        (self-hosted)     (SaaS+meter)   (SaaS+meter)

Existing Custodian repos (unchanged)

Concern Owner
Workstream, task, progress state state-hub
Cron and orchestration activity-core
SSH reverse tunnels ops-bridge
SSH certificate issuance ops-warden
Canon and agent instruction canon the-custodian
Capability federation hub reuse-surface
Production on Railiance01 railiance-apps / domain repos
ADR-001 reconciliation state-hub

sand-boxer consumes ops-bridge and ops-warden; it does not subsume them.


What it is

sand-boxer is a meta-framework with four pillars:

1. Unified establishment API

One consistent surface for all sandbox variations:

  • Create, inspect, extend, snapshot, recreate, destroy
  • Profile-driven inputs (repo ref, compose bundle, setup metadata, secret refs)
  • Consumer attribution (adm / agt / atm + calling project id)
  • Lifecycle states: requested → provisioning → ready → active → expired → destroyed

Early versions may expose a subset; the API shape is designed for completeness.

2. Profile catalog

Named, versioned recipes — not one-off containers:

  • Extension binding (ext.compose-ssh, ext.vm-packer, ext.e2b, …)
  • Isolation level, network policy, workspace mode (mirror | remote-canonical)
  • Scope default (agent | session | shared)
  • TTL, resource limits, placement preference
  • Setup metadata (natural-language bootstrap instructions for extensions)
  • Registered in registry/ and federated via reuse-surface

Profiles collect good ideas from OpenClaw (backend/scope/workspace), Hermes (labeled reuse, resource limits), Blitzy (setup instructions, secret boundary), and hosted platforms (checkpoint, persistence classes) into one schema.

3. Extension platform

Extensions delegate to sandbox systems and services:

Class Examples Billing
Self-hosted compose-ssh, vm-packer, Daytona OSS, OpenShell Infra allocation
SaaS consumption E2B, Modal, Daytona cloud, future providers Payments layer

Each extension implements a provision / ready / teardown contract (optional snapshot / cost estimate). Extensions ship as plugins; third-party and Coulomb- native backends use the same interface.

4. Payments and metering

For metered SaaS extensions:

  • Org/workspace credits and usage accounting
  • Pre-create cost estimates; post-destroy actuals
  • BYOK for provider API keys where supported
  • Export to domain billing systems — sand-boxer meters sandbox consumption, not general payments

Self-hosted extensions record allocation (host, duration), not external spend.


What it is not

Concern Owner sand-boxer role
Agent gateway, tools, memory, channels glas-harness Customer API
E2e tests, health checks, validation wise-validator Customer API
Code generation, tech specs, AAP snuggle-inventor Customer API
When work runs activity-core None
What tasks exist state-hub Registers lifecycle only
Tunnels ops-bridge Consumer
Certs ops-warden Consumer
Intent-aware egress / prompt security Research frontier Document limits only

sand-boxer provides blast-radius isolation and governed reachability. It does not protect against a compromised agent abusing allowed egress paths (git, npm, curl to allowlisted hosts). Security runbooks must state this explicitly.


Strategic context

Workstation automation is interim

Local timers and laptop scripts bootstrapped ADR-001 sync. Railiance01 activity-core schedules are the direction. Workstation paths remain only where no sandbox alternative exists yet.

Host topology

Layer Role
Railiance01 Production k3s, activity-core, Temporal — not agent dev runtime
sandboxer01 Dedicated sandbox host — preferred blast-radius isolation
CoulombCore Interim sandbox host during migration
Workstation (WSL) Control-plane anchor today — not target execution surface
SaaS extensions Burst / capability gap (GPU, desktop) via payments layer

Lineage

sand-boxer generalizes patterns split across the-custodian:

Legacy sand-boxer Sibling
e2e-framework/ provision/teardown ext.compose-ssh wise-validator owns test run
e2e-framework/ health + test + report wise-validator
infra/build-machines/ ext.vm-packer
Agent sandbox config (future) API consumer glas-harness

the-custodian stays governance-focused; sand-boxer becomes the execution venue catalog.

Phase 5: Coulomb-native runtime (later)

After operating extensions in production — observing latency, cost, failure modes, isolation gaps — sand-boxer may ship an owned best-of-brands sandboxing solution combining:

  • Persistent labeled workspaces (Hermes pattern)
  • Default-deny policy layer (OpenShell lessons)
  • Fast resume / checkpoint (industry baseline)
  • Self-hosted economics (Daytona/OpenSandbox lessons)

This is not v1 scope. Extensions and payments come first; native runtime follows evidence.


Intended users

  • Human operators (adm) — profiles, hosts, extensions, credits, lifecycle
  • LLM agents (agt) — via glas-harness, snuggle-inventor, or direct API
  • Deterministic automations (atm) — via wise-validator, activity-core, CI
  • Extension authors — implement backend adapters against the extension contract
  • Platform integrators — register capabilities, federate via reuse-surface

Design principles

  • Meta-framework, not monolith — one API; many extensions; optional native runtime later
  • Profiles over one-offs — every sandbox type is named, versioned, registered
  • Prefer self-hosted — SaaS via explicit routing policy, not silent default
  • Blast-radius isolation — dedicated hosts; never jeopardize Railiance01 production
  • Reachability, not ownership — ops-bridge + ops-warden as consumers
  • Secrets at the boundary — resolve at provision; never in agent-visible workspace
  • Observable lifecycle — every state transition attributable and queryable
  • Disposable by default — TTL-bound; persistence and checkpoint are explicit
  • Honest security — sandboxing limits blast radius; it is not intent enforcement
  • Registry-first reuse — capabilities in registry/ before ad hoc duplication
  • Payments transparency — estimate before create; meter on destroy for SaaS

Near-term outcomes

  1. Charter and research — done (INTENT.md, research/, meta-framework spec)
  2. First self-hosted extension — done (ext.compose-ssh, SAND-WP-0002)
  3. Unified API v0 — done (CLI + HTTP stub, State Hub lifecycle)
  4. Profile catalog startprofile.compose-e2e, profile.sandbox-canary
  5. Registry entrycapability.execution.sandbox-provision
  6. Sibling integration notesdocs/integrations/
  7. Extension SDK sketch — contract for P1 backends (vm-packer, Daytona OSS)
  8. wise-validator — separate repo/workplan (SAND-WP-0003); not a sand-boxer dependency

Maturity target

A mature sand-boxer is Coulomb's default way to establish any sandbox:

  • glas-harness requests agent dev sandboxes without choosing Docker vs Modal vs SSH
  • wise-validator may request validation environments; sand-boxer does not depend on it
  • snuggle-inventor requests build sandboxes with setup metadata and secret refs
  • activity-core and CI request bounded venues with consistent lifecycle visibility
  • Operators route spend across self-hosted and SaaS with one credits model
  • A Coulomb-native runtime — if warranted — wins on ops data, not speculation

The workstation is optional. The harness is not sand-boxer. The validator is not sand-boxer. The inventor is not sand-boxer. Establishing the box is.