Files

tegwick 9054d33e46 Clarify INTENT.md: sand-boxer self-sufficiency and sibling boundaries

Document that sand-boxer is self-sustained without wise-validator, that
validation is an optional downstream consumer, and update near-term outcomes
to reflect completed SAND-WP-0002 work.

2026-06-23 21:23:39 +02:00

16 KiB

Raw Blame History

domain, repo, updated

domain	repo	updated
infotech	sand-boxer	2026-06-23

INTENT

sand-boxer is the Coulomb meta-framework for establishing sandboxes — a unified API and extension platform for provisioning every variation of isolated execution environment, from self-hosted compose stacks to metered SaaS runtimes. This file is the charter: why it exists, what it owns, and where sibling projects begin.

Research backing this charter lives in research/.

Why it exists

Custodian automation is moving from workstation-anchored execution to Railiance01-scheduled orchestration. That shift improves reliability but does not, by itself, answer the harder question: where can agentic and deterministic work run safely without the laptop filesystem, sleep cycles, and single-user blast radius?

The industry has exploded with sandbox answers — E2B, Modal, Daytona, OpenShell, OpenClaw-style Docker/SSH backends, hyperscaler interpreters — each with different APIs, billing models, and isolation postures. Coulomb needs one place to establish sandboxes regardless of backend, not a new integration per agent harness, validator, or codegen pipeline.

sand-boxer exists to be that place: OpenRouter for sandboxes, not for models.

Consumers call one API. Extensions delegate to the sandbox system that fits — self-hosted on sandboxer01, inherited compose-ssh from the-custodian, or a metered cloud provider. An integrated payments layer handles SaaS consumption when Coulomb uses external capacity. Over time, operational learning may justify a Coulomb-native best-of-brands runtime — but that is a later phase built on evidence, not day-one ambition.

The workstation becomes optional for runtime. Railiance01 decides when work runs (via activity-core). sand-boxer decides where isolated execution happens. State Hub records what changed.

The governing principle

sand-boxer is the sandbox establishment service — profiles, provisioning, extension routing, placement, lifecycle, and metering. Nothing more.

It answers:

Which sandbox recipe applies? Profile selection and version resolution.
Which backend fulfills it? Extension routing (self-hosted vs SaaS).
Where does it run? Host placement and blast-radius policy.
How is isolation enforced? Network default-deny, TTL, resource limits, teardown guarantees — as declared by profile + extension.
How does it become reachable? Consumer integration with ops-bridge and ops-warden — without owning tunnels or certificates.
What happened? Lifecycle events, usage meters, State Hub registration.
What did it cost? Payments and credits for metered extensions.

It must not become the agent harness, the e2e validator, the code generator, the scheduler, the work-state database, the connectivity authority, or production hosting on Railiance01.

Self-sufficiency

sand-boxer is self-sustained. It ships a complete establishment surface — profiles, extensions, CLI, lifecycle registration, and host telemetry (canary self-deploy) — without depending on wise-validator or any other sibling project.

sand-boxer does	sand-boxer does not require
Provision and teardown sandboxes	wise-validator to exist or run
Prove reachability (`ready`)	Repo `e2e/e2e.yml` or test contracts
Emit sandbox lifecycle to State Hub	Validation pass/fail from another service
Dogfood via `profile.sandbox-canary`	Cross-repo use-case orchestration

wise-validator is an optional downstream consumer, not a co-requisite. If wise-validator were never built, sand-boxer would still provision agent dev environments, compose stacks, and operator smoke paths. Conversely, wise-validator depends on sand-boxer (or a compatible establishment API) for environments — never the reverse.

Other peers (glas-harness, snuggle-inventor, activity-core, CI) are equally optional consumers of the same API.

The OpenRouter analogy

OpenRouter	sand-boxer
Unified LLM access API	Unified sandbox establishment API
Routes across model providers	Routes across sandbox extensions
Provider metadata (price, context)	Profile metadata (isolation, cost, latency)
API keys, credits, usage billing	Payments layer for SaaS sandbox consumption
BYOK supported	BYOK for extension provider keys
Does not train models	Does not replace extension runtimes (until phase 5)

sand-boxer is infrastructure routing, not product UX. Harnesses, validators, and inventors are customers.

Coulomb sibling boundaries

sand-boxer stays inside the sandboxing boundary. Three sibling Coulomb projects own adjacent concerns. Integration is contractual — they request sandboxes; sand-boxer establishes them.

Per-sibling integration contracts: docs/integrations/ (glas-harness, wise-validator, snuggle-inventor).

glas-harness — agent harness

Owns: Gateway, tool orchestration, skills, memory, channels, subagent delegation, session semantics, sandbox consumption from the agent's perspective.

Does not own: Sandbox runtimes, profile catalog authority, host placement, extension adapters, isolation enforcement.

glas-harness configures when tools run in a sandbox (OpenClaw-style mode / scope / workspaceAccess). sand-boxer provides the sandbox handle and reachability descriptor.

wise-validator — cross-repo use-case validation (optional consumer)

wise-validator owns: Use-case validation orchestration across the Coulomb ecosystem — health check semantics, test execution, pass/fail interpretation, structured validation results to State Hub and CI. It stabilizes use cases that may not run daily by detecting silent degeneration (dependency drift, host changes, cross-repo breakage) before someone depends on a stale path again.

sand-boxer does not own: Any of the above. sand-boxer does not parse e2e/e2e.yml, poll HTTP health endpoints, run test_command, or emit validation pass/fail. That boundary is intentional so establishment stays independent of validation.

Relationship: wise-validator is a separate project that may call sand-boxer to obtain environments (profile.compose-e2e, etc.), then runs the validation story inside them. sand-boxer establishes the box; wise-validator proves use cases still work. sand-boxer neither waits for nor requires wise-validator.

Lineage: wise-validator replaces the validation half of the-custodian/e2e-framework/; sand-boxer already owns the provision/teardown half (ext.compose-ssh).

snuggle-inventor — code generation

Owns: Code generation, modernization pipelines, tech-spec and planning artifacts, PR-oriented output, human-in-the-loop review gates.

Does not own: Sandbox infrastructure, environment bootstrapping authority, secret stores, runtime metering.

snuggle-inventor may attach Blitzy-style setup instructions and secret references as profile inputs. sand-boxer resolves secrets at the provision boundary; generated code never transits sand-boxer APIs.

Boundary diagram

  glas-harness          wise-validator         snuggle-inventor
  (agent harness)       (e2e + health)         (code generation)
        │                     │                      │
        └─────────────────────┼──────────────────────┘
                              │  POST /v1/sandboxes
                              ▼
                        sand-boxer
                   (establish sandboxes)
                              │
              ┌───────────────┼───────────────┐
              ▼               ▼               ▼
        ext.compose-ssh   ext.modal      ext.e2b …
        (self-hosted)     (SaaS+meter)   (SaaS+meter)

Existing Custodian repos (unchanged)

Concern	Owner
Workstream, task, progress state	`state-hub`
Cron and orchestration	`activity-core`
SSH reverse tunnels	`ops-bridge`
SSH certificate issuance	`ops-warden`
Canon and agent instruction canon	`the-custodian`
Capability federation hub	`reuse-surface`
Production on Railiance01	`railiance-apps` / domain repos
ADR-001 reconciliation	`state-hub`

sand-boxer consumes ops-bridge and ops-warden; it does not subsume them.

What it is

sand-boxer is a meta-framework with four pillars:

1. Unified establishment API

One consistent surface for all sandbox variations:

Create, inspect, extend, snapshot, recreate, destroy
Profile-driven inputs (repo ref, compose bundle, setup metadata, secret refs)
Consumer attribution (adm / agt / atm + calling project id)
Lifecycle states: requested → provisioning → ready → active → expired → destroyed

Early versions may expose a subset; the API shape is designed for completeness.

2. Profile catalog

Named, versioned recipes — not one-off containers:

Extension binding (ext.compose-ssh, ext.vm-packer, ext.e2b, …)
Isolation level, network policy, workspace mode (mirror | remote-canonical)
Scope default (agent | session | shared)
TTL, resource limits, placement preference
Setup metadata (natural-language bootstrap instructions for extensions)
Registered in registry/ and federated via reuse-surface

Profiles collect good ideas from OpenClaw (backend/scope/workspace), Hermes (labeled reuse, resource limits), Blitzy (setup instructions, secret boundary), and hosted platforms (checkpoint, persistence classes) into one schema.

3. Extension platform

Extensions delegate to sandbox systems and services:

Class	Examples	Billing
Self-hosted	compose-ssh, vm-packer, Daytona OSS, OpenShell	Infra allocation
SaaS consumption	E2B, Modal, Daytona cloud, future providers	Payments layer

Each extension implements a provision / ready / teardown contract (optional snapshot / cost estimate). Extensions ship as plugins; third-party and Coulomb- native backends use the same interface.

4. Payments and metering

For metered SaaS extensions:

Org/workspace credits and usage accounting
Pre-create cost estimates; post-destroy actuals
BYOK for provider API keys where supported
Export to domain billing systems — sand-boxer meters sandbox consumption, not general payments

Self-hosted extensions record allocation (host, duration), not external spend.

What it is not

Concern	Owner	sand-boxer role
Agent gateway, tools, memory, channels	glas-harness	Customer API
E2e tests, health checks, validation	wise-validator	Customer API
Code generation, tech specs, AAP	snuggle-inventor	Customer API
When work runs	`activity-core`	None
What tasks exist	`state-hub`	Registers lifecycle only
Tunnels	`ops-bridge`	Consumer
Certs	`ops-warden`	Consumer
Intent-aware egress / prompt security	Research frontier	Document limits only

sand-boxer provides blast-radius isolation and governed reachability. It does not protect against a compromised agent abusing allowed egress paths (git, npm, curl to allowlisted hosts). Security runbooks must state this explicitly.

Strategic context

Workstation automation is interim

Local timers and laptop scripts bootstrapped ADR-001 sync. Railiance01 activity-core schedules are the direction. Workstation paths remain only where no sandbox alternative exists yet.

Host topology

Layer	Role
Railiance01	Production k3s, activity-core, Temporal — not agent dev runtime
sandboxer01	Dedicated sandbox host — preferred blast-radius isolation
CoulombCore	Interim sandbox host during migration
Workstation (WSL)	Control-plane anchor today — not target execution surface
SaaS extensions	Burst / capability gap (GPU, desktop) via payments layer

Lineage

sand-boxer generalizes patterns split across the-custodian:

Legacy	sand-boxer	Sibling
`e2e-framework/` provision/teardown	`ext.compose-ssh`	wise-validator owns test run
`e2e-framework/` health + test + report	—	wise-validator
`infra/build-machines/`	`ext.vm-packer`	—
Agent sandbox config (future)	API consumer	glas-harness

the-custodian stays governance-focused; sand-boxer becomes the execution venue catalog.

Phase 5: Coulomb-native runtime (later)

After operating extensions in production — observing latency, cost, failure modes, isolation gaps — sand-boxer may ship an owned best-of-brands sandboxing solution combining:

Persistent labeled workspaces (Hermes pattern)
Default-deny policy layer (OpenShell lessons)
Fast resume / checkpoint (industry baseline)
Self-hosted economics (Daytona/OpenSandbox lessons)

This is not v1 scope. Extensions and payments come first; native runtime follows evidence.

Intended users

Human operators (adm) — profiles, hosts, extensions, credits, lifecycle
LLM agents (agt) — via glas-harness, snuggle-inventor, or direct API
Deterministic automations (atm) — via wise-validator, activity-core, CI
Extension authors — implement backend adapters against the extension contract
Platform integrators — register capabilities, federate via reuse-surface

Design principles

Meta-framework, not monolith — one API; many extensions; optional native runtime later
Profiles over one-offs — every sandbox type is named, versioned, registered
Prefer self-hosted — SaaS via explicit routing policy, not silent default
Blast-radius isolation — dedicated hosts; never jeopardize Railiance01 production
Reachability, not ownership — ops-bridge + ops-warden as consumers
Secrets at the boundary — resolve at provision; never in agent-visible workspace
Observable lifecycle — every state transition attributable and queryable
Disposable by default — TTL-bound; persistence and checkpoint are explicit
Honest security — sandboxing limits blast radius; it is not intent enforcement
Registry-first reuse — capabilities in registry/ before ad hoc duplication
Payments transparency — estimate before create; meter on destroy for SaaS

Near-term outcomes

~~Charter and research~~ — done (INTENT.md, research/, meta-framework spec)
~~First self-hosted extension~~ — done (ext.compose-ssh, SAND-WP-0002)
~~Unified API v0~~ — done (CLI + HTTP stub, State Hub lifecycle)
~~Profile catalog start~~ — profile.compose-e2e, profile.sandbox-canary
~~Registry entry~~ — capability.execution.sandbox-provision
~~Sibling integration notes~~ — docs/integrations/
Extension SDK sketch — contract for P1 backends (vm-packer, Daytona OSS)
wise-validator — separate repo/workplan (SAND-WP-0003); not a sand-boxer dependency

Maturity target

A mature sand-boxer is Coulomb's default way to establish any sandbox:

glas-harness requests agent dev sandboxes without choosing Docker vs Modal vs SSH
wise-validator may request validation environments; sand-boxer does not depend on it
snuggle-inventor requests build sandboxes with setup metadata and secret refs
activity-core and CI request bounded venues with consistent lifecycle visibility
Operators route spend across self-hosted and SaaS with one credits model
A Coulomb-native runtime — if warranted — wins on ops data, not speculation

The workstation is optional. The harness is not sand-boxer. The validator is not sand-boxer. The inventor is not sand-boxer. Establishing the box is.

16 KiB Raw Blame History