Files
sand-boxer/research/03-meta-framework-synthesis.md
tegwick f33cff5363 docs: charter meta-framework vision, research, and SAND-WP-0002
Rewrite INTENT.md as the sand-boxer meta-framework charter (OpenRouter-style
sandbox API, extensions, payments, Coulomb sibling boundaries). Add research
under research/, update SCOPE.md, bootstrap workplans SAND-WP-0001/0002, and
State Hub integration files from the bootstrap pass.
2026-06-22 21:32:32 +02:00

9.6 KiB

Meta-framework synthesis

Design notes distilled from landscape research for sand-boxer's unified sandbox API, extension model, payments layer, and Coulomb project boundaries.


Core thesis

sand-boxer is a meta-framework for establishing sandboxes — like OpenRouter is a meta-framework for accessing LLM models:

  • One consistent API for consumers (adm, agt, atm, domain services)
  • Many extensions that delegate to self-hosted or SaaS sandbox systems
  • Integrated payments when consuming metered external services
  • Registry-first profiles and capabilities via reuse-surface
  • Later: a Coulomb-native "best of brands" runtime built from operational experience — not day one

sand-boxer provisions where and how code runs. It does not provision how agents think, what tests mean, or what code gets written.


Coulomb project boundaries

These sibling projects are planned Coulomb repos with explicit authority split. sand-boxer must not absorb their concerns.

flowchart LR
  subgraph establish [sand-boxer]
    SB[Establish sandbox]
  end

  subgraph harness [glas-harness]
    GH[Agent harness: gateway tools memory channels]
  end

  subgraph validate [wise-validator]
    WV[E2E tests health checks validation orchestration]
  end

  subgraph generate [snuggle-inventor]
    SI[Code generation modernization]
  end

  GH -->|request sandbox| SB
  WV -->|request sandbox| SB
  SI -->|request sandbox| SB
  WV -.->|runs tests in| SB
  GH -.->|executes tools in| SB
  SI -.->|validates output in| SB
Project Owns Does not own
sand-boxer Sandbox profiles, provision/teardown, extension routing, placement, lifecycle registration, payments for sandbox consumption Agent memory, channels, tool policies, test definitions, code generation
glas-harness Agent gateway, harness, skills, subagents, tool orchestration, channel bridges Sandbox runtime, isolation enforcement, host placement
wise-validator E2E test orchestration, health check semantics, validation workflows, result reporting Sandbox provisioning, agent conversation state
snuggle-inventor Code generation, tech specs, AAP-style planning, PR-oriented output Sandbox infrastructure, test harness canon

Integration contracts (intended)

glas-harness → sand-boxer

POST /v1/sandboxes
  profile: "profile.agent-dev"
  scope: session | agent | shared
  workspace: { mode: mirror | remote, access: none | ro | rw }
  consumer: { actor: agt, harness: glas-harness, session_id }

Harness receives: sandbox_id, reachability descriptor (SSH endpoint, tunnel ref), lifecycle webhook or poll URL. Harness executes tools inside sandbox via agreed exec channel — sand-boxer does not parse tool calls.

wise-validator → sand-boxer

POST /v1/sandboxes
  profile: "profile.compose-e2e"
  inputs: { repo_ref, compose_bundle_ref }
  ttl: 2h
  consumer: { actor: atm, harness: wise-validator, run_id }

wise-validator owns e2e.yml semantics, health check definitions, test commands, and pass/fail interpretation. sand-boxer delivers an environment; wise-validator runs the validation story on top.

snuggle-inventor → sand-boxer

POST /v1/sandboxes
  profile: "profile.build"
  setup_metadata: { instructions_ref, secret_refs }
  consumer: { actor: agt, harness: snuggle-inventor, job_id }

snuggle-inventor may attach Blitzy-style setup instructions as profile inputs. sand-boxer resolves secrets at boundary; generated code never flows through sand-boxer APIs.

Migration from the-custodian

Legacy New owner
e2e-framework/ provision/teardown sand-boxer ext.compose-ssh
e2e-framework/ test run + report wise-validator (calls sand-boxer)
Agent tool sandbox config glas-harness (calls sand-boxer)
infra/build-machines/ sand-boxer ext.vm-packer

Meta-framework API (conceptual)

Resources

Resource Description
Profile Named, versioned sandbox recipe (image, isolation, network, TTL, extension)
Extension Backend adapter (self-hosted or SaaS)
Host Registered placement target for self-hosted extensions
Sandbox Running instance of a profile
Snapshot Point-in-time workspace checkpoint (optional)
Route Extension selection policy (cost, latency, capability)
Meter Usage record for payments layer

Sandbox lifecycle states

requested → provisioning → ready → active → { expired | failed } → destroying → destroyed

All transitions emit State Hub events. ready means reachability probe succeeded.

Core operations

Operation Description
create Provision from profile + inputs
get / list Inspect status
exec Run command in sandbox (optional — may be harness-owned)
extend_ttl Explicit persistence extension
snapshot / restore Checkpoint workspace
recreate Destroy and reprovision from seed
destroy Idempotent teardown

Early versions may expose only create, get, destroy, recreate; harnesses can own exec via SSH/tunnel without sand-boxer proxying every command.

Profile schema (minimum)

id: profile.compose-e2e
version: "1.0.0"
extension: ext.compose-ssh
isolation:
  level: container          # container | microvm | policy
network:
  default: deny
  egress: []                # extension interprets
workspace:
  mode: remote-canonical    # mirror | remote-canonical
  access: rw
scope_default: session
ttl:
  default: 4h
  max: 24h
  idle_reap: null
resources:
  cpu: null
  memory_mb: null
setup:
  instructions: ""          # Blitzy-style natural language for extension bootstrap
  secret_refs: []           # resolved at provision; never in agent context
placement:
  prefer: [sandboxer01]
  fallback: [coulombcore]
reachability:
  tunnel: ops-bridge
  identity: ops-warden
metadata:
  cost_class: self-hosted   # self-hosted | saas-metered
  latency_class: standard

Extension interface (contract)

Each extension implements:

provision(profile, inputs, placement) → sandbox_handle
wait_ready(sandbox_handle) → reachability
teardown(sandbox_handle) → cleanup_report
snapshot?(sandbox_handle) → snapshot_id
restore?(snapshot_id) → sandbox_handle
estimate_cost?(profile, duration) → meter_quote

Extensions register in registry/ with capability vectors (isolation level, regions, GPU, persistence, pricing model).

Bundled extensions (roadmap):

Priority Extension Type
P0 ext.compose-ssh Self-hosted (e2e-framework lineage)
P1 ext.vm-packer Self-hosted (build-machines lineage)
P2 ext.daytona-self Self-hosted OSS
P3 ext.e2b, ext.modal, ext.daytona SaaS + payments
P4 ext.openshell Policy runtime wrapper

Payments layer

For SaaS extensions, sand-boxer provides an integrated payments and metering layer analogous to OpenRouter credits:

Concern sand-boxer approach
Account credits Org/workspace balance for sandbox consumption
Metering Per-second, per-creation, GPU surcharge — per extension quote
Provider keys BYOK optional; platform keys for convenience
Cost visibility estimate_cost before create; actuals on destroy
Billing events Export to fin-hub / external billing (consumer, not owner)

Self-hosted extensions bill infra cost only (host allocation) — no SaaS meter.

Payments is a facility inside sand-boxer, not a general payment processor. Domain billing authority remains elsewhere.


Routing policy (OpenRouter-style)

When multiple extensions satisfy a profile capability:

route:
  strategy: prefer-self-hosted | lowest-cost | lowest-latency | explicit
  fallback: [ext.compose-ssh, ext.daytona]
  constraints:
    max_cost_per_hour: null
    require_isolation: microvm
    region: eu

Default Coulomb posture: prefer-self-hosted on sandboxer01; SaaS for burst or capability gaps (GPU, desktop) once extensions exist.


Security posture (documented limits)

sand-boxer commits to:

  1. Default-deny network unless profile explicitly allows egress
  2. Secrets resolved at provision boundary via ops-warden / secret refs
  3. Blast-radius isolation on dedicated hosts away from Railiance01 production
  4. Observable lifecycle and attributable actors (adm / agt / atm)
  5. Honest documentation: allowed tool paths can be abused by compromised agents

sand-boxer does not commit to intent-aware egress filtering in v1.


Phased maturity

Phase Deliverable
0 Charter, research, profile schema, ext.compose-ssh design
1 Unified API + self-hosted compose-ssh + State Hub registration
2 Extension SDK + vm-packer + registry entries + routing
3 SaaS extensions + payments layer
4 Snapshot/restore + checkpoint profiles
5 Coulomb-native runtime ("best of brands") informed by extension ops data

Phase 5 is explicitly later — learn from routing, billing, failure modes, and latency before building owned microVM/control-plane.


Open questions (for workplans)

  1. Does exec live in sand-boxer API or only in glas-harness via SSH?
  2. Payments: integrate with existing fin-hub or standalone credits first?
  3. Profile authorship: repo-local YAML vs hub-managed catalog?
  4. wise-validator: fork e2e-framework reporter or new contract from day one?

These belong in SAND-WP-0002+ design workplans, not INTENT.md.