# Meta-framework synthesis Design notes distilled from landscape research for sand-boxer's unified sandbox API, extension model, payments layer, and Coulomb project boundaries. --- ## Core thesis sand-boxer is a **meta-framework for establishing sandboxes** — like OpenRouter is a meta-framework for accessing LLM models: - One **consistent API** for consumers (`adm`, `agt`, `atm`, domain services) - Many **extensions** that delegate to self-hosted or SaaS sandbox systems - **Integrated payments** when consuming metered external services - **Registry-first** profiles and capabilities via reuse-surface - **Later:** a Coulomb-native "best of brands" runtime built from operational experience — not day one sand-boxer provisions **where and how code runs**. It does not provision **how agents think**, **what tests mean**, or **what code gets written**. --- ## Coulomb project boundaries These sibling projects are **planned Coulomb repos** with explicit authority split. sand-boxer must not absorb their concerns. ```mermaid flowchart LR subgraph establish [sand-boxer] SB[Establish sandbox] end subgraph harness [glas-harness] GH[Agent harness: gateway tools memory channels] end subgraph validate [wise-validator] WV[E2E tests health checks validation orchestration] end subgraph generate [snuggle-inventor] SI[Code generation modernization] end GH -->|request sandbox| SB WV -->|request sandbox| SB SI -->|request sandbox| SB WV -.->|runs tests in| SB GH -.->|executes tools in| SB SI -.->|validates output in| SB ``` | Project | Owns | Does not own | |---------|------|--------------| | **sand-boxer** | Sandbox profiles, provision/teardown, extension routing, placement, lifecycle registration, payments for sandbox consumption | Agent memory, channels, tool policies, test definitions, code generation | | **glas-harness** | Agent gateway, harness, skills, subagents, tool orchestration, channel bridges | Sandbox runtime, isolation enforcement, host placement | | **wise-validator** | E2E test orchestration, health check semantics, validation workflows, result reporting | Sandbox provisioning, agent conversation state | | **snuggle-inventor** | Code generation, tech specs, AAP-style planning, PR-oriented output | Sandbox infrastructure, test harness canon | ### Integration contracts (intended) **glas-harness → sand-boxer** ``` POST /v1/sandboxes profile: "profile.agent-dev" scope: session | agent | shared workspace: { mode: mirror | remote, access: none | ro | rw } consumer: { actor: agt, harness: glas-harness, session_id } ``` Harness receives: `sandbox_id`, reachability descriptor (SSH endpoint, tunnel ref), lifecycle webhook or poll URL. Harness executes tools **inside** sandbox via agreed exec channel — sand-boxer does not parse tool calls. **wise-validator → sand-boxer** ``` POST /v1/sandboxes profile: "profile.compose-e2e" inputs: { repo_ref, compose_bundle_ref } ttl: 2h consumer: { actor: atm, harness: wise-validator, run_id } ``` wise-validator owns `e2e.yml` semantics, health check definitions, test commands, and pass/fail interpretation. sand-boxer delivers an environment; wise-validator runs the validation story **on top**. **snuggle-inventor → sand-boxer** ``` POST /v1/sandboxes profile: "profile.build" setup_metadata: { instructions_ref, secret_refs } consumer: { actor: agt, harness: snuggle-inventor, job_id } ``` snuggle-inventor may attach Blitzy-style setup instructions as profile inputs. sand-boxer resolves secrets at boundary; generated code never flows through sand-boxer APIs. ### Migration from the-custodian | Legacy | New owner | |--------|-----------| | `e2e-framework/` provision/teardown | sand-boxer `ext.compose-ssh` | | `e2e-framework/` test run + report | wise-validator (calls sand-boxer) | | Agent tool sandbox config | glas-harness (calls sand-boxer) | | `infra/build-machines/` | sand-boxer `ext.vm-packer` | --- ## Meta-framework API (conceptual) ### Resources | Resource | Description | |----------|-------------| | `Profile` | Named, versioned sandbox recipe (image, isolation, network, TTL, extension) | | `Extension` | Backend adapter (self-hosted or SaaS) | | `Host` | Registered placement target for self-hosted extensions | | `Sandbox` | Running instance of a profile | | `Snapshot` | Point-in-time workspace checkpoint (optional) | | `Route` | Extension selection policy (cost, latency, capability) | | `Meter` | Usage record for payments layer | ### Sandbox lifecycle states ``` requested → provisioning → ready → active → { expired | failed } → destroying → destroyed ``` All transitions emit State Hub events. `ready` means reachability probe succeeded. ### Core operations | Operation | Description | |-----------|-------------| | `create` | Provision from profile + inputs | | `get` / `list` | Inspect status | | `exec` | Run command in sandbox (optional — may be harness-owned) | | `extend_ttl` | Explicit persistence extension | | `snapshot` / `restore` | Checkpoint workspace | | `recreate` | Destroy and reprovision from seed | | `destroy` | Idempotent teardown | Early versions may expose only `create`, `get`, `destroy`, `recreate`; harnesses can own `exec` via SSH/tunnel without sand-boxer proxying every command. ### Profile schema (minimum) ```yaml id: profile.compose-e2e version: "1.0.0" extension: ext.compose-ssh isolation: level: container # container | microvm | policy network: default: deny egress: [] # extension interprets workspace: mode: remote-canonical # mirror | remote-canonical access: rw scope_default: session ttl: default: 4h max: 24h idle_reap: null resources: cpu: null memory_mb: null setup: instructions: "" # Blitzy-style natural language for extension bootstrap secret_refs: [] # resolved at provision; never in agent context placement: prefer: [sandboxer01] fallback: [coulombcore] reachability: tunnel: ops-bridge identity: ops-warden metadata: cost_class: self-hosted # self-hosted | saas-metered latency_class: standard ``` ### Extension interface (contract) Each extension implements: ```text provision(profile, inputs, placement) → sandbox_handle wait_ready(sandbox_handle) → reachability teardown(sandbox_handle) → cleanup_report snapshot?(sandbox_handle) → snapshot_id restore?(snapshot_id) → sandbox_handle estimate_cost?(profile, duration) → meter_quote ``` Extensions register in `registry/` with capability vectors (isolation level, regions, GPU, persistence, pricing model). **Bundled extensions (roadmap):** | Priority | Extension | Type | |----------|-----------|------| | P0 | `ext.compose-ssh` | Self-hosted (e2e-framework lineage) | | P1 | `ext.vm-packer` | Self-hosted (build-machines lineage) | | P2 | `ext.daytona-self` | Self-hosted OSS | | P3 | `ext.e2b`, `ext.modal`, `ext.daytona` | SaaS + payments | | P4 | `ext.openshell` | Policy runtime wrapper | --- ## Payments layer For SaaS extensions, sand-boxer provides an **integrated payments and metering layer** analogous to OpenRouter credits: | Concern | sand-boxer approach | |---------|---------------------| | Account credits | Org/workspace balance for sandbox consumption | | Metering | Per-second, per-creation, GPU surcharge — per extension quote | | Provider keys | BYOK optional; platform keys for convenience | | Cost visibility | `estimate_cost` before create; actuals on destroy | | Billing events | Export to fin-hub / external billing (consumer, not owner) | Self-hosted extensions bill **infra cost only** (host allocation) — no SaaS meter. Payments is a **facility inside sand-boxer**, not a general payment processor. Domain billing authority remains elsewhere. --- ## Routing policy (OpenRouter-style) When multiple extensions satisfy a profile capability: ```yaml route: strategy: prefer-self-hosted | lowest-cost | lowest-latency | explicit fallback: [ext.compose-ssh, ext.daytona] constraints: max_cost_per_hour: null require_isolation: microvm region: eu ``` Default Coulomb posture: **prefer-self-hosted** on sandboxer01; SaaS for burst or capability gaps (GPU, desktop) once extensions exist. --- ## Security posture (documented limits) sand-boxer commits to: 1. Default-deny network unless profile explicitly allows egress 2. Secrets resolved at provision boundary via ops-warden / secret refs 3. Blast-radius isolation on dedicated hosts away from Railiance01 production 4. Observable lifecycle and attributable actors (`adm` / `agt` / `atm`) 5. Honest documentation: **allowed tool paths can be abused by compromised agents** sand-boxer does **not** commit to intent-aware egress filtering in v1. --- ## Phased maturity | Phase | Deliverable | |-------|-------------| | **0** | Charter, research, profile schema, `ext.compose-ssh` design | | **1** | Unified API + self-hosted compose-ssh + State Hub registration | | **2** | Extension SDK + vm-packer + registry entries + routing | | **3** | SaaS extensions + payments layer | | **4** | Snapshot/restore + checkpoint profiles | | **5** | Coulomb-native runtime ("best of brands") informed by extension ops data | Phase 5 is explicitly **later** — learn from routing, billing, failure modes, and latency before building owned microVM/control-plane. --- ## Open questions (for workplans) 1. Does `exec` live in sand-boxer API or only in glas-harness via SSH? 2. Payments: integrate with existing fin-hub or standalone credits first? 3. Profile authorship: repo-local YAML vs hub-managed catalog? 4. wise-validator: fork e2e-framework reporter or new contract from day one? These belong in SAND-WP-0002+ design workplans, not INTENT.md.