docs: charter meta-framework vision, research, and SAND-WP-0002

Rewrite INTENT.md as the sand-boxer meta-framework charter (OpenRouter-style sandbox API, extensions, payments, Coulomb sibling boundaries). Add research under research/, update SCOPE.md, bootstrap workplans SAND-WP-0001/0002, and State Hub integration files from the bootstrap pass.
2026-06-22 21:32:32 +02:00
parent e248f669a3
commit f33cff5363
20 changed files with 2016 additions and 113 deletions
--- a/research/03-meta-framework-synthesis.md
+++ b/research/03-meta-framework-synthesis.md
@@ -0,0 +1,294 @@
+# Meta-framework synthesis
+
+Design notes distilled from landscape research for sand-boxer's unified sandbox
+API, extension model, payments layer, and Coulomb project boundaries.
+
+---
+
+## Core thesis
+
+sand-boxer is a **meta-framework for establishing sandboxes** — like OpenRouter
+is a meta-framework for accessing LLM models:
+
+- One **consistent API** for consumers (`adm`, `agt`, `atm`, domain services)
+- Many **extensions** that delegate to self-hosted or SaaS sandbox systems
+- **Integrated payments** when consuming metered external services
+- **Registry-first** profiles and capabilities via reuse-surface
+- **Later:** a Coulomb-native "best of brands" runtime built from operational
+  experience — not day one
+
+sand-boxer provisions **where and how code runs**. It does not provision **how
+agents think**, **what tests mean**, or **what code gets written**.
+
+---
+
+## Coulomb project boundaries
+
+These sibling projects are **planned Coulomb repos** with explicit authority
+split. sand-boxer must not absorb their concerns.
+
+```mermaid
+flowchart LR
+  subgraph establish [sand-boxer]
+    SB[Establish sandbox]
+  end
+
+  subgraph harness [glas-harness]
+    GH[Agent harness: gateway tools memory channels]
+  end
+
+  subgraph validate [wise-validator]
+    WV[E2E tests health checks validation orchestration]
+  end
+
+  subgraph generate [snuggle-inventor]
+    SI[Code generation modernization]
+  end
+
+  GH -->|request sandbox| SB
+  WV -->|request sandbox| SB
+  SI -->|request sandbox| SB
+  WV -.->|runs tests in| SB
+  GH -.->|executes tools in| SB
+  SI -.->|validates output in| SB
+```
+
+| Project | Owns | Does not own |
+|---------|------|--------------|
+| **sand-boxer** | Sandbox profiles, provision/teardown, extension routing, placement, lifecycle registration, payments for sandbox consumption | Agent memory, channels, tool policies, test definitions, code generation |
+| **glas-harness** | Agent gateway, harness, skills, subagents, tool orchestration, channel bridges | Sandbox runtime, isolation enforcement, host placement |
+| **wise-validator** | E2E test orchestration, health check semantics, validation workflows, result reporting | Sandbox provisioning, agent conversation state |
+| **snuggle-inventor** | Code generation, tech specs, AAP-style planning, PR-oriented output | Sandbox infrastructure, test harness canon |
+
+### Integration contracts (intended)
+
+**glas-harness → sand-boxer**
+
+```
+POST /v1/sandboxes
+  profile: "profile.agent-dev"
+  scope: session | agent | shared
+  workspace: { mode: mirror | remote, access: none | ro | rw }
+  consumer: { actor: agt, harness: glas-harness, session_id }
+```
+
+Harness receives: `sandbox_id`, reachability descriptor (SSH endpoint, tunnel ref),
+lifecycle webhook or poll URL. Harness executes tools **inside** sandbox via
+agreed exec channel — sand-boxer does not parse tool calls.
+
+**wise-validator → sand-boxer**
+
+```
+POST /v1/sandboxes
+  profile: "profile.compose-e2e"
+  inputs: { repo_ref, compose_bundle_ref }
+  ttl: 2h
+  consumer: { actor: atm, harness: wise-validator, run_id }
+```
+
+wise-validator owns `e2e.yml` semantics, health check definitions, test commands,
+and pass/fail interpretation. sand-boxer delivers an environment; wise-validator
+runs the validation story **on top**.
+
+**snuggle-inventor → sand-boxer**
+
+```
+POST /v1/sandboxes
+  profile: "profile.build"
+  setup_metadata: { instructions_ref, secret_refs }
+  consumer: { actor: agt, harness: snuggle-inventor, job_id }
+```
+
+snuggle-inventor may attach Blitzy-style setup instructions as profile inputs.
+sand-boxer resolves secrets at boundary; generated code never flows through
+sand-boxer APIs.
+
+### Migration from the-custodian
+
+| Legacy | New owner |
+|--------|-----------|
+| `e2e-framework/` provision/teardown | sand-boxer `ext.compose-ssh` |
+| `e2e-framework/` test run + report | wise-validator (calls sand-boxer) |
+| Agent tool sandbox config | glas-harness (calls sand-boxer) |
+| `infra/build-machines/` | sand-boxer `ext.vm-packer` |
+
+---
+
+## Meta-framework API (conceptual)
+
+### Resources
+
+| Resource | Description |
+|----------|-------------|
+| `Profile` | Named, versioned sandbox recipe (image, isolation, network, TTL, extension) |
+| `Extension` | Backend adapter (self-hosted or SaaS) |
+| `Host` | Registered placement target for self-hosted extensions |
+| `Sandbox` | Running instance of a profile |
+| `Snapshot` | Point-in-time workspace checkpoint (optional) |
+| `Route` | Extension selection policy (cost, latency, capability) |
+| `Meter` | Usage record for payments layer |
+
+### Sandbox lifecycle states
+
+```
+requested → provisioning → ready → active → { expired | failed } → destroying → destroyed
+```
+
+All transitions emit State Hub events. `ready` means reachability probe succeeded.
+
+### Core operations
+
+| Operation | Description |
+|-----------|-------------|
+| `create` | Provision from profile + inputs |
+| `get` / `list` | Inspect status |
+| `exec` | Run command in sandbox (optional — may be harness-owned) |
+| `extend_ttl` | Explicit persistence extension |
+| `snapshot` / `restore` | Checkpoint workspace |
+| `recreate` | Destroy and reprovision from seed |
+| `destroy` | Idempotent teardown |
+
+Early versions may expose only `create`, `get`, `destroy`, `recreate`; harnesses
+can own `exec` via SSH/tunnel without sand-boxer proxying every command.
+
+### Profile schema (minimum)
+
+```yaml
+id: profile.compose-e2e
+version: "1.0.0"
+extension: ext.compose-ssh
+isolation:
+  level: container          # container | microvm | policy
+network:
+  default: deny
+  egress: []                # extension interprets
+workspace:
+  mode: remote-canonical    # mirror | remote-canonical
+  access: rw
+scope_default: session
+ttl:
+  default: 4h
+  max: 24h
+  idle_reap: null
+resources:
+  cpu: null
+  memory_mb: null
+setup:
+  instructions: ""          # Blitzy-style natural language for extension bootstrap
+  secret_refs: []           # resolved at provision; never in agent context
+placement:
+  prefer: [sandboxer01]
+  fallback: [coulombcore]
+reachability:
+  tunnel: ops-bridge
+  identity: ops-warden
+metadata:
+  cost_class: self-hosted   # self-hosted | saas-metered
+  latency_class: standard
+```
+
+### Extension interface (contract)
+
+Each extension implements:
+
+```text
+provision(profile, inputs, placement) → sandbox_handle
+wait_ready(sandbox_handle) → reachability
+teardown(sandbox_handle) → cleanup_report
+snapshot?(sandbox_handle) → snapshot_id
+restore?(snapshot_id) → sandbox_handle
+estimate_cost?(profile, duration) → meter_quote
+```
+
+Extensions register in `registry/` with capability vectors (isolation level,
+regions, GPU, persistence, pricing model).
+
+**Bundled extensions (roadmap):**
+
+| Priority | Extension | Type |
+|----------|-----------|------|
+| P0 | `ext.compose-ssh` | Self-hosted (e2e-framework lineage) |
+| P1 | `ext.vm-packer` | Self-hosted (build-machines lineage) |
+| P2 | `ext.daytona-self` | Self-hosted OSS |
+| P3 | `ext.e2b`, `ext.modal`, `ext.daytona` | SaaS + payments |
+| P4 | `ext.openshell` | Policy runtime wrapper |
+
+---
+
+## Payments layer
+
+For SaaS extensions, sand-boxer provides an **integrated payments and metering
+layer** analogous to OpenRouter credits:
+
+| Concern | sand-boxer approach |
+|---------|---------------------|
+| Account credits | Org/workspace balance for sandbox consumption |
+| Metering | Per-second, per-creation, GPU surcharge — per extension quote |
+| Provider keys | BYOK optional; platform keys for convenience |
+| Cost visibility | `estimate_cost` before create; actuals on destroy |
+| Billing events | Export to fin-hub / external billing (consumer, not owner) |
+
+Self-hosted extensions bill **infra cost only** (host allocation) — no SaaS meter.
+
+Payments is a **facility inside sand-boxer**, not a general payment processor.
+Domain billing authority remains elsewhere.
+
+---
+
+## Routing policy (OpenRouter-style)
+
+When multiple extensions satisfy a profile capability:
+
+```yaml
+route:
+  strategy: prefer-self-hosted | lowest-cost | lowest-latency | explicit
+  fallback: [ext.compose-ssh, ext.daytona]
+  constraints:
+    max_cost_per_hour: null
+    require_isolation: microvm
+    region: eu
+```
+
+Default Coulomb posture: **prefer-self-hosted** on sandboxer01; SaaS for burst
+or capability gaps (GPU, desktop) once extensions exist.
+
+---
+
+## Security posture (documented limits)
+
+sand-boxer commits to:
+
+1. Default-deny network unless profile explicitly allows egress
+2. Secrets resolved at provision boundary via ops-warden / secret refs
+3. Blast-radius isolation on dedicated hosts away from Railiance01 production
+4. Observable lifecycle and attributable actors (`adm` / `agt` / `atm`)
+5. Honest documentation: **allowed tool paths can be abused by compromised agents**
+
+sand-boxer does **not** commit to intent-aware egress filtering in v1.
+
+---
+
+## Phased maturity
+
+| Phase | Deliverable |
+|-------|-------------|
+| **0** | Charter, research, profile schema, `ext.compose-ssh` design |
+| **1** | Unified API + self-hosted compose-ssh + State Hub registration |
+| **2** | Extension SDK + vm-packer + registry entries + routing |
+| **3** | SaaS extensions + payments layer |
+| **4** | Snapshot/restore + checkpoint profiles |
+| **5** | Coulomb-native runtime ("best of brands") informed by extension ops data |
+
+Phase 5 is explicitly **later** — learn from routing, billing, failure modes, and
+latency before building owned microVM/control-plane.
+
+---
+
+## Open questions (for workplans)
+
+1. Does `exec` live in sand-boxer API or only in glas-harness via SSH?
+2. Payments: integrate with existing fin-hub or standalone credits first?
+3. Profile authorship: repo-local YAML vs hub-managed catalog?
+4. wise-validator: fork e2e-framework reporter or new contract from day one?
+
+These belong in SAND-WP-0002+ design workplans, not INTENT.md.