Files

tegwick f33cff5363 docs: charter meta-framework vision, research, and SAND-WP-0002

Rewrite INTENT.md as the sand-boxer meta-framework charter (OpenRouter-style
sandbox API, extensions, payments, Coulomb sibling boundaries). Add research
under research/, update SCOPE.md, bootstrap workplans SAND-WP-0001/0002, and
State Hub integration files from the bootstrap pass.

2026-06-22 21:32:32 +02:00

9.6 KiB

Raw Blame History

Meta-framework synthesis

Design notes distilled from landscape research for sand-boxer's unified sandbox API, extension model, payments layer, and Coulomb project boundaries.

Core thesis

sand-boxer is a meta-framework for establishing sandboxes — like OpenRouter is a meta-framework for accessing LLM models:

One consistent API for consumers (adm, agt, atm, domain services)
Many extensions that delegate to self-hosted or SaaS sandbox systems
Integrated payments when consuming metered external services
Registry-first profiles and capabilities via reuse-surface
Later: a Coulomb-native "best of brands" runtime built from operational experience — not day one

sand-boxer provisions where and how code runs. It does not provision how agents think, what tests mean, or what code gets written.

Coulomb project boundaries

These sibling projects are planned Coulomb repos with explicit authority split. sand-boxer must not absorb their concerns.

flowchart LR
  subgraph establish [sand-boxer]
    SB[Establish sandbox]
  end

  subgraph harness [glas-harness]
    GH[Agent harness: gateway tools memory channels]
  end

  subgraph validate [wise-validator]
    WV[E2E tests health checks validation orchestration]
  end

  subgraph generate [snuggle-inventor]
    SI[Code generation modernization]
  end

  GH -->|request sandbox| SB
  WV -->|request sandbox| SB
  SI -->|request sandbox| SB
  WV -.->|runs tests in| SB
  GH -.->|executes tools in| SB
  SI -.->|validates output in| SB

Project	Owns	Does not own
sand-boxer	Sandbox profiles, provision/teardown, extension routing, placement, lifecycle registration, payments for sandbox consumption	Agent memory, channels, tool policies, test definitions, code generation
glas-harness	Agent gateway, harness, skills, subagents, tool orchestration, channel bridges	Sandbox runtime, isolation enforcement, host placement
wise-validator	E2E test orchestration, health check semantics, validation workflows, result reporting	Sandbox provisioning, agent conversation state
snuggle-inventor	Code generation, tech specs, AAP-style planning, PR-oriented output	Sandbox infrastructure, test harness canon

Integration contracts (intended)

glas-harness → sand-boxer

POST /v1/sandboxes
  profile: "profile.agent-dev"
  scope: session | agent | shared
  workspace: { mode: mirror | remote, access: none | ro | rw }
  consumer: { actor: agt, harness: glas-harness, session_id }

Harness receives: sandbox_id, reachability descriptor (SSH endpoint, tunnel ref), lifecycle webhook or poll URL. Harness executes tools inside sandbox via agreed exec channel — sand-boxer does not parse tool calls.

wise-validator → sand-boxer

POST /v1/sandboxes
  profile: "profile.compose-e2e"
  inputs: { repo_ref, compose_bundle_ref }
  ttl: 2h
  consumer: { actor: atm, harness: wise-validator, run_id }

wise-validator owns e2e.yml semantics, health check definitions, test commands, and pass/fail interpretation. sand-boxer delivers an environment; wise-validator runs the validation story on top.

snuggle-inventor → sand-boxer

POST /v1/sandboxes
  profile: "profile.build"
  setup_metadata: { instructions_ref, secret_refs }
  consumer: { actor: agt, harness: snuggle-inventor, job_id }

snuggle-inventor may attach Blitzy-style setup instructions as profile inputs. sand-boxer resolves secrets at boundary; generated code never flows through sand-boxer APIs.

Migration from the-custodian

Legacy	New owner
`e2e-framework/` provision/teardown	sand-boxer `ext.compose-ssh`
`e2e-framework/` test run + report	wise-validator (calls sand-boxer)
Agent tool sandbox config	glas-harness (calls sand-boxer)
`infra/build-machines/`	sand-boxer `ext.vm-packer`

Meta-framework API (conceptual)

Resources

Resource	Description
`Profile`	Named, versioned sandbox recipe (image, isolation, network, TTL, extension)
`Extension`	Backend adapter (self-hosted or SaaS)
`Host`	Registered placement target for self-hosted extensions
`Sandbox`	Running instance of a profile
`Snapshot`	Point-in-time workspace checkpoint (optional)
`Route`	Extension selection policy (cost, latency, capability)
`Meter`	Usage record for payments layer

Sandbox lifecycle states

requested → provisioning → ready → active → { expired | failed } → destroying → destroyed

All transitions emit State Hub events. ready means reachability probe succeeded.

Core operations

Operation	Description
`create`	Provision from profile + inputs
`get` / `list`	Inspect status
`exec`	Run command in sandbox (optional — may be harness-owned)
`extend_ttl`	Explicit persistence extension
`snapshot` / `restore`	Checkpoint workspace
`recreate`	Destroy and reprovision from seed
`destroy`	Idempotent teardown

Early versions may expose only create, get, destroy, recreate; harnesses can own exec via SSH/tunnel without sand-boxer proxying every command.

Profile schema (minimum)

id: profile.compose-e2e
version: "1.0.0"
extension: ext.compose-ssh
isolation:
  level: container          # container | microvm | policy
network:
  default: deny
  egress: []                # extension interprets
workspace:
  mode: remote-canonical    # mirror | remote-canonical
  access: rw
scope_default: session
ttl:
  default: 4h
  max: 24h
  idle_reap: null
resources:
  cpu: null
  memory_mb: null
setup:
  instructions: ""          # Blitzy-style natural language for extension bootstrap
  secret_refs: []           # resolved at provision; never in agent context
placement:
  prefer: [sandboxer01]
  fallback: [coulombcore]
reachability:
  tunnel: ops-bridge
  identity: ops-warden
metadata:
  cost_class: self-hosted   # self-hosted | saas-metered
  latency_class: standard

Extension interface (contract)

Each extension implements:

provision(profile, inputs, placement) → sandbox_handle
wait_ready(sandbox_handle) → reachability
teardown(sandbox_handle) → cleanup_report
snapshot?(sandbox_handle) → snapshot_id
restore?(snapshot_id) → sandbox_handle
estimate_cost?(profile, duration) → meter_quote

Extensions register in registry/ with capability vectors (isolation level, regions, GPU, persistence, pricing model).

Bundled extensions (roadmap):

Priority	Extension	Type
P0	`ext.compose-ssh`	Self-hosted (e2e-framework lineage)
P1	`ext.vm-packer`	Self-hosted (build-machines lineage)
P2	`ext.daytona-self`	Self-hosted OSS
P3	`ext.e2b`, `ext.modal`, `ext.daytona`	SaaS + payments
P4	`ext.openshell`	Policy runtime wrapper

Payments layer

For SaaS extensions, sand-boxer provides an integrated payments and metering layer analogous to OpenRouter credits:

Concern	sand-boxer approach
Account credits	Org/workspace balance for sandbox consumption
Metering	Per-second, per-creation, GPU surcharge — per extension quote
Provider keys	BYOK optional; platform keys for convenience
Cost visibility	`estimate_cost` before create; actuals on destroy
Billing events	Export to fin-hub / external billing (consumer, not owner)

Self-hosted extensions bill infra cost only (host allocation) — no SaaS meter.

Payments is a facility inside sand-boxer, not a general payment processor. Domain billing authority remains elsewhere.

Routing policy (OpenRouter-style)

When multiple extensions satisfy a profile capability:

route:
  strategy: prefer-self-hosted | lowest-cost | lowest-latency | explicit
  fallback: [ext.compose-ssh, ext.daytona]
  constraints:
    max_cost_per_hour: null
    require_isolation: microvm
    region: eu

Default Coulomb posture: prefer-self-hosted on sandboxer01; SaaS for burst or capability gaps (GPU, desktop) once extensions exist.

Security posture (documented limits)

sand-boxer commits to:

Default-deny network unless profile explicitly allows egress
Secrets resolved at provision boundary via ops-warden / secret refs
Blast-radius isolation on dedicated hosts away from Railiance01 production
Observable lifecycle and attributable actors (adm / agt / atm)
Honest documentation: allowed tool paths can be abused by compromised agents

sand-boxer does not commit to intent-aware egress filtering in v1.

Phased maturity

Phase	Deliverable
0	Charter, research, profile schema, `ext.compose-ssh` design
1	Unified API + self-hosted compose-ssh + State Hub registration
2	Extension SDK + vm-packer + registry entries + routing
3	SaaS extensions + payments layer
4	Snapshot/restore + checkpoint profiles
5	Coulomb-native runtime ("best of brands") informed by extension ops data

Phase 5 is explicitly later — learn from routing, billing, failure modes, and latency before building owned microVM/control-plane.

Open questions (for workplans)

Does exec live in sand-boxer API or only in glas-harness via SSH?
Payments: integrate with existing fin-hub or standalone credits first?
Profile authorship: repo-local YAML vs hub-managed catalog?
wise-validator: fork e2e-framework reporter or new contract from day one?

These belong in SAND-WP-0002+ design workplans, not INTENT.md.

9.6 KiB Raw Blame History