Files

tegwick f33cff5363 docs: charter meta-framework vision, research, and SAND-WP-0002

Rewrite INTENT.md as the sand-boxer meta-framework charter (OpenRouter-style
sandbox API, extensions, payments, Coulomb sibling boundaries). Add research
under research/, update SCOPE.md, bootstrap workplans SAND-WP-0001/0002, and
State Hub integration files from the bootstrap pass.

2026-06-22 21:32:32 +02:00

7.2 KiB

Raw Blame History

Agent sandbox landscape (2026)

Survey of modern sandbox infrastructure for agentic coding — isolation technologies, provider models, and industry convergence patterns relevant to sand-boxer.

Market definition

AI agent sandboxes are isolated execution environments for running AI-generated or agent-requested code safely. They optimize for:

Fast create / resume / teardown
Programmatic lifecycle APIs
Isolation from host and peer workloads
Developer- and agent-friendly SDKs

This is distinct from general application hosting and from agent harnesses (memory, channels, tool orchestration).

Provider landscape (summary)

Platform	Model	Creation	Persist / checkpoint	Isolation	Notes
E2B	Managed SaaS	~150ms	Pause/resume, snapshots	Firecracker	Scale leader; template + sandbox API
Daytona	Managed + OSS	~90ms	Snapshots, fork	Docker/Kata	Open-source self-host path
Modal	Serverless SaaS	Sub-second	Memory snapshots, volumes	gVisor	Strong GPU; code-defined runtime
Blaxel	Managed	Sub-25ms resume	Hibernate	microVM	Zero idle compute billing
Vercel Sandbox	Managed	ms	Snapshots, persistent default	Firecracker	Vercel ecosystem
Cloudflare Sandbox SDK	Edge	seconds / ms (isolates)	DO state	Containers / V8	Workers-native
AWS AgentCore	Managed sessions	—	Session ≤8h	microVM	Hyperscaler bundling
Google Agent Sandbox	Managed preview	Sub-second	TTL ≤14d	Hardened containers	Gemini Enterprise layer
OpenSandbox	Self-hosted OSS	Pool pre-warm	Pause/resume, PVC	gVisor/Kata/Firecracker	K8s-scale; CNCF Landscape
OpenShell	Policy runtime	—	Long-lived sandboxes	Landlock/seccomp/OPA	Governance layer, not hosted platform
Northflank	BYOC + managed	~200ms	Persistent	microVM/gVisor	VPC deployment
Runloop	Managed	~100ms exec	Snapshot, branch	Custom hypervisor	SWE-bench / eval focus
Sprites	Managed	1–2s	~300ms checkpoints	Firecracker	Persistent-first
ComputeSDK	Abstraction	Varies	Varies	Varies	Multi-provider router (9 backends)

Sources: Ry Walker research (Jun 2026), provider docs, Modal/E2B marketing materials. Treat vendor claims as directional.

Isolation technology spectrum

Technology	Used by	Security level	Performance
Firecracker	E2B, Sprites, Vercel	Hardware-level microVM	Fast
gVisor / Kata	Modal, Northflank, OpenSandbox	Kernel-level	Very fast
Hardened Docker	Daytona, AIO Sandbox	Container-level	Fastest setup
Landlock / seccomp / OPA	OpenShell	Kernel policy	Native speed
V8 isolates	Cloudflare Worker Loader	Process-level	Milliseconds

Implication for sand-boxer: profile metadata must declare isolation_level so consumers can reason about blast radius. Extensions map profiles to concrete runtimes; the meta-framework does not mandate one technology.

Convergence trends (2025 → 2026)

1. Ephemeral vs persistent collapsed

Early market split (E2B = ephemeral, Sprites = persistent) has merged. Most platforms now offer:

Persistent workspace by default or as first-class option
Checkpoint / snapshot / hibernate for fast resume
TTL and explicit teardown still expected for cost and security

sand-boxer takeaway: profiles should support persistence: ephemeral | persistent | checkpoint as a first-class dimension, not a backend detail.

2. Checkpointing is table stakes

Sub-second to low-second restore times are becoming baseline for agent coding (workspace state, installed deps, shell history — not always live PIDs).

sand-boxer takeaway: lifecycle API needs snapshot, restore, fork operations even if early extensions only implement recreate.

3. Security stress-tests exposed limits

Research on AWS AgentCore and OpenShell/NemoClaw showed that allowed egress paths (git, npm, curl, node to allowlisted hosts) can be weaponized for exfiltration when agents are prompt-injected or tricked into malicious dependencies. Policy controls destination, not intent.

sand-boxer takeaway: document honestly that sandboxing is blast-radius control, not agent-behavior guarantee. Default-deny network; per-profile egress allowlists; secrets injected at boundary, never in agent-visible workspace.

4. Hyperscaler bundling pressures independents

AWS, Google, Cloudflare, Vercel entered the category in one quarter. Independents compete on multi-cloud neutrality, price, isolation depth, or open-source self-host.

sand-boxer takeaway: OpenRouter-style routing across self-hosted and SaaS backends is a defensible Coulomb position — no single-vendor lock-in.

5. Abstraction layers emerging

ComputeSDK routes one TypeScript API across E2B, Modal, Daytona, Runloop, Cloudflare, Vercel, etc. — "Terraform for running other people's code."

sand-boxer takeaway: validate the meta-framework API against this pattern; extensions are providers; sand-boxer core is router + policy + billing + registry.

Architecture patterns (industry)

Gateway / harness vs runtime (universal split)

[Agent gateway / harness]  ──orchestrates──▶  [Sandbox runtime]
      (host or control plane)                    (isolated)

OpenClaw and Hermes both keep the gateway on the host and run tool execution in the sandbox. sand-boxer owns the runtime side only; glas-harness owns the gateway/harness side (see 03-meta-framework-synthesis.md).

Profile + backend + scope (OpenClaw / Hermes consensus)

Dimension	Examples
Backend	docker, ssh, openshell, modal, daytona, compose-ssh
Scope	per-agent, per-session, shared
Workspace	isolated, ro-mount, rw-mount; mirror vs remote-canonical
Network	default deny; optional allowlist
TTL	mandatory; idle reaper optional

Credential and reachability boundary

Best practice: credentials brokered at sandbox edge (OpenShell proxy, Blitzy secrets-never-to-AI, ops-warden certs). Agent process never holds production tokens for unrelated systems.

sand-boxer integrates ops-bridge (reachability) and ops-warden (identity) as consumers — does not replace them.

What sand-boxer should adopt vs defer

Adopt now (meta-framework)	Defer (extension or phase 2)
Unified provision/teardown API	GPU profiles
Named versioned profiles	Browser sandbox profiles
Extension plugin interface	Intent-aware egress filtering
Self-hosted compose-ssh (e2e lineage)	Native Firecracker control plane
State Hub lifecycle registration	Multi-region routing
Default-deny network policy	Computer Use / desktop sandboxes
Payments routing for SaaS backends	Owned hyperscale sandbox fleet

02-reference-frameworks.md — OpenClaw, Hermes, Blitzy
03-meta-framework-synthesis.md — API and extensions

7.2 KiB Raw Blame History Unescape Escape