Rewrite INTENT.md as the sand-boxer meta-framework charter (OpenRouter-style sandbox API, extensions, payments, Coulomb sibling boundaries). Add research under research/, update SCOPE.md, bootstrap workplans SAND-WP-0001/0002, and State Hub integration files from the bootstrap pass.
7.2 KiB
Agent sandbox landscape (2026)
Survey of modern sandbox infrastructure for agentic coding — isolation technologies, provider models, and industry convergence patterns relevant to sand-boxer.
Market definition
AI agent sandboxes are isolated execution environments for running AI-generated or agent-requested code safely. They optimize for:
- Fast create / resume / teardown
- Programmatic lifecycle APIs
- Isolation from host and peer workloads
- Developer- and agent-friendly SDKs
This is distinct from general application hosting and from agent harnesses (memory, channels, tool orchestration).
Provider landscape (summary)
| Platform | Model | Creation | Persist / checkpoint | Isolation | Notes |
|---|---|---|---|---|---|
| E2B | Managed SaaS | ~150ms | Pause/resume, snapshots | Firecracker | Scale leader; template + sandbox API |
| Daytona | Managed + OSS | ~90ms | Snapshots, fork | Docker/Kata | Open-source self-host path |
| Modal | Serverless SaaS | Sub-second | Memory snapshots, volumes | gVisor | Strong GPU; code-defined runtime |
| Blaxel | Managed | Sub-25ms resume | Hibernate | microVM | Zero idle compute billing |
| Vercel Sandbox | Managed | ms | Snapshots, persistent default | Firecracker | Vercel ecosystem |
| Cloudflare Sandbox SDK | Edge | seconds / ms (isolates) | DO state | Containers / V8 | Workers-native |
| AWS AgentCore | Managed sessions | — | Session ≤8h | microVM | Hyperscaler bundling |
| Google Agent Sandbox | Managed preview | Sub-second | TTL ≤14d | Hardened containers | Gemini Enterprise layer |
| OpenSandbox | Self-hosted OSS | Pool pre-warm | Pause/resume, PVC | gVisor/Kata/Firecracker | K8s-scale; CNCF Landscape |
| OpenShell | Policy runtime | — | Long-lived sandboxes | Landlock/seccomp/OPA | Governance layer, not hosted platform |
| Northflank | BYOC + managed | ~200ms | Persistent | microVM/gVisor | VPC deployment |
| Runloop | Managed | ~100ms exec | Snapshot, branch | Custom hypervisor | SWE-bench / eval focus |
| Sprites | Managed | 1–2s | ~300ms checkpoints | Firecracker | Persistent-first |
| ComputeSDK | Abstraction | Varies | Varies | Varies | Multi-provider router (9 backends) |
Sources: Ry Walker research (Jun 2026), provider docs, Modal/E2B marketing materials. Treat vendor claims as directional.
Isolation technology spectrum
| Technology | Used by | Security level | Performance |
|---|---|---|---|
| Firecracker | E2B, Sprites, Vercel | Hardware-level microVM | Fast |
| gVisor / Kata | Modal, Northflank, OpenSandbox | Kernel-level | Very fast |
| Hardened Docker | Daytona, AIO Sandbox | Container-level | Fastest setup |
| Landlock / seccomp / OPA | OpenShell | Kernel policy | Native speed |
| V8 isolates | Cloudflare Worker Loader | Process-level | Milliseconds |
Implication for sand-boxer: profile metadata must declare isolation_level
so consumers can reason about blast radius. Extensions map profiles to concrete
runtimes; the meta-framework does not mandate one technology.
Convergence trends (2025 → 2026)
1. Ephemeral vs persistent collapsed
Early market split (E2B = ephemeral, Sprites = persistent) has merged. Most platforms now offer:
- Persistent workspace by default or as first-class option
- Checkpoint / snapshot / hibernate for fast resume
- TTL and explicit teardown still expected for cost and security
sand-boxer takeaway: profiles should support persistence: ephemeral | persistent | checkpoint as a first-class dimension, not a backend detail.
2. Checkpointing is table stakes
Sub-second to low-second restore times are becoming baseline for agent coding (workspace state, installed deps, shell history — not always live PIDs).
sand-boxer takeaway: lifecycle API needs snapshot, restore, fork
operations even if early extensions only implement recreate.
3. Security stress-tests exposed limits
Research on AWS AgentCore and OpenShell/NemoClaw showed that allowed egress paths (git, npm, curl, node to allowlisted hosts) can be weaponized for exfiltration when agents are prompt-injected or tricked into malicious dependencies. Policy controls destination, not intent.
sand-boxer takeaway: document honestly that sandboxing is blast-radius control, not agent-behavior guarantee. Default-deny network; per-profile egress allowlists; secrets injected at boundary, never in agent-visible workspace.
4. Hyperscaler bundling pressures independents
AWS, Google, Cloudflare, Vercel entered the category in one quarter. Independents compete on multi-cloud neutrality, price, isolation depth, or open-source self-host.
sand-boxer takeaway: OpenRouter-style routing across self-hosted and SaaS backends is a defensible Coulomb position — no single-vendor lock-in.
5. Abstraction layers emerging
ComputeSDK routes one TypeScript API across E2B, Modal, Daytona, Runloop, Cloudflare, Vercel, etc. — "Terraform for running other people's code."
sand-boxer takeaway: validate the meta-framework API against this pattern; extensions are providers; sand-boxer core is router + policy + billing + registry.
Architecture patterns (industry)
Gateway / harness vs runtime (universal split)
[Agent gateway / harness] ──orchestrates──▶ [Sandbox runtime]
(host or control plane) (isolated)
OpenClaw and Hermes both keep the gateway on the host and run tool execution
in the sandbox. sand-boxer owns the runtime side only; glas-harness owns the
gateway/harness side (see 03-meta-framework-synthesis.md).
Profile + backend + scope (OpenClaw / Hermes consensus)
| Dimension | Examples |
|---|---|
| Backend | docker, ssh, openshell, modal, daytona, compose-ssh |
| Scope | per-agent, per-session, shared |
| Workspace | isolated, ro-mount, rw-mount; mirror vs remote-canonical |
| Network | default deny; optional allowlist |
| TTL | mandatory; idle reaper optional |
Credential and reachability boundary
Best practice: credentials brokered at sandbox edge (OpenShell proxy, Blitzy secrets-never-to-AI, ops-warden certs). Agent process never holds production tokens for unrelated systems.
sand-boxer integrates ops-bridge (reachability) and ops-warden (identity) as consumers — does not replace them.
What sand-boxer should adopt vs defer
| Adopt now (meta-framework) | Defer (extension or phase 2) |
|---|---|
| Unified provision/teardown API | GPU profiles |
| Named versioned profiles | Browser sandbox profiles |
| Extension plugin interface | Intent-aware egress filtering |
| Self-hosted compose-ssh (e2e lineage) | Native Firecracker control plane |
| State Hub lifecycle registration | Multi-region routing |
| Default-deny network policy | Computer Use / desktop sandboxes |
| Payments routing for SaaS backends | Owned hyperscale sandbox fleet |
Related reading
- 02-reference-frameworks.md — OpenClaw, Hermes, Blitzy
- 03-meta-framework-synthesis.md — API and extensions