Files
sand-boxer/research/01-agent-sandbox-landscape.md
tegwick f33cff5363 docs: charter meta-framework vision, research, and SAND-WP-0002
Rewrite INTENT.md as the sand-boxer meta-framework charter (OpenRouter-style
sandbox API, extensions, payments, Coulomb sibling boundaries). Add research
under research/, update SCOPE.md, bootstrap workplans SAND-WP-0001/0002, and
State Hub integration files from the bootstrap pass.
2026-06-22 21:32:32 +02:00

153 lines
7.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Agent sandbox landscape (2026)
Survey of modern sandbox infrastructure for agentic coding — isolation
technologies, provider models, and industry convergence patterns relevant to
sand-boxer.
## Market definition
**AI agent sandboxes** are isolated execution environments for running
AI-generated or agent-requested code safely. They optimize for:
- Fast create / resume / teardown
- Programmatic lifecycle APIs
- Isolation from host and peer workloads
- Developer- and agent-friendly SDKs
This is distinct from general application hosting and from agent harnesses
(memory, channels, tool orchestration).
## Provider landscape (summary)
| Platform | Model | Creation | Persist / checkpoint | Isolation | Notes |
|----------|-------|----------|----------------------|-----------|-------|
| **E2B** | Managed SaaS | ~150ms | Pause/resume, snapshots | Firecracker | Scale leader; template + sandbox API |
| **Daytona** | Managed + OSS | ~90ms | Snapshots, fork | Docker/Kata | Open-source self-host path |
| **Modal** | Serverless SaaS | Sub-second | Memory snapshots, volumes | gVisor | Strong GPU; code-defined runtime |
| **Blaxel** | Managed | Sub-25ms resume | Hibernate | microVM | Zero idle compute billing |
| **Vercel Sandbox** | Managed | ms | Snapshots, persistent default | Firecracker | Vercel ecosystem |
| **Cloudflare Sandbox SDK** | Edge | seconds / ms (isolates) | DO state | Containers / V8 | Workers-native |
| **AWS AgentCore** | Managed sessions | — | Session ≤8h | microVM | Hyperscaler bundling |
| **Google Agent Sandbox** | Managed preview | Sub-second | TTL ≤14d | Hardened containers | Gemini Enterprise layer |
| **OpenSandbox** | Self-hosted OSS | Pool pre-warm | Pause/resume, PVC | gVisor/Kata/Firecracker | K8s-scale; CNCF Landscape |
| **OpenShell** | Policy runtime | — | Long-lived sandboxes | Landlock/seccomp/OPA | Governance layer, not hosted platform |
| **Northflank** | BYOC + managed | ~200ms | Persistent | microVM/gVisor | VPC deployment |
| **Runloop** | Managed | ~100ms exec | Snapshot, branch | Custom hypervisor | SWE-bench / eval focus |
| **Sprites** | Managed | 12s | ~300ms checkpoints | Firecracker | Persistent-first |
| **ComputeSDK** | Abstraction | Varies | Varies | Varies | Multi-provider router (9 backends) |
Sources: [Ry Walker research (Jun 2026)](https://rywalker.com/research/ai-agent-sandboxes),
provider docs, Modal/E2B marketing materials. Treat vendor claims as directional.
## Isolation technology spectrum
| Technology | Used by | Security level | Performance |
|------------|---------|----------------|-------------|
| **Firecracker** | E2B, Sprites, Vercel | Hardware-level microVM | Fast |
| **gVisor / Kata** | Modal, Northflank, OpenSandbox | Kernel-level | Very fast |
| **Hardened Docker** | Daytona, AIO Sandbox | Container-level | Fastest setup |
| **Landlock / seccomp / OPA** | OpenShell | Kernel policy | Native speed |
| **V8 isolates** | Cloudflare Worker Loader | Process-level | Milliseconds |
**Implication for sand-boxer:** profile metadata must declare `isolation_level`
so consumers can reason about blast radius. Extensions map profiles to concrete
runtimes; the meta-framework does not mandate one technology.
## Convergence trends (2025 → 2026)
### 1. Ephemeral vs persistent collapsed
Early market split (E2B = ephemeral, Sprites = persistent) has merged. Most
platforms now offer:
- Persistent workspace by default or as first-class option
- Checkpoint / snapshot / hibernate for fast resume
- TTL and explicit teardown still expected for cost and security
**sand-boxer takeaway:** profiles should support `persistence: ephemeral |
persistent | checkpoint` as a first-class dimension, not a backend detail.
### 2. Checkpointing is table stakes
Sub-second to low-second restore times are becoming baseline for agent coding
(workspace state, installed deps, shell history — not always live PIDs).
**sand-boxer takeaway:** lifecycle API needs `snapshot`, `restore`, `fork`
operations even if early extensions only implement `recreate`.
### 3. Security stress-tests exposed limits
Research on AWS AgentCore and OpenShell/NemoClaw showed that **allowed egress
paths** (git, npm, curl, node to allowlisted hosts) can be weaponized for
exfiltration when agents are prompt-injected or tricked into malicious
dependencies. Policy controls *destination*, not *intent*.
**sand-boxer takeaway:** document honestly that sandboxing is blast-radius
control, not agent-behavior guarantee. Default-deny network; per-profile egress
allowlists; secrets injected at boundary, never in agent-visible workspace.
### 4. Hyperscaler bundling pressures independents
AWS, Google, Cloudflare, Vercel entered the category in one quarter.
Independents compete on multi-cloud neutrality, price, isolation depth, or
open-source self-host.
**sand-boxer takeaway:** OpenRouter-style routing across self-hosted and SaaS
backends is a defensible Coulomb position — no single-vendor lock-in.
### 5. Abstraction layers emerging
ComputeSDK routes one TypeScript API across E2B, Modal, Daytona, Runloop,
Cloudflare, Vercel, etc. — "Terraform for running other people's code."
**sand-boxer takeaway:** validate the meta-framework API against this pattern;
extensions are providers; sand-boxer core is router + policy + billing + registry.
## Architecture patterns (industry)
### Gateway / harness vs runtime (universal split)
```
[Agent gateway / harness] ──orchestrates──▶ [Sandbox runtime]
(host or control plane) (isolated)
```
OpenClaw and Hermes both keep the gateway on the host and run **tool execution**
in the sandbox. sand-boxer owns the runtime side only; **glas-harness** owns the
gateway/harness side (see `03-meta-framework-synthesis.md`).
### Profile + backend + scope (OpenClaw / Hermes consensus)
| Dimension | Examples |
|-----------|----------|
| **Backend** | docker, ssh, openshell, modal, daytona, compose-ssh |
| **Scope** | per-agent, per-session, shared |
| **Workspace** | isolated, ro-mount, rw-mount; mirror vs remote-canonical |
| **Network** | default deny; optional allowlist |
| **TTL** | mandatory; idle reaper optional |
### Credential and reachability boundary
Best practice: credentials brokered at sandbox edge (OpenShell proxy, Blitzy
secrets-never-to-AI, ops-warden certs). Agent process never holds production
tokens for unrelated systems.
sand-boxer integrates **ops-bridge** (reachability) and **ops-warden**
(identity) as consumers — does not replace them.
## What sand-boxer should adopt vs defer
| Adopt now (meta-framework) | Defer (extension or phase 2) |
|----------------------------|------------------------------|
| Unified provision/teardown API | GPU profiles |
| Named versioned profiles | Browser sandbox profiles |
| Extension plugin interface | Intent-aware egress filtering |
| Self-hosted compose-ssh (e2e lineage) | Native Firecracker control plane |
| State Hub lifecycle registration | Multi-region routing |
| Default-deny network policy | Computer Use / desktop sandboxes |
| Payments routing for SaaS backends | Owned hyperscale sandbox fleet |
## Related reading
- [02-reference-frameworks.md](02-reference-frameworks.md) — OpenClaw, Hermes, Blitzy
- [03-meta-framework-synthesis.md](03-meta-framework-synthesis.md) — API and extensions