sand-boxer/research/01-agent-sandbox-landscape.md

# Agent sandbox landscape (2026)

Survey of modern sandbox infrastructure for agentic coding — isolation
technologies, provider models, and industry convergence patterns relevant to
sand-boxer.

## Market definition

**AI agent sandboxes** are isolated execution environments for running
AI-generated or agent-requested code safely. They optimize for:

- Fast create / resume / teardown
- Programmatic lifecycle APIs
- Isolation from host and peer workloads
- Developer- and agent-friendly SDKs

This is distinct from general application hosting and from agent harnesses
(memory, channels, tool orchestration).

## Provider landscape (summary)

| Platform | Model | Creation | Persist / checkpoint | Isolation | Notes |
|----------|-------|----------|----------------------|-----------|-------|
| **E2B** | Managed SaaS | ~150ms | Pause/resume, snapshots | Firecracker | Scale leader; template + sandbox API |
| **Daytona** | Managed + OSS | ~90ms | Snapshots, fork | Docker/Kata | Open-source self-host path |
| **Modal** | Serverless SaaS | Sub-second | Memory snapshots, volumes | gVisor | Strong GPU; code-defined runtime |
| **Blaxel** | Managed | Sub-25ms resume | Hibernate | microVM | Zero idle compute billing |
| **Vercel Sandbox** | Managed | ms | Snapshots, persistent default | Firecracker | Vercel ecosystem |
| **Cloudflare Sandbox SDK** | Edge | seconds / ms (isolates) | DO state | Containers / V8 | Workers-native |
| **AWS AgentCore** | Managed sessions | — | Session ≤8h | microVM | Hyperscaler bundling |
| **Google Agent Sandbox** | Managed preview | Sub-second | TTL ≤14d | Hardened containers | Gemini Enterprise layer |
| **OpenSandbox** | Self-hosted OSS | Pool pre-warm | Pause/resume, PVC | gVisor/Kata/Firecracker | K8s-scale; CNCF Landscape |
| **OpenShell** | Policy runtime | — | Long-lived sandboxes | Landlock/seccomp/OPA | Governance layer, not hosted platform |
| **Northflank** | BYOC + managed | ~200ms | Persistent | microVM/gVisor | VPC deployment |
| **Runloop** | Managed | ~100ms exec | Snapshot, branch | Custom hypervisor | SWE-bench / eval focus |
| **Sprites** | Managed | 1–2s | ~300ms checkpoints | Firecracker | Persistent-first |
| **ComputeSDK** | Abstraction | Varies | Varies | Varies | Multi-provider router (9 backends) |

Sources: [Ry Walker research (Jun 2026)](https://rywalker.com/research/ai-agent-sandboxes),
provider docs, Modal/E2B marketing materials. Treat vendor claims as directional.

## Isolation technology spectrum

| Technology | Used by | Security level | Performance |
|------------|---------|----------------|-------------|
| **Firecracker** | E2B, Sprites, Vercel | Hardware-level microVM | Fast |
| **gVisor / Kata** | Modal, Northflank, OpenSandbox | Kernel-level | Very fast |
| **Hardened Docker** | Daytona, AIO Sandbox | Container-level | Fastest setup |
| **Landlock / seccomp / OPA** | OpenShell | Kernel policy | Native speed |
| **V8 isolates** | Cloudflare Worker Loader | Process-level | Milliseconds |

**Implication for sand-boxer:** profile metadata must declare `isolation_level`
so consumers can reason about blast radius. Extensions map profiles to concrete
runtimes; the meta-framework does not mandate one technology.

## Convergence trends (2025 → 2026)

### 1. Ephemeral vs persistent collapsed

Early market split (E2B = ephemeral, Sprites = persistent) has merged. Most
platforms now offer:

- Persistent workspace by default or as first-class option
- Checkpoint / snapshot / hibernate for fast resume
- TTL and explicit teardown still expected for cost and security

**sand-boxer takeaway:** profiles should support `persistence: ephemeral |
persistent | checkpoint` as a first-class dimension, not a backend detail.

### 2. Checkpointing is table stakes

Sub-second to low-second restore times are becoming baseline for agent coding
(workspace state, installed deps, shell history — not always live PIDs).

**sand-boxer takeaway:** lifecycle API needs `snapshot`, `restore`, `fork`
operations even if early extensions only implement `recreate`.

### 3. Security stress-tests exposed limits

Research on AWS AgentCore and OpenShell/NemoClaw showed that **allowed egress
paths** (git, npm, curl, node to allowlisted hosts) can be weaponized for
exfiltration when agents are prompt-injected or tricked into malicious
dependencies. Policy controls *destination*, not *intent*.

**sand-boxer takeaway:** document honestly that sandboxing is blast-radius
control, not agent-behavior guarantee. Default-deny network; per-profile egress
allowlists; secrets injected at boundary, never in agent-visible workspace.

### 4. Hyperscaler bundling pressures independents

AWS, Google, Cloudflare, Vercel entered the category in one quarter.
Independents compete on multi-cloud neutrality, price, isolation depth, or
open-source self-host.

**sand-boxer takeaway:** OpenRouter-style routing across self-hosted and SaaS
backends is a defensible Coulomb position — no single-vendor lock-in.

### 5. Abstraction layers emerging

ComputeSDK routes one TypeScript API across E2B, Modal, Daytona, Runloop,
Cloudflare, Vercel, etc. — "Terraform for running other people's code."

**sand-boxer takeaway:** validate the meta-framework API against this pattern;
extensions are providers; sand-boxer core is router + policy + billing + registry.

## Architecture patterns (industry)

### Gateway / harness vs runtime (universal split)

```
[Agent gateway / harness]  ──orchestrates──▶  [Sandbox runtime]
      (host or control plane)                    (isolated)
```

OpenClaw and Hermes both keep the gateway on the host and run **tool execution**
in the sandbox. sand-boxer owns the runtime side only; **glas-harness** owns the
gateway/harness side (see `03-meta-framework-synthesis.md`).

### Profile + backend + scope (OpenClaw / Hermes consensus)

| Dimension | Examples |
|-----------|----------|
| **Backend** | docker, ssh, openshell, modal, daytona, compose-ssh |
| **Scope** | per-agent, per-session, shared |
| **Workspace** | isolated, ro-mount, rw-mount; mirror vs remote-canonical |
| **Network** | default deny; optional allowlist |
| **TTL** | mandatory; idle reaper optional |

### Credential and reachability boundary

Best practice: credentials brokered at sandbox edge (OpenShell proxy, Blitzy
secrets-never-to-AI, ops-warden certs). Agent process never holds production
tokens for unrelated systems.

sand-boxer integrates **ops-bridge** (reachability) and **ops-warden**
(identity) as consumers — does not replace them.

## What sand-boxer should adopt vs defer

| Adopt now (meta-framework) | Defer (extension or phase 2) |
|----------------------------|------------------------------|
| Unified provision/teardown API | GPU profiles |
| Named versioned profiles | Browser sandbox profiles |
| Extension plugin interface | Intent-aware egress filtering |
| Self-hosted compose-ssh (e2e lineage) | Native Firecracker control plane |
| State Hub lifecycle registration | Multi-region routing |
| Default-deny network policy | Computer Use / desktop sandboxes |
| Payments routing for SaaS backends | Owned hyperscale sandbox fleet |

## Related reading

- [02-reference-frameworks.md](02-reference-frameworks.md) — OpenClaw, Hermes, Blitzy
- [03-meta-framework-synthesis.md](03-meta-framework-synthesis.md) — API and extensions