generated from coulomb/repo-seed
Rewrite INTENT.md as the sand-boxer meta-framework charter (OpenRouter-style sandbox API, extensions, payments, Coulomb sibling boundaries). Add research under research/, update SCOPE.md, bootstrap workplans SAND-WP-0001/0002, and State Hub integration files from the bootstrap pass.
153 lines
7.2 KiB
Markdown
153 lines
7.2 KiB
Markdown
# Agent sandbox landscape (2026)
|
||
|
||
Survey of modern sandbox infrastructure for agentic coding — isolation
|
||
technologies, provider models, and industry convergence patterns relevant to
|
||
sand-boxer.
|
||
|
||
## Market definition
|
||
|
||
**AI agent sandboxes** are isolated execution environments for running
|
||
AI-generated or agent-requested code safely. They optimize for:
|
||
|
||
- Fast create / resume / teardown
|
||
- Programmatic lifecycle APIs
|
||
- Isolation from host and peer workloads
|
||
- Developer- and agent-friendly SDKs
|
||
|
||
This is distinct from general application hosting and from agent harnesses
|
||
(memory, channels, tool orchestration).
|
||
|
||
## Provider landscape (summary)
|
||
|
||
| Platform | Model | Creation | Persist / checkpoint | Isolation | Notes |
|
||
|----------|-------|----------|----------------------|-----------|-------|
|
||
| **E2B** | Managed SaaS | ~150ms | Pause/resume, snapshots | Firecracker | Scale leader; template + sandbox API |
|
||
| **Daytona** | Managed + OSS | ~90ms | Snapshots, fork | Docker/Kata | Open-source self-host path |
|
||
| **Modal** | Serverless SaaS | Sub-second | Memory snapshots, volumes | gVisor | Strong GPU; code-defined runtime |
|
||
| **Blaxel** | Managed | Sub-25ms resume | Hibernate | microVM | Zero idle compute billing |
|
||
| **Vercel Sandbox** | Managed | ms | Snapshots, persistent default | Firecracker | Vercel ecosystem |
|
||
| **Cloudflare Sandbox SDK** | Edge | seconds / ms (isolates) | DO state | Containers / V8 | Workers-native |
|
||
| **AWS AgentCore** | Managed sessions | — | Session ≤8h | microVM | Hyperscaler bundling |
|
||
| **Google Agent Sandbox** | Managed preview | Sub-second | TTL ≤14d | Hardened containers | Gemini Enterprise layer |
|
||
| **OpenSandbox** | Self-hosted OSS | Pool pre-warm | Pause/resume, PVC | gVisor/Kata/Firecracker | K8s-scale; CNCF Landscape |
|
||
| **OpenShell** | Policy runtime | — | Long-lived sandboxes | Landlock/seccomp/OPA | Governance layer, not hosted platform |
|
||
| **Northflank** | BYOC + managed | ~200ms | Persistent | microVM/gVisor | VPC deployment |
|
||
| **Runloop** | Managed | ~100ms exec | Snapshot, branch | Custom hypervisor | SWE-bench / eval focus |
|
||
| **Sprites** | Managed | 1–2s | ~300ms checkpoints | Firecracker | Persistent-first |
|
||
| **ComputeSDK** | Abstraction | Varies | Varies | Varies | Multi-provider router (9 backends) |
|
||
|
||
Sources: [Ry Walker research (Jun 2026)](https://rywalker.com/research/ai-agent-sandboxes),
|
||
provider docs, Modal/E2B marketing materials. Treat vendor claims as directional.
|
||
|
||
## Isolation technology spectrum
|
||
|
||
| Technology | Used by | Security level | Performance |
|
||
|------------|---------|----------------|-------------|
|
||
| **Firecracker** | E2B, Sprites, Vercel | Hardware-level microVM | Fast |
|
||
| **gVisor / Kata** | Modal, Northflank, OpenSandbox | Kernel-level | Very fast |
|
||
| **Hardened Docker** | Daytona, AIO Sandbox | Container-level | Fastest setup |
|
||
| **Landlock / seccomp / OPA** | OpenShell | Kernel policy | Native speed |
|
||
| **V8 isolates** | Cloudflare Worker Loader | Process-level | Milliseconds |
|
||
|
||
**Implication for sand-boxer:** profile metadata must declare `isolation_level`
|
||
so consumers can reason about blast radius. Extensions map profiles to concrete
|
||
runtimes; the meta-framework does not mandate one technology.
|
||
|
||
## Convergence trends (2025 → 2026)
|
||
|
||
### 1. Ephemeral vs persistent collapsed
|
||
|
||
Early market split (E2B = ephemeral, Sprites = persistent) has merged. Most
|
||
platforms now offer:
|
||
|
||
- Persistent workspace by default or as first-class option
|
||
- Checkpoint / snapshot / hibernate for fast resume
|
||
- TTL and explicit teardown still expected for cost and security
|
||
|
||
**sand-boxer takeaway:** profiles should support `persistence: ephemeral |
|
||
persistent | checkpoint` as a first-class dimension, not a backend detail.
|
||
|
||
### 2. Checkpointing is table stakes
|
||
|
||
Sub-second to low-second restore times are becoming baseline for agent coding
|
||
(workspace state, installed deps, shell history — not always live PIDs).
|
||
|
||
**sand-boxer takeaway:** lifecycle API needs `snapshot`, `restore`, `fork`
|
||
operations even if early extensions only implement `recreate`.
|
||
|
||
### 3. Security stress-tests exposed limits
|
||
|
||
Research on AWS AgentCore and OpenShell/NemoClaw showed that **allowed egress
|
||
paths** (git, npm, curl, node to allowlisted hosts) can be weaponized for
|
||
exfiltration when agents are prompt-injected or tricked into malicious
|
||
dependencies. Policy controls *destination*, not *intent*.
|
||
|
||
**sand-boxer takeaway:** document honestly that sandboxing is blast-radius
|
||
control, not agent-behavior guarantee. Default-deny network; per-profile egress
|
||
allowlists; secrets injected at boundary, never in agent-visible workspace.
|
||
|
||
### 4. Hyperscaler bundling pressures independents
|
||
|
||
AWS, Google, Cloudflare, Vercel entered the category in one quarter.
|
||
Independents compete on multi-cloud neutrality, price, isolation depth, or
|
||
open-source self-host.
|
||
|
||
**sand-boxer takeaway:** OpenRouter-style routing across self-hosted and SaaS
|
||
backends is a defensible Coulomb position — no single-vendor lock-in.
|
||
|
||
### 5. Abstraction layers emerging
|
||
|
||
ComputeSDK routes one TypeScript API across E2B, Modal, Daytona, Runloop,
|
||
Cloudflare, Vercel, etc. — "Terraform for running other people's code."
|
||
|
||
**sand-boxer takeaway:** validate the meta-framework API against this pattern;
|
||
extensions are providers; sand-boxer core is router + policy + billing + registry.
|
||
|
||
## Architecture patterns (industry)
|
||
|
||
### Gateway / harness vs runtime (universal split)
|
||
|
||
```
|
||
[Agent gateway / harness] ──orchestrates──▶ [Sandbox runtime]
|
||
(host or control plane) (isolated)
|
||
```
|
||
|
||
OpenClaw and Hermes both keep the gateway on the host and run **tool execution**
|
||
in the sandbox. sand-boxer owns the runtime side only; **glas-harness** owns the
|
||
gateway/harness side (see `03-meta-framework-synthesis.md`).
|
||
|
||
### Profile + backend + scope (OpenClaw / Hermes consensus)
|
||
|
||
| Dimension | Examples |
|
||
|-----------|----------|
|
||
| **Backend** | docker, ssh, openshell, modal, daytona, compose-ssh |
|
||
| **Scope** | per-agent, per-session, shared |
|
||
| **Workspace** | isolated, ro-mount, rw-mount; mirror vs remote-canonical |
|
||
| **Network** | default deny; optional allowlist |
|
||
| **TTL** | mandatory; idle reaper optional |
|
||
|
||
### Credential and reachability boundary
|
||
|
||
Best practice: credentials brokered at sandbox edge (OpenShell proxy, Blitzy
|
||
secrets-never-to-AI, ops-warden certs). Agent process never holds production
|
||
tokens for unrelated systems.
|
||
|
||
sand-boxer integrates **ops-bridge** (reachability) and **ops-warden**
|
||
(identity) as consumers — does not replace them.
|
||
|
||
## What sand-boxer should adopt vs defer
|
||
|
||
| Adopt now (meta-framework) | Defer (extension or phase 2) |
|
||
|----------------------------|------------------------------|
|
||
| Unified provision/teardown API | GPU profiles |
|
||
| Named versioned profiles | Browser sandbox profiles |
|
||
| Extension plugin interface | Intent-aware egress filtering |
|
||
| Self-hosted compose-ssh (e2e lineage) | Native Firecracker control plane |
|
||
| State Hub lifecycle registration | Multi-region routing |
|
||
| Default-deny network policy | Computer Use / desktop sandboxes |
|
||
| Payments routing for SaaS backends | Owned hyperscale sandbox fleet |
|
||
|
||
## Related reading
|
||
|
||
- [02-reference-frameworks.md](02-reference-frameworks.md) — OpenClaw, Hermes, Blitzy
|
||
- [03-meta-framework-synthesis.md](03-meta-framework-synthesis.md) — API and extensions |