Files
sand-boxer/research/01-agent-sandbox-landscape.md
tegwick f33cff5363 docs: charter meta-framework vision, research, and SAND-WP-0002
Rewrite INTENT.md as the sand-boxer meta-framework charter (OpenRouter-style
sandbox API, extensions, payments, Coulomb sibling boundaries). Add research
under research/, update SCOPE.md, bootstrap workplans SAND-WP-0001/0002, and
State Hub integration files from the bootstrap pass.
2026-06-22 21:32:32 +02:00

7.2 KiB
Raw Blame History

Agent sandbox landscape (2026)

Survey of modern sandbox infrastructure for agentic coding — isolation technologies, provider models, and industry convergence patterns relevant to sand-boxer.

Market definition

AI agent sandboxes are isolated execution environments for running AI-generated or agent-requested code safely. They optimize for:

  • Fast create / resume / teardown
  • Programmatic lifecycle APIs
  • Isolation from host and peer workloads
  • Developer- and agent-friendly SDKs

This is distinct from general application hosting and from agent harnesses (memory, channels, tool orchestration).

Provider landscape (summary)

Platform Model Creation Persist / checkpoint Isolation Notes
E2B Managed SaaS ~150ms Pause/resume, snapshots Firecracker Scale leader; template + sandbox API
Daytona Managed + OSS ~90ms Snapshots, fork Docker/Kata Open-source self-host path
Modal Serverless SaaS Sub-second Memory snapshots, volumes gVisor Strong GPU; code-defined runtime
Blaxel Managed Sub-25ms resume Hibernate microVM Zero idle compute billing
Vercel Sandbox Managed ms Snapshots, persistent default Firecracker Vercel ecosystem
Cloudflare Sandbox SDK Edge seconds / ms (isolates) DO state Containers / V8 Workers-native
AWS AgentCore Managed sessions Session ≤8h microVM Hyperscaler bundling
Google Agent Sandbox Managed preview Sub-second TTL ≤14d Hardened containers Gemini Enterprise layer
OpenSandbox Self-hosted OSS Pool pre-warm Pause/resume, PVC gVisor/Kata/Firecracker K8s-scale; CNCF Landscape
OpenShell Policy runtime Long-lived sandboxes Landlock/seccomp/OPA Governance layer, not hosted platform
Northflank BYOC + managed ~200ms Persistent microVM/gVisor VPC deployment
Runloop Managed ~100ms exec Snapshot, branch Custom hypervisor SWE-bench / eval focus
Sprites Managed 12s ~300ms checkpoints Firecracker Persistent-first
ComputeSDK Abstraction Varies Varies Varies Multi-provider router (9 backends)

Sources: Ry Walker research (Jun 2026), provider docs, Modal/E2B marketing materials. Treat vendor claims as directional.

Isolation technology spectrum

Technology Used by Security level Performance
Firecracker E2B, Sprites, Vercel Hardware-level microVM Fast
gVisor / Kata Modal, Northflank, OpenSandbox Kernel-level Very fast
Hardened Docker Daytona, AIO Sandbox Container-level Fastest setup
Landlock / seccomp / OPA OpenShell Kernel policy Native speed
V8 isolates Cloudflare Worker Loader Process-level Milliseconds

Implication for sand-boxer: profile metadata must declare isolation_level so consumers can reason about blast radius. Extensions map profiles to concrete runtimes; the meta-framework does not mandate one technology.

1. Ephemeral vs persistent collapsed

Early market split (E2B = ephemeral, Sprites = persistent) has merged. Most platforms now offer:

  • Persistent workspace by default or as first-class option
  • Checkpoint / snapshot / hibernate for fast resume
  • TTL and explicit teardown still expected for cost and security

sand-boxer takeaway: profiles should support persistence: ephemeral | persistent | checkpoint as a first-class dimension, not a backend detail.

2. Checkpointing is table stakes

Sub-second to low-second restore times are becoming baseline for agent coding (workspace state, installed deps, shell history — not always live PIDs).

sand-boxer takeaway: lifecycle API needs snapshot, restore, fork operations even if early extensions only implement recreate.

3. Security stress-tests exposed limits

Research on AWS AgentCore and OpenShell/NemoClaw showed that allowed egress paths (git, npm, curl, node to allowlisted hosts) can be weaponized for exfiltration when agents are prompt-injected or tricked into malicious dependencies. Policy controls destination, not intent.

sand-boxer takeaway: document honestly that sandboxing is blast-radius control, not agent-behavior guarantee. Default-deny network; per-profile egress allowlists; secrets injected at boundary, never in agent-visible workspace.

4. Hyperscaler bundling pressures independents

AWS, Google, Cloudflare, Vercel entered the category in one quarter. Independents compete on multi-cloud neutrality, price, isolation depth, or open-source self-host.

sand-boxer takeaway: OpenRouter-style routing across self-hosted and SaaS backends is a defensible Coulomb position — no single-vendor lock-in.

5. Abstraction layers emerging

ComputeSDK routes one TypeScript API across E2B, Modal, Daytona, Runloop, Cloudflare, Vercel, etc. — "Terraform for running other people's code."

sand-boxer takeaway: validate the meta-framework API against this pattern; extensions are providers; sand-boxer core is router + policy + billing + registry.

Architecture patterns (industry)

Gateway / harness vs runtime (universal split)

[Agent gateway / harness]  ──orchestrates──▶  [Sandbox runtime]
      (host or control plane)                    (isolated)

OpenClaw and Hermes both keep the gateway on the host and run tool execution in the sandbox. sand-boxer owns the runtime side only; glas-harness owns the gateway/harness side (see 03-meta-framework-synthesis.md).

Profile + backend + scope (OpenClaw / Hermes consensus)

Dimension Examples
Backend docker, ssh, openshell, modal, daytona, compose-ssh
Scope per-agent, per-session, shared
Workspace isolated, ro-mount, rw-mount; mirror vs remote-canonical
Network default deny; optional allowlist
TTL mandatory; idle reaper optional

Credential and reachability boundary

Best practice: credentials brokered at sandbox edge (OpenShell proxy, Blitzy secrets-never-to-AI, ops-warden certs). Agent process never holds production tokens for unrelated systems.

sand-boxer integrates ops-bridge (reachability) and ops-warden (identity) as consumers — does not replace them.

What sand-boxer should adopt vs defer

Adopt now (meta-framework) Defer (extension or phase 2)
Unified provision/teardown API GPU profiles
Named versioned profiles Browser sandbox profiles
Extension plugin interface Intent-aware egress filtering
Self-hosted compose-ssh (e2e lineage) Native Firecracker control plane
State Hub lifecycle registration Multi-region routing
Default-deny network policy Computer Use / desktop sandboxes
Payments routing for SaaS backends Owned hyperscale sandbox fleet