generated from coulomb/repo-seed
docs: charter meta-framework vision, research, and SAND-WP-0002
Rewrite INTENT.md as the sand-boxer meta-framework charter (OpenRouter-style sandbox API, extensions, payments, Coulomb sibling boundaries). Add research under research/, update SCOPE.md, bootstrap workplans SAND-WP-0001/0002, and State Hub integration files from the bootstrap pass.
This commit is contained in:
153
research/01-agent-sandbox-landscape.md
Normal file
153
research/01-agent-sandbox-landscape.md
Normal file
@@ -0,0 +1,153 @@
|
||||
# Agent sandbox landscape (2026)
|
||||
|
||||
Survey of modern sandbox infrastructure for agentic coding — isolation
|
||||
technologies, provider models, and industry convergence patterns relevant to
|
||||
sand-boxer.
|
||||
|
||||
## Market definition
|
||||
|
||||
**AI agent sandboxes** are isolated execution environments for running
|
||||
AI-generated or agent-requested code safely. They optimize for:
|
||||
|
||||
- Fast create / resume / teardown
|
||||
- Programmatic lifecycle APIs
|
||||
- Isolation from host and peer workloads
|
||||
- Developer- and agent-friendly SDKs
|
||||
|
||||
This is distinct from general application hosting and from agent harnesses
|
||||
(memory, channels, tool orchestration).
|
||||
|
||||
## Provider landscape (summary)
|
||||
|
||||
| Platform | Model | Creation | Persist / checkpoint | Isolation | Notes |
|
||||
|----------|-------|----------|----------------------|-----------|-------|
|
||||
| **E2B** | Managed SaaS | ~150ms | Pause/resume, snapshots | Firecracker | Scale leader; template + sandbox API |
|
||||
| **Daytona** | Managed + OSS | ~90ms | Snapshots, fork | Docker/Kata | Open-source self-host path |
|
||||
| **Modal** | Serverless SaaS | Sub-second | Memory snapshots, volumes | gVisor | Strong GPU; code-defined runtime |
|
||||
| **Blaxel** | Managed | Sub-25ms resume | Hibernate | microVM | Zero idle compute billing |
|
||||
| **Vercel Sandbox** | Managed | ms | Snapshots, persistent default | Firecracker | Vercel ecosystem |
|
||||
| **Cloudflare Sandbox SDK** | Edge | seconds / ms (isolates) | DO state | Containers / V8 | Workers-native |
|
||||
| **AWS AgentCore** | Managed sessions | — | Session ≤8h | microVM | Hyperscaler bundling |
|
||||
| **Google Agent Sandbox** | Managed preview | Sub-second | TTL ≤14d | Hardened containers | Gemini Enterprise layer |
|
||||
| **OpenSandbox** | Self-hosted OSS | Pool pre-warm | Pause/resume, PVC | gVisor/Kata/Firecracker | K8s-scale; CNCF Landscape |
|
||||
| **OpenShell** | Policy runtime | — | Long-lived sandboxes | Landlock/seccomp/OPA | Governance layer, not hosted platform |
|
||||
| **Northflank** | BYOC + managed | ~200ms | Persistent | microVM/gVisor | VPC deployment |
|
||||
| **Runloop** | Managed | ~100ms exec | Snapshot, branch | Custom hypervisor | SWE-bench / eval focus |
|
||||
| **Sprites** | Managed | 1–2s | ~300ms checkpoints | Firecracker | Persistent-first |
|
||||
| **ComputeSDK** | Abstraction | Varies | Varies | Varies | Multi-provider router (9 backends) |
|
||||
|
||||
Sources: [Ry Walker research (Jun 2026)](https://rywalker.com/research/ai-agent-sandboxes),
|
||||
provider docs, Modal/E2B marketing materials. Treat vendor claims as directional.
|
||||
|
||||
## Isolation technology spectrum
|
||||
|
||||
| Technology | Used by | Security level | Performance |
|
||||
|------------|---------|----------------|-------------|
|
||||
| **Firecracker** | E2B, Sprites, Vercel | Hardware-level microVM | Fast |
|
||||
| **gVisor / Kata** | Modal, Northflank, OpenSandbox | Kernel-level | Very fast |
|
||||
| **Hardened Docker** | Daytona, AIO Sandbox | Container-level | Fastest setup |
|
||||
| **Landlock / seccomp / OPA** | OpenShell | Kernel policy | Native speed |
|
||||
| **V8 isolates** | Cloudflare Worker Loader | Process-level | Milliseconds |
|
||||
|
||||
**Implication for sand-boxer:** profile metadata must declare `isolation_level`
|
||||
so consumers can reason about blast radius. Extensions map profiles to concrete
|
||||
runtimes; the meta-framework does not mandate one technology.
|
||||
|
||||
## Convergence trends (2025 → 2026)
|
||||
|
||||
### 1. Ephemeral vs persistent collapsed
|
||||
|
||||
Early market split (E2B = ephemeral, Sprites = persistent) has merged. Most
|
||||
platforms now offer:
|
||||
|
||||
- Persistent workspace by default or as first-class option
|
||||
- Checkpoint / snapshot / hibernate for fast resume
|
||||
- TTL and explicit teardown still expected for cost and security
|
||||
|
||||
**sand-boxer takeaway:** profiles should support `persistence: ephemeral |
|
||||
persistent | checkpoint` as a first-class dimension, not a backend detail.
|
||||
|
||||
### 2. Checkpointing is table stakes
|
||||
|
||||
Sub-second to low-second restore times are becoming baseline for agent coding
|
||||
(workspace state, installed deps, shell history — not always live PIDs).
|
||||
|
||||
**sand-boxer takeaway:** lifecycle API needs `snapshot`, `restore`, `fork`
|
||||
operations even if early extensions only implement `recreate`.
|
||||
|
||||
### 3. Security stress-tests exposed limits
|
||||
|
||||
Research on AWS AgentCore and OpenShell/NemoClaw showed that **allowed egress
|
||||
paths** (git, npm, curl, node to allowlisted hosts) can be weaponized for
|
||||
exfiltration when agents are prompt-injected or tricked into malicious
|
||||
dependencies. Policy controls *destination*, not *intent*.
|
||||
|
||||
**sand-boxer takeaway:** document honestly that sandboxing is blast-radius
|
||||
control, not agent-behavior guarantee. Default-deny network; per-profile egress
|
||||
allowlists; secrets injected at boundary, never in agent-visible workspace.
|
||||
|
||||
### 4. Hyperscaler bundling pressures independents
|
||||
|
||||
AWS, Google, Cloudflare, Vercel entered the category in one quarter.
|
||||
Independents compete on multi-cloud neutrality, price, isolation depth, or
|
||||
open-source self-host.
|
||||
|
||||
**sand-boxer takeaway:** OpenRouter-style routing across self-hosted and SaaS
|
||||
backends is a defensible Coulomb position — no single-vendor lock-in.
|
||||
|
||||
### 5. Abstraction layers emerging
|
||||
|
||||
ComputeSDK routes one TypeScript API across E2B, Modal, Daytona, Runloop,
|
||||
Cloudflare, Vercel, etc. — "Terraform for running other people's code."
|
||||
|
||||
**sand-boxer takeaway:** validate the meta-framework API against this pattern;
|
||||
extensions are providers; sand-boxer core is router + policy + billing + registry.
|
||||
|
||||
## Architecture patterns (industry)
|
||||
|
||||
### Gateway / harness vs runtime (universal split)
|
||||
|
||||
```
|
||||
[Agent gateway / harness] ──orchestrates──▶ [Sandbox runtime]
|
||||
(host or control plane) (isolated)
|
||||
```
|
||||
|
||||
OpenClaw and Hermes both keep the gateway on the host and run **tool execution**
|
||||
in the sandbox. sand-boxer owns the runtime side only; **glas-harness** owns the
|
||||
gateway/harness side (see `03-meta-framework-synthesis.md`).
|
||||
|
||||
### Profile + backend + scope (OpenClaw / Hermes consensus)
|
||||
|
||||
| Dimension | Examples |
|
||||
|-----------|----------|
|
||||
| **Backend** | docker, ssh, openshell, modal, daytona, compose-ssh |
|
||||
| **Scope** | per-agent, per-session, shared |
|
||||
| **Workspace** | isolated, ro-mount, rw-mount; mirror vs remote-canonical |
|
||||
| **Network** | default deny; optional allowlist |
|
||||
| **TTL** | mandatory; idle reaper optional |
|
||||
|
||||
### Credential and reachability boundary
|
||||
|
||||
Best practice: credentials brokered at sandbox edge (OpenShell proxy, Blitzy
|
||||
secrets-never-to-AI, ops-warden certs). Agent process never holds production
|
||||
tokens for unrelated systems.
|
||||
|
||||
sand-boxer integrates **ops-bridge** (reachability) and **ops-warden**
|
||||
(identity) as consumers — does not replace them.
|
||||
|
||||
## What sand-boxer should adopt vs defer
|
||||
|
||||
| Adopt now (meta-framework) | Defer (extension or phase 2) |
|
||||
|----------------------------|------------------------------|
|
||||
| Unified provision/teardown API | GPU profiles |
|
||||
| Named versioned profiles | Browser sandbox profiles |
|
||||
| Extension plugin interface | Intent-aware egress filtering |
|
||||
| Self-hosted compose-ssh (e2e lineage) | Native Firecracker control plane |
|
||||
| State Hub lifecycle registration | Multi-region routing |
|
||||
| Default-deny network policy | Computer Use / desktop sandboxes |
|
||||
| Payments routing for SaaS backends | Owned hyperscale sandbox fleet |
|
||||
|
||||
## Related reading
|
||||
|
||||
- [02-reference-frameworks.md](02-reference-frameworks.md) — OpenClaw, Hermes, Blitzy
|
||||
- [03-meta-framework-synthesis.md](03-meta-framework-synthesis.md) — API and extensions
|
||||
204
research/02-reference-frameworks.md
Normal file
204
research/02-reference-frameworks.md
Normal file
@@ -0,0 +1,204 @@
|
||||
# Reference frameworks and platforms
|
||||
|
||||
Deep dives on systems sand-boxer should learn from — especially OpenClaw,
|
||||
Hermes Agent, Blitzy, and OpenShell — plus hosted platforms as extension
|
||||
targets.
|
||||
|
||||
---
|
||||
|
||||
## OpenClaw
|
||||
|
||||
**What it is:** Personal AI assistant with optional tool sandboxing.
|
||||
**Docs:** https://docs.openclaw.ai/gateway/sandboxing
|
||||
|
||||
### Role in the stack
|
||||
|
||||
OpenClaw is an **agent harness** (gateway, channels, skills, memory). Sandboxing
|
||||
is optional configuration on tool execution — not the product core. This is the
|
||||
same boundary sand-boxer draws vs **glas-harness**.
|
||||
|
||||
### Sandbox architecture
|
||||
|
||||
**What gets sandboxed:** `exec`, `read`, `write`, `edit`, `apply_patch`,
|
||||
`process`, optional sandboxed browser. Gateway stays on host.
|
||||
|
||||
**Backends:**
|
||||
|
||||
| Backend | Where | Workspace model |
|
||||
|---------|-------|-----------------|
|
||||
| `docker` | Local container | Bind-mount or copy; default `network: "none"` |
|
||||
| `ssh` | Remote SSH host | Remote-canonical: seed once, exec remotely |
|
||||
| `openshell` | OpenShell-managed | `mirror` (local canonical) or `remote` (remote canonical) |
|
||||
|
||||
**Scope:** `agent` (default) | `session` | `shared` — controls container count.
|
||||
|
||||
**Mode:** `off` | `non-main` | `all` — when sandboxing applies.
|
||||
|
||||
**Workspace access:** `none` | `ro` | `rw` — what tools can see.
|
||||
|
||||
### Security patterns worth copying
|
||||
|
||||
- Default Docker network **none**
|
||||
- Bind-mount blocklist: `docker.sock`, `/etc`, `~/.ssh`, `~/.aws`, credential roots
|
||||
- Symlink-aware path validation before bind approval
|
||||
- `tools.elevated` as explicit sandbox bypass (audited escape hatch)
|
||||
- Honest disclaimer: reduces blast radius, not perfect boundary
|
||||
|
||||
### sand-boxer lessons
|
||||
|
||||
1. **Backend / scope / workspaceAccess** vocabulary is proven — adopt in profile schema
|
||||
2. **SSH remote-canonical** matches Custodian e2e-framework evolution path
|
||||
3. **mirror vs remote** workspace modes belong in meta-framework API
|
||||
4. OpenClaw integrates OpenShell as extension — validates extension-delegation model
|
||||
|
||||
---
|
||||
|
||||
## Hermes Agent
|
||||
|
||||
**What it is:** Agent harness from Nous Research with multi-backend terminal execution.
|
||||
**Repo:** https://github.com/NousResearch/hermes-agent
|
||||
|
||||
### Terminal backends (six)
|
||||
|
||||
| Backend | Isolation | Persistence |
|
||||
|---------|-----------|-------------|
|
||||
| `local` | None | — |
|
||||
| `docker` | Cap-drop ALL, pids-limit, tmpfs | Single long-lived labeled container |
|
||||
| `ssh` | Network boundary | Persistent remote shell |
|
||||
| `modal` | Cloud VM | Filesystem snapshots |
|
||||
| `daytona` | Cloud container | Stop/resume |
|
||||
| `singularity` | HPC namespaces | Writable overlay |
|
||||
|
||||
### Docker backend highlights
|
||||
|
||||
- **One container per task**, reused across sessions and Hermes process restarts
|
||||
- Labels: `hermes-agent=1`, `hermes-task-id`, `hermes-profile`
|
||||
- `docker_persist_across_processes: true` (default) — container survives process exit
|
||||
- Resource limits: CPU, memory, disk, `lifetime_seconds` idle reaper
|
||||
- `docker_forward_env` — secrets from host `.env`, not config YAML
|
||||
- Parallel subagents **share** container unless per-task image override
|
||||
|
||||
### sand-boxer lessons
|
||||
|
||||
1. **Labeled reuse** beats cold provision per tool call for agent coding efficiency
|
||||
2. Resource limits and idle reaper are profile-level concerns
|
||||
3. Modal/Daytona as **extension backends** — Hermes consumes, does not own
|
||||
4. Credential forwarding policy belongs in extension contract, not agent config
|
||||
|
||||
---
|
||||
|
||||
## NVIDIA OpenShell + NemoClaw (Hermes deployment)
|
||||
|
||||
**OpenShell:** Policy runtime for agent sandboxes — Landlock, seccomp, OPA egress.
|
||||
**NemoClaw:** Reference stack deploying Hermes inside OpenShell.
|
||||
|
||||
### Three-layer model (industry pattern)
|
||||
|
||||
| Layer | Component | Responsibility |
|
||||
|-------|-----------|----------------|
|
||||
| Model | LLM provider | Reasoning |
|
||||
| Harness | Hermes | Skills, memory, bridges, scheduling |
|
||||
| Runtime | OpenShell | Filesystem/network policy, credential brokering |
|
||||
|
||||
sand-boxer maps to **runtime** only. glas-harness maps to **harness**.
|
||||
|
||||
### Policy model
|
||||
|
||||
Declarative YAML: allowed hosts, ports, HTTP methods, **binary-scoped** rules
|
||||
(e.g. only `curl` may reach `api.github.com`). Credentials injected at egress
|
||||
proxy — agent never sees Slack/Outlook tokens.
|
||||
|
||||
### Snapshot / restore
|
||||
|
||||
NemoClaw ships `snapshot.sh` / `restore.sh` for agent state (skills, memories,
|
||||
sessions) across redeploys. Credential filter excludes secrets from tarballs.
|
||||
|
||||
### Security research (Lasso, Apr 2026)
|
||||
|
||||
Demonstrated exfiltration via **policy-permitted** paths (git PR, npm postinstall
|
||||
→ Discord). Policies enforced correctly; intent not evaluated.
|
||||
|
||||
**sand-boxer lesson:** OpenShell-class extensions should be offered; security
|
||||
runbooks must state limits of egress allowlisting.
|
||||
|
||||
---
|
||||
|
||||
## Blitzy
|
||||
|
||||
**What it is:** AI-native code generation platform — **not** a sandbox runtime.
|
||||
|
||||
### "Blitzy Sandbox" GitHub org
|
||||
|
||||
Public demo repos for Explore members. Not execution infrastructure.
|
||||
|
||||
### Real isolation model: Environments
|
||||
|
||||
https://docs.blitzy.com/administration/environments
|
||||
|
||||
- Natural-language **setup instructions** (toolchain, build, run, test)
|
||||
- **Variables** (plaintext) vs **Secrets** (encrypted, masked, **never sent to AI**)
|
||||
- Multi-environment priority merge (base + project override)
|
||||
- Validation in configured environment after code generation
|
||||
|
||||
### sand-boxer lessons (environment metadata, not runtime)
|
||||
|
||||
| Blitzy pattern | sand-boxer mapping |
|
||||
|----------------|-------------------|
|
||||
| Environment config | Profile `setup` metadata block |
|
||||
| Secrets never to AI | `secret_refs` resolved at provision boundary |
|
||||
| Setup instructions | Profile runbook for extension bootstrap |
|
||||
| Human review gates | Out of scope — **snuggle-inventor** / PR workflow |
|
||||
|
||||
Blitzy validates that **describing how to boot an environment** is as important
|
||||
as **where it runs**. sand-boxer profiles carry both.
|
||||
|
||||
---
|
||||
|
||||
## Hosted platforms as extension targets
|
||||
|
||||
sand-boxer extensions may delegate to SaaS providers. Initial extension candidates:
|
||||
|
||||
| Extension id | Provider | Self-host alt | Payments |
|
||||
|--------------|----------|---------------|----------|
|
||||
| `ext.e2b` | E2B | — | Per-second SaaS |
|
||||
| `ext.modal` | Modal | — | Per-second + GPU |
|
||||
| `ext.daytona` | Daytona cloud | `ext.daytona-self` (OSS) | SaaS or infra cost |
|
||||
| `ext.openshell` | — | OpenShell local/k3s | Infra cost |
|
||||
| `ext.compose-ssh` | — | sandboxer01 / CoulombCore | Infra cost |
|
||||
| `ext.vm-packer` | — | build-machines lineage | Infra cost |
|
||||
|
||||
ComputeSDK (https://github.com/computesdk/computesdk) is a useful reference for
|
||||
normalizing provider differences behind one client API.
|
||||
|
||||
---
|
||||
|
||||
## OpenRouter analogy
|
||||
|
||||
| OpenRouter | sand-boxer |
|
||||
|------------|------------|
|
||||
| Unified LLM API | Unified sandbox API |
|
||||
| Routes to OpenAI, Anthropic, … | Routes to E2B, Modal, self-hosted compose, … |
|
||||
| API keys / credits / billing | Payments layer for SaaS consumption |
|
||||
| Model metadata (context, price) | Profile metadata (isolation, cost, latency) |
|
||||
| Fallback / routing policy | Host placement + extension fallback |
|
||||
|
||||
sand-boxer does not run inference; it runs **isolation**. The routing and
|
||||
payments patterns transfer directly.
|
||||
|
||||
---
|
||||
|
||||
## Anti-patterns to avoid
|
||||
|
||||
| Anti-pattern | Why |
|
||||
|--------------|-----|
|
||||
| Rebuild OpenClaw/Hermes gateway in sand-boxer | glas-harness scope |
|
||||
| Embed e2e test orchestration in provisioner | wise-validator scope |
|
||||
| Generate code inside sandbox API | snuggle-inventor scope |
|
||||
| Own SSH tunnels or CA | ops-bridge / ops-warden scope |
|
||||
| Claim sandbox = safe from prompt injection | Research disproves |
|
||||
|
||||
## Related reading
|
||||
|
||||
- [01-agent-sandbox-landscape.md](01-agent-sandbox-landscape.md)
|
||||
- [03-meta-framework-synthesis.md](03-meta-framework-synthesis.md)
|
||||
- `INTENT.md` — normative charter
|
||||
294
research/03-meta-framework-synthesis.md
Normal file
294
research/03-meta-framework-synthesis.md
Normal file
@@ -0,0 +1,294 @@
|
||||
# Meta-framework synthesis
|
||||
|
||||
Design notes distilled from landscape research for sand-boxer's unified sandbox
|
||||
API, extension model, payments layer, and Coulomb project boundaries.
|
||||
|
||||
---
|
||||
|
||||
## Core thesis
|
||||
|
||||
sand-boxer is a **meta-framework for establishing sandboxes** — like OpenRouter
|
||||
is a meta-framework for accessing LLM models:
|
||||
|
||||
- One **consistent API** for consumers (`adm`, `agt`, `atm`, domain services)
|
||||
- Many **extensions** that delegate to self-hosted or SaaS sandbox systems
|
||||
- **Integrated payments** when consuming metered external services
|
||||
- **Registry-first** profiles and capabilities via reuse-surface
|
||||
- **Later:** a Coulomb-native "best of brands" runtime built from operational
|
||||
experience — not day one
|
||||
|
||||
sand-boxer provisions **where and how code runs**. It does not provision **how
|
||||
agents think**, **what tests mean**, or **what code gets written**.
|
||||
|
||||
---
|
||||
|
||||
## Coulomb project boundaries
|
||||
|
||||
These sibling projects are **planned Coulomb repos** with explicit authority
|
||||
split. sand-boxer must not absorb their concerns.
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
subgraph establish [sand-boxer]
|
||||
SB[Establish sandbox]
|
||||
end
|
||||
|
||||
subgraph harness [glas-harness]
|
||||
GH[Agent harness: gateway tools memory channels]
|
||||
end
|
||||
|
||||
subgraph validate [wise-validator]
|
||||
WV[E2E tests health checks validation orchestration]
|
||||
end
|
||||
|
||||
subgraph generate [snuggle-inventor]
|
||||
SI[Code generation modernization]
|
||||
end
|
||||
|
||||
GH -->|request sandbox| SB
|
||||
WV -->|request sandbox| SB
|
||||
SI -->|request sandbox| SB
|
||||
WV -.->|runs tests in| SB
|
||||
GH -.->|executes tools in| SB
|
||||
SI -.->|validates output in| SB
|
||||
```
|
||||
|
||||
| Project | Owns | Does not own |
|
||||
|---------|------|--------------|
|
||||
| **sand-boxer** | Sandbox profiles, provision/teardown, extension routing, placement, lifecycle registration, payments for sandbox consumption | Agent memory, channels, tool policies, test definitions, code generation |
|
||||
| **glas-harness** | Agent gateway, harness, skills, subagents, tool orchestration, channel bridges | Sandbox runtime, isolation enforcement, host placement |
|
||||
| **wise-validator** | E2E test orchestration, health check semantics, validation workflows, result reporting | Sandbox provisioning, agent conversation state |
|
||||
| **snuggle-inventor** | Code generation, tech specs, AAP-style planning, PR-oriented output | Sandbox infrastructure, test harness canon |
|
||||
|
||||
### Integration contracts (intended)
|
||||
|
||||
**glas-harness → sand-boxer**
|
||||
|
||||
```
|
||||
POST /v1/sandboxes
|
||||
profile: "profile.agent-dev"
|
||||
scope: session | agent | shared
|
||||
workspace: { mode: mirror | remote, access: none | ro | rw }
|
||||
consumer: { actor: agt, harness: glas-harness, session_id }
|
||||
```
|
||||
|
||||
Harness receives: `sandbox_id`, reachability descriptor (SSH endpoint, tunnel ref),
|
||||
lifecycle webhook or poll URL. Harness executes tools **inside** sandbox via
|
||||
agreed exec channel — sand-boxer does not parse tool calls.
|
||||
|
||||
**wise-validator → sand-boxer**
|
||||
|
||||
```
|
||||
POST /v1/sandboxes
|
||||
profile: "profile.compose-e2e"
|
||||
inputs: { repo_ref, compose_bundle_ref }
|
||||
ttl: 2h
|
||||
consumer: { actor: atm, harness: wise-validator, run_id }
|
||||
```
|
||||
|
||||
wise-validator owns `e2e.yml` semantics, health check definitions, test commands,
|
||||
and pass/fail interpretation. sand-boxer delivers an environment; wise-validator
|
||||
runs the validation story **on top**.
|
||||
|
||||
**snuggle-inventor → sand-boxer**
|
||||
|
||||
```
|
||||
POST /v1/sandboxes
|
||||
profile: "profile.build"
|
||||
setup_metadata: { instructions_ref, secret_refs }
|
||||
consumer: { actor: agt, harness: snuggle-inventor, job_id }
|
||||
```
|
||||
|
||||
snuggle-inventor may attach Blitzy-style setup instructions as profile inputs.
|
||||
sand-boxer resolves secrets at boundary; generated code never flows through
|
||||
sand-boxer APIs.
|
||||
|
||||
### Migration from the-custodian
|
||||
|
||||
| Legacy | New owner |
|
||||
|--------|-----------|
|
||||
| `e2e-framework/` provision/teardown | sand-boxer `ext.compose-ssh` |
|
||||
| `e2e-framework/` test run + report | wise-validator (calls sand-boxer) |
|
||||
| Agent tool sandbox config | glas-harness (calls sand-boxer) |
|
||||
| `infra/build-machines/` | sand-boxer `ext.vm-packer` |
|
||||
|
||||
---
|
||||
|
||||
## Meta-framework API (conceptual)
|
||||
|
||||
### Resources
|
||||
|
||||
| Resource | Description |
|
||||
|----------|-------------|
|
||||
| `Profile` | Named, versioned sandbox recipe (image, isolation, network, TTL, extension) |
|
||||
| `Extension` | Backend adapter (self-hosted or SaaS) |
|
||||
| `Host` | Registered placement target for self-hosted extensions |
|
||||
| `Sandbox` | Running instance of a profile |
|
||||
| `Snapshot` | Point-in-time workspace checkpoint (optional) |
|
||||
| `Route` | Extension selection policy (cost, latency, capability) |
|
||||
| `Meter` | Usage record for payments layer |
|
||||
|
||||
### Sandbox lifecycle states
|
||||
|
||||
```
|
||||
requested → provisioning → ready → active → { expired | failed } → destroying → destroyed
|
||||
```
|
||||
|
||||
All transitions emit State Hub events. `ready` means reachability probe succeeded.
|
||||
|
||||
### Core operations
|
||||
|
||||
| Operation | Description |
|
||||
|-----------|-------------|
|
||||
| `create` | Provision from profile + inputs |
|
||||
| `get` / `list` | Inspect status |
|
||||
| `exec` | Run command in sandbox (optional — may be harness-owned) |
|
||||
| `extend_ttl` | Explicit persistence extension |
|
||||
| `snapshot` / `restore` | Checkpoint workspace |
|
||||
| `recreate` | Destroy and reprovision from seed |
|
||||
| `destroy` | Idempotent teardown |
|
||||
|
||||
Early versions may expose only `create`, `get`, `destroy`, `recreate`; harnesses
|
||||
can own `exec` via SSH/tunnel without sand-boxer proxying every command.
|
||||
|
||||
### Profile schema (minimum)
|
||||
|
||||
```yaml
|
||||
id: profile.compose-e2e
|
||||
version: "1.0.0"
|
||||
extension: ext.compose-ssh
|
||||
isolation:
|
||||
level: container # container | microvm | policy
|
||||
network:
|
||||
default: deny
|
||||
egress: [] # extension interprets
|
||||
workspace:
|
||||
mode: remote-canonical # mirror | remote-canonical
|
||||
access: rw
|
||||
scope_default: session
|
||||
ttl:
|
||||
default: 4h
|
||||
max: 24h
|
||||
idle_reap: null
|
||||
resources:
|
||||
cpu: null
|
||||
memory_mb: null
|
||||
setup:
|
||||
instructions: "" # Blitzy-style natural language for extension bootstrap
|
||||
secret_refs: [] # resolved at provision; never in agent context
|
||||
placement:
|
||||
prefer: [sandboxer01]
|
||||
fallback: [coulombcore]
|
||||
reachability:
|
||||
tunnel: ops-bridge
|
||||
identity: ops-warden
|
||||
metadata:
|
||||
cost_class: self-hosted # self-hosted | saas-metered
|
||||
latency_class: standard
|
||||
```
|
||||
|
||||
### Extension interface (contract)
|
||||
|
||||
Each extension implements:
|
||||
|
||||
```text
|
||||
provision(profile, inputs, placement) → sandbox_handle
|
||||
wait_ready(sandbox_handle) → reachability
|
||||
teardown(sandbox_handle) → cleanup_report
|
||||
snapshot?(sandbox_handle) → snapshot_id
|
||||
restore?(snapshot_id) → sandbox_handle
|
||||
estimate_cost?(profile, duration) → meter_quote
|
||||
```
|
||||
|
||||
Extensions register in `registry/` with capability vectors (isolation level,
|
||||
regions, GPU, persistence, pricing model).
|
||||
|
||||
**Bundled extensions (roadmap):**
|
||||
|
||||
| Priority | Extension | Type |
|
||||
|----------|-----------|------|
|
||||
| P0 | `ext.compose-ssh` | Self-hosted (e2e-framework lineage) |
|
||||
| P1 | `ext.vm-packer` | Self-hosted (build-machines lineage) |
|
||||
| P2 | `ext.daytona-self` | Self-hosted OSS |
|
||||
| P3 | `ext.e2b`, `ext.modal`, `ext.daytona` | SaaS + payments |
|
||||
| P4 | `ext.openshell` | Policy runtime wrapper |
|
||||
|
||||
---
|
||||
|
||||
## Payments layer
|
||||
|
||||
For SaaS extensions, sand-boxer provides an **integrated payments and metering
|
||||
layer** analogous to OpenRouter credits:
|
||||
|
||||
| Concern | sand-boxer approach |
|
||||
|---------|---------------------|
|
||||
| Account credits | Org/workspace balance for sandbox consumption |
|
||||
| Metering | Per-second, per-creation, GPU surcharge — per extension quote |
|
||||
| Provider keys | BYOK optional; platform keys for convenience |
|
||||
| Cost visibility | `estimate_cost` before create; actuals on destroy |
|
||||
| Billing events | Export to fin-hub / external billing (consumer, not owner) |
|
||||
|
||||
Self-hosted extensions bill **infra cost only** (host allocation) — no SaaS meter.
|
||||
|
||||
Payments is a **facility inside sand-boxer**, not a general payment processor.
|
||||
Domain billing authority remains elsewhere.
|
||||
|
||||
---
|
||||
|
||||
## Routing policy (OpenRouter-style)
|
||||
|
||||
When multiple extensions satisfy a profile capability:
|
||||
|
||||
```yaml
|
||||
route:
|
||||
strategy: prefer-self-hosted | lowest-cost | lowest-latency | explicit
|
||||
fallback: [ext.compose-ssh, ext.daytona]
|
||||
constraints:
|
||||
max_cost_per_hour: null
|
||||
require_isolation: microvm
|
||||
region: eu
|
||||
```
|
||||
|
||||
Default Coulomb posture: **prefer-self-hosted** on sandboxer01; SaaS for burst
|
||||
or capability gaps (GPU, desktop) once extensions exist.
|
||||
|
||||
---
|
||||
|
||||
## Security posture (documented limits)
|
||||
|
||||
sand-boxer commits to:
|
||||
|
||||
1. Default-deny network unless profile explicitly allows egress
|
||||
2. Secrets resolved at provision boundary via ops-warden / secret refs
|
||||
3. Blast-radius isolation on dedicated hosts away from Railiance01 production
|
||||
4. Observable lifecycle and attributable actors (`adm` / `agt` / `atm`)
|
||||
5. Honest documentation: **allowed tool paths can be abused by compromised agents**
|
||||
|
||||
sand-boxer does **not** commit to intent-aware egress filtering in v1.
|
||||
|
||||
---
|
||||
|
||||
## Phased maturity
|
||||
|
||||
| Phase | Deliverable |
|
||||
|-------|-------------|
|
||||
| **0** | Charter, research, profile schema, `ext.compose-ssh` design |
|
||||
| **1** | Unified API + self-hosted compose-ssh + State Hub registration |
|
||||
| **2** | Extension SDK + vm-packer + registry entries + routing |
|
||||
| **3** | SaaS extensions + payments layer |
|
||||
| **4** | Snapshot/restore + checkpoint profiles |
|
||||
| **5** | Coulomb-native runtime ("best of brands") informed by extension ops data |
|
||||
|
||||
Phase 5 is explicitly **later** — learn from routing, billing, failure modes, and
|
||||
latency before building owned microVM/control-plane.
|
||||
|
||||
---
|
||||
|
||||
## Open questions (for workplans)
|
||||
|
||||
1. Does `exec` live in sand-boxer API or only in glas-harness via SSH?
|
||||
2. Payments: integrate with existing fin-hub or standalone credits first?
|
||||
3. Profile authorship: repo-local YAML vs hub-managed catalog?
|
||||
4. wise-validator: fork e2e-framework reporter or new contract from day one?
|
||||
|
||||
These belong in SAND-WP-0002+ design workplans, not INTENT.md.
|
||||
22
research/README.md
Normal file
22
research/README.md
Normal file
@@ -0,0 +1,22 @@
|
||||
# sand-boxer research
|
||||
|
||||
Research informing the sand-boxer meta-framework charter and implementation
|
||||
roadmap. These documents are **inputs to design**, not normative specs — see
|
||||
`INTENT.md` for authority and boundaries.
|
||||
|
||||
## Index
|
||||
|
||||
| Document | Contents |
|
||||
|----------|----------|
|
||||
| [01-agent-sandbox-landscape.md](01-agent-sandbox-landscape.md) | Market survey: isolation technologies, providers, convergence trends |
|
||||
| [02-reference-frameworks.md](02-reference-frameworks.md) | Deep dives: OpenClaw, Hermes, Blitzy, OpenShell, hosted platforms |
|
||||
| [03-meta-framework-synthesis.md](03-meta-framework-synthesis.md) | Design synthesis: API shape, extensions, payments, Coulomb boundaries |
|
||||
|
||||
## How to use
|
||||
|
||||
1. Read `INTENT.md` for the governing charter.
|
||||
2. Use `03-meta-framework-synthesis.md` when designing profiles, extensions, or
|
||||
the unified API.
|
||||
3. Use `01` and `02` when evaluating a backend extension or security posture.
|
||||
|
||||
Last updated: 2026-06-22
|
||||
Reference in New Issue
Block a user