# Reference frameworks and platforms Deep dives on systems sand-boxer should learn from — especially OpenClaw, Hermes Agent, Blitzy, and OpenShell — plus hosted platforms as extension targets. --- ## OpenClaw **What it is:** Personal AI assistant with optional tool sandboxing. **Docs:** https://docs.openclaw.ai/gateway/sandboxing ### Role in the stack OpenClaw is an **agent harness** (gateway, channels, skills, memory). Sandboxing is optional configuration on tool execution — not the product core. This is the same boundary sand-boxer draws vs **glas-harness**. ### Sandbox architecture **What gets sandboxed:** `exec`, `read`, `write`, `edit`, `apply_patch`, `process`, optional sandboxed browser. Gateway stays on host. **Backends:** | Backend | Where | Workspace model | |---------|-------|-----------------| | `docker` | Local container | Bind-mount or copy; default `network: "none"` | | `ssh` | Remote SSH host | Remote-canonical: seed once, exec remotely | | `openshell` | OpenShell-managed | `mirror` (local canonical) or `remote` (remote canonical) | **Scope:** `agent` (default) | `session` | `shared` — controls container count. **Mode:** `off` | `non-main` | `all` — when sandboxing applies. **Workspace access:** `none` | `ro` | `rw` — what tools can see. ### Security patterns worth copying - Default Docker network **none** - Bind-mount blocklist: `docker.sock`, `/etc`, `~/.ssh`, `~/.aws`, credential roots - Symlink-aware path validation before bind approval - `tools.elevated` as explicit sandbox bypass (audited escape hatch) - Honest disclaimer: reduces blast radius, not perfect boundary ### sand-boxer lessons 1. **Backend / scope / workspaceAccess** vocabulary is proven — adopt in profile schema 2. **SSH remote-canonical** matches Custodian e2e-framework evolution path 3. **mirror vs remote** workspace modes belong in meta-framework API 4. OpenClaw integrates OpenShell as extension — validates extension-delegation model --- ## Hermes Agent **What it is:** Agent harness from Nous Research with multi-backend terminal execution. **Repo:** https://github.com/NousResearch/hermes-agent ### Terminal backends (six) | Backend | Isolation | Persistence | |---------|-----------|-------------| | `local` | None | — | | `docker` | Cap-drop ALL, pids-limit, tmpfs | Single long-lived labeled container | | `ssh` | Network boundary | Persistent remote shell | | `modal` | Cloud VM | Filesystem snapshots | | `daytona` | Cloud container | Stop/resume | | `singularity` | HPC namespaces | Writable overlay | ### Docker backend highlights - **One container per task**, reused across sessions and Hermes process restarts - Labels: `hermes-agent=1`, `hermes-task-id`, `hermes-profile` - `docker_persist_across_processes: true` (default) — container survives process exit - Resource limits: CPU, memory, disk, `lifetime_seconds` idle reaper - `docker_forward_env` — secrets from host `.env`, not config YAML - Parallel subagents **share** container unless per-task image override ### sand-boxer lessons 1. **Labeled reuse** beats cold provision per tool call for agent coding efficiency 2. Resource limits and idle reaper are profile-level concerns 3. Modal/Daytona as **extension backends** — Hermes consumes, does not own 4. Credential forwarding policy belongs in extension contract, not agent config --- ## NVIDIA OpenShell + NemoClaw (Hermes deployment) **OpenShell:** Policy runtime for agent sandboxes — Landlock, seccomp, OPA egress. **NemoClaw:** Reference stack deploying Hermes inside OpenShell. ### Three-layer model (industry pattern) | Layer | Component | Responsibility | |-------|-----------|----------------| | Model | LLM provider | Reasoning | | Harness | Hermes | Skills, memory, bridges, scheduling | | Runtime | OpenShell | Filesystem/network policy, credential brokering | sand-boxer maps to **runtime** only. glas-harness maps to **harness**. ### Policy model Declarative YAML: allowed hosts, ports, HTTP methods, **binary-scoped** rules (e.g. only `curl` may reach `api.github.com`). Credentials injected at egress proxy — agent never sees Slack/Outlook tokens. ### Snapshot / restore NemoClaw ships `snapshot.sh` / `restore.sh` for agent state (skills, memories, sessions) across redeploys. Credential filter excludes secrets from tarballs. ### Security research (Lasso, Apr 2026) Demonstrated exfiltration via **policy-permitted** paths (git PR, npm postinstall → Discord). Policies enforced correctly; intent not evaluated. **sand-boxer lesson:** OpenShell-class extensions should be offered; security runbooks must state limits of egress allowlisting. --- ## Blitzy **What it is:** AI-native code generation platform — **not** a sandbox runtime. ### "Blitzy Sandbox" GitHub org Public demo repos for Explore members. Not execution infrastructure. ### Real isolation model: Environments https://docs.blitzy.com/administration/environments - Natural-language **setup instructions** (toolchain, build, run, test) - **Variables** (plaintext) vs **Secrets** (encrypted, masked, **never sent to AI**) - Multi-environment priority merge (base + project override) - Validation in configured environment after code generation ### sand-boxer lessons (environment metadata, not runtime) | Blitzy pattern | sand-boxer mapping | |----------------|-------------------| | Environment config | Profile `setup` metadata block | | Secrets never to AI | `secret_refs` resolved at provision boundary | | Setup instructions | Profile runbook for extension bootstrap | | Human review gates | Out of scope — **snuggle-inventor** / PR workflow | Blitzy validates that **describing how to boot an environment** is as important as **where it runs**. sand-boxer profiles carry both. --- ## Hosted platforms as extension targets sand-boxer extensions may delegate to SaaS providers. Initial extension candidates: | Extension id | Provider | Self-host alt | Payments | |--------------|----------|---------------|----------| | `ext.e2b` | E2B | — | Per-second SaaS | | `ext.modal` | Modal | — | Per-second + GPU | | `ext.daytona` | Daytona cloud | `ext.daytona-self` (OSS) | SaaS or infra cost | | `ext.openshell` | — | OpenShell local/k3s | Infra cost | | `ext.compose-ssh` | — | sandboxer01 / CoulombCore | Infra cost | | `ext.vm-packer` | — | build-machines lineage | Infra cost | ComputeSDK (https://github.com/computesdk/computesdk) is a useful reference for normalizing provider differences behind one client API. --- ## OpenRouter analogy | OpenRouter | sand-boxer | |------------|------------| | Unified LLM API | Unified sandbox API | | Routes to OpenAI, Anthropic, … | Routes to E2B, Modal, self-hosted compose, … | | API keys / credits / billing | Payments layer for SaaS consumption | | Model metadata (context, price) | Profile metadata (isolation, cost, latency) | | Fallback / routing policy | Host placement + extension fallback | sand-boxer does not run inference; it runs **isolation**. The routing and payments patterns transfer directly. --- ## Anti-patterns to avoid | Anti-pattern | Why | |--------------|-----| | Rebuild OpenClaw/Hermes gateway in sand-boxer | glas-harness scope | | Embed e2e test orchestration in provisioner | wise-validator scope | | Generate code inside sandbox API | snuggle-inventor scope | | Own SSH tunnels or CA | ops-bridge / ops-warden scope | | Claim sandbox = safe from prompt injection | Research disproves | ## Related reading - [01-agent-sandbox-landscape.md](01-agent-sandbox-landscape.md) - [03-meta-framework-synthesis.md](03-meta-framework-synthesis.md) - `INTENT.md` — normative charter