Files
sand-boxer/research/02-reference-frameworks.md
tegwick f33cff5363 docs: charter meta-framework vision, research, and SAND-WP-0002
Rewrite INTENT.md as the sand-boxer meta-framework charter (OpenRouter-style
sandbox API, extensions, payments, Coulomb sibling boundaries). Add research
under research/, update SCOPE.md, bootstrap workplans SAND-WP-0001/0002, and
State Hub integration files from the bootstrap pass.
2026-06-22 21:32:32 +02:00

7.5 KiB

Reference frameworks and platforms

Deep dives on systems sand-boxer should learn from — especially OpenClaw, Hermes Agent, Blitzy, and OpenShell — plus hosted platforms as extension targets.


OpenClaw

What it is: Personal AI assistant with optional tool sandboxing. Docs: https://docs.openclaw.ai/gateway/sandboxing

Role in the stack

OpenClaw is an agent harness (gateway, channels, skills, memory). Sandboxing is optional configuration on tool execution — not the product core. This is the same boundary sand-boxer draws vs glas-harness.

Sandbox architecture

What gets sandboxed: exec, read, write, edit, apply_patch, process, optional sandboxed browser. Gateway stays on host.

Backends:

Backend Where Workspace model
docker Local container Bind-mount or copy; default network: "none"
ssh Remote SSH host Remote-canonical: seed once, exec remotely
openshell OpenShell-managed mirror (local canonical) or remote (remote canonical)

Scope: agent (default) | session | shared — controls container count.

Mode: off | non-main | all — when sandboxing applies.

Workspace access: none | ro | rw — what tools can see.

Security patterns worth copying

  • Default Docker network none
  • Bind-mount blocklist: docker.sock, /etc, ~/.ssh, ~/.aws, credential roots
  • Symlink-aware path validation before bind approval
  • tools.elevated as explicit sandbox bypass (audited escape hatch)
  • Honest disclaimer: reduces blast radius, not perfect boundary

sand-boxer lessons

  1. Backend / scope / workspaceAccess vocabulary is proven — adopt in profile schema
  2. SSH remote-canonical matches Custodian e2e-framework evolution path
  3. mirror vs remote workspace modes belong in meta-framework API
  4. OpenClaw integrates OpenShell as extension — validates extension-delegation model

Hermes Agent

What it is: Agent harness from Nous Research with multi-backend terminal execution. Repo: https://github.com/NousResearch/hermes-agent

Terminal backends (six)

Backend Isolation Persistence
local None
docker Cap-drop ALL, pids-limit, tmpfs Single long-lived labeled container
ssh Network boundary Persistent remote shell
modal Cloud VM Filesystem snapshots
daytona Cloud container Stop/resume
singularity HPC namespaces Writable overlay

Docker backend highlights

  • One container per task, reused across sessions and Hermes process restarts
  • Labels: hermes-agent=1, hermes-task-id, hermes-profile
  • docker_persist_across_processes: true (default) — container survives process exit
  • Resource limits: CPU, memory, disk, lifetime_seconds idle reaper
  • docker_forward_env — secrets from host .env, not config YAML
  • Parallel subagents share container unless per-task image override

sand-boxer lessons

  1. Labeled reuse beats cold provision per tool call for agent coding efficiency
  2. Resource limits and idle reaper are profile-level concerns
  3. Modal/Daytona as extension backends — Hermes consumes, does not own
  4. Credential forwarding policy belongs in extension contract, not agent config

NVIDIA OpenShell + NemoClaw (Hermes deployment)

OpenShell: Policy runtime for agent sandboxes — Landlock, seccomp, OPA egress. NemoClaw: Reference stack deploying Hermes inside OpenShell.

Three-layer model (industry pattern)

Layer Component Responsibility
Model LLM provider Reasoning
Harness Hermes Skills, memory, bridges, scheduling
Runtime OpenShell Filesystem/network policy, credential brokering

sand-boxer maps to runtime only. glas-harness maps to harness.

Policy model

Declarative YAML: allowed hosts, ports, HTTP methods, binary-scoped rules (e.g. only curl may reach api.github.com). Credentials injected at egress proxy — agent never sees Slack/Outlook tokens.

Snapshot / restore

NemoClaw ships snapshot.sh / restore.sh for agent state (skills, memories, sessions) across redeploys. Credential filter excludes secrets from tarballs.

Security research (Lasso, Apr 2026)

Demonstrated exfiltration via policy-permitted paths (git PR, npm postinstall → Discord). Policies enforced correctly; intent not evaluated.

sand-boxer lesson: OpenShell-class extensions should be offered; security runbooks must state limits of egress allowlisting.


Blitzy

What it is: AI-native code generation platform — not a sandbox runtime.

"Blitzy Sandbox" GitHub org

Public demo repos for Explore members. Not execution infrastructure.

Real isolation model: Environments

https://docs.blitzy.com/administration/environments

  • Natural-language setup instructions (toolchain, build, run, test)
  • Variables (plaintext) vs Secrets (encrypted, masked, never sent to AI)
  • Multi-environment priority merge (base + project override)
  • Validation in configured environment after code generation

sand-boxer lessons (environment metadata, not runtime)

Blitzy pattern sand-boxer mapping
Environment config Profile setup metadata block
Secrets never to AI secret_refs resolved at provision boundary
Setup instructions Profile runbook for extension bootstrap
Human review gates Out of scope — snuggle-inventor / PR workflow

Blitzy validates that describing how to boot an environment is as important as where it runs. sand-boxer profiles carry both.


Hosted platforms as extension targets

sand-boxer extensions may delegate to SaaS providers. Initial extension candidates:

Extension id Provider Self-host alt Payments
ext.e2b E2B Per-second SaaS
ext.modal Modal Per-second + GPU
ext.daytona Daytona cloud ext.daytona-self (OSS) SaaS or infra cost
ext.openshell OpenShell local/k3s Infra cost
ext.compose-ssh sandboxer01 / CoulombCore Infra cost
ext.vm-packer build-machines lineage Infra cost

ComputeSDK (https://github.com/computesdk/computesdk) is a useful reference for normalizing provider differences behind one client API.


OpenRouter analogy

OpenRouter sand-boxer
Unified LLM API Unified sandbox API
Routes to OpenAI, Anthropic, … Routes to E2B, Modal, self-hosted compose, …
API keys / credits / billing Payments layer for SaaS consumption
Model metadata (context, price) Profile metadata (isolation, cost, latency)
Fallback / routing policy Host placement + extension fallback

sand-boxer does not run inference; it runs isolation. The routing and payments patterns transfer directly.


Anti-patterns to avoid

Anti-pattern Why
Rebuild OpenClaw/Hermes gateway in sand-boxer glas-harness scope
Embed e2e test orchestration in provisioner wise-validator scope
Generate code inside sandbox API snuggle-inventor scope
Own SSH tunnels or CA ops-bridge / ops-warden scope
Claim sandbox = safe from prompt injection Research disproves