docs: charter meta-framework vision, research, and SAND-WP-0002

Rewrite INTENT.md as the sand-boxer meta-framework charter (OpenRouter-style sandbox API, extensions, payments, Coulomb sibling boundaries). Add research under research/, update SCOPE.md, bootstrap workplans SAND-WP-0001/0002, and State Hub integration files from the bootstrap pass.
2026-06-22 21:32:32 +02:00
parent e248f669a3
commit f33cff5363
20 changed files with 2016 additions and 113 deletions
--- a/INTENT.md
+++ b/INTENT.md
@@ -1,180 +1,338 @@
 ---
-domain: custodian
+domain: infotech
 repo: sand-boxer
-updated: "2026-06-21"
+updated: "2026-06-22"
 ---

 # INTENT

-> This file explains why sand-boxer exists, what problem it solves in the
-> Custodian ecosystem, and where its authority begins and ends.
+> sand-boxer is the Coulomb **meta-framework for establishing sandboxes** — a
+> unified API and extension platform for provisioning every variation of isolated
+> execution environment, from self-hosted compose stacks to metered SaaS
+> runtimes. This file is the charter: why it exists, what it owns, and where
+> sibling projects begin.
+
+Research backing this charter lives in `research/`.

 ---

 ## Why it exists

 Custodian automation is moving from **workstation-anchored** execution to
-**Railiance01-scheduled** orchestration. That shift is right for reliability:
-activity-core on Railiance01 can fire maintenance and coordination jobs on a
-stable clock. It does not, by itself, give agents a safe place to **develop,
-build, and test** without the laptop filesystem, sleep cycles, and single-user
-blast radius.
+**Railiance01-scheduled** orchestration. That shift improves reliability but does
+not, by itself, answer the harder question: **where can agentic and deterministic
+work run safely** without the laptop filesystem, sleep cycles, and single-user
+blast radius?

-sand-boxer exists to provide **isolated execution environments** — sandboxes —
-where agentic and deterministic work can run on dedicated infrastructure while
-remaining observable and governable from State Hub.
+The industry has exploded with sandbox answers — E2B, Modal, Daytona, OpenShell,
+OpenClaw-style Docker/SSH backends, hyperscaler interpreters — each with
+different APIs, billing models, and isolation postures. Coulomb needs **one place
+to establish sandboxes** regardless of backend, not a new integration per agent
+harness, validator, or codegen pipeline.

-The goal is progress without requiring the workstation as a runtime: repos are
-checked out, tools run, tests execute, and artifacts return through controlled
-channels. The laptop becomes optional for operations, not the hub of all
-execution.
+sand-boxer exists to be that place: **OpenRouter for sandboxes, not for models.**
+
+Consumers call one API. Extensions delegate to the sandbox system that fits —
+self-hosted on sandboxer01, inherited compose-ssh from `the-custodian`, or a
+metered cloud provider. An integrated **payments layer** handles SaaS consumption
+when Coulomb uses external capacity. Over time, operational learning may justify
+a Coulomb-native **best-of-brands runtime** — but that is a later phase built on
+evidence, not day-one ambition.
+
+The workstation becomes optional for **runtime**. Railiance01 decides *when*
+work runs (via activity-core). sand-boxer decides *where* isolated execution
+happens. State Hub records *what* changed.

 ---

 ## The governing principle

-sand-boxer is the **execution isolation and provisioning service** for agentic
-development and related workloads.
+sand-boxer is the **sandbox establishment service** — profiles, provisioning,
+extension routing, placement, lifecycle, and metering. Nothing more.

-It should answer:
+It answers:

-1. **Where can this work run safely?** Profile selection (compose stack, VM,
-   future cluster worker) and host placement.
-2. **How is isolation enforced?** Networks, TTL, resource limits, teardown, and
-   cleanup guarantees.
-3. **How does the sandbox phone home?** Reachability via ops-bridge tunnels and
-   SSH identity via ops-warden — without owning either.
-4. **What happened?** Registration, health, and lifecycle events visible to
-   State Hub and reuse-surface consumers.
+1. **Which sandbox recipe applies?** Profile selection and version resolution.
+2. **Which backend fulfills it?** Extension routing (self-hosted vs SaaS).
+3. **Where does it run?** Host placement and blast-radius policy.
+4. **How is isolation enforced?** Network default-deny, TTL, resource limits,
+   teardown guarantees — as declared by profile + extension.
+5. **How does it become reachable?** Consumer integration with ops-bridge and
+   ops-warden — without owning tunnels or certificates.
+6. **What happened?** Lifecycle events, usage meters, State Hub registration.
+7. **What did it cost?** Payments and credits for metered extensions.

-It should not become the scheduler, the work-state database, the connectivity
-authority, or production application hosting on Railiance01.
+It must **not** become the agent harness, the e2e validator, the code generator,
+the scheduler, the work-state database, the connectivity authority, or production
+hosting on Railiance01.

 ---

-## Strategic context
+## The OpenRouter analogy

-### Workstation automation is interim, not the target
+| OpenRouter | sand-boxer |
+|------------|------------|
+| Unified LLM access API | Unified sandbox establishment API |
+| Routes across model providers | Routes across sandbox extensions |
+| Provider metadata (price, context) | Profile metadata (isolation, cost, latency) |
+| API keys, credits, usage billing | Payments layer for SaaS sandbox consumption |
+| BYOK supported | BYOK for extension provider keys |
+| Does not train models | Does not replace extension runtimes (until phase 5) |

-Local timers and laptop-resident scripts were useful for bootstrapping ADR-001
-consistency sync and similar jobs. They are not the long-term substrate.
-Railiance01-based activity-core schedules are the primary direction; workstation
-paths remain only where no sandbox or cluster alternative exists yet.
+sand-boxer is **infrastructure routing**, not product UX. Harnesses, validators,
+and inventors are customers.

-### Railiance01 vs sandbox hosts
+---

-| Layer | Role |
-|-------|------|
-| **Railiance01** | Production k3s, activity-core, Temporal, stable custodian schedules |
-| **sandboxer01** (or equivalent) | Dedicated VM for dev/agent sandboxes — **isolated blast radius** |
-| **CoulombCore** | Acceptable interim sandbox host during migration; not a substitute for deliberate isolation from production |
-| **Workstation (WSL)** | Control plane anchor today; **not** the desired execution surface |
+## Coulomb sibling boundaries

-sand-boxer owns the **abstraction and lifecycle** of sandboxes. It does not own
-Railiance01 cluster operations (see `railiance-cluster` / `railiance-apps`).
+sand-boxer stays inside the **sandboxing boundary**. Three sibling Coulomb
+projects own adjacent concerns. Integration is contractual — they **request**
+sandboxes; sand-boxer **establishes** them.

-### Lineage
+### glas-harness — agent harness

-This repository consolidates and generalizes patterns that today live split and
-unregistered in `the-custodian`:
+**Owns:** Gateway, tool orchestration, skills, memory, channels, subagent
+delegation, session semantics, sandbox *consumption* from the agent's perspective.

- **E2E sandbox framework** (`e2e-framework/`) — SSH to remote host, isolated
-  directory, docker compose, teardown (`CUST-WP-0028`).
- **Build machines** (`infra/build-machines/`) — reproducible VM images,
-  reverse tunnels, State Hub capability registration (`CUST-WP-0032`).
+**Does not own:** Sandbox runtimes, profile catalog authority, host placement,
+extension adapters, isolation enforcement.

-sand-boxer extracts a **reusable platform** from those precedents so
-`the-custodian` can stay governance-focused with a small operational surface.
+glas-harness configures *when* tools run in a sandbox (OpenClaw-style
+`mode` / `scope` / `workspaceAccess`). sand-boxer provides the sandbox handle
+and reachability descriptor.
+
+### wise-validator — e2e test and health
+
+**Owns:** Validation workflows, health check semantics, test orchestration,
+pass/fail interpretation, structured result reporting to State Hub and CI.
+
+**Does not own:** Remote host provisioning, compose lifecycle, port isolation,
+sandbox teardown.
+
+wise-validator replaces the validation half of `the-custodian/e2e-framework/`.
+It requests `profile.compose-e2e` (or successors), runs tests inside the
+established environment, and owns the `e2e.yml` contract.
+
+### snuggle-inventor — code generation
+
+**Owns:** Code generation, modernization pipelines, tech-spec and planning
+artifacts, PR-oriented output, human-in-the-loop review gates.
+
+**Does not own:** Sandbox infrastructure, environment bootstrapping authority,
+secret stores, runtime metering.
+
+snuggle-inventor may attach Blitzy-style **setup instructions** and secret
+references as profile inputs. sand-boxer resolves secrets at the provision
+boundary; generated code never transits sand-boxer APIs.
+
+### Boundary diagram
+
+```
+  glas-harness          wise-validator         snuggle-inventor
+  (agent harness)       (e2e + health)         (code generation)
+        │                     │                      │
+        └─────────────────────┼──────────────────────┘
+                              │  POST /v1/sandboxes
+                              ▼
+                        sand-boxer
+                   (establish sandboxes)
+                              │
+              ┌───────────────┼───────────────┐
+              ▼               ▼               ▼
+        ext.compose-ssh   ext.modal      ext.e2b …
+        (self-hosted)     (SaaS+meter)   (SaaS+meter)
+```
+
+### Existing Custodian repos (unchanged)
+
+| Concern | Owner |
+|---------|--------|
+| Workstream, task, progress state | `state-hub` |
+| Cron and orchestration | `activity-core` |
+| SSH reverse tunnels | `ops-bridge` |
+| SSH certificate issuance | `ops-warden` |
+| Canon and agent instruction canon | `the-custodian` |
+| Capability federation hub | `reuse-surface` |
+| Production on Railiance01 | `railiance-apps` / domain repos |
+| ADR-001 reconciliation | `state-hub` |
+
+sand-boxer **consumes** ops-bridge and ops-warden; it does not subsume them.

 ---

 ## What it is

-sand-boxer is the **sandbox provisioning and profile catalog** for Custodian.
+sand-boxer is a **meta-framework** with four pillars:

-It is intended to contain:
+### 1. Unified establishment API

- **Sandbox profiles** — e.g. compose-based e2e stacks, VM images, future
-  container-on-worker patterns
- **Provision / wait / teardown** lifecycle — TTL, idempotent cleanup, port and
-  network conventions
- **Host placement policy** — which profiles run on sandboxer01, coulombcore
-  interim, or other registered hosts
- **CLI and/or API** for operators and agents to request isolated environments
- **State Hub registration contract** — extend the `build-agent` self-register
-  pattern to generic sandbox identities
- **Capability registry entries** in this repo's `registry/` for federation via
-  reuse-surface (e.g. `capability.execution.sandbox-provision`)
- Runbooks, templates (Packer, compose bundles), and tests for the above
+One consistent surface for all sandbox variations:
+
+- Create, inspect, extend, snapshot, recreate, destroy
+- Profile-driven inputs (repo ref, compose bundle, setup metadata, secret refs)
+- Consumer attribution (`adm` / `agt` / `atm` + calling project id)
+- Lifecycle states: `requested → provisioning → ready → active → expired → destroyed`
+
+Early versions may expose a subset; the API shape is designed for completeness.
+
+### 2. Profile catalog
+
+Named, versioned recipes — not one-off containers:
+
+- Extension binding (`ext.compose-ssh`, `ext.vm-packer`, `ext.e2b`, …)
+- Isolation level, network policy, workspace mode (`mirror` | `remote-canonical`)
+- Scope default (`agent` | `session` | `shared`)
+- TTL, resource limits, placement preference
+- Setup metadata (natural-language bootstrap instructions for extensions)
+- Registered in `registry/` and federated via reuse-surface
+
+Profiles collect good ideas from OpenClaw (backend/scope/workspace), Hermes
+(labeled reuse, resource limits), Blitzy (setup instructions, secret boundary),
+and hosted platforms (checkpoint, persistence classes) into **one schema**.
+
+### 3. Extension platform
+
+Extensions **delegate** to sandbox systems and services:
+
+| Class | Examples | Billing |
+|-------|----------|---------|
+| **Self-hosted** | compose-ssh, vm-packer, Daytona OSS, OpenShell | Infra allocation |
+| **SaaS consumption** | E2B, Modal, Daytona cloud, future providers | Payments layer |
+
+Each extension implements a provision / ready / teardown contract (optional
+snapshot / cost estimate). Extensions ship as plugins; third-party and Coulomb-
+native backends use the same interface.
+
+### 4. Payments and metering
+
+For metered SaaS extensions:
+
+- Org/workspace credits and usage accounting
+- Pre-create cost estimates; post-destroy actuals
+- BYOK for provider API keys where supported
+- Export to domain billing systems — sand-boxer meters sandbox consumption,
+  not general payments
+
+Self-hosted extensions record **allocation** (host, duration), not external spend.

 ---

 ## What it is not

-| Concern | Owner |
-|---------|--------|
-| Workstream, task, and progress state | `state-hub` |
-| Cron and event-triggered orchestration | `activity-core` |
-| SSH reverse tunnels and tunnel health | `ops-bridge` |
-| SSH certificate issuance | `ops-warden` |
-| Canon, charters, agent instruction canon | `the-custodian` |
-| Capability index federation hub | `reuse-surface` |
-| Production service deployment on Railiance01 | `railiance-apps` / domain repos |
-| ADR-001 workplan ↔ DB reconciliation | `state-hub` (`consistency_check.py`) |
+| Concern | Owner | sand-boxer role |
+|---------|--------|-----------------|
+| Agent gateway, tools, memory, channels | **glas-harness** | Customer API |
+| E2e tests, health checks, validation | **wise-validator** | Customer API |
+| Code generation, tech specs, AAP | **snuggle-inventor** | Customer API |
+| When work runs | `activity-core` | None |
+| What tasks exist | `state-hub` | Registers lifecycle only |
+| Tunnels | `ops-bridge` | Consumer |
+| Certs | `ops-warden` | Consumer |
+| Intent-aware egress / prompt security | Research frontier | Document limits only |

-sand-boxer may **consume** connectivity and certificates; it must not duplicate
-or subsume those authorities.
+sand-boxer provides **blast-radius isolation and governed reachability**. It does
+not protect against a compromised agent abusing **allowed** egress paths (git,
+npm, curl to allowlisted hosts). Security runbooks must state this explicitly.
+
+---
+
+## Strategic context
+
+### Workstation automation is interim
+
+Local timers and laptop scripts bootstrapped ADR-001 sync. Railiance01
+activity-core schedules are the direction. Workstation paths remain only where no
+sandbox alternative exists yet.
+
+### Host topology
+
+| Layer | Role |
+|-------|------|
+| **Railiance01** | Production k3s, activity-core, Temporal — **not** agent dev runtime |
+| **sandboxer01** | Dedicated sandbox host — preferred blast-radius isolation |
+| **CoulombCore** | Interim sandbox host during migration |
+| **Workstation (WSL)** | Control-plane anchor today — **not** target execution surface |
+| **SaaS extensions** | Burst / capability gap (GPU, desktop) via payments layer |
+
+### Lineage
+
+sand-boxer generalizes patterns split across `the-custodian`:
+
+| Legacy | sand-boxer | Sibling |
+|--------|------------|---------|
+| `e2e-framework/` provision/teardown | `ext.compose-ssh` | wise-validator owns test run |
+| `e2e-framework/` health + test + report | — | wise-validator |
+| `infra/build-machines/` | `ext.vm-packer` | — |
+| Agent sandbox config (future) | API consumer | glas-harness |
+
+`the-custodian` stays governance-focused; sand-boxer becomes the execution
+venue catalog.
+
+### Phase 5: Coulomb-native runtime (later)
+
+After operating extensions in production — observing latency, cost, failure
+modes, isolation gaps — sand-boxer may ship an owned **best-of-brands**
+sandboxing solution combining:
+
+- Persistent labeled workspaces (Hermes pattern)
+- Default-deny policy layer (OpenShell lessons)
+- Fast resume / checkpoint (industry baseline)
+- Self-hosted economics (Daytona/OpenSandbox lessons)
+
+This is **not** v1 scope. Extensions and payments come first; native runtime
+follows evidence.

 ---

 ## Intended users

- **Human operators (`adm`)** — provision sandboxes, manage profiles and hosts,
-  inspect lifecycle and cleanup
- **LLM agents (`agt`)** — request isolated environments for coding, testing,
-  and verification without laptop filesystem dependence
- **Deterministic automations (`atm`)** — activity-core instructions and CI
-  hooks that need a bounded execution venue
+- **Human operators (`adm`)** — profiles, hosts, extensions, credits, lifecycle
+- **LLM agents (`agt`)** — via glas-harness, snuggle-inventor, or direct API
+- **Deterministic automations (`atm`)** — via wise-validator, activity-core, CI
+- **Extension authors** — implement backend adapters against the extension contract
+- **Platform integrators** — register capabilities, federate via reuse-surface

 ---

 ## Design principles

- **Blast radius isolation** — sandbox workloads must not jeopardize Railiance01
-  production stability; prefer dedicated hosts (sandboxer01) for agentic dev
- **Profiles over one-offs** — every sandbox type is a named, versioned profile
-  with documented inputs, outputs, and teardown
- **Reachability, not ownership** — use ops-bridge for tunnels and ops-warden
-  for SSH identity; sand-boxer orchestrates, it does not issue certs or run
-  tunnel daemons
- **Observable lifecycle** — create, ready, active, expired, and destroyed states
-  are attributable and queryable
- **Disposable by default** — sandboxes are TTL-bound; persistence is explicit
-  and exceptional
- **Registry-first reuse** — register capabilities in this repo and federate
-  through reuse-surface before ad hoc duplication elsewhere
+- **Meta-framework, not monolith** — one API; many extensions; optional native runtime later
+- **Profiles over one-offs** — every sandbox type is named, versioned, registered
+- **Prefer self-hosted** — SaaS via explicit routing policy, not silent default
+- **Blast-radius isolation** — dedicated hosts; never jeopardize Railiance01 production
+- **Reachability, not ownership** — ops-bridge + ops-warden as consumers
+- **Secrets at the boundary** — resolve at provision; never in agent-visible workspace
+- **Observable lifecycle** — every state transition attributable and queryable
+- **Disposable by default** — TTL-bound; persistence and checkpoint are explicit
+- **Honest security** — sandboxing limits blast radius; it is not intent enforcement
+- **Registry-first reuse** — capabilities in `registry/` before ad hoc duplication
+- **Payments transparency** — estimate before create; meter on destroy for SaaS

 ---

 ## Near-term outcomes

-A first useful version of sand-boxer should:
-
-1. Define at least one **production-oriented profile** (e.g. compose sandbox on
-   sandboxer01 or coulombcore interim) with documented provision/teardown
-2. Register **`capability.execution.sandbox-provision`** (or equivalent) in
-   `registry/` and pass reuse-surface validation
-3. Integrate with **ops-bridge** reachability and **State Hub** registration
-4. Provide a clear migration path for e2e-framework and build-machines callers
-5. Enable activity-core and agents to request sandboxes without workstation repo
-   paths as a hard dependency
+1. **Charter and research** — `INTENT.md`, `research/`, profile schema draft
+2. **First self-hosted extension** — `ext.compose-ssh` from e2e-framework lineage
+3. **Unified API v0** — create / get / destroy / recreate + State Hub registration
+4. **First profile** — `profile.compose-e2e` for wise-validator migration
+5. **Registry entry** — `capability.execution.sandbox-provision` via reuse-surface
+6. **Extension SDK sketch** — contract for P1 backends (vm-packer, Daytona OSS)
+7. **Sibling integration notes** — glas-harness, wise-validator, snuggle-inventor API expectations documented

 ---

 ## Maturity target

-A mature sand-boxer should be the **standard execution venue** for agentic
-development in Custodian: Railiance01 decides *when* work runs; sand-boxer
-decides *where* isolated execution happens; State Hub records *what* changed.
-The workstation is optional — used for human preference, not as a single point
-of runtime failure.
+A mature sand-boxer is Coulomb's **default way to establish any sandbox**:
+
+- glas-harness requests agent dev sandboxes without choosing Docker vs Modal vs SSH
+- wise-validator requests validation environments without owning provisioners
+- snuggle-inventor requests build sandboxes with setup metadata and secret refs
+- activity-core and CI request bounded venues with consistent lifecycle visibility
+- Operators route spend across self-hosted and SaaS with one credits model
+- A Coulomb-native runtime — if warranted — wins on ops data, not speculation
+
+The workstation is optional. The harness is not sand-boxer. The validator is not
+sand-boxer. The inventor is not sand-boxer. **Establishing the box is.**