generated from coulomb/repo-seed
Implement SAND-WP-0008: host telemetry and self-canary
Add profile.sandbox-canary, HostSnapshot/inventory/stale schemas, SSH collectors, before/after provision deltas, telemetry export to State Hub and local JSON, default `sandboxer create` self-deploy, inspect/reap-stale CLI, runbook, and CoulombCore verification (26 tests pass).
This commit is contained in:
95
docs/host-telemetry.md
Normal file
95
docs/host-telemetry.md
Normal file
@@ -0,0 +1,95 @@
|
||||
# Host telemetry contract
|
||||
|
||||
Version 0.1 — SAND-WP-0008. Extends `docs/meta-framework.md` Host resource with
|
||||
read-only observability. sand-boxer collects and exports telemetry; it does not
|
||||
own long-term metrics storage.
|
||||
|
||||
---
|
||||
|
||||
## Types
|
||||
|
||||
### HostSnapshot
|
||||
|
||||
Point-in-time host metrics collected over SSH (≤10s, non-root-safe).
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `load_1m`, `load_5m`, `load_15m` | `/proc/loadavg` |
|
||||
| `cpu_count` | Logical CPUs |
|
||||
| `mem_total_mb`, `mem_available_mb` | From `free -m` |
|
||||
| `disk_root_used_pct`, `disk_root_avail_gb` | Root filesystem |
|
||||
| `running_containers` | All running containers (podman/docker) |
|
||||
| `sandbox_containers` | Containers with `sbx-*` compose project label |
|
||||
|
||||
### SandboxInventory
|
||||
|
||||
Known sandbox artifacts on a host.
|
||||
|
||||
| Entry type | Source |
|
||||
|------------|--------|
|
||||
| `directory` | `{base_dir}/{sandbox_id}` |
|
||||
| `compose_project` | `sbx-*` or legacy `e2e-*` compose labels |
|
||||
|
||||
Each entry: `id`, `path`, `age_hours`, `profile_hint` (inferred from project name).
|
||||
|
||||
### StaleCandidate
|
||||
|
||||
| Kind | Meaning | Suggested action |
|
||||
|------|---------|------------------|
|
||||
| `orphan_dir` | Dir on host, not in local store | `reap` |
|
||||
| `orphan_compose` | Compose project on host, not in store | `reap` |
|
||||
| `zombie_record` | Store record not `destroyed`, missing on host | `inspect` |
|
||||
| `aged_dir` | Dir older than threshold | `reap` |
|
||||
|
||||
Actions: `reap`, `inspect`, `ignore`. Automatic reap requires `--apply` on CLI.
|
||||
|
||||
### ProvisionDelta
|
||||
|
||||
`before` and `after` HostSnapshot pair with computed deltas:
|
||||
|
||||
- `load_1m_delta`, `mem_available_mb_delta`, `running_containers_delta`
|
||||
|
||||
### IntrospectionReport
|
||||
|
||||
Bundled canary output attached to `SandboxStatus.telemetry` on `ready`:
|
||||
|
||||
```json
|
||||
{
|
||||
"schema_version": "0.1",
|
||||
"host": "92.205.130.254",
|
||||
"sandbox_id": "abc12345",
|
||||
"profile_id": "profile.sandbox-canary",
|
||||
"collected_at": "2026-06-23T...",
|
||||
"provision_delta": { "before": {}, "after": {}, "load_1m_delta": 0.1 },
|
||||
"inventory": { "entries": [], "host": "..." },
|
||||
"stale_candidates": []
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Privacy and retention
|
||||
|
||||
- No secret paths, env files, or full `docker inspect` dumps
|
||||
- Telemetry JSON retained locally under `~/.local/share/sandboxer/telemetry/`
|
||||
- State Hub events include report in `detail` — same redaction rules apply
|
||||
- Operators may set `SANDBOXER_NO_STATE_HUB=1` to skip remote emission
|
||||
|
||||
---
|
||||
|
||||
## Export sinks
|
||||
|
||||
| Sink | Status |
|
||||
|------|--------|
|
||||
| State Hub `progress/` | Implemented |
|
||||
| Local JSON artifact | Implemented |
|
||||
| `TelemetrySink` protocol | Stub for artifact-store / Prometheus / ClickHouse |
|
||||
|
||||
---
|
||||
|
||||
## Profile trigger
|
||||
|
||||
Telemetry collection runs when:
|
||||
|
||||
- Profile id is `profile.sandbox-canary`, or
|
||||
- `profile.metadata.observability` is `canary`
|
||||
@@ -14,7 +14,7 @@ agent harnessing, validation, and code generation.
|
||||
|----------|-------------|
|
||||
| **Profile** | Named, versioned sandbox recipe: extension binding, isolation, network, TTL, placement |
|
||||
| **Extension** | Backend adapter implementing provision / wait_ready / teardown |
|
||||
| **Host** | Registered placement target for self-hosted extensions |
|
||||
| **Host** | Registered placement target for self-hosted extensions; read-only telemetry via `profile.sandbox-canary` (see `docs/host-telemetry.md`) |
|
||||
| **Sandbox** | Running instance of a profile |
|
||||
| **Snapshot** | Point-in-time workspace checkpoint (deferred — SAND-WP-0003) |
|
||||
| **Route** | Extension selection policy when multiple backends qualify |
|
||||
|
||||
58
docs/runbooks/profile-sandbox-canary.md
Normal file
58
docs/runbooks/profile-sandbox-canary.md
Normal file
@@ -0,0 +1,58 @@
|
||||
# Runbook: profile.sandbox-canary
|
||||
|
||||
Self-deploy sand-boxer to verify host health and return telemetry.
|
||||
|
||||
## Quick start
|
||||
|
||||
```bash
|
||||
export SANDBOXER_HOST=coulombcore
|
||||
export SANDBOXER_COMPOSE_CMD=podman-compose # CoulombCore
|
||||
|
||||
sandboxer create # no args — canary self-deploy + IntrospectionReport
|
||||
```
|
||||
|
||||
## What you get on `ready`
|
||||
|
||||
`SandboxStatus.telemetry` contains:
|
||||
|
||||
- **provision_delta** — host load/memory/container counts before vs after
|
||||
- **inventory** — sandbox dirs and compose projects on host
|
||||
- **stale_candidates** — orphans and aged sandboxes (dry-run recommendations)
|
||||
|
||||
Human summary prints to stderr:
|
||||
|
||||
```
|
||||
Telemetry: load Δ +0.12, mem avail Δ -48 MB, stale candidates: 0
|
||||
```
|
||||
|
||||
Artifacts: `~/.local/share/sandboxer/telemetry/<sandbox_id>.json`
|
||||
|
||||
## Inspect without creating
|
||||
|
||||
```bash
|
||||
sandboxer inspect host
|
||||
sandboxer inspect stale --older-than 24
|
||||
sandboxer reap-stale --dry-run
|
||||
sandboxer reap-stale --apply --older-than 48 # destructive — review dry-run first
|
||||
```
|
||||
|
||||
## Destroy
|
||||
|
||||
```bash
|
||||
sandboxer destroy <sandbox_id>
|
||||
```
|
||||
|
||||
Destroy telemetry includes **destroy_delta** (load recovery after teardown).
|
||||
|
||||
## Verification checklist (SAND-WP-0008-T10)
|
||||
|
||||
1. `sandboxer create` → `ready` + `telemetry.provision_delta`
|
||||
2. `sandboxer inspect host` → metrics consistent with create report
|
||||
3. Fake stale dir: `ssh host 'mkdir -p /tmp/sandboxer/fake99'` → appears in `inspect stale`
|
||||
4. `sandboxer destroy` → `destroy_delta` shows load/mem recovery
|
||||
|
||||
## Optimization notes (activity-core follow-up)
|
||||
|
||||
- Schedule periodic `sandboxer create` canary on sandboxer01
|
||||
- Reap policy: `--older-than 24` with human-approved `--apply`
|
||||
- Disk pressure alerts when `disk_root_avail_gb` < threshold
|
||||
Reference in New Issue
Block a user