Implement SAND-WP-0008: host telemetry and self-canary

Add profile.sandbox-canary, HostSnapshot/inventory/stale schemas, SSH
collectors, before/after provision deltas, telemetry export to State Hub
and local JSON, default `sandboxer create` self-deploy, inspect/reap-stale
CLI, runbook, and CoulombCore verification (26 tests pass).
This commit is contained in:
2026-06-23 19:53:51 +02:00
parent 582c1dd3c6
commit c0a9261cdc
22 changed files with 1047 additions and 26 deletions

95
docs/host-telemetry.md Normal file
View File

@@ -0,0 +1,95 @@
# Host telemetry contract
Version 0.1 — SAND-WP-0008. Extends `docs/meta-framework.md` Host resource with
read-only observability. sand-boxer collects and exports telemetry; it does not
own long-term metrics storage.
---
## Types
### HostSnapshot
Point-in-time host metrics collected over SSH (≤10s, non-root-safe).
| Field | Description |
|-------|-------------|
| `load_1m`, `load_5m`, `load_15m` | `/proc/loadavg` |
| `cpu_count` | Logical CPUs |
| `mem_total_mb`, `mem_available_mb` | From `free -m` |
| `disk_root_used_pct`, `disk_root_avail_gb` | Root filesystem |
| `running_containers` | All running containers (podman/docker) |
| `sandbox_containers` | Containers with `sbx-*` compose project label |
### SandboxInventory
Known sandbox artifacts on a host.
| Entry type | Source |
|------------|--------|
| `directory` | `{base_dir}/{sandbox_id}` |
| `compose_project` | `sbx-*` or legacy `e2e-*` compose labels |
Each entry: `id`, `path`, `age_hours`, `profile_hint` (inferred from project name).
### StaleCandidate
| Kind | Meaning | Suggested action |
|------|---------|------------------|
| `orphan_dir` | Dir on host, not in local store | `reap` |
| `orphan_compose` | Compose project on host, not in store | `reap` |
| `zombie_record` | Store record not `destroyed`, missing on host | `inspect` |
| `aged_dir` | Dir older than threshold | `reap` |
Actions: `reap`, `inspect`, `ignore`. Automatic reap requires `--apply` on CLI.
### ProvisionDelta
`before` and `after` HostSnapshot pair with computed deltas:
- `load_1m_delta`, `mem_available_mb_delta`, `running_containers_delta`
### IntrospectionReport
Bundled canary output attached to `SandboxStatus.telemetry` on `ready`:
```json
{
"schema_version": "0.1",
"host": "92.205.130.254",
"sandbox_id": "abc12345",
"profile_id": "profile.sandbox-canary",
"collected_at": "2026-06-23T...",
"provision_delta": { "before": {}, "after": {}, "load_1m_delta": 0.1 },
"inventory": { "entries": [], "host": "..." },
"stale_candidates": []
}
```
---
## Privacy and retention
- No secret paths, env files, or full `docker inspect` dumps
- Telemetry JSON retained locally under `~/.local/share/sandboxer/telemetry/`
- State Hub events include report in `detail` — same redaction rules apply
- Operators may set `SANDBOXER_NO_STATE_HUB=1` to skip remote emission
---
## Export sinks
| Sink | Status |
|------|--------|
| State Hub `progress/` | Implemented |
| Local JSON artifact | Implemented |
| `TelemetrySink` protocol | Stub for artifact-store / Prometheus / ClickHouse |
---
## Profile trigger
Telemetry collection runs when:
- Profile id is `profile.sandbox-canary`, or
- `profile.metadata.observability` is `canary`

View File

@@ -14,7 +14,7 @@ agent harnessing, validation, and code generation.
|----------|-------------|
| **Profile** | Named, versioned sandbox recipe: extension binding, isolation, network, TTL, placement |
| **Extension** | Backend adapter implementing provision / wait_ready / teardown |
| **Host** | Registered placement target for self-hosted extensions |
| **Host** | Registered placement target for self-hosted extensions; read-only telemetry via `profile.sandbox-canary` (see `docs/host-telemetry.md`) |
| **Sandbox** | Running instance of a profile |
| **Snapshot** | Point-in-time workspace checkpoint (deferred — SAND-WP-0003) |
| **Route** | Extension selection policy when multiple backends qualify |

View File

@@ -0,0 +1,58 @@
# Runbook: profile.sandbox-canary
Self-deploy sand-boxer to verify host health and return telemetry.
## Quick start
```bash
export SANDBOXER_HOST=coulombcore
export SANDBOXER_COMPOSE_CMD=podman-compose # CoulombCore
sandboxer create # no args — canary self-deploy + IntrospectionReport
```
## What you get on `ready`
`SandboxStatus.telemetry` contains:
- **provision_delta** — host load/memory/container counts before vs after
- **inventory** — sandbox dirs and compose projects on host
- **stale_candidates** — orphans and aged sandboxes (dry-run recommendations)
Human summary prints to stderr:
```
Telemetry: load Δ +0.12, mem avail Δ -48 MB, stale candidates: 0
```
Artifacts: `~/.local/share/sandboxer/telemetry/<sandbox_id>.json`
## Inspect without creating
```bash
sandboxer inspect host
sandboxer inspect stale --older-than 24
sandboxer reap-stale --dry-run
sandboxer reap-stale --apply --older-than 48 # destructive — review dry-run first
```
## Destroy
```bash
sandboxer destroy <sandbox_id>
```
Destroy telemetry includes **destroy_delta** (load recovery after teardown).
## Verification checklist (SAND-WP-0008-T10)
1. `sandboxer create``ready` + `telemetry.provision_delta`
2. `sandboxer inspect host` → metrics consistent with create report
3. Fake stale dir: `ssh host 'mkdir -p /tmp/sandboxer/fake99'` → appears in `inspect stale`
4. `sandboxer destroy``destroy_delta` shows load/mem recovery
## Optimization notes (activity-core follow-up)
- Schedule periodic `sandboxer create` canary on sandboxer01
- Reap policy: `--older-than 24` with human-approved `--apply`
- Disk pressure alerts when `disk_root_avail_gb` < threshold