Add profile.sandbox-canary, HostSnapshot/inventory/stale schemas, SSH collectors, before/after provision deltas, telemetry export to State Hub and local JSON, default `sandboxer create` self-deploy, inspect/reap-stale CLI, runbook, and CoulombCore verification (26 tests pass).
9.2 KiB
id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_workstream_id
| id | type | title | domain | repo | status | owner | topic_slug | created | updated | state_hub_workstream_id |
|---|---|---|---|---|---|---|---|---|---|---|
| SAND-WP-0008 | workplan | Host telemetry and self-canary introspection | infotech | sand-boxer | finished | codex | custodian | 2026-06-23 | 2026-06-23 | afbcbc84-5ec7-4f8b-ae21-4cbda0d05195 |
Host telemetry and self-canary introspection
Use sand-boxer as its own trial deployment to prove provision/teardown and return actionable host and sandbox intelligence: resource metrics, load before/after, stale sandbox inventory, and structured telemetry for centralized analysis.
Charter: INTENT.md (host topology, observable lifecycle)
Spec: docs/meta-framework.md (Host resource, Meter — extend for self-hosted)
Predecessor: SAND-WP-0002 (ext.compose-ssh, CLI v0, State Hub events)
Related: SAND-WP-0002-T10 (remote smoke), activity-core (scheduled reap jobs)
Problem
Today sandboxer create proves SSH + compose for an arbitrary repo but returns
only lifecycle state and reachability. Operators lack:
- Host load and capacity before accepting new sandboxes
- After metrics to quantify sandbox cost
- Inventory of stale sandboxes (
/tmp/sandboxer/*, orphaned compose projects) - A default smoke path that does not depend on another repo's
e2e/layout
sand-boxer should dogfood itself: deploy the sand-boxer tree, run a bounded introspection bundle on the remote host, and emit telemetry suitable for a central datastore (State Hub first; export to artifact-store or metrics pipeline later).
Design host telemetry contract
id: SAND-WP-0008-T01
status: done
priority: high
state_hub_task_id: "8f7b46e3-045e-481c-81bd-1c61734c6eb3"
Author docs/host-telemetry.md defining:
- HostSnapshot — point-in-time host metrics (load, CPU%, mem, disk, docker stats summary)
- SandboxInventory — known sandboxes on host (compose projects matching
sbx-*, directories under configuredbase_dir, age, owning profile if inferable) - StaleCandidate — entries exceeding TTL, idle threshold, or missing store record
- ProvisionDelta —
before/afterHostSnapshot pair around create/destroy - IntrospectionReport — bundled output attached to sandbox
readyresponse - Retention and privacy rules (no secret paths, no full
docker inspectdumps by default)
Extend meta-framework spec with Host observability fields (read-only; sand-boxer
does not own long-term metrics DB).
Define profile.sandbox-canary and introspection schema
id: SAND-WP-0008-T02
status: done
priority: high
state_hub_task_id: "732bae4e-2dd9-4500-a86d-e869007bb383"
Add:
profiles/profile.sandbox-canary.yaml— lightweight compose or no-compose introspection profile bound toext.compose-ssh(or thinext.ssh-introspectif compose is unnecessary for canary)- Pydantic models:
HostSnapshot,SandboxInventory,StaleCandidate,ProvisionDelta,IntrospectionReport - Default inputs:
repooptional; when omitted, resolve to sand-boxer repo root (package parent path orSANDBOXER_REPO_ROOT)
Canary deliverable on ready: JSON IntrospectionReport in sandbox status
detail / reachability extension field.
Implement remote host metrics collector
id: SAND-WP-0008-T03
status: done
priority: high
state_hub_task_id: "7bd22f27-5058-4c19-98b6-b923909a8815"
SSH-side collection (shell + structured parse, no extra daemon on host):
- Load average, CPU count, mem available/total, root disk use
docker system df/ running container count- Optional:
docker stats --no-streamaggregate for sbx-* projects only - Bounded runtime (e.g. ≤10s) and non-root-safe commands
Module: src/sandboxer/telemetry/host_snapshot.py with unit tests using fixture
command output.
Implement stale sandbox discovery
id: SAND-WP-0008-T04
status: done
priority: high
state_hub_task_id: "c2d19bb7-9322-4744-a71e-75f7701a6fb2"
Scan remote host for:
- Directories under
base_dir(default/tmp/sandboxer) with mtime age docker compose lsprojects matchingsbx-*/e2e-*legacy patterns- Cross-check against local
SandboxStore— flag orphans (on host, not in store) and zombies (in store, not on host)
Output StaleCandidate list with suggested action: reap, inspect, ignore.
No automatic deletion in this task — dry-run only.
Capture before/after load around provision
id: SAND-WP-0008-T05
status: done
priority: medium
state_hub_task_id: "b6b02289-d36e-4ee1-9ff7-dc59a1d24886"
Integrate into SandboxManager.create / destroy when profile metadata requests
telemetry (metadata.observability: canary or profile id profile.sandbox-canary):
HostSnapshotbefore extensionprovision- Run provision + wait_ready
HostSnapshotafter ready- Compute
ProvisionDelta(load/mem/disk/container deltas)
Same pattern on destroy for teardown impact. Tests mock SSH collector.
Default repo: deploy sand-boxer itself
id: SAND-WP-0008-T06
status: done
priority: high
state_hub_task_id: "d9941d93-a662-45c0-820b-88d32266c653"
When create has no repo input:
- Resolve default to sand-boxer repository root (
SANDBOXER_REPO_ROOToverride) - Use
profile.sandbox-canaryas default profile when--profileomitted and norepogiven (document precedence: explicit flags win) - Ship minimal
e2e/e2e.ymlordocker-compose.canary.ymlin sand-boxer repo if compose-up is required for parity withext.compose-ssh
CLI examples:
sandboxer create # canary self-deploy
sandboxer create --profile profile.sandbox-canary
sandboxer create --input repo=/other/repo # unchanged behavior
Wire introspection into canary provision flow
id: SAND-WP-0008-T07
status: done
priority: high
state_hub_task_id: "76430452-c98e-44e5-b625-e243dc12b8a5"
After wait_ready for canary profile:
- Rsync includes
src/sandboxer/telemetry/introspection entry script or invoke collector modules via SSH one-liner - Assemble
IntrospectionReport(inventory + deltas + stale candidates) - Attach to
SandboxStatus(new optionaltelemetryfield) - Print human summary in CLI (load delta, stale count, disk headroom)
Telemetry export for centralized analysis
id: SAND-WP-0008-T08
status: done
priority: medium
state_hub_task_id: "4ee4b95b-e7b5-4893-b78e-914f808bc00a"
Emit structured telemetry to:
- State Hub —
progress/events withdetailcontainingIntrospectionReport(extend existing lifecycle emitter) - Local artifact —
~/.local/share/sandboxer/telemetry/<sandbox_id>.jsonfor offline analysis - Export hook (stub) —
TelemetrySinkprotocol for future artifact-store / Prometheus / ClickHouse; document contract only
Include: host, sandbox_id, profile_id, collected_at, schema version.
activity-core may schedule periodic canary runs later — out of scope here.
CLI inspect and stale reap commands
id: SAND-WP-0008-T09
status: done
priority: medium
state_hub_task_id: "6ea8eda6-491b-460a-a526-7565962f449e"
sandboxer inspect host [--host coulombcore] # HostSnapshot + inventory, no create
sandboxer inspect stale [--host ...] [--json] # StaleCandidate list
sandboxer reap-stale --dry-run [--host ...] # report only
sandboxer reap-stale --apply [--older-than 24h] # T10+; gated behind --apply
inspect does not require a running sandbox — SSH + read-only collectors only.
Runbook, tests, and CoulombCore verification
id: SAND-WP-0008-T10
status: done
priority: medium
state_hub_task_id: "435a3993-d8d3-4280-b68a-c37e34d20312"
docs/runbooks/profile-sandbox-canary.md- Integration test: mock SSH fixtures for full report assembly
- Manual proof on CoulombCore:
sandboxer create(no args) →ready+IntrospectionReportsandboxer inspect hostmatches report host metrics- Introduce fake stale dir → appears in
inspect stale destroy→ after snapshot shows load recovery
- Satisfies SAND-WP-0002-T10 smoke variant when canary path used
Record optimization hypotheses (disk pressure, stale reap policy) for phase-2 automation via activity-core.
Out of scope
| Item | Target |
|---|---|
| Long-term metrics database / dashboards | artifact-store or observability stack (separate workplan) |
| Automatic scheduled reap without human gate | activity-core instruction (after dry-run proven) |
| wise-validator migration | SAND-WP-0003 |
| SaaS metering | SAND-WP-0006 |
Completion criteria
sandboxer createwith norepodeploys sand-boxer and returnsIntrospectionReportonready- Before/after host snapshots captured for canary creates
- Stale sandbox inventory with dry-run reap CLI
- Telemetry lands in State Hub
detailand local JSON artifact - Runbook and tests merged; operator runs
make fix-consistency REPO=sand-boxer
Operator note
After merging task status updates:
cd ~/state-hub && make fix-consistency REPO=sand-boxer
Verification record (2026-06-23)
CoulombCore remote proof:
sandboxer create(no args) →ready+telemetry.provision_deltasandboxer inspect host→ load/mem metrics returned- Stale orphans from prior runs detected in
stale_candidates sandboxer destroy→destroy_deltawith load Δ -0.09, mem +54 MB