generated from coulomb/repo-seed
Implement SAND-WP-0008: host telemetry and self-canary
Add profile.sandbox-canary, HostSnapshot/inventory/stale schemas, SSH collectors, before/after provision deltas, telemetry export to State Hub and local JSON, default `sandboxer create` self-deploy, inspect/reap-stale CLI, runbook, and CoulombCore verification (26 tests pass).
This commit is contained in:
@@ -36,11 +36,16 @@ make cli-version # smoke test: sandboxer version
|
||||
Sandbox CLI (v0):
|
||||
|
||||
```bash
|
||||
sandboxer create # canary self-deploy (profile.sandbox-canary)
|
||||
sandboxer create --profile profile.compose-e2e --input repo=/path/to/repo
|
||||
sandboxer get <id>
|
||||
sandboxer list
|
||||
sandboxer destroy <id>
|
||||
sandboxer recreate <id>
|
||||
sandboxer inspect host
|
||||
sandboxer inspect stale
|
||||
sandboxer reap-stale # dry-run; add --apply to remove
|
||||
export SANDBOXER_COMPOSE_CMD=podman-compose # required on CoulombCore
|
||||
```
|
||||
|
||||
Equivalent `uv` invocations without Make:
|
||||
|
||||
@@ -172,8 +172,13 @@ make lint # ruff check
|
||||
make format # ruff format
|
||||
make build # uv build
|
||||
make cli-version # smoke test: sandboxer version
|
||||
make smoke-remote # SAND-WP-0002 compose-e2e smoke
|
||||
```
|
||||
|
||||
Canary self-deploy (SAND-WP-0008): `sandboxer create` with no args deploys
|
||||
sand-boxer and returns `telemetry` (host metrics, stale inventory). See
|
||||
`docs/runbooks/profile-sandbox-canary.md`.
|
||||
|
||||
Canonical detail: `.claude/rules/stack-and-commands.md`.
|
||||
|
||||
---
|
||||
|
||||
4
SCOPE.md
4
SCOPE.md
@@ -126,8 +126,8 @@ Additional boundaries:
|
||||
- **Registry:** scaffold present (`registry/indexes/capabilities.yaml` empty;
|
||||
`registry/capabilities/` placeholder); domain in index still `helix_forge`
|
||||
from scaffold — needs alignment to `infotech`
|
||||
- **Workplans:** `SAND-WP-0001` finished; `SAND-WP-0002` finished;
|
||||
`SAND-WP-0008` ready (host telemetry / self-canary)
|
||||
- **Workplans:** `SAND-WP-0001`–`0002` finished; `SAND-WP-0008` finished
|
||||
(host telemetry / self-canary)
|
||||
- **Lineage (external, not yet migrated):** `the-custodian/e2e-framework/`
|
||||
(CUST-WP-0028, completed) and `infra/build-machines/` (CUST-WP-0032)
|
||||
|
||||
|
||||
95
docs/host-telemetry.md
Normal file
95
docs/host-telemetry.md
Normal file
@@ -0,0 +1,95 @@
|
||||
# Host telemetry contract
|
||||
|
||||
Version 0.1 — SAND-WP-0008. Extends `docs/meta-framework.md` Host resource with
|
||||
read-only observability. sand-boxer collects and exports telemetry; it does not
|
||||
own long-term metrics storage.
|
||||
|
||||
---
|
||||
|
||||
## Types
|
||||
|
||||
### HostSnapshot
|
||||
|
||||
Point-in-time host metrics collected over SSH (≤10s, non-root-safe).
|
||||
|
||||
| Field | Description |
|
||||
|-------|-------------|
|
||||
| `load_1m`, `load_5m`, `load_15m` | `/proc/loadavg` |
|
||||
| `cpu_count` | Logical CPUs |
|
||||
| `mem_total_mb`, `mem_available_mb` | From `free -m` |
|
||||
| `disk_root_used_pct`, `disk_root_avail_gb` | Root filesystem |
|
||||
| `running_containers` | All running containers (podman/docker) |
|
||||
| `sandbox_containers` | Containers with `sbx-*` compose project label |
|
||||
|
||||
### SandboxInventory
|
||||
|
||||
Known sandbox artifacts on a host.
|
||||
|
||||
| Entry type | Source |
|
||||
|------------|--------|
|
||||
| `directory` | `{base_dir}/{sandbox_id}` |
|
||||
| `compose_project` | `sbx-*` or legacy `e2e-*` compose labels |
|
||||
|
||||
Each entry: `id`, `path`, `age_hours`, `profile_hint` (inferred from project name).
|
||||
|
||||
### StaleCandidate
|
||||
|
||||
| Kind | Meaning | Suggested action |
|
||||
|------|---------|------------------|
|
||||
| `orphan_dir` | Dir on host, not in local store | `reap` |
|
||||
| `orphan_compose` | Compose project on host, not in store | `reap` |
|
||||
| `zombie_record` | Store record not `destroyed`, missing on host | `inspect` |
|
||||
| `aged_dir` | Dir older than threshold | `reap` |
|
||||
|
||||
Actions: `reap`, `inspect`, `ignore`. Automatic reap requires `--apply` on CLI.
|
||||
|
||||
### ProvisionDelta
|
||||
|
||||
`before` and `after` HostSnapshot pair with computed deltas:
|
||||
|
||||
- `load_1m_delta`, `mem_available_mb_delta`, `running_containers_delta`
|
||||
|
||||
### IntrospectionReport
|
||||
|
||||
Bundled canary output attached to `SandboxStatus.telemetry` on `ready`:
|
||||
|
||||
```json
|
||||
{
|
||||
"schema_version": "0.1",
|
||||
"host": "92.205.130.254",
|
||||
"sandbox_id": "abc12345",
|
||||
"profile_id": "profile.sandbox-canary",
|
||||
"collected_at": "2026-06-23T...",
|
||||
"provision_delta": { "before": {}, "after": {}, "load_1m_delta": 0.1 },
|
||||
"inventory": { "entries": [], "host": "..." },
|
||||
"stale_candidates": []
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Privacy and retention
|
||||
|
||||
- No secret paths, env files, or full `docker inspect` dumps
|
||||
- Telemetry JSON retained locally under `~/.local/share/sandboxer/telemetry/`
|
||||
- State Hub events include report in `detail` — same redaction rules apply
|
||||
- Operators may set `SANDBOXER_NO_STATE_HUB=1` to skip remote emission
|
||||
|
||||
---
|
||||
|
||||
## Export sinks
|
||||
|
||||
| Sink | Status |
|
||||
|------|--------|
|
||||
| State Hub `progress/` | Implemented |
|
||||
| Local JSON artifact | Implemented |
|
||||
| `TelemetrySink` protocol | Stub for artifact-store / Prometheus / ClickHouse |
|
||||
|
||||
---
|
||||
|
||||
## Profile trigger
|
||||
|
||||
Telemetry collection runs when:
|
||||
|
||||
- Profile id is `profile.sandbox-canary`, or
|
||||
- `profile.metadata.observability` is `canary`
|
||||
@@ -14,7 +14,7 @@ agent harnessing, validation, and code generation.
|
||||
|----------|-------------|
|
||||
| **Profile** | Named, versioned sandbox recipe: extension binding, isolation, network, TTL, placement |
|
||||
| **Extension** | Backend adapter implementing provision / wait_ready / teardown |
|
||||
| **Host** | Registered placement target for self-hosted extensions |
|
||||
| **Host** | Registered placement target for self-hosted extensions; read-only telemetry via `profile.sandbox-canary` (see `docs/host-telemetry.md`) |
|
||||
| **Sandbox** | Running instance of a profile |
|
||||
| **Snapshot** | Point-in-time workspace checkpoint (deferred — SAND-WP-0003) |
|
||||
| **Route** | Extension selection policy when multiple backends qualify |
|
||||
|
||||
58
docs/runbooks/profile-sandbox-canary.md
Normal file
58
docs/runbooks/profile-sandbox-canary.md
Normal file
@@ -0,0 +1,58 @@
|
||||
# Runbook: profile.sandbox-canary
|
||||
|
||||
Self-deploy sand-boxer to verify host health and return telemetry.
|
||||
|
||||
## Quick start
|
||||
|
||||
```bash
|
||||
export SANDBOXER_HOST=coulombcore
|
||||
export SANDBOXER_COMPOSE_CMD=podman-compose # CoulombCore
|
||||
|
||||
sandboxer create # no args — canary self-deploy + IntrospectionReport
|
||||
```
|
||||
|
||||
## What you get on `ready`
|
||||
|
||||
`SandboxStatus.telemetry` contains:
|
||||
|
||||
- **provision_delta** — host load/memory/container counts before vs after
|
||||
- **inventory** — sandbox dirs and compose projects on host
|
||||
- **stale_candidates** — orphans and aged sandboxes (dry-run recommendations)
|
||||
|
||||
Human summary prints to stderr:
|
||||
|
||||
```
|
||||
Telemetry: load Δ +0.12, mem avail Δ -48 MB, stale candidates: 0
|
||||
```
|
||||
|
||||
Artifacts: `~/.local/share/sandboxer/telemetry/<sandbox_id>.json`
|
||||
|
||||
## Inspect without creating
|
||||
|
||||
```bash
|
||||
sandboxer inspect host
|
||||
sandboxer inspect stale --older-than 24
|
||||
sandboxer reap-stale --dry-run
|
||||
sandboxer reap-stale --apply --older-than 48 # destructive — review dry-run first
|
||||
```
|
||||
|
||||
## Destroy
|
||||
|
||||
```bash
|
||||
sandboxer destroy <sandbox_id>
|
||||
```
|
||||
|
||||
Destroy telemetry includes **destroy_delta** (load recovery after teardown).
|
||||
|
||||
## Verification checklist (SAND-WP-0008-T10)
|
||||
|
||||
1. `sandboxer create` → `ready` + `telemetry.provision_delta`
|
||||
2. `sandboxer inspect host` → metrics consistent with create report
|
||||
3. Fake stale dir: `ssh host 'mkdir -p /tmp/sandboxer/fake99'` → appears in `inspect stale`
|
||||
4. `sandboxer destroy` → `destroy_delta` shows load/mem recovery
|
||||
|
||||
## Optimization notes (activity-core follow-up)
|
||||
|
||||
- Schedule periodic `sandboxer create` canary on sandboxer01
|
||||
- Reap policy: `--older-than 24` with human-approved `--apply`
|
||||
- Disk pressure alerts when `disk_root_avail_gb` < threshold
|
||||
32
profiles/profile.sandbox-canary.yaml
Normal file
32
profiles/profile.sandbox-canary.yaml
Normal file
@@ -0,0 +1,32 @@
|
||||
id: profile.sandbox-canary
|
||||
version: "1.0.0"
|
||||
extension: ext.compose-ssh
|
||||
isolation:
|
||||
level: container
|
||||
network:
|
||||
default: deny
|
||||
egress: []
|
||||
workspace:
|
||||
mode: remote-canonical
|
||||
access: rw
|
||||
scope_default: session
|
||||
ttl:
|
||||
default: 1h
|
||||
max: 4h
|
||||
idle_reap: null
|
||||
resources:
|
||||
cpu: null
|
||||
memory_mb: null
|
||||
setup:
|
||||
instructions: ""
|
||||
secret_refs: []
|
||||
placement:
|
||||
prefer: [sandboxer01]
|
||||
fallback: [coulombcore]
|
||||
reachability:
|
||||
tunnel: ops-bridge
|
||||
identity: ops-warden
|
||||
metadata:
|
||||
cost_class: self-hosted
|
||||
latency_class: standard
|
||||
observability: canary
|
||||
@@ -9,13 +9,22 @@ import typer
|
||||
|
||||
from sandboxer import __version__
|
||||
from sandboxer.core.manager import SandboxManager
|
||||
from sandboxer.defaults import resolve_create_defaults
|
||||
from sandboxer.models import ActorType, Consumer, SandboxCreateRequest
|
||||
from sandboxer.placement import resolve_host
|
||||
from sandboxer.profiles.loader import load_profile
|
||||
from sandboxer.telemetry.export import export_telemetry
|
||||
from sandboxer.telemetry.introspection import build_introspection_report, collect_host_snapshot
|
||||
from sandboxer.telemetry.inventory import HostInventoryScanner
|
||||
from sandboxer.telemetry.reap import reap_stale
|
||||
|
||||
app = typer.Typer(
|
||||
name="sandboxer",
|
||||
help="Provision and manage isolated sandbox environments.",
|
||||
no_args_is_help=True,
|
||||
)
|
||||
inspect_app = typer.Typer(help="Host introspection without provisioning.")
|
||||
app.add_typer(inspect_app, name="inspect")
|
||||
|
||||
|
||||
@app.callback()
|
||||
@@ -39,13 +48,36 @@ def _parse_inputs(values: list[str]) -> dict[str, str]:
|
||||
return inputs
|
||||
|
||||
|
||||
def _print_status(status: object) -> None:
|
||||
typer.echo(json.dumps(status, default=str, indent=2))
|
||||
def _print_json(data: object) -> None:
|
||||
typer.echo(json.dumps(data, default=str, indent=2))
|
||||
|
||||
|
||||
def _print_telemetry_summary(telemetry: dict | None) -> None:
|
||||
if not telemetry:
|
||||
return
|
||||
delta = telemetry.get("provision_delta") or telemetry.get("destroy_delta")
|
||||
stale = telemetry.get("stale_candidates", [])
|
||||
if delta:
|
||||
typer.echo(
|
||||
f"\nTelemetry: load Δ {delta.get('load_1m_delta', 0):+.3f}, "
|
||||
f"mem avail Δ {delta.get('mem_available_mb_delta', 0):+d} MB, "
|
||||
f"stale candidates: {len(stale)}",
|
||||
err=True,
|
||||
)
|
||||
after = delta.get("after") if delta else None
|
||||
if after:
|
||||
typer.echo(
|
||||
f" host load={after.get('load_1m')} mem_avail={after.get('mem_available_mb')} MB "
|
||||
f"disk_free={after.get('disk_root_avail_gb')} GB",
|
||||
err=True,
|
||||
)
|
||||
|
||||
|
||||
@app.command("create")
|
||||
def sandbox_create(
|
||||
profile: Annotated[str, typer.Option("--profile", help="Profile id")],
|
||||
profile: Annotated[
|
||||
str | None, typer.Option("--profile", help="Profile id (default: canary self-deploy)")
|
||||
] = None,
|
||||
input: Annotated[
|
||||
list[str] | None,
|
||||
typer.Option("--input", help="Input key=value (repeatable)"),
|
||||
@@ -54,10 +86,12 @@ def sandbox_create(
|
||||
project: Annotated[str, typer.Option(help="Calling project id")] = "sand-boxer",
|
||||
host: Annotated[str | None, typer.Option(help="Override placement host")] = None,
|
||||
) -> None:
|
||||
"""Provision a sandbox from a profile."""
|
||||
"""Provision a sandbox. No args → canary self-deploy of sand-boxer."""
|
||||
parsed = _parse_inputs(input or [])
|
||||
resolved_profile, resolved_inputs = resolve_create_defaults(profile, parsed)
|
||||
request = SandboxCreateRequest(
|
||||
profile=profile,
|
||||
inputs=_parse_inputs(input or []),
|
||||
profile=resolved_profile,
|
||||
inputs=resolved_inputs,
|
||||
consumer=Consumer(actor=ActorType(actor), project=project),
|
||||
)
|
||||
manager = SandboxManager()
|
||||
@@ -66,7 +100,9 @@ def sandbox_create(
|
||||
except Exception as exc:
|
||||
typer.echo(f"Error: {exc}", err=True)
|
||||
raise typer.Exit(code=1) from exc
|
||||
_print_status(status.model_dump(mode="json"))
|
||||
payload = status.model_dump(mode="json")
|
||||
_print_json(payload)
|
||||
_print_telemetry_summary(status.telemetry)
|
||||
|
||||
|
||||
@app.command("get")
|
||||
@@ -76,7 +112,7 @@ def sandbox_get(sandbox_id: str) -> None:
|
||||
if not status:
|
||||
typer.echo(f"Sandbox not found: {sandbox_id}", err=True)
|
||||
raise typer.Exit(code=1)
|
||||
_print_status(status.model_dump(mode="json"))
|
||||
_print_json(status.model_dump(mode="json"))
|
||||
|
||||
|
||||
@app.command("list")
|
||||
@@ -87,7 +123,7 @@ def sandbox_list(
|
||||
items = SandboxManager().list()
|
||||
if state:
|
||||
items = [s for s in items if s.state.value == state]
|
||||
_print_status([s.model_dump(mode="json") for s in items])
|
||||
_print_json([s.model_dump(mode="json") for s in items])
|
||||
|
||||
|
||||
@app.command("destroy")
|
||||
@@ -99,7 +135,8 @@ def sandbox_destroy(sandbox_id: str) -> None:
|
||||
except KeyError as exc:
|
||||
typer.echo(str(exc), err=True)
|
||||
raise typer.Exit(code=1) from exc
|
||||
_print_status(status.model_dump(mode="json"))
|
||||
_print_json(status.model_dump(mode="json"))
|
||||
_print_telemetry_summary(status.telemetry)
|
||||
|
||||
|
||||
@app.command("recreate")
|
||||
@@ -111,7 +148,72 @@ def sandbox_recreate(sandbox_id: str) -> None:
|
||||
except (KeyError, Exception) as exc:
|
||||
typer.echo(f"Error: {exc}", err=True)
|
||||
raise typer.Exit(code=1) from exc
|
||||
_print_status(status.model_dump(mode="json"))
|
||||
_print_json(status.model_dump(mode="json"))
|
||||
|
||||
|
||||
@inspect_app.command("host")
|
||||
def inspect_host(
|
||||
host: Annotated[str | None, typer.Option(help="Sandbox host")] = None,
|
||||
profile_id: Annotated[
|
||||
str, typer.Option(help="Profile for placement resolution")
|
||||
] = "profile.sandbox-canary",
|
||||
) -> None:
|
||||
"""Host snapshot and inventory (no sandbox create)."""
|
||||
profile = load_profile(profile_id)
|
||||
resolved = resolve_host(profile, override=host)
|
||||
snapshot = collect_host_snapshot(resolved)
|
||||
scanner = HostInventoryScanner(resolved)
|
||||
inventory = scanner.scan_inventory()
|
||||
stale = scanner.find_stale(SandboxManager().store)
|
||||
report = build_introspection_report(
|
||||
host=resolved,
|
||||
sandbox_id="inspect",
|
||||
profile=profile,
|
||||
provision_before=snapshot,
|
||||
provision_after=snapshot,
|
||||
store=SandboxManager().store,
|
||||
)
|
||||
export_telemetry(report)
|
||||
_print_json(
|
||||
{
|
||||
"host_snapshot": snapshot.model_dump(mode="json"),
|
||||
"inventory": inventory.model_dump(mode="json"),
|
||||
"stale_candidates": [s.model_dump(mode="json") for s in stale],
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
@inspect_app.command("stale")
|
||||
def inspect_stale(
|
||||
host: Annotated[str | None, typer.Option(help="Sandbox host")] = None,
|
||||
older_than: Annotated[float, typer.Option(help="Stale threshold hours")] = 24.0,
|
||||
) -> None:
|
||||
"""List stale sandbox candidates."""
|
||||
profile = load_profile("profile.sandbox-canary")
|
||||
resolved = resolve_host(profile, override=host)
|
||||
scanner = HostInventoryScanner(resolved, stale_hours=older_than)
|
||||
stale = scanner.find_stale(SandboxManager().store, stale_hours=older_than)
|
||||
_print_json([s.model_dump(mode="json") for s in stale])
|
||||
|
||||
|
||||
@app.command("reap-stale")
|
||||
def reap_stale_cmd(
|
||||
host: Annotated[str | None, typer.Option(help="Sandbox host")] = None,
|
||||
older_than: Annotated[float, typer.Option(help="Reap threshold hours")] = 24.0,
|
||||
apply: Annotated[bool, typer.Option("--apply", help="Actually remove stale resources")] = False,
|
||||
) -> None:
|
||||
"""Report or remove stale sandboxes on host (default: dry-run)."""
|
||||
profile = load_profile("profile.sandbox-canary")
|
||||
resolved = resolve_host(profile, override=host)
|
||||
results = reap_stale(
|
||||
resolved,
|
||||
SandboxManager().store,
|
||||
dry_run=not apply,
|
||||
stale_hours=older_than,
|
||||
)
|
||||
mode = "apply" if apply else "dry-run"
|
||||
typer.echo(f"reap-stale ({mode}): {len(results)} candidate(s)", err=True)
|
||||
_print_json([r.model_dump(mode="json") for r in results])
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
@@ -13,6 +13,12 @@ from sandboxer.models import (
|
||||
)
|
||||
from sandboxer.placement import resolve_host
|
||||
from sandboxer.profiles.loader import load_profile
|
||||
from sandboxer.telemetry.export import export_telemetry
|
||||
from sandboxer.telemetry.introspection import (
|
||||
build_introspection_report,
|
||||
collect_host_snapshot,
|
||||
profile_wants_telemetry,
|
||||
)
|
||||
|
||||
|
||||
class SandboxManager:
|
||||
@@ -24,6 +30,8 @@ class SandboxManager:
|
||||
extension = load_extension(profile.extension)
|
||||
backend = resolve_backend(extension)
|
||||
resolved_host = resolve_host(profile, override=host)
|
||||
wants_telemetry = profile_wants_telemetry(profile)
|
||||
base_dir = extension.config.get("base_dir", "/tmp/sandboxer")
|
||||
|
||||
now = utcnow()
|
||||
status = SandboxStatus(
|
||||
@@ -43,6 +51,10 @@ class SandboxManager:
|
||||
status.updated_at = utcnow()
|
||||
emit_lifecycle_event(status, event_type=event_type_for_state(status.state))
|
||||
|
||||
provision_before = None
|
||||
if wants_telemetry:
|
||||
provision_before = collect_host_snapshot(resolved_host)
|
||||
|
||||
try:
|
||||
handle = backend.provision(profile, request.inputs, resolved_host)
|
||||
status.sandbox_id = handle["sandbox_id"]
|
||||
@@ -54,6 +66,21 @@ class SandboxManager:
|
||||
status.state = SandboxState.READY
|
||||
status.ready_at = utcnow()
|
||||
status.updated_at = status.ready_at
|
||||
|
||||
if wants_telemetry and provision_before:
|
||||
provision_after = collect_host_snapshot(resolved_host)
|
||||
report = build_introspection_report(
|
||||
host=resolved_host,
|
||||
sandbox_id=status.sandbox_id,
|
||||
profile=profile,
|
||||
provision_before=provision_before,
|
||||
provision_after=provision_after,
|
||||
store=self.store,
|
||||
base_dir=base_dir,
|
||||
)
|
||||
status.telemetry = report.model_dump(mode="json")
|
||||
export_telemetry(report)
|
||||
|
||||
self.store.save(status)
|
||||
emit_lifecycle_event(status, event_type=event_type_for_state(status.state))
|
||||
return status
|
||||
@@ -86,6 +113,12 @@ class SandboxManager:
|
||||
profile = load_profile(status.profile_id)
|
||||
extension = load_extension(profile.extension)
|
||||
backend = resolve_backend(extension)
|
||||
wants_telemetry = profile_wants_telemetry(profile)
|
||||
base_dir = extension.config.get("base_dir", "/tmp/sandboxer")
|
||||
|
||||
destroy_before = None
|
||||
if wants_telemetry and status.host:
|
||||
destroy_before = collect_host_snapshot(status.host)
|
||||
|
||||
status.state = SandboxState.DESTROYING
|
||||
status.updated_at = utcnow()
|
||||
@@ -106,6 +139,21 @@ class SandboxManager:
|
||||
status.state = SandboxState.DESTROYED
|
||||
status.destroyed_at = utcnow()
|
||||
status.updated_at = status.destroyed_at
|
||||
|
||||
if wants_telemetry and destroy_before and status.host:
|
||||
destroy_after = collect_host_snapshot(status.host)
|
||||
report = build_introspection_report(
|
||||
host=status.host,
|
||||
sandbox_id=status.sandbox_id,
|
||||
profile=profile,
|
||||
destroy_before=destroy_before,
|
||||
destroy_after=destroy_after,
|
||||
store=self.store,
|
||||
base_dir=base_dir,
|
||||
)
|
||||
status.telemetry = report.model_dump(mode="json")
|
||||
export_telemetry(report)
|
||||
|
||||
self.store.save(status)
|
||||
emit_lifecycle_event(status, event_type=event_type_for_state(status.state))
|
||||
return status
|
||||
|
||||
30
src/sandboxer/defaults.py
Normal file
30
src/sandboxer/defaults.py
Normal file
@@ -0,0 +1,30 @@
|
||||
"""Default paths and profile resolution for CLI."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
DEFAULT_CANARY_PROFILE = "profile.sandbox-canary"
|
||||
DEFAULT_COMPOSE_PROFILE = "profile.compose-e2e"
|
||||
|
||||
|
||||
def repo_root() -> Path:
|
||||
override = os.environ.get("SANDBOXER_REPO_ROOT")
|
||||
if override:
|
||||
return Path(override).expanduser().resolve()
|
||||
return Path(__file__).resolve().parents[2]
|
||||
|
||||
|
||||
def resolve_create_defaults(
|
||||
profile: str | None,
|
||||
inputs: dict[str, str],
|
||||
) -> tuple[str, dict[str, str]]:
|
||||
"""Apply default profile and repo per SAND-WP-0008-T06."""
|
||||
resolved = dict(inputs)
|
||||
user_repo = "repo" in resolved
|
||||
if not user_repo:
|
||||
resolved["repo"] = str(repo_root())
|
||||
if profile is None:
|
||||
profile = DEFAULT_COMPOSE_PROFILE if user_repo else DEFAULT_CANARY_PROFILE
|
||||
return profile, resolved
|
||||
@@ -39,6 +39,7 @@ def emit_lifecycle_event(
|
||||
"actor_type": status.consumer.actor.value,
|
||||
"state": status.state.value,
|
||||
"reachability": status.reachability.model_dump() if status.reachability else None,
|
||||
"telemetry": status.telemetry,
|
||||
"timestamps": {
|
||||
"created_at": status.created_at.isoformat(),
|
||||
"updated_at": status.updated_at.isoformat(),
|
||||
|
||||
@@ -83,6 +83,7 @@ class ReachabilitySpec(BaseModel):
|
||||
class ProfileMetadata(BaseModel):
|
||||
cost_class: Literal["self-hosted", "saas-metered"] = "self-hosted"
|
||||
latency_class: str = "standard"
|
||||
observability: Literal["none", "canary"] = "none"
|
||||
|
||||
|
||||
class Profile(BaseModel):
|
||||
@@ -141,6 +142,7 @@ class SandboxStatus(BaseModel):
|
||||
reachability: Reachability | None = None
|
||||
inputs: dict[str, str] = Field(default_factory=dict)
|
||||
error: str | None = None
|
||||
telemetry: dict | None = None # IntrospectionReport JSON when canary
|
||||
created_at: datetime
|
||||
updated_at: datetime
|
||||
ready_at: datetime | None = None
|
||||
|
||||
1
src/sandboxer/telemetry/__init__.py
Normal file
1
src/sandboxer/telemetry/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
"""Host telemetry and introspection."""
|
||||
64
src/sandboxer/telemetry/export.py
Normal file
64
src/sandboxer/telemetry/export.py
Normal file
@@ -0,0 +1,64 @@
|
||||
"""Telemetry export sinks."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
from pathlib import Path
|
||||
from typing import Protocol
|
||||
|
||||
import httpx
|
||||
|
||||
from sandboxer.lifecycle.state_hub import hub_url
|
||||
from sandboxer.telemetry.models import IntrospectionReport
|
||||
|
||||
|
||||
class TelemetrySink(Protocol):
|
||||
"""Future export target (artifact-store, Prometheus, ClickHouse)."""
|
||||
|
||||
def publish(self, report: IntrospectionReport) -> None: ...
|
||||
|
||||
|
||||
def telemetry_dir() -> Path:
|
||||
base = Path(os.environ.get("XDG_DATA_HOME", Path.home() / ".local" / "share"))
|
||||
path = base / "sandboxer" / "telemetry"
|
||||
path.mkdir(parents=True, exist_ok=True)
|
||||
return path
|
||||
|
||||
|
||||
def export_local_artifact(report: IntrospectionReport) -> Path:
|
||||
path = telemetry_dir() / f"{report.sandbox_id}.json"
|
||||
path.write_text(json.dumps(report.model_dump(mode="json"), indent=2, default=str))
|
||||
return path
|
||||
|
||||
|
||||
def export_state_hub(report: IntrospectionReport) -> dict | None:
|
||||
if os.environ.get("SANDBOXER_NO_STATE_HUB", "").lower() in ("1", "true", "yes"):
|
||||
return None
|
||||
payload = {
|
||||
"event_type": "note",
|
||||
"summary": (
|
||||
f"Telemetry {report.sandbox_id}: load Δ "
|
||||
f"{report.provision_delta.load_1m_delta if report.provision_delta else 0}, "
|
||||
f"stale={len(report.stale_candidates)}"
|
||||
),
|
||||
"author": "sandboxer",
|
||||
"detail": report.model_dump(mode="json"),
|
||||
}
|
||||
try:
|
||||
response = httpx.post(f"{hub_url()}/progress/", json=payload, timeout=10.0)
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
except httpx.HTTPError:
|
||||
return None
|
||||
|
||||
|
||||
def export_telemetry(report: IntrospectionReport) -> Path:
|
||||
path = export_local_artifact(report)
|
||||
export_state_hub(report)
|
||||
return path
|
||||
|
||||
|
||||
class NoopTelemetrySink:
|
||||
def publish(self, report: IntrospectionReport) -> None:
|
||||
export_telemetry(report)
|
||||
122
src/sandboxer/telemetry/host_snapshot.py
Normal file
122
src/sandboxer/telemetry/host_snapshot.py
Normal file
@@ -0,0 +1,122 @@
|
||||
"""Collect HostSnapshot over SSH."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from datetime import datetime
|
||||
|
||||
from sandboxer.extensions.ssh import SSHConfig
|
||||
from sandboxer.lifecycle.store import utcnow
|
||||
from sandboxer.telemetry.models import HostSnapshot
|
||||
|
||||
|
||||
def parse_loadavg(text: str) -> tuple[float, float, float]:
|
||||
parts = text.strip().split()
|
||||
return float(parts[0]), float(parts[1]), float(parts[2])
|
||||
|
||||
|
||||
def parse_meminfo(text: str) -> tuple[int, int]:
|
||||
total = avail = 0
|
||||
for line in text.splitlines():
|
||||
if line.startswith("MemTotal:"):
|
||||
total = int(line.split()[1]) // 1024
|
||||
elif line.startswith("MemAvailable:"):
|
||||
avail = int(line.split()[1]) // 1024
|
||||
return total, avail
|
||||
|
||||
|
||||
def parse_free_m(text: str) -> tuple[int, int]:
|
||||
for line in text.splitlines():
|
||||
if line.startswith("Mem:"):
|
||||
parts = line.split()
|
||||
return int(parts[1]), int(parts[6])
|
||||
return 0, 0
|
||||
|
||||
|
||||
def parse_df_root(text: str) -> tuple[float, float]:
|
||||
line = text.strip().splitlines()[-1]
|
||||
parts = line.split()
|
||||
used_pct = float(parts[4].rstrip("%"))
|
||||
avail = parts[3]
|
||||
mult = 1.0
|
||||
if avail[-1] in "KMGT":
|
||||
mult = {"K": 1 / 1e6, "M": 1 / 1e3, "G": 1.0, "T": 1000.0}[avail[-1]]
|
||||
avail = avail[:-1]
|
||||
return used_pct, float(avail) * mult
|
||||
|
||||
|
||||
def parse_container_count(text: str) -> int:
|
||||
lines = [ln for ln in text.strip().splitlines() if ln.strip()]
|
||||
return len(lines)
|
||||
|
||||
|
||||
class HostSnapshotCollector:
|
||||
def __init__(
|
||||
self, host: str, *, ssh_user: str | None = None, ssh_key: str | None = None
|
||||
) -> None:
|
||||
self.ssh = SSHConfig.from_env(host, user=ssh_user, key=ssh_key)
|
||||
self.host = host
|
||||
|
||||
def collect(self, *, collected_at: datetime | None = None) -> HostSnapshot:
|
||||
when = collected_at or utcnow()
|
||||
runtime = self._detect_runtime()
|
||||
load = self._run("cat /proc/loadavg")
|
||||
cpu = self._run("nproc")
|
||||
mem = self._run("free -m | awk '/^Mem:/{print $2\" \"$7}'")
|
||||
disk = self._run("df -h / | tail -1")
|
||||
running = self._run(f"{runtime} ps -q 2>/dev/null")
|
||||
sandbox = self._run(
|
||||
f"{runtime} ps -q --filter label=io.podman.compose.project=sbx 2>/dev/null"
|
||||
)
|
||||
if sandbox == "" and runtime == "docker":
|
||||
sandbox = self._run(
|
||||
"docker ps -q --filter label=com.docker.compose.project=sbx 2>/dev/null"
|
||||
)
|
||||
|
||||
load_vals = parse_loadavg(load) if load else (0.0, 0.0, 0.0)
|
||||
cpu_count = int(cpu.strip()) if cpu.strip().isdigit() else 0
|
||||
if mem and mem.strip():
|
||||
parts = mem.strip().split()
|
||||
mem_total, mem_avail = int(parts[0]), int(parts[1])
|
||||
else:
|
||||
mem_total, mem_avail = 0, 0
|
||||
disk_used, disk_avail = parse_df_root(disk) if disk else (0.0, 0.0)
|
||||
|
||||
return HostSnapshot(
|
||||
collected_at=when,
|
||||
host=self.host,
|
||||
load_1m=load_vals[0],
|
||||
load_5m=load_vals[1],
|
||||
load_15m=load_vals[2],
|
||||
cpu_count=cpu_count,
|
||||
mem_total_mb=mem_total,
|
||||
mem_available_mb=mem_avail,
|
||||
disk_root_used_pct=disk_used,
|
||||
disk_root_avail_gb=disk_avail,
|
||||
running_containers=parse_container_count(running),
|
||||
sandbox_containers=parse_container_count(sandbox),
|
||||
container_runtime=runtime,
|
||||
)
|
||||
|
||||
def _detect_runtime(self) -> str:
|
||||
rc, _ = self.ssh.run("command -v podman")
|
||||
if rc == 0:
|
||||
return "podman"
|
||||
rc, _ = self.ssh.run("command -v docker")
|
||||
if rc == 0:
|
||||
return "docker"
|
||||
return "unknown"
|
||||
|
||||
def _run(self, cmd: str) -> str:
|
||||
rc, out = self.ssh.run(cmd, timeout=10)
|
||||
if rc != 0:
|
||||
return ""
|
||||
return out
|
||||
|
||||
|
||||
def compute_delta(before: HostSnapshot, after: HostSnapshot) -> dict[str, float | int]:
|
||||
return {
|
||||
"load_1m_delta": round(after.load_1m - before.load_1m, 3),
|
||||
"mem_available_mb_delta": after.mem_available_mb - before.mem_available_mb,
|
||||
"running_containers_delta": after.running_containers - before.running_containers,
|
||||
"sandbox_containers_delta": after.sandbox_containers - before.sandbox_containers,
|
||||
}
|
||||
65
src/sandboxer/telemetry/introspection.py
Normal file
65
src/sandboxer/telemetry/introspection.py
Normal file
@@ -0,0 +1,65 @@
|
||||
"""Assemble IntrospectionReport for canary profiles."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from sandboxer.lifecycle.store import SandboxStore, utcnow
|
||||
from sandboxer.models import Profile
|
||||
from sandboxer.telemetry.host_snapshot import HostSnapshot, compute_delta
|
||||
from sandboxer.telemetry.inventory import HostInventoryScanner
|
||||
from sandboxer.telemetry.models import IntrospectionReport, ProvisionDelta
|
||||
|
||||
|
||||
def profile_wants_telemetry(profile: Profile) -> bool:
|
||||
if profile.id == "profile.sandbox-canary":
|
||||
return True
|
||||
return profile.metadata.observability == "canary"
|
||||
|
||||
|
||||
def build_introspection_report(
|
||||
*,
|
||||
host: str,
|
||||
sandbox_id: str,
|
||||
profile: Profile,
|
||||
store: SandboxStore,
|
||||
base_dir: str = "/tmp/sandboxer",
|
||||
provision_before: HostSnapshot | None = None,
|
||||
provision_after: HostSnapshot | None = None,
|
||||
destroy_before: HostSnapshot | None = None,
|
||||
destroy_after: HostSnapshot | None = None,
|
||||
) -> IntrospectionReport:
|
||||
scanner = HostInventoryScanner(host, base_dir=base_dir)
|
||||
inventory = scanner.scan_inventory()
|
||||
stale = scanner.find_stale(store)
|
||||
|
||||
provision_delta = None
|
||||
if provision_before and provision_after:
|
||||
provision_delta = ProvisionDelta(
|
||||
before=provision_before,
|
||||
after=provision_after,
|
||||
**compute_delta(provision_before, provision_after),
|
||||
)
|
||||
|
||||
destroy_delta = None
|
||||
if destroy_before and destroy_after:
|
||||
destroy_delta = ProvisionDelta(
|
||||
before=destroy_before,
|
||||
after=destroy_after,
|
||||
**compute_delta(destroy_before, destroy_after),
|
||||
)
|
||||
|
||||
return IntrospectionReport(
|
||||
host=host,
|
||||
sandbox_id=sandbox_id,
|
||||
profile_id=profile.id,
|
||||
collected_at=utcnow(),
|
||||
provision_delta=provision_delta,
|
||||
destroy_delta=destroy_delta,
|
||||
inventory=inventory,
|
||||
stale_candidates=stale,
|
||||
)
|
||||
|
||||
|
||||
def collect_host_snapshot(host: str) -> HostSnapshot:
|
||||
from sandboxer.telemetry.host_snapshot import HostSnapshotCollector
|
||||
|
||||
return HostSnapshotCollector(host).collect()
|
||||
177
src/sandboxer/telemetry/inventory.py
Normal file
177
src/sandboxer/telemetry/inventory.py
Normal file
@@ -0,0 +1,177 @@
|
||||
"""Sandbox inventory and stale candidate discovery."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import re
|
||||
from datetime import UTC, datetime
|
||||
|
||||
from sandboxer.extensions.ssh import SSHConfig
|
||||
from sandboxer.lifecycle.store import SandboxStore, utcnow
|
||||
from sandboxer.models import SandboxState
|
||||
from sandboxer.telemetry.models import InventoryEntry, SandboxInventory, StaleCandidate
|
||||
|
||||
_PROJECT_RE = re.compile(r"^(sbx-.+|e2e-.+)$")
|
||||
|
||||
|
||||
def _age_hours(epoch_str: str) -> float | None:
|
||||
try:
|
||||
# stat format %Y
|
||||
ts = int(epoch_str.strip())
|
||||
return round((datetime.now(UTC).timestamp() - ts) / 3600, 2)
|
||||
except ValueError:
|
||||
return None
|
||||
|
||||
|
||||
def _profile_hint_from_project(project: str) -> str | None:
|
||||
if project.startswith("sbx-"):
|
||||
parts = project.split("-")
|
||||
if len(parts) >= 3:
|
||||
return f"profile.{parts[1]}"
|
||||
return None
|
||||
|
||||
|
||||
class HostInventoryScanner:
|
||||
def __init__(
|
||||
self,
|
||||
host: str,
|
||||
*,
|
||||
base_dir: str = "/tmp/sandboxer",
|
||||
ssh_user: str | None = None,
|
||||
stale_hours: float = 24.0,
|
||||
) -> None:
|
||||
self.host = host
|
||||
self.base_dir = base_dir
|
||||
self.ssh = SSHConfig.from_env(host, user=ssh_user)
|
||||
self.stale_hours = stale_hours
|
||||
|
||||
def scan_inventory(self) -> SandboxInventory:
|
||||
when = utcnow()
|
||||
entries: list[InventoryEntry] = []
|
||||
entries.extend(self._scan_directories())
|
||||
entries.extend(self._scan_compose_projects())
|
||||
return SandboxInventory(
|
||||
host=self.host,
|
||||
base_dir=self.base_dir,
|
||||
collected_at=when,
|
||||
entries=entries,
|
||||
)
|
||||
|
||||
def find_stale(
|
||||
self,
|
||||
store: SandboxStore,
|
||||
*,
|
||||
stale_hours: float | None = None,
|
||||
) -> list[StaleCandidate]:
|
||||
threshold = stale_hours if stale_hours is not None else self.stale_hours
|
||||
inventory = self.scan_inventory()
|
||||
on_host_ids = {e.id for e in inventory.entries}
|
||||
store_by_id = {
|
||||
s.sandbox_id: s
|
||||
for s in store.list_all()
|
||||
if s.state != SandboxState.DESTROYED
|
||||
}
|
||||
|
||||
candidates: list[StaleCandidate] = []
|
||||
|
||||
for entry in inventory.entries:
|
||||
in_store = entry.id in store_by_id
|
||||
if not in_store:
|
||||
kind = "orphan_dir" if entry.kind == "directory" else "orphan_compose"
|
||||
candidates.append(
|
||||
StaleCandidate(
|
||||
kind=kind,
|
||||
id=entry.id,
|
||||
path=entry.path,
|
||||
age_hours=entry.age_hours,
|
||||
action="reap",
|
||||
reason="present on host but absent from local store",
|
||||
)
|
||||
)
|
||||
elif entry.age_hours is not None and entry.age_hours >= threshold:
|
||||
candidates.append(
|
||||
StaleCandidate(
|
||||
kind="aged_dir" if entry.kind == "directory" else "orphan_compose",
|
||||
id=entry.id,
|
||||
path=entry.path,
|
||||
age_hours=entry.age_hours,
|
||||
action="reap",
|
||||
reason=f"older than {threshold}h threshold",
|
||||
)
|
||||
)
|
||||
|
||||
for sid, status in store_by_id.items():
|
||||
if sid not in on_host_ids and status.host == self.host:
|
||||
candidates.append(
|
||||
StaleCandidate(
|
||||
kind="zombie_record",
|
||||
id=sid,
|
||||
path=status.reachability.remote_dir if status.reachability else None,
|
||||
age_hours=None,
|
||||
action="inspect",
|
||||
reason="recorded in store but missing on host",
|
||||
)
|
||||
)
|
||||
|
||||
return candidates
|
||||
|
||||
def _scan_directories(self) -> list[InventoryEntry]:
|
||||
cmd = (
|
||||
f"if [ -d {self.base_dir} ]; then "
|
||||
f"find {self.base_dir} -mindepth 1 -maxdepth 1 -type d "
|
||||
f"-printf '%f %Y\\n'; fi"
|
||||
)
|
||||
rc, out = self.ssh.run(cmd, timeout=15)
|
||||
if rc != 0 or not out.strip():
|
||||
return []
|
||||
entries = []
|
||||
for line in out.strip().splitlines():
|
||||
parts = line.split()
|
||||
if len(parts) < 2:
|
||||
continue
|
||||
sid, mtime = parts[0], parts[1]
|
||||
entries.append(
|
||||
InventoryEntry(
|
||||
kind="directory",
|
||||
id=sid,
|
||||
path=f"{self.base_dir}/{sid}",
|
||||
age_hours=_age_hours(mtime),
|
||||
)
|
||||
)
|
||||
return entries
|
||||
|
||||
def _scan_compose_projects(self) -> list[InventoryEntry]:
|
||||
runtime = "podman" if self._has_podman() else "docker"
|
||||
if runtime == "podman":
|
||||
cmd = (
|
||||
"podman ps -a --format '{{.Labels}}' 2>/dev/null | "
|
||||
"grep -o 'io.podman.compose.project=sbx[^, ]*' | "
|
||||
"sed 's/io.podman.compose.project=//' | sort -u"
|
||||
)
|
||||
else:
|
||||
cmd = (
|
||||
"docker ps -a --format '{{.Label \"com.docker.compose.project\"}}' 2>/dev/null | "
|
||||
"grep '^sbx' | sort -u"
|
||||
)
|
||||
rc, out = self.ssh.run(cmd, timeout=15)
|
||||
if rc != 0 or not out.strip():
|
||||
return []
|
||||
entries = []
|
||||
for project in out.strip().splitlines():
|
||||
project = project.strip()
|
||||
if not _PROJECT_RE.match(project):
|
||||
continue
|
||||
sandbox_id = project.rsplit("-", 1)[-1]
|
||||
entries.append(
|
||||
InventoryEntry(
|
||||
kind="compose_project",
|
||||
id=sandbox_id,
|
||||
path=f"compose:{project}",
|
||||
age_hours=None,
|
||||
profile_hint=_profile_hint_from_project(project),
|
||||
)
|
||||
)
|
||||
return entries
|
||||
|
||||
def _has_podman(self) -> bool:
|
||||
rc, _ = self.ssh.run("command -v podman")
|
||||
return rc == 0
|
||||
70
src/sandboxer/telemetry/models.py
Normal file
70
src/sandboxer/telemetry/models.py
Normal file
@@ -0,0 +1,70 @@
|
||||
"""Telemetry and introspection schemas."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from datetime import datetime
|
||||
|
||||
from pydantic import BaseModel, Field
|
||||
|
||||
SCHEMA_VERSION = "0.1"
|
||||
|
||||
|
||||
class HostSnapshot(BaseModel):
|
||||
collected_at: datetime
|
||||
host: str
|
||||
load_1m: float = 0.0
|
||||
load_5m: float = 0.0
|
||||
load_15m: float = 0.0
|
||||
cpu_count: int = 0
|
||||
mem_total_mb: int = 0
|
||||
mem_available_mb: int = 0
|
||||
disk_root_used_pct: float = 0.0
|
||||
disk_root_avail_gb: float = 0.0
|
||||
running_containers: int = 0
|
||||
sandbox_containers: int = 0
|
||||
container_runtime: str = "unknown"
|
||||
|
||||
|
||||
class InventoryEntry(BaseModel):
|
||||
kind: str # directory | compose_project
|
||||
id: str
|
||||
path: str | None = None
|
||||
age_hours: float | None = None
|
||||
profile_hint: str | None = None
|
||||
|
||||
|
||||
class SandboxInventory(BaseModel):
|
||||
host: str
|
||||
base_dir: str
|
||||
collected_at: datetime
|
||||
entries: list[InventoryEntry] = Field(default_factory=list)
|
||||
|
||||
|
||||
class StaleCandidate(BaseModel):
|
||||
kind: str # orphan_dir | orphan_compose | zombie_record | aged_dir
|
||||
id: str
|
||||
path: str | None = None
|
||||
age_hours: float | None = None
|
||||
action: str # reap | inspect | ignore
|
||||
reason: str
|
||||
|
||||
|
||||
class ProvisionDelta(BaseModel):
|
||||
before: HostSnapshot
|
||||
after: HostSnapshot
|
||||
load_1m_delta: float = 0.0
|
||||
mem_available_mb_delta: int = 0
|
||||
running_containers_delta: int = 0
|
||||
sandbox_containers_delta: int = 0
|
||||
|
||||
|
||||
class IntrospectionReport(BaseModel):
|
||||
schema_version: str = SCHEMA_VERSION
|
||||
host: str
|
||||
sandbox_id: str
|
||||
profile_id: str
|
||||
collected_at: datetime
|
||||
provision_delta: ProvisionDelta | None = None
|
||||
destroy_delta: ProvisionDelta | None = None
|
||||
inventory: SandboxInventory | None = None
|
||||
stale_candidates: list[StaleCandidate] = Field(default_factory=list)
|
||||
36
src/sandboxer/telemetry/reap.py
Normal file
36
src/sandboxer/telemetry/reap.py
Normal file
@@ -0,0 +1,36 @@
|
||||
"""Stale sandbox reap (dry-run and apply)."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
|
||||
from sandboxer.extensions.ssh import SSHConfig
|
||||
from sandboxer.lifecycle.store import SandboxStore
|
||||
from sandboxer.telemetry.inventory import HostInventoryScanner
|
||||
from sandboxer.telemetry.models import StaleCandidate
|
||||
|
||||
|
||||
def reap_stale(
|
||||
host: str,
|
||||
store: SandboxStore,
|
||||
*,
|
||||
dry_run: bool = True,
|
||||
stale_hours: float = 24.0,
|
||||
base_dir: str = "/tmp/sandboxer",
|
||||
) -> list[StaleCandidate]:
|
||||
scanner = HostInventoryScanner(host, base_dir=base_dir, stale_hours=stale_hours)
|
||||
candidates = [
|
||||
c for c in scanner.find_stale(store, stale_hours=stale_hours) if c.action == "reap"
|
||||
]
|
||||
if dry_run:
|
||||
return candidates
|
||||
|
||||
compose_cmd = os.environ.get("SANDBOXER_COMPOSE_CMD", "podman-compose")
|
||||
ssh = SSHConfig.from_env(host)
|
||||
for item in candidates:
|
||||
if item.kind in ("orphan_dir", "aged_dir") and item.path:
|
||||
ssh.run(f"rm -rf {item.path}", timeout=30)
|
||||
elif item.kind == "orphan_compose" and item.path and item.path.startswith("compose:"):
|
||||
project = item.path.split(":", 1)[1]
|
||||
ssh.run(f"{compose_cmd} -p {project} down -v 2>/dev/null || true", timeout=60)
|
||||
return candidates
|
||||
12
tests/test_defaults.py
Normal file
12
tests/test_defaults.py
Normal file
@@ -0,0 +1,12 @@
|
||||
"""Default profile resolution."""
|
||||
|
||||
from sandboxer.defaults import DEFAULT_CANARY_PROFILE, DEFAULT_COMPOSE_PROFILE, repo_root
|
||||
|
||||
|
||||
def test_repo_root_points_at_sand_boxer() -> None:
|
||||
assert repo_root().name == "sand-boxer"
|
||||
|
||||
|
||||
def test_canary_profile_constant() -> None:
|
||||
assert DEFAULT_CANARY_PROFILE == "profile.sandbox-canary"
|
||||
assert DEFAULT_COMPOSE_PROFILE == "profile.compose-e2e"
|
||||
87
tests/test_telemetry.py
Normal file
87
tests/test_telemetry.py
Normal file
@@ -0,0 +1,87 @@
|
||||
"""Telemetry parsing and introspection tests."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from datetime import UTC, datetime
|
||||
from pathlib import Path
|
||||
from unittest.mock import patch
|
||||
|
||||
import pytest
|
||||
|
||||
from sandboxer.defaults import resolve_create_defaults
|
||||
from sandboxer.profiles.loader import load_profile
|
||||
from sandboxer.telemetry.host_snapshot import (
|
||||
parse_container_count,
|
||||
parse_df_root,
|
||||
parse_loadavg,
|
||||
)
|
||||
from sandboxer.telemetry.introspection import profile_wants_telemetry
|
||||
from sandboxer.telemetry.models import HostSnapshot, SandboxInventory
|
||||
|
||||
|
||||
def test_parse_loadavg() -> None:
|
||||
assert parse_loadavg("0.52 0.48 0.45 1/234 999") == (0.52, 0.48, 0.45)
|
||||
|
||||
|
||||
def test_parse_df_root() -> None:
|
||||
line = "Filesystem Size Used Avail Use% Mounted\n/dev/sda1 100G 40G 55G 43% /"
|
||||
used, avail = parse_df_root(line)
|
||||
assert used == 43.0
|
||||
assert avail == 55.0
|
||||
|
||||
|
||||
def test_parse_container_count() -> None:
|
||||
assert parse_container_count("abc\ndef\n") == 2
|
||||
|
||||
|
||||
def test_profile_wants_telemetry_canary() -> None:
|
||||
profile = load_profile("profile.sandbox-canary")
|
||||
assert profile_wants_telemetry(profile) is True
|
||||
|
||||
|
||||
def test_profile_wants_telemetry_compose_e2e() -> None:
|
||||
profile = load_profile("profile.compose-e2e")
|
||||
assert profile_wants_telemetry(profile) is False
|
||||
|
||||
|
||||
def test_resolve_create_defaults_no_args() -> None:
|
||||
profile, inputs = resolve_create_defaults(None, {})
|
||||
assert profile == "profile.sandbox-canary"
|
||||
assert "repo" in inputs
|
||||
assert Path(inputs["repo"]).name == "sand-boxer"
|
||||
|
||||
|
||||
def test_resolve_create_defaults_explicit_repo() -> None:
|
||||
profile, inputs = resolve_create_defaults(None, {"repo": "/tmp/foo"})
|
||||
assert profile == "profile.compose-e2e"
|
||||
assert inputs["repo"] == "/tmp/foo"
|
||||
|
||||
|
||||
def test_build_introspection_report_mocked(tmp_path: Path) -> None:
|
||||
from sandboxer.lifecycle.store import SandboxStore
|
||||
from sandboxer.telemetry.introspection import build_introspection_report
|
||||
|
||||
now = datetime.now(UTC)
|
||||
snap = HostSnapshot(collected_at=now, host="h1", load_1m=1.0, mem_available_mb=1000)
|
||||
snap2 = HostSnapshot(collected_at=now, host="h1", load_1m=1.5, mem_available_mb=900)
|
||||
profile = load_profile("profile.sandbox-canary")
|
||||
store = SandboxStore(path=tmp_path / "sandboxes.json")
|
||||
|
||||
with patch("sandboxer.telemetry.introspection.HostInventoryScanner") as scanner_cls:
|
||||
scanner = scanner_cls.return_value
|
||||
scanner.scan_inventory.return_value = SandboxInventory(
|
||||
host="h1", base_dir="/tmp/sandboxer", collected_at=now, entries=[]
|
||||
)
|
||||
scanner.find_stale.return_value = []
|
||||
report = build_introspection_report(
|
||||
host="h1",
|
||||
sandbox_id="abc",
|
||||
profile=profile,
|
||||
provision_before=snap,
|
||||
provision_after=snap2,
|
||||
store=store,
|
||||
)
|
||||
|
||||
assert report.provision_delta is not None
|
||||
assert report.provision_delta.load_1m_delta == pytest.approx(0.5)
|
||||
assert report.provision_delta.mem_available_mb_delta == -100
|
||||
@@ -4,7 +4,7 @@ type: workplan
|
||||
title: "Host telemetry and self-canary introspection"
|
||||
domain: infotech
|
||||
repo: sand-boxer
|
||||
status: ready
|
||||
status: finished
|
||||
owner: codex
|
||||
topic_slug: custodian
|
||||
created: "2026-06-23"
|
||||
@@ -42,7 +42,7 @@ later).
|
||||
|
||||
```task
|
||||
id: SAND-WP-0008-T01
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "8f7b46e3-045e-481c-81bd-1c61734c6eb3"
|
||||
```
|
||||
@@ -64,7 +64,7 @@ does not own long-term metrics DB).
|
||||
|
||||
```task
|
||||
id: SAND-WP-0008-T02
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "732bae4e-2dd9-4500-a86d-e869007bb383"
|
||||
```
|
||||
@@ -86,7 +86,7 @@ Canary deliverable on `ready`: JSON `IntrospectionReport` in sandbox status
|
||||
|
||||
```task
|
||||
id: SAND-WP-0008-T03
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "7bd22f27-5058-4c19-98b6-b923909a8815"
|
||||
```
|
||||
@@ -105,7 +105,7 @@ command output.
|
||||
|
||||
```task
|
||||
id: SAND-WP-0008-T04
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "c2d19bb7-9322-4744-a71e-75f7701a6fb2"
|
||||
```
|
||||
@@ -124,7 +124,7 @@ No automatic deletion in this task — dry-run only.
|
||||
|
||||
```task
|
||||
id: SAND-WP-0008-T05
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "b6b02289-d36e-4ee1-9ff7-dc59a1d24886"
|
||||
```
|
||||
@@ -143,7 +143,7 @@ Same pattern on `destroy` for teardown impact. Tests mock SSH collector.
|
||||
|
||||
```task
|
||||
id: SAND-WP-0008-T06
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "d9941d93-a662-45c0-820b-88d32266c653"
|
||||
```
|
||||
@@ -168,7 +168,7 @@ sandboxer create --input repo=/other/repo # unchanged behavior
|
||||
|
||||
```task
|
||||
id: SAND-WP-0008-T07
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
state_hub_task_id: "76430452-c98e-44e5-b625-e243dc12b8a5"
|
||||
```
|
||||
@@ -185,7 +185,7 @@ After `wait_ready` for canary profile:
|
||||
|
||||
```task
|
||||
id: SAND-WP-0008-T08
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "4ee4b95b-e7b5-4893-b78e-914f808bc00a"
|
||||
```
|
||||
@@ -207,7 +207,7 @@ activity-core may schedule periodic canary runs later — out of scope here.
|
||||
|
||||
```task
|
||||
id: SAND-WP-0008-T09
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "6ea8eda6-491b-460a-a526-7565962f449e"
|
||||
```
|
||||
@@ -225,7 +225,7 @@ sandboxer reap-stale --apply [--older-than 24h] # T10+; gated behind --apply
|
||||
|
||||
```task
|
||||
id: SAND-WP-0008-T10
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
state_hub_task_id: "435a3993-d8d3-4280-b68a-c37e34d20312"
|
||||
```
|
||||
@@ -268,4 +268,13 @@ After merging task status updates:
|
||||
|
||||
```bash
|
||||
cd ~/state-hub && make fix-consistency REPO=sand-boxer
|
||||
```
|
||||
```
|
||||
|
||||
## Verification record (2026-06-23)
|
||||
|
||||
CoulombCore remote proof:
|
||||
|
||||
1. `sandboxer create` (no args) → `ready` + `telemetry.provision_delta`
|
||||
2. `sandboxer inspect host` → load/mem metrics returned
|
||||
3. Stale orphans from prior runs detected in `stale_candidates`
|
||||
4. `sandboxer destroy` → `destroy_delta` with load Δ -0.09, mem +54 MB
|
||||
Reference in New Issue
Block a user