From df658e7ef989686c1ca466b6f1f4717c5b6acfb0 Mon Sep 17 00:00:00 2001 From: tegwick Date: Wed, 24 Jun 2026 12:44:04 +0200 Subject: [PATCH] feat: TTL enforcement and operational hardening (SAND-WP-0009) Add TTL parser, expires_at on create, extend_ttl and expire/reap APIs, activity-core integration doc, repo classification, registry refresh, HTTP parity, and 69 tests. --- .repo-classification.yaml | 25 ++ SCOPE.md | 15 +- docs/integrations/activity-core.md | 42 +++ docs/meta-framework.md | 5 +- docs/migration-gaps.md | 2 +- docs/security.md | 23 ++ docs/ttl.md | 65 +++++ registry/README.md | 11 +- .../execution.sandbox-provision.md | 31 +- src/sandboxer/api/app.py | 29 +- src/sandboxer/cli.py | 29 ++ src/sandboxer/core/manager.py | 108 +++++++ src/sandboxer/lifecycle/expire.py | 77 +++++ src/sandboxer/lifecycle/state_hub.py | 4 +- src/sandboxer/lifecycle/ttl.py | 121 ++++++++ src/sandboxer/models.py | 13 + tests/test_api.py | 62 +++- tests/test_snapshots.py | 2 - tests/test_ttl.py | 265 ++++++++++++++++++ ...D-WP-0009-ttl-and-operational-hardening.md | 23 +- 20 files changed, 913 insertions(+), 39 deletions(-) create mode 100644 .repo-classification.yaml create mode 100644 docs/integrations/activity-core.md create mode 100644 docs/security.md create mode 100644 docs/ttl.md create mode 100644 src/sandboxer/lifecycle/expire.py create mode 100644 src/sandboxer/lifecycle/ttl.py create mode 100644 tests/test_ttl.py diff --git a/.repo-classification.yaml b/.repo-classification.yaml new file mode 100644 index 0000000..dcd381e --- /dev/null +++ b/.repo-classification.yaml @@ -0,0 +1,25 @@ +repo_classification: + standard: Repo Classification Standard + version: '1.0' + classified_at: '2026-06-24' + classified_by: codex + category: tooling + domain: infotech + secondary_domains: + - agents + capability_tags: + - sandbox + - isolation + - provision + - execution + - orchestration + business_stake: + - technology + - execution + - automation + business_mechanics: + - operation + - coordination + notes: > + Sandbox establishment meta-framework — profiles, extensions, routing, + lifecycle, TTL, and host telemetry for agentic and deterministic workloads. \ No newline at end of file diff --git a/SCOPE.md b/SCOPE.md index 7220c9d..725184a 100644 --- a/SCOPE.md +++ b/SCOPE.md @@ -44,7 +44,7 @@ orchestration from `create` remains deferred. ## In Scope - **Unified establishment API** — CLI v0 + HTTP stub (`create`, `get`, `list`, - `destroy`, `recreate`, `snapshot`, `restore`); `extend_ttl` planned + `destroy`, `recreate`, `snapshot`, `restore`, `extend-ttl`, `expire`) - **Profile catalog** — six profiles: compose e2e/checkpoint, sandbox canary, vm-haskell-build, saas-stub, burst-sandbox - **Extension platform** — `ext.compose-ssh`, `ext.vm-packer`, `ext.saas-stub`; @@ -128,12 +128,12 @@ own tunnels or CAs. - **Docs:** `meta-framework`, `extension-sdk`, `host-telemetry`, `routing`, `payments`, `snapshots`, `migration-gaps`, `migration-build-machines` - **Registry:** `capability.execution.sandbox-provision` indexed (draft) -- **Tests:** 54 pytest cases; `make check` green +- **Tests:** 69 pytest cases; `make check` green - **Siblings:** wise-validator `validate run` (SAND-WP-0003); the-custodian `make e2e REPO=` shim (SAND-WP-0004) Latest gap analysis: `history/2026-06-24-post-wp0007-intent-scope-gap-analysis.md` -Next workplan: **SAND-WP-0009** (TTL enforcement and operational hardening). +Latest workplan: **SAND-WP-0009** (TTL enforcement — finished). --- @@ -149,6 +149,9 @@ sandboxer get / list / destroy / recreate sandboxer snapshot [--name LABEL] sandboxer restore sandboxer snapshots list / snapshots get +sandboxer extend-ttl --duration 2h +sandboxer expire [--apply] +sandboxer create --ttl 2h ... sandboxer credits show / credits add sandboxer inspect host / inspect stale / reap-stale [--apply] make smoke-remote # CoulombCore compose smoke (SANDBOXER_HOST) @@ -168,14 +171,14 @@ cd ~/the-custodian && make e2e REPO=activity-core ## What Is Not Possible Yet -- TTL auto-expiry / `extend_ttl` enforcement +- ~~TTL auto-expiry / `extend_ttl` enforcement~~ — done (SAND-WP-0009) - Packer build orchestration from `create` (attach-only today) - Real E2B / Modal / Daytona adapters (in-repo stub only) - Cross-host snapshot transfer - Formal ops-bridge tunnel attachment in reachability descriptor - Dedicated sandboxer01 host (CoulombCore interim only today) - `reuse-surface validate` / federation publish workflow -- `.repo-classification.yaml` (State Hub C-24 hygiene) +- ~~`.repo-classification.yaml`~~ — done (SAND-WP-0009) - fin-hub billing export for metered usage --- @@ -239,6 +242,8 @@ see `registry/capabilities/execution.sandbox-provision.md`. | `docs/routing.md` | Backend selection strategies | | `docs/payments.md` | Credits and metering | | `docs/snapshots.md` | Checkpoint snapshot/restore | +| `docs/ttl.md` | TTL extend and expire/reap | +| `docs/security.md` | Blast-radius vs intent enforcement | | `docs/migration-gaps.md` | Legacy cutover status | | `docs/integrations/` | Consumer contracts | | `workplans/` | ADR-001 work structure | diff --git a/docs/integrations/activity-core.md b/docs/integrations/activity-core.md new file mode 100644 index 0000000..cf3842c --- /dev/null +++ b/docs/integrations/activity-core.md @@ -0,0 +1,42 @@ +# activity-core integration + +activity-core schedules bounded work on Railiance01. sand-boxer provides +**sandbox venues** with TTL enforcement; activity-core owns **when** expire runs. + +## Scheduled TTL reap + +Run periodically (cron, Temporal activity, or CI): + +```bash +sandboxer expire --apply +``` + +HTTP equivalent: + +```http +POST /v1/sandboxes/expire?apply=true +``` + +Returns a list of `ExpireActionResult` entries (`dry-run`, `destroyed`, `failed`). + +## Lifecycle events + +Each expired sandbox emits a State Hub progress event: + +- `state`: `expired` (`event_type`: `milestone`) +- Followed by `destroying` → `destroyed` + +Event `detail` includes `ttl`, `expires_at`, and reachability fields. + +## What sand-boxer does not do + +- No Temporal workflows or activity-core code in this repo +- No push webhook to activity-core on expiry (poll/schedule only in v0) +- TTL parsing and destroy orchestration live in sand-boxer + +## Consumer pattern + +1. activity-core activity provisions via `sandboxer create` (or HTTP) +2. Work runs in the sandbox (glas-harness, wise-validator, etc.) +3. Scheduled `sandboxer expire --apply` reaps past-TTL sandboxes +4. State Hub records full lifecycle for audit \ No newline at end of file diff --git a/docs/meta-framework.md b/docs/meta-framework.md index 301a824..4218471 100644 --- a/docs/meta-framework.md +++ b/docs/meta-framework.md @@ -82,7 +82,7 @@ Extends the `build-agent` self-register pattern: generic sandbox identities carr | `create` | Provision from profile + inputs | **Yes** | | `get` | Inspect sandbox status | **Yes** | | `list` | List sandboxes (filter by consumer optional) | **Yes** | -| `extend_ttl` | Extend time-to-live | Stub | +| `extend_ttl` | Extend time-to-live | **Yes** | | `recreate` | Destroy and reprovision from stored seed | **Yes** | | `destroy` | Idempotent teardown | **Yes** | | `snapshot` / `restore` | Checkpoint workspace | **Yes** (compose-ssh, saas-stub) | @@ -97,6 +97,9 @@ HTTP surface (optional v0; CLI calls core library directly): - `POST /v1/sandboxes/{id}/snapshot` — checkpoint - `POST /v1/snapshots/{id}/restore` — restore - `GET /v1/snapshots` — list checkpoints +- `POST /v1/sandboxes/{id}/recreate` — recreate +- `PATCH /v1/sandboxes/{id}/ttl` — extend TTL +- `POST /v1/sandboxes/expire` — TTL reap (query `apply=true`) --- diff --git a/docs/migration-gaps.md b/docs/migration-gaps.md index 0ecf10f..df4d345 100644 --- a/docs/migration-gaps.md +++ b/docs/migration-gaps.md @@ -46,4 +46,4 @@ Deferred: Packer orchestration from API, `make remote-build` shim. | ~~SaaS extensions + payments v0~~ | SAND-WP-0006 — stub + routing + credits | | E2B / Modal real adapters | Post SAND-WP-0006 | | ~~Snapshot / restore~~ | SAND-WP-0007 — `docs/snapshots.md` | -| TTL enforcement + scheduled reap | **SAND-WP-0009** | \ No newline at end of file +| ~~TTL enforcement + scheduled reap~~ | SAND-WP-0009 — `docs/ttl.md` | \ No newline at end of file diff --git a/docs/security.md b/docs/security.md new file mode 100644 index 0000000..75f53e5 --- /dev/null +++ b/docs/security.md @@ -0,0 +1,23 @@ +# Security posture + +sand-boxer limits **blast radius** — it does not enforce **intent**. + +## What sandboxing provides + +- Isolated compose projects and workspace directories on placement hosts +- Profile-declared network default-deny (declarative in v0; enforcement varies by extension) +- TTL-bound disposable venues with automated expire/reap +- Consumer attribution (`adm` / `agt` / `atm`) on lifecycle events + +## What sandboxing does not provide + +- Protection against a malicious or compromised agent *inside* the sandbox +- Guarantee that an agent follows instructions or policy +- Replacement for secrets management (use OpenBao / operator paths via `warden route`) +- Production isolation on Railiance01 (sandboxes run on sandboxer01 / CoulombCore) + +Per INTENT: *"Honest security — sandboxing limits blast radius; it is not intent +enforcement."* + +Operators should combine sand-boxer with flex-auth, credential routing, and +harness-level controls for end-to-end safety. \ No newline at end of file diff --git a/docs/ttl.md b/docs/ttl.md new file mode 100644 index 0000000..1f84009 --- /dev/null +++ b/docs/ttl.md @@ -0,0 +1,65 @@ +# Time-to-live (TTL) + +Disposable-by-default sandboxes — SAND-WP-0009. + +## Semantics + +Each ready sandbox has: + +| Field | Meaning | +|-------|---------| +| `ttl` | Active duration string (e.g. `4h`) | +| `expires_at` | UTC timestamp when the sandbox should be reaped | + +On `create`, TTL comes from `SandboxCreateRequest.ttl` or the profile +`ttl.default`, capped at `ttl.max`. Anchor is `ready_at`. + +Duration format: positive integer + unit — `s`, `m`, `h`, `d` (e.g. `30m`, `4h`). + +## extend_ttl + +Extend a live sandbox (`ready` or `active`): + +```bash +sandboxer extend-ttl --duration 2h +``` + +HTTP: `PATCH /v1/sandboxes/{id}/ttl` with `{"duration": "2h"}`. + +Extension adds to the current `expires_at` (or now if already past). Total lifetime +from `ready_at` cannot exceed profile `ttl.max`. + +## expire / reap + +TTL reap is distinct from host inventory `reap-stale`: + +| Command | Purpose | +|---------|---------| +| `sandboxer expire` | Sandboxes past `expires_at` or profile `ttl.idle_reap` | +| `sandboxer reap-stale` | Orphan host resources vs store inventory | + +```bash +sandboxer expire # dry-run (default) +sandboxer expire --apply # mark expired, destroy +``` + +HTTP: `POST /v1/sandboxes/expire?apply=true` + +Flow on `--apply`: + +1. Transition to `expired` (State Hub milestone) +2. `destroy` (idempotent teardown) + +## Profile fields + +```yaml +ttl: + default: 4h + max: 24h + idle_reap: null # optional; reap when updated_at + idle_reap elapsed +``` + +## activity-core + +Scheduled jobs should invoke `sandboxer expire --apply` (or HTTP equivalent). +See `docs/integrations/activity-core.md`. \ No newline at end of file diff --git a/registry/README.md b/registry/README.md index 569abe9..1d9c1f6 100644 --- a/registry/README.md +++ b/registry/README.md @@ -6,7 +6,16 @@ Markdown-first capability index for federation and reuse planning. 1. Copy a capability entry template (see reuse-surface `templates/capability-entry.template.md`). 2. Add the row to `indexes/capabilities.yaml`. -3. Run `reuse-surface validate` from a checkout with the CLI installed. +3. Run `reuse-surface validate` from a checkout with the CLI installed: + + ```bash + cd ~/reuse-surface + reuse-surface validate --repo ~/sand-boxer + ``` + 4. Merge to `main` and verify publish with `reuse-surface establish --publish-check`. +sand-boxer v0 maturity (post SAND-WP-0009): D5/A4/C4 — see +`registry/capabilities/execution.sandbox-provision.md`. + Federation contract: reuse-surface `docs/RegistryFederation.md`. diff --git a/registry/capabilities/execution.sandbox-provision.md b/registry/capabilities/execution.sandbox-provision.md index cab453d..0b63f08 100644 --- a/registry/capabilities/execution.sandbox-provision.md +++ b/registry/capabilities/execution.sandbox-provision.md @@ -9,32 +9,35 @@ tags: [sandbox, isolation, provision, e2e, agentic, execution, profile] maturity: discovery: - current: D4 + current: D5 target: D6 confidence: high rationale: > - Charter (INTENT.md), meta-framework spec (docs/meta-framework.md), and - research synthesis define scope. First extension (ext.compose-ssh) in progress. + Charter (INTENT.md), meta-framework spec, extension SDK, integration docs, + and research synthesis. Capability indexed in registry/. availability: - current: A2 + current: A4 target: A5 - confidence: medium + confidence: high rationale: > - CLI v0 and ext.compose-ssh scaffold land in SAND-WP-0002. SaaS extensions - and payments deferred. + CLI v0 (create/destroy/snapshot/TTL), HTTP API, CoulombCore remote smoke. + SaaS stub + routing + credits shipped (SAND-WP-0006). external_evidence: completeness: - level: C2 - name: Partial - confidence: medium + level: C4 + name: Substantial + confidence: high basis: scope_vs_intent_and_consumer_expectations satisfied_expectations: - - profile-based create/destroy via CLI + - profile-based create/destroy/snapshot/restore via CLI + - TTL extend and expire/reap (SAND-WP-0009) - State Hub lifecycle events on transitions + - wise-validator and the-custodian migration arc complete + - extension SDK with compose-ssh, vm-packer attach, saas-stub broken_expectations: - - Real E2B/Modal adapters not yet built (saas-stub + credits v0 done) - - wise-validator migration not complete + - Real E2B/Modal adapters not yet built + - sandboxer01 dedicated host not live (CoulombCore interim) out_of_scope_expectations: - agent harness and tool orchestration (glas-harness) - e2e test semantics (wise-validator) @@ -42,4 +45,4 @@ external_evidence: consumption_modes: - CLI (sandboxer) - core library (Python) - - HTTP API (planned) \ No newline at end of file + - HTTP API (uvicorn sandboxer.api.app:app) \ No newline at end of file diff --git a/src/sandboxer/api/app.py b/src/sandboxer/api/app.py index 8138425..1ef147d 100644 --- a/src/sandboxer/api/app.py +++ b/src/sandboxer/api/app.py @@ -6,6 +6,8 @@ from fastapi import FastAPI, HTTPException from sandboxer.core.manager import SandboxManager from sandboxer.models import ( + ExpireActionResult, + ExtendTtlRequest, SandboxCreateRequest, SandboxStatus, SnapshotRecord, @@ -82,4 +84,29 @@ def get_snapshot(snapshot_id: str) -> SnapshotRecord: record = _manager.get_snapshot(snapshot_id) if not record: raise HTTPException(status_code=404, detail="snapshot not found") - return record \ No newline at end of file + return record + + +@app.post("/v1/sandboxes/{sandbox_id}/recreate", response_model=SandboxStatus) +def recreate_sandbox(sandbox_id: str) -> SandboxStatus: + try: + return _manager.recreate(sandbox_id) + except KeyError as exc: + raise HTTPException(status_code=404, detail=str(exc)) from exc + except Exception as exc: + raise HTTPException(status_code=400, detail=str(exc)) from exc + + +@app.patch("/v1/sandboxes/{sandbox_id}/ttl", response_model=SandboxStatus) +def extend_sandbox_ttl(sandbox_id: str, request: ExtendTtlRequest) -> SandboxStatus: + try: + return _manager.extend_ttl(sandbox_id, request.duration) + except KeyError as exc: + raise HTTPException(status_code=404, detail=str(exc)) from exc + except (RuntimeError, ValueError) as exc: + raise HTTPException(status_code=400, detail=str(exc)) from exc + + +@app.post("/v1/sandboxes/expire", response_model=list[ExpireActionResult]) +def expire_sandboxes(apply: bool = False) -> list[ExpireActionResult]: + return _manager.expire(apply=apply) \ No newline at end of file diff --git a/src/sandboxer/cli.py b/src/sandboxer/cli.py index 925e2b7..f8ec401 100644 --- a/src/sandboxer/cli.py +++ b/src/sandboxer/cli.py @@ -90,6 +90,7 @@ def sandbox_create( actor: Annotated[str, typer.Option(help="Consumer actor type")] = "adm", project: Annotated[str, typer.Option(help="Calling project id")] = "sand-boxer", host: Annotated[str | None, typer.Option(help="Override placement host")] = None, + ttl: Annotated[str | None, typer.Option(help="TTL override (e.g. 4h)")] = None, ) -> None: """Provision a sandbox. No args → canary self-deploy of sand-boxer.""" parsed = _parse_inputs(input or []) @@ -98,6 +99,7 @@ def sandbox_create( profile=resolved_profile, inputs=resolved_inputs, consumer=Consumer(actor=ActorType(actor), project=project), + ttl=ttl, ) manager = SandboxManager() try: @@ -196,6 +198,33 @@ def snapshots_get(snapshot_id: str) -> None: _print_json(record.model_dump(mode="json")) +@app.command("extend-ttl") +def sandbox_extend_ttl( + sandbox_id: str, + duration: Annotated[str, typer.Option("--duration", help="Extension duration (e.g. 2h)")], +) -> None: + """Extend sandbox time-to-live (capped at profile max).""" + manager = SandboxManager() + try: + status = manager.extend_ttl(sandbox_id, duration) + except (KeyError, RuntimeError, ValueError) as exc: + typer.echo(f"Error: {exc}", err=True) + raise typer.Exit(code=1) from exc + _print_json(status.model_dump(mode="json")) + + +@app.command("expire") +def sandbox_expire( + apply: Annotated[bool, typer.Option("--apply", help="Destroy expired sandboxes")] = False, +) -> None: + """Report or destroy sandboxes past TTL or idle-reap threshold.""" + manager = SandboxManager() + results = manager.expire(apply=apply) + mode = "apply" if apply else "dry-run" + typer.echo(f"expire ({mode}): {len(results)} candidate(s)", err=True) + _print_json([r.model_dump(mode="json") for r in results]) + + @app.command("recreate") def sandbox_recreate(sandbox_id: str) -> None: """Destroy and reprovision from stored inputs.""" diff --git a/src/sandboxer/core/manager.py b/src/sandboxer/core/manager.py index 87b2933..2cb5cc0 100644 --- a/src/sandboxer/core/manager.py +++ b/src/sandboxer/core/manager.py @@ -3,10 +3,17 @@ from __future__ import annotations from sandboxer.extensions.registry import load_extension, resolve_backend +from sandboxer.lifecycle.expire import ( + ExpireCandidate, + apply_expired_state, + find_expire_candidates, +) from sandboxer.lifecycle.state_hub import emit_lifecycle_event, event_type_for_state from sandboxer.lifecycle.store import SandboxStore, utcnow +from sandboxer.lifecycle.ttl import expires_at_from, extend_expires_at, resolve_initial_ttl from sandboxer.models import ( Consumer, + ExpireActionResult, MeterRecord, Reachability, SandboxCreateRequest, @@ -60,6 +67,18 @@ class SandboxManager: return extension.config.get("provider", "saas") return resolve_host(profile, override=host_override) + @staticmethod + def _assign_ttl( + status: SandboxStatus, + profile, + *, + request_ttl: str | None, + ) -> None: + ttl_str = resolve_initial_ttl(profile, request_ttl) + anchor = status.ready_at or utcnow() + status.ttl = ttl_str + status.expires_at = expires_at_from(anchor, ttl_str) + def create(self, request: SandboxCreateRequest, *, host: str | None = None) -> SandboxStatus: profile = load_profile(request.profile) extension = resolve_extension(profile, request.inputs, host_override=host) @@ -119,6 +138,7 @@ class SandboxManager: status.state = SandboxState.READY status.ready_at = utcnow() status.updated_at = status.ready_at + self._assign_ttl(status, profile, request_ttl=request.ttl) if wants_telemetry and provision_before: provision_after = collect_host_snapshot(resolved_host) @@ -224,11 +244,98 @@ class SandboxManager: profile=existing.profile_id, inputs=dict(existing.inputs), consumer=existing.consumer, + ttl=existing.ttl, ) if existing.state != SandboxState.DESTROYED: self.destroy(sandbox_id) return self.create(request, host=existing.host) + def extend_ttl(self, sandbox_id: str, duration: str) -> SandboxStatus: + status = self.store.get(sandbox_id) + if not status: + raise KeyError(f"Sandbox not found: {sandbox_id}") + if status.state not in (SandboxState.READY, SandboxState.ACTIVE): + raise RuntimeError( + f"Cannot extend TTL for sandbox in state {status.state.value}" + ) + if not status.expires_at or not status.ready_at: + raise RuntimeError("Sandbox has no expiry metadata") + + profile = load_profile(status.profile_id) + new_expires, applied = extend_expires_at( + status.expires_at, + anchor=status.ready_at, + extension=duration, + max_duration=profile.ttl.max, + ) + status.expires_at = new_expires + status.ttl = applied + status.updated_at = utcnow() + self.store.save(status) + emit_lifecycle_event( + status, + summary=f"TTL extended by {applied} (expires {new_expires.isoformat()})", + event_type="note", + ) + return status + + def expire( + self, + *, + apply: bool = False, + now=None, + ) -> list[ExpireActionResult]: + candidates = find_expire_candidates(self.store, now=now) + results: list[ExpireActionResult] = [] + + for candidate in candidates: + if not apply: + results.append( + ExpireActionResult( + sandbox_id=candidate.sandbox_id, + reason=candidate.reason, + action="dry-run", + ) + ) + continue + + try: + status = self.store.get(candidate.sandbox_id) + if not status or status.state not in ( + SandboxState.READY, + SandboxState.ACTIVE, + ): + continue + status = apply_expired_state(status, now=now) + self.store.save(status) + emit_lifecycle_event( + status, + summary=f"Sandbox expired ({candidate.reason})", + event_type=event_type_for_state(status.state), + ) + self.destroy(candidate.sandbox_id) + results.append( + ExpireActionResult( + sandbox_id=candidate.sandbox_id, + reason=candidate.reason, + action="destroyed", + ) + ) + except Exception as exc: + results.append( + ExpireActionResult( + sandbox_id=candidate.sandbox_id, + reason=candidate.reason, + action="failed", + error=str(exc), + ) + ) + + return results + + def list_expire_candidates(self, *, now=None) -> list[ExpireCandidate]: + return find_expire_candidates(self.store, now=now) + def snapshot(self, sandbox_id: str, *, name: str | None = None) -> SnapshotRecord: status = self.store.get(sandbox_id) if not status: @@ -345,6 +452,7 @@ class SandboxManager: status.state = SandboxState.READY status.ready_at = utcnow() status.updated_at = status.ready_at + self._assign_ttl(status, profile, request_ttl=None) self.store.save(status) emit_lifecycle_event( status, diff --git a/src/sandboxer/lifecycle/expire.py b/src/sandboxer/lifecycle/expire.py new file mode 100644 index 0000000..0d2fb39 --- /dev/null +++ b/src/sandboxer/lifecycle/expire.py @@ -0,0 +1,77 @@ +"""TTL and idle-reap expiry candidate selection.""" + +from __future__ import annotations + +from dataclasses import dataclass +from datetime import UTC, datetime +from typing import Literal + +from sandboxer.lifecycle.store import SandboxStore +from sandboxer.lifecycle.ttl import is_idle_expired, is_past_expiry +from sandboxer.models import SandboxState, SandboxStatus +from sandboxer.profiles.loader import load_profile + +ExpireReason = Literal["ttl", "idle"] + + +@dataclass +class ExpireCandidate: + sandbox_id: str + profile_id: str + reason: ExpireReason + expires_at: datetime | None = None + updated_at: datetime | None = None + + +_LIVE_STATES = frozenset({SandboxState.READY, SandboxState.ACTIVE}) + + +def find_expire_candidates( + store: SandboxStore, + *, + now: datetime | None = None, +) -> list[ExpireCandidate]: + ref = now or datetime.now(UTC) + candidates: list[ExpireCandidate] = [] + seen: set[str] = set() + + for status in store.list_all(): + if status.state not in _LIVE_STATES: + continue + + if is_past_expiry(status.expires_at, now=ref): + candidates.append( + ExpireCandidate( + sandbox_id=status.sandbox_id, + profile_id=status.profile_id, + reason="ttl", + expires_at=status.expires_at, + ) + ) + seen.add(status.sandbox_id) + continue + + try: + profile = load_profile(status.profile_id) + except FileNotFoundError: + continue + + if is_idle_expired(status.updated_at, profile.ttl.idle_reap, now=ref): + candidates.append( + ExpireCandidate( + sandbox_id=status.sandbox_id, + profile_id=status.profile_id, + reason="idle", + updated_at=status.updated_at, + ) + ) + seen.add(status.sandbox_id) + + return sorted(candidates, key=lambda c: c.sandbox_id) + + +def apply_expired_state(status: SandboxStatus, *, now: datetime | None = None) -> SandboxStatus: + ref = now or datetime.now(UTC) + status.state = SandboxState.EXPIRED + status.updated_at = ref + return status \ No newline at end of file diff --git a/src/sandboxer/lifecycle/state_hub.py b/src/sandboxer/lifecycle/state_hub.py index c9b1087..99a086e 100644 --- a/src/sandboxer/lifecycle/state_hub.py +++ b/src/sandboxer/lifecycle/state_hub.py @@ -38,6 +38,8 @@ def emit_lifecycle_event( "consumer": status.consumer.model_dump(), "actor_type": status.consumer.actor.value, "state": status.state.value, + "ttl": status.ttl, + "expires_at": status.expires_at.isoformat() if status.expires_at else None, "reachability": status.reachability.model_dump() if status.reachability else None, "telemetry": status.telemetry, "timestamps": { @@ -58,6 +60,6 @@ def emit_lifecycle_event( def event_type_for_state(state: SandboxState) -> str: - if state in (SandboxState.READY, SandboxState.DESTROYED): + if state in (SandboxState.READY, SandboxState.DESTROYED, SandboxState.EXPIRED): return "milestone" return "note" \ No newline at end of file diff --git a/src/sandboxer/lifecycle/ttl.py b/src/sandboxer/lifecycle/ttl.py new file mode 100644 index 0000000..2b7eef7 --- /dev/null +++ b/src/sandboxer/lifecycle/ttl.py @@ -0,0 +1,121 @@ +"""TTL duration parsing and expiry calculation.""" + +from __future__ import annotations + +import re +from datetime import UTC, datetime, timedelta + +from sandboxer.models import Profile + +_DURATION_RE = re.compile(r"^(\d+)([smhd])$", re.IGNORECASE) +_UNIT_SECONDS = {"s": 1, "m": 60, "h": 3600, "d": 86400} + + +def parse_duration(value: str) -> timedelta: + """Parse a duration string like ``4h``, ``30m``, ``1d``.""" + raw = value.strip() + match = _DURATION_RE.match(raw) + if not match: + raise ValueError(f"Invalid duration: {value!r} (expected e.g. 4h, 30m, 1d)") + amount = int(match.group(1)) + if amount <= 0: + raise ValueError(f"Duration must be positive: {value!r}") + unit = match.group(2).lower() + return timedelta(seconds=amount * _UNIT_SECONDS[unit]) + + +def duration_seconds(value: str) -> int: + return int(parse_duration(value).total_seconds()) + + +def resolve_initial_ttl(profile: Profile, request_ttl: str | None) -> str: + """Pick create TTL from request override or profile default, capped at profile max.""" + requested = request_ttl or profile.ttl.default + return cap_duration(requested, profile.ttl.max) + + +def cap_duration(requested: str, maximum: str) -> str: + """Return ``requested`` if within ``maximum``; otherwise return ``maximum``.""" + req_s = duration_seconds(requested) + max_s = duration_seconds(maximum) + if req_s > max_s: + return maximum + return requested + + +def expires_at_from(base: datetime, duration: str) -> datetime: + if base.tzinfo is None: + base = base.replace(tzinfo=UTC) + return base + parse_duration(duration) + + +def cap_expires_at( + candidate: datetime, + *, + anchor: datetime, + max_duration: str, +) -> datetime: + """Cap ``candidate`` so it does not exceed ``anchor + max_duration``.""" + ceiling = expires_at_from(anchor, max_duration) + if candidate.tzinfo is None: + candidate = candidate.replace(tzinfo=UTC) + return min(candidate, ceiling) + + +def extend_expires_at( + current: datetime, + *, + anchor: datetime, + extension: str, + max_duration: str, +) -> tuple[datetime, str]: + """Add ``extension`` to ``current`` and cap at ``anchor + max_duration``.""" + now = datetime.now(UTC) + base = max(current, now) + proposed = expires_at_from(base, extension) + capped = cap_expires_at(proposed, anchor=anchor, max_duration=max_duration) + applied = extension + if capped < proposed: + remaining = capped - base + if remaining.total_seconds() <= 0: + raise ValueError(f"Cannot extend: already at profile max ({max_duration})") + applied = format_timedelta(remaining) + return capped, applied + + +def format_timedelta(delta: timedelta) -> str: + seconds = int(delta.total_seconds()) + if seconds <= 0: + raise ValueError("Duration must be positive") + if seconds >= 86400 and seconds % 86400 == 0: + return f"{seconds // 86400}d" + if seconds >= 3600 and seconds % 3600 == 0: + return f"{seconds // 3600}h" + if seconds >= 60 and seconds % 60 == 0: + return f"{seconds // 60}m" + return f"{seconds}s" + + +def is_past_expiry(expires_at: datetime | None, *, now: datetime | None = None) -> bool: + if expires_at is None: + return False + ref = now or datetime.now(UTC) + if expires_at.tzinfo is None: + expires_at = expires_at.replace(tzinfo=UTC) + return expires_at <= ref + + +def is_idle_expired( + updated_at: datetime, + idle_reap: str | None, + *, + now: datetime | None = None, +) -> bool: + if not idle_reap: + return False + ref = now or datetime.now(UTC) + if updated_at.tzinfo is None: + updated_at = updated_at.replace(tzinfo=UTC) + return updated_at + parse_duration(idle_reap) <= ref + + diff --git a/src/sandboxer/models.py b/src/sandboxer/models.py index 27e1660..6963161 100644 --- a/src/sandboxer/models.py +++ b/src/sandboxer/models.py @@ -164,6 +164,8 @@ class SandboxStatus(BaseModel): host: str | None = None reachability: Reachability | None = None inputs: dict[str, str] = Field(default_factory=dict) + ttl: str | None = None + expires_at: datetime | None = None error: str | None = None meter: MeterRecord | None = None telemetry: dict | None = None # IntrospectionReport JSON when canary @@ -173,6 +175,17 @@ class SandboxStatus(BaseModel): destroyed_at: datetime | None = None +class ExtendTtlRequest(BaseModel): + duration: str + + +class ExpireActionResult(BaseModel): + sandbox_id: str + reason: Literal["ttl", "idle"] + action: Literal["dry-run", "expired", "destroyed", "failed"] + error: str | None = None + + class SnapshotRestoreRequest(BaseModel): host: str | None = None consumer: Consumer | None = None diff --git a/tests/test_api.py b/tests/test_api.py index b118538..6c23061 100644 --- a/tests/test_api.py +++ b/tests/test_api.py @@ -88,4 +88,64 @@ def test_restore_snapshot() -> None: json={"consumer": {"actor": "adm", "project": "sand-boxer"}}, ) assert resp.status_code == 200 - assert resp.json()["sandbox_id"] == "restored1" \ No newline at end of file + assert resp.json()["sandbox_id"] == "restored1" + + +def test_recreate_sandbox() -> None: + from datetime import UTC, datetime + + status = SandboxStatus( + sandbox_id="new12345", + profile_id="profile.compose-e2e", + extension_id="ext.compose-ssh", + state=SandboxState.READY, + consumer=Consumer(actor=ActorType.ADM, project="sand-boxer"), + created_at=datetime.now(UTC), + updated_at=datetime.now(UTC), + ) + with patch("sandboxer.api.app._manager") as mgr: + mgr.recreate.return_value = status + client = TestClient(app) + resp = client.post("/v1/sandboxes/abc12345/recreate") + assert resp.status_code == 200 + assert resp.json()["sandbox_id"] == "new12345" + + +def test_extend_ttl() -> None: + from datetime import UTC, datetime + + now = datetime.now(UTC) + status = SandboxStatus( + sandbox_id="abc12345", + profile_id="profile.compose-e2e", + extension_id="ext.compose-ssh", + state=SandboxState.READY, + consumer=Consumer(actor=ActorType.ADM, project="sand-boxer"), + ttl="2h", + expires_at=now, + created_at=now, + updated_at=now, + ready_at=now, + ) + with patch("sandboxer.api.app._manager") as mgr: + mgr.extend_ttl.return_value = status + client = TestClient(app) + resp = client.patch( + "/v1/sandboxes/abc12345/ttl", + json={"duration": "2h"}, + ) + assert resp.status_code == 200 + assert resp.json()["ttl"] == "2h" + + +def test_expire_sandboxes() -> None: + from sandboxer.models import ExpireActionResult + + with patch("sandboxer.api.app._manager") as mgr: + mgr.expire.return_value = [ + ExpireActionResult(sandbox_id="x", reason="ttl", action="dry-run") + ] + client = TestClient(app) + resp = client.post("/v1/sandboxes/expire") + assert resp.status_code == 200 + assert resp.json()[0]["action"] == "dry-run" \ No newline at end of file diff --git a/tests/test_snapshots.py b/tests/test_snapshots.py index 82c0db3..bae3b28 100644 --- a/tests/test_snapshots.py +++ b/tests/test_snapshots.py @@ -126,9 +126,7 @@ def test_manager_snapshot_and_restore(store: SandboxStore, snapshots: SnapshotSt with ( patch("sandboxer.core.manager.resolve_backend", return_value=backend), - patch("sandboxer.core.manager.load_extension"), patch("sandboxer.core.manager.emit_lifecycle_event", return_value=None), - patch("sandboxer.core.manager.load_profile"), patch("sandboxer.core.manager.resolve_host", return_value="coulombcore"), ): record = manager.snapshot("test1234", name="pre-test") diff --git a/tests/test_ttl.py b/tests/test_ttl.py new file mode 100644 index 0000000..a47f555 --- /dev/null +++ b/tests/test_ttl.py @@ -0,0 +1,265 @@ +"""TTL parsing, extend, and expire tests.""" + +from __future__ import annotations + +from datetime import UTC, datetime, timedelta +from pathlib import Path +from unittest.mock import patch + +import pytest + +from sandboxer.core.manager import SandboxManager +from sandboxer.lifecycle.expire import find_expire_candidates +from sandboxer.lifecycle.store import SandboxStore +from sandboxer.lifecycle.ttl import ( + cap_duration, + extend_expires_at, + format_timedelta, + is_idle_expired, + is_past_expiry, + parse_duration, + resolve_initial_ttl, +) +from sandboxer.models import ( + ActorType, + Consumer, + Profile, + Reachability, + SandboxCreateRequest, + SandboxState, + SandboxStatus, +) + + +def _profile(**ttl_overrides) -> Profile: + ttl_data = {"default": "4h", "max": "24h", "idle_reap": None} + ttl_data.update(ttl_overrides) + return Profile.model_validate( + { + "id": "profile.compose-e2e", + "version": "1.0.0", + "extension": "ext.compose-ssh", + "ttl": ttl_data, + } + ) + + +def test_parse_duration_units() -> None: + assert parse_duration("30m") == timedelta(minutes=30) + assert parse_duration("4h") == timedelta(hours=4) + assert parse_duration("1d") == timedelta(days=1) + assert parse_duration("90s") == timedelta(seconds=90) + + +def test_parse_duration_invalid() -> None: + with pytest.raises(ValueError, match="Invalid duration"): + parse_duration("4hours") + with pytest.raises(ValueError, match="positive"): + parse_duration("0h") + + +def test_cap_duration() -> None: + assert cap_duration("4h", "24h") == "4h" + assert cap_duration("48h", "24h") == "24h" + + +def test_resolve_initial_ttl() -> None: + profile = _profile() + assert resolve_initial_ttl(profile, None) == "4h" + assert resolve_initial_ttl(profile, "2h") == "2h" + assert resolve_initial_ttl(profile, "48h") == "24h" + + +def test_extend_expires_at_caps_at_max() -> None: + anchor = datetime(2026, 6, 24, 10, 0, tzinfo=UTC) + current = anchor + timedelta(hours=23) + new_expires, applied = extend_expires_at( + current, + anchor=anchor, + extension="4h", + max_duration="24h", + ) + assert new_expires == anchor + timedelta(hours=24) + assert applied == "1h" + + +def test_extend_expires_at_at_max_raises() -> None: + anchor = datetime(2026, 6, 24, 10, 0, tzinfo=UTC) + current = anchor + timedelta(hours=24) + with pytest.raises(ValueError, match="profile max"): + extend_expires_at( + current, + anchor=anchor, + extension="1h", + max_duration="24h", + ) + + +def test_format_timedelta() -> None: + assert format_timedelta(timedelta(hours=2)) == "2h" + assert format_timedelta(timedelta(minutes=30)) == "30m" + + +def test_is_past_expiry_and_idle() -> None: + now = datetime(2026, 6, 24, 12, 0, tzinfo=UTC) + assert is_past_expiry(now - timedelta(minutes=1), now=now) + assert not is_past_expiry(now + timedelta(minutes=1), now=now) + updated = now - timedelta(hours=2) + assert is_idle_expired(updated, "1h", now=now) + assert not is_idle_expired(updated, "4h", now=now) + + +@pytest.fixture +def store(tmp_path: Path) -> SandboxStore: + return SandboxStore(path=tmp_path / "sandboxes.json") + + +def test_find_expire_candidates_ttl_and_idle(store: SandboxStore) -> None: + now = datetime(2026, 6, 24, 12, 0, tzinfo=UTC) + store.save( + SandboxStatus( + sandbox_id="expired1", + profile_id="profile.compose-e2e", + extension_id="ext.compose-ssh", + state=SandboxState.READY, + consumer=Consumer(actor=ActorType.ADM, project="sand-boxer"), + expires_at=now - timedelta(minutes=5), + created_at=now - timedelta(hours=5), + updated_at=now - timedelta(hours=5), + ready_at=now - timedelta(hours=5), + ) + ) + store.save( + SandboxStatus( + sandbox_id="idle1", + profile_id="profile.sandbox-canary", + extension_id="ext.compose-ssh", + state=SandboxState.READY, + consumer=Consumer(actor=ActorType.ADM, project="sand-boxer"), + expires_at=now + timedelta(hours=2), + created_at=now - timedelta(hours=5), + updated_at=now - timedelta(hours=3), + ready_at=now - timedelta(hours=5), + ) + ) + + with patch("sandboxer.lifecycle.expire.load_profile") as load_profile: + load_profile.side_effect = lambda pid: _profile( + idle_reap="2h" if pid == "profile.sandbox-canary" else None + ) + candidates = find_expire_candidates(store, now=now) + + reasons = {c.sandbox_id: c.reason for c in candidates} + assert reasons["expired1"] == "ttl" + assert reasons["idle1"] == "idle" + + +class FakeBackend: + def provision(self, profile, inputs, host): + return { + "sandbox_id": "test1234", + "host": host, + "remote_dir": "/tmp/sandboxer/test1234", + "compose_project": "sbx-e2e-test1234", + "compose_file": "docker-compose.yml", + "ssh_user": "root", + } + + def wait_ready(self, handle): + return { + "ssh": f"root@{handle['host']}", + "remote_dir": handle["remote_dir"], + "compose_project": handle["compose_project"], + "host": handle["host"], + } + + def teardown(self, handle): + return {"compose_removed": "True", "remote_dir_removed": "True"} + + +def test_manager_create_sets_expires_at(store: SandboxStore) -> None: + manager = SandboxManager(store=store) + request = SandboxCreateRequest( + profile="profile.compose-e2e", + inputs={"repo": "/tmp/repo"}, + consumer=Consumer(actor=ActorType.ADM, project="sand-boxer"), + ttl="2h", + ) + fake = FakeBackend() + with ( + patch("sandboxer.core.manager.resolve_backend", return_value=fake), + patch("sandboxer.core.manager.emit_lifecycle_event", return_value=None), + patch("sandboxer.core.manager.resolve_host", return_value="coulombcore"), + ): + status = manager.create(request) + assert status.ttl == "2h" + assert status.expires_at is not None + assert status.ready_at is not None + assert status.expires_at > status.ready_at + + +def test_manager_extend_ttl(store: SandboxStore) -> None: + now = datetime.now(UTC) + store.save( + SandboxStatus( + sandbox_id="live1234", + profile_id="profile.compose-e2e", + extension_id="ext.compose-ssh", + state=SandboxState.READY, + consumer=Consumer(actor=ActorType.ADM, project="sand-boxer"), + host="coulombcore", + reachability=Reachability(remote_dir="/tmp/x", host="coulombcore"), + ttl="4h", + expires_at=now + timedelta(hours=1), + created_at=now - timedelta(hours=1), + updated_at=now, + ready_at=now - timedelta(hours=1), + ) + ) + manager = SandboxManager(store=store) + with patch("sandboxer.core.manager.emit_lifecycle_event", return_value=None): + extended = manager.extend_ttl("live1234", "2h") + assert extended.expires_at > now + timedelta(hours=1) + + +def test_manager_expire_dry_run_and_apply(store: SandboxStore) -> None: + now = datetime.now(UTC) + store.save( + SandboxStatus( + sandbox_id="gone5678", + profile_id="profile.compose-e2e", + extension_id="ext.compose-ssh", + state=SandboxState.READY, + consumer=Consumer(actor=ActorType.ADM, project="sand-boxer"), + host="coulombcore", + reachability=Reachability( + remote_dir="/tmp/sandboxer/gone5678", + compose_project="sbx-e2e-gone5678", + host="coulombcore", + ), + inputs={"compose_file": "docker-compose.yml"}, + ttl="1h", + expires_at=now - timedelta(minutes=1), + created_at=now - timedelta(hours=2), + updated_at=now - timedelta(hours=2), + ready_at=now - timedelta(hours=2), + ) + ) + manager = SandboxManager(store=store) + fake = FakeBackend() + + dry = manager.expire(apply=False, now=now) + assert len(dry) == 1 + assert dry[0].action == "dry-run" + assert manager.get("gone5678").state == SandboxState.READY + + with ( + patch("sandboxer.core.manager.resolve_backend", return_value=fake), + patch("sandboxer.core.manager.emit_lifecycle_event", return_value=None), + patch("sandboxer.core.manager.load_extension"), + patch("sandboxer.core.manager.load_profile"), + ): + applied = manager.expire(apply=True, now=now) + + assert applied[0].action == "destroyed" + assert manager.get("gone5678").state == SandboxState.DESTROYED \ No newline at end of file diff --git a/workplans/SAND-WP-0009-ttl-and-operational-hardening.md b/workplans/SAND-WP-0009-ttl-and-operational-hardening.md index ab4f6b7..87ae4d2 100644 --- a/workplans/SAND-WP-0009-ttl-and-operational-hardening.md +++ b/workplans/SAND-WP-0009-ttl-and-operational-hardening.md @@ -4,7 +4,7 @@ type: workplan title: "TTL enforcement and operational hardening" domain: infotech repo: sand-boxer -status: ready +status: finished owner: codex topic_slug: custodian created: "2026-06-24" @@ -30,7 +30,7 @@ consumer profiles), SAND-WP-0012 (Packer orchestration) ```task id: SAND-WP-0009-T01 -status: todo +status: done priority: high state_hub_task_id: "44cee754-2874-40eb-9cb3-168e5bc8dd54" ``` @@ -43,7 +43,7 @@ max-cap enforcement. ```task id: SAND-WP-0009-T02 -status: todo +status: done priority: high state_hub_task_id: "a5a6503c-56a3-4876-8211-e06b9eed6292" ``` @@ -56,7 +56,7 @@ Persist in `SandboxStore`. Emit expiry in State Hub `detail`. ```task id: SAND-WP-0009-T03 -status: todo +status: done priority: high state_hub_task_id: "ff32a3e5-0bf6-479c-8373-d601588461e7" ``` @@ -69,7 +69,7 @@ HTTP: `PATCH /v1/sandboxes/{id}/ttl` with body `{"duration": "2h"}`. ```task id: SAND-WP-0009-T04 -status: todo +status: done priority: high state_hub_task_id: "ce597f28-a2f3-44ed-8e85-f8bd254bc4ce" ``` @@ -83,7 +83,7 @@ with existing `reap-stale` docs (host inventory vs TTL are distinct concerns). ```task id: SAND-WP-0009-T05 -status: todo +status: done priority: medium state_hub_task_id: "9ad34d90-bbc7-4ede-8549-f4291e27ba22" ``` @@ -96,7 +96,7 @@ state; no Temporal code in this repo. ```task id: SAND-WP-0009-T06 -status: todo +status: done priority: medium state_hub_task_id: "ffde8196-18e3-4762-8cfd-1b69874e51e1" ``` @@ -110,7 +110,7 @@ run validate if reuse-surface CLI available in environment. ```task id: SAND-WP-0009-T07 -status: todo +status: done priority: medium state_hub_task_id: "69b192c7-8599-46e7-bb63-8457bfb72a81" ``` @@ -122,21 +122,20 @@ Align OpenAPI with CLI surface from SAND-WP-0007. ```task id: SAND-WP-0009-T08 -status: todo +status: done priority: medium state_hub_task_id: "69d1a23f-b3a3-4aa7-846c-e953f02977f3" ``` `docs/ttl.md` — semantics, extend, expire, profile fields. Update `docs/meta-framework.md`, `SCOPE.md`, `docs/migration-gaps.md`. Brief security -note in `docs/runbooks/` or `docs/security.md`: sandbox limits blast radius, not -intent enforcement (INTENT design principle). +note in `docs/security.md`: sandbox limits blast radius, not intent enforcement. ## Tests ```task id: SAND-WP-0009-T09 -status: todo +status: done priority: high state_hub_task_id: "0683b09a-0dd9-4880-9bd0-13003e3621a6" ```