diff --git a/.custodian-brief.md b/.custodian-brief.md index 86520fc..73e88e2 100644 --- a/.custodian-brief.md +++ b/.custodian-brief.md @@ -2,23 +2,12 @@ # Custodian Brief — ops-bridge **Domain:** custodian -**Last synced:** 2026-05-15 07:39 UTC +**Last synced:** 2026-05-15 10:19 UTC **State Hub:** http://127.0.0.1:8000 *(adjust if running on a remote machine)* ## Active Workstreams -### OpsWarden Initial Implementation -Progress: 0/10 done | workstream_id: `c3118cc6-adfb-428c-a9c6-edd0ee152ae6` - -**Open tasks:** -- · T1 — Repository bootstrap `6d643e9d` -- · T2 — Models and config `c66fc65a` -- · T3 — LocalCA backend `a5a41e58` -- · T4 — VaultCA backend `b2067ee6` -- · T5 — Principals inventory `6d13f8cd` -- · T6 — CLI commands `656a4615` -- · T7 — Scorecard runner `7818bcc5` -- … and 3 more open tasks +*(none — repo may need first-session setup)* --- ## MCP Orientation (when available) diff --git a/workplans/WARDEN-WP-0001-initial-implementation.md b/workplans/WARDEN-WP-0001-initial-implementation.md deleted file mode 100644 index c673631..0000000 --- a/workplans/WARDEN-WP-0001-initial-implementation.md +++ /dev/null @@ -1,333 +0,0 @@ ---- -id: WARDEN-WP-0001 -type: workplan -title: "OpsWarden Initial Implementation" -domain: custodian -repo: ops-warden -status: active -owner: Bernd -topic_slug: custodian -created: "2026-03-28" -updated: "2026-03-28" -state_hub_workstream_id: "c3118cc6-adfb-428c-a9c6-edd0ee152ae6" ---- - -# WARDEN-WP-0001 — OpsWarden Initial Implementation - -> **Note:** This workplan is authored in `ops-bridge` because `ops-warden` does not yet exist. -> Move it to `workplans/WARDEN-WP-0001-initial-implementation.md` in the new repo as the -> first commit action. - -**Scope:** Bootstrap the `ops-warden` repository and deliver a working `warden` CLI that -implements the SSH CA and certificate lifecycle defined in `wiki/AccessManagementDirective.md`. - -**Out of scope:** Vault HA/cluster setup, Ansible playbooks for host principal deployment -(those live in `railiance-infra`), session recording, and SSO integration (trigger §6.2 of -the directive when scale requires it). - ---- - -## Goal - -Create a new `ops-warden` repository that owns **credential issuance only** — the CA, -certificate signing, actor identity registry, and scorecard tooling. Its sole public surface -to sibling repos is a well-defined `cert_command` interface that any tool (principally -`ops-bridge`) can call to obtain a short-lived, CA-signed SSH certificate for a named actor. - ---- - -## Reference Documents - -| Document | Location | -|---|---| -| AccessManagementDirective | `ops-bridge/wiki/AccessManagementDirective.md` | -| ops-bridge SCOPE.md | `ops-bridge/SCOPE.md` | - ---- - -## Architecture - -``` -ops-warden/ -├── SCOPE.md -├── CLAUDE.md -├── pyproject.toml -├── src/warden/ -│ ├── cli.py # Typer CLI: sign / issue / status / inventory / scorecard -│ ├── models.py # ActorType enum, CertSpec, CertRecord, PrincipalsInventory -│ ├── ca.py # LocalCA backend (file-based, for dev / non-Vault) -│ ├── vault.py # VaultCA backend (Vault SSH engine, for production) -│ ├── inventory.py # YAML principals inventory read/write -│ ├── scorecard.py # §5 compliance checks -│ └── config.py # ~/.config/warden/warden.yaml loader -├── tests/ -└── wiki/ # (symlink or copy of AccessManagementDirective.md) -``` - -**Backends are swappable.** Config key `backend: local | vault` selects which CA -implementation is used. This means the tool is fully functional without Vault for local lab -use, and production-grade with Vault — the same CLI surface, the same `cert_command` -interface, the same principals inventory format. - -**cert_command interface contract:** -``` -warden sign --pubkey -``` -Writes the signed certificate to stdout (the cert text). Exits non-zero on failure. -`ops-bridge` calls this verbatim via `cert_command` in `tunnels.yaml`. - ---- - -## Stack - -- **Language:** Python 3.11+ -- **CLI framework:** Typer -- **Dependencies:** typer, pyyaml, httpx, cryptography (for cert parsing / TTL reading) -- **Vault SDK:** `hvac` (optional; only required for vault backend) -- **Packaging:** `uv tool install` - ---- - -## Tasks - -### T1 — Repository bootstrap - -```task -id: WARDEN-WP-0001-T1 -state_hub_task_id: 6d643e9d-5e97-4224-9d82-87267b5ba6bc -status: todo -priority: high -``` - -- [ ] Create `ops-warden` repo; copy CLAUDE.md template from `ops-bridge`; add - `workplans/WARDEN-WP-0001-initial-implementation.md` (this file) -- [ ] Write `SCOPE.md` (see template in §SCOPE below) -- [ ] `pyproject.toml`: `[project.scripts] warden = "warden.cli:app"` -- [ ] Register repo with state-hub (`register_repo`) -- [ ] Create state-hub workstream for this workplan - -### T2 — Models and config - -```task -id: WARDEN-WP-0001-T2 -state_hub_task_id: c66fc65a-0b16-4ba2-9e70-a83d875572ec -status: todo -priority: high -``` - -- [ ] `models.py`: `ActorType` enum (`adm | agt | atm`); `CertSpec` (actor_name, pubkey_path, - ttl_hours, principals); `CertRecord` (identity, valid_before, cert_path, signed_at) -- [ ] `config.py`: load `~/.config/warden/warden.yaml`; required fields: `backend`, - `ca_key` (local) or `vault_addr` + `vault_role_map` (vault); optional: - `inventory_path`, `state_dir` -- [ ] Validate actor name prefix matches `ActorType` (`adm-*`, `agt-*`, `atm-*`) - -### T3 — LocalCA backend - -```task -id: WARDEN-WP-0001-T3 -state_hub_task_id: a5a41e58-1c6d-42a9-9b11-2088f17c29b5 -status: todo -priority: high -``` - -- [ ] `ca.py`: `LocalCA.sign(spec: CertSpec) -> CertRecord` - - Calls `ssh-keygen -s -I -n -V +h ` - - Parses `ssh-keygen -L -f ` output to extract `Valid before`, `Key ID`, - `Principals` - - Returns `CertRecord`; writes cert to `~/.local/state/warden/.cert.pub` -- [ ] Default TTLs enforced per `ActorType`: adm → 48 h, agt → 24 h, atm → 8 h - (overridable per actor in inventory) -- [ ] `LocalCA.generate_keypair(actor_name) -> (privkey_path, pubkey_path)` — for agt/atm - actors that do not bring their own key - -### T4 — VaultCA backend - -```task -id: WARDEN-WP-0001-T4 -state_hub_task_id: b2067ee6-c9ce-423b-9d60-0d28069fb304 -status: todo -priority: medium -``` - -- [ ] `vault.py`: `VaultCA.sign(spec: CertSpec) -> CertRecord` - - `POST /v1/ssh/sign/` with `public_key`, `valid_principals`, `ttl` - - Parse response `signed_key` field; write to state dir; extract metadata via - `ssh-keygen -L` -- [ ] Role map in config: `vault_role_map: {adm: adm-role, agt: agt-role, atm: atm-role}` -- [ ] Graceful error message when Vault is unreachable (with `--backend local` fallback hint) - -### T5 — Principals inventory - -```task -id: WARDEN-WP-0001-T5 -state_hub_task_id: 6d13f8cd-1850-44c9-b769-b21250348319 -status: todo -priority: high -``` - -- [ ] `inventory.py`: load/save `inventory.yaml` (format mirrors §4.1 of directive): - ```yaml - actors: - agt-state-hub-bridge: - type: agt - principals: [agt-task-bridge] - ttl_hours: 24 - description: "ops-bridge tunnel actor" - hosts: - coulombcore: - allowed_principals: - agt: [agt-task-bridge] - atm: [atm-backup-daily] - ``` -- [ ] `warden inventory list` — print table -- [ ] `warden inventory add --type --principals <...>` -- [ ] `warden inventory remove ` - -### T6 — CLI commands - -```task -id: WARDEN-WP-0001-T6 -state_hub_task_id: 656a4615-92bb-4b5d-9406-e86d24fa15d0 -status: todo -priority: high -``` - -- [ ] `warden sign --pubkey ` — sign existing pubkey; write cert to - stdout (the `cert_command` interface for ops-bridge) -- [ ] `warden issue ` — generate keypair + sign; output JSON with - `privkey`, `cert`, `valid_before`, `identity` -- [ ] `warden status [actor-name]` — show cert validity, identity, principals, TTL - remaining; `--all` flag to show all actors in state dir -- [ ] `warden scorecard` — run §5 checks (see T7) -- [ ] `warden inventory ` (list / add / remove) - -### T7 — Scorecard runner - -```task -id: WARDEN-WP-0001-T7 -state_hub_task_id: 7818bcc5-f40e-4793-b117-d36f653ffeed -status: todo -priority: medium -``` - -- [ ] `scorecard.py`: implement each §5 row as a named check function returning - `CheckResult(name, passed, detail)` -- [ ] Checks in scope for `ops-warden` (local checks, not host-side): - - All certs in state dir respect TTL policy for their `ActorType` - - No actor in inventory lacks a `principals` entry - - Actor name prefix matches declared type - - No cert expired by more than 5 min still present in state dir (stale cleanup) -- [ ] Host-side checks (password auth disabled, root login disabled, etc.) are out of scope - — those live in the Ansible `ssh-access-audit.yml` playbook in `railiance-infra` -- [ ] `warden scorecard --json` for machine-readable output - -### T8 — ops-ssh-wrapper script - -```task -id: WARDEN-WP-0001-T8 -state_hub_task_id: e9c28152-5785-4995-83a5-439985ed3db9 -status: todo -priority: medium -``` - -- [ ] Ship `scripts/ops-ssh-wrapper` (the Python snippet from §4.1, hardened): - - Reads `WARDEN_ACTOR` and `SSH_PUBKEY` env vars - - Calls `warden sign $WARDEN_ACTOR --pubkey $SSH_PUBKEY` - - Loads cert via `ssh-add`; execs the given command -- [ ] Install as part of `uv tool install` entry points - -### T9 — Tests - -```task -id: WARDEN-WP-0001-T9 -state_hub_task_id: 950139ab-cc17-4f1d-9a17-d5744e402ddf -status: todo -priority: high -``` - -- [ ] Unit tests for `LocalCA` (mock `ssh-keygen` subprocess) -- [ ] Unit tests for inventory YAML round-trip -- [ ] Unit tests for actor name prefix validation -- [ ] Integration test: `LocalCA.sign` on a real test keypair (requires `ssh-keygen` in PATH) -- [ ] Scorecard unit tests (mock cert records) - -### T10 — Documentation - -```task -id: WARDEN-WP-0001-T10 -state_hub_task_id: 271d6759-e359-41ce-80e4-76c574634a87 -status: todo -priority: medium -``` - -- [ ] `SCOPE.md` (see below) -- [ ] `wiki/AccessManagementDirective.md` — copy from `ops-bridge/wiki/` -- [ ] `wiki/OpsWardenConfig.md` — annotated `warden.yaml` reference -- [ ] `wiki/CertCommandInterface.md` — contract for `cert_command` callers (ops-bridge etc.) - ---- - -## SCOPE.md Template - -``` -# SCOPE - -## One-liner -SSH Certificate Authority and credential issuance for the ops fleet — -signs short-lived certs for adm/agt/atm actors; provides the cert_command -interface consumed by ops-bridge and other tooling. - -## Core Idea -Implements AccessManagementDirective §§1–5. Owns the CA key, actor inventory, -signing logic, and scorecard. Does not own tunnel lifecycle, host provisioning, -or SSH key generation for humans. - -## In Scope -- Local CA backend (ssh-keygen -s) for lab / non-Vault use -- Vault SSH engine backend for production -- Actor identity registry (inventory.yaml) -- cert_command CLI interface: `warden sign --pubkey ` -- TTL policy enforcement per ActorType (adm/agt/atm) -- Certificate status and stale-cert cleanup -- Scorecard checks (local / cert-side only) -- ops-ssh-wrapper script for agt/atm startup automation - -## Out of Scope -- Host-side principal deployment (railiance-infra Ansible) -- SSH key generation for human admins (self-service: ssh-keygen) -- Vault cluster setup / HA -- Session recording, audit forwarding to SIEM (host-side) -- Tunnel lifecycle (ops-bridge) -- SSO / Teleport (trigger when §6.2 scale thresholds are hit) - -## Relevant When -- Issuing or refreshing a cert for any adm/agt/atm actor -- Checking cert validity / scorecard compliance -- ops-bridge needs cert_command to be defined -- Adding a new actor to the principals inventory - -## Not Relevant When -- Managing tunnel lifecycle (ops-bridge) -- Deploying SSH config to hosts (railiance-infra) -- All access is via static keys with no TTL (legacy mode) - -## Current State -Status: planned (WARDEN-WP-0001 not yet started) - -## Related Repositories -- ops-bridge — primary consumer of cert_command interface -- railiance-infra — owns host-side principal deployment -- the-custodian/state-hub — registers domain/workstreams -``` - ---- - -## Acceptance Criteria - -- [ ] `warden sign agt-test-actor --pubkey /tmp/test.pub` outputs a valid cert (local backend) -- [ ] `warden status agt-test-actor` shows correct identity, principals, and time-to-expiry -- [ ] `warden scorecard` returns 5/5 on a clean test inventory -- [ ] `warden sign` called from ops-bridge `cert_command` in an integration test tunnel -- [ ] All tests pass: `uv run pytest` -- [ ] All lints pass: `uv run ruff check .`