Files
ops-warden/workplans/WARDEN-WP-0001-initial-implementation.md
2026-03-28 00:45:43 +00:00

4.9 KiB

id, type, title, domain, repo, status, owner, topic_slug, created, updated
id type title domain repo status owner topic_slug created updated
WARDEN-WP-0001 workplan OpsWarden Initial Implementation custodian ops-warden draft Bernd custodian 2026-03-28 2026-03-28

WARDEN-WP-0001 — OpsWarden Initial Implementation

Scope: Deliver a working warden CLI that implements the SSH CA and certificate lifecycle defined in wiki/AccessManagementDirective.md. Scaffolding (models, config, CA backends, inventory, scorecard, CLI) is already present in the repo; this workplan tracks the remaining implementation, testing, and integration work.

Out of scope: Vault HA/cluster setup, Ansible playbooks for host principal deployment (those live in railiance-infra), session recording, and SSO integration (trigger §6.2 of the directive when scale requires it).


Goal

After this workplan:

  1. warden sign agt-test --pubkey /tmp/test.pub outputs a valid cert (local backend).
  2. warden status agt-test shows correct identity, principals, and time-to-expiry.
  3. warden scorecard returns 4/4 on a clean test inventory.
  4. warden sign called from ops-bridge cert_command works end-to-end in an integration test tunnel.
  5. All tests pass (uv run pytest) and lints pass (uv run ruff check .).

Reference Documents

Document Location
AccessManagementDirective wiki/AccessManagementDirective.md
cert_command interface wiki/CertCommandInterface.md
Config reference wiki/OpsWardenConfig.md
ops-bridge alignment workplan ../ops-bridge/workplans/BRIDGE-WP-0004-directive-alignment.md

Architecture Summary

~/.config/warden/warden.yaml      # backend, ca_key, inventory_path, state_dir
~/.config/warden/inventory.yaml   # actor registry (name → type, principals, ttl_hours)
~/.local/state/warden/            # signed certs (*-cert.pub); keypairs (keys/)

Two swappable CA backends — both expose the same sign(spec) -> CertRecord interface:

  • LocalCAssh-keygen -s; no Vault dependency; default for dev/lab
  • VaultCA — Vault SSH engine via httpx

cert_command interface (consumed by ops-bridge):

warden sign <actor-name> --pubkey <path>   # → cert text to stdout

Tasks

T1 — Repository registration

  • Register repo with state-hub (register_repo); assign Repo ID; update .claude/rules/repo-identity.md
  • Create state-hub workstream for this workplan

T2 — LocalCA integration test

  • Generate a test CA key: ssh-keygen -t ed25519 -f /tmp/test-ca -N ""
  • Run warden sign against a real pubkey with the test CA (requires ssh-keygen in PATH)
  • Verify cert parses correctly with ssh-keygen -L
  • Add to tests/test_ca.py as an integration test (skipped if ssh-keygen not in PATH)

T3 — VaultCA integration test

  • Set up a local Vault dev server (vault server -dev)
  • Enable SSH secrets engine: vault secrets enable ssh
  • Configure a signing role for agt
  • Run warden sign with backend: vault config
  • Add to tests/test_vault.py as an integration test (skipped if Vault not reachable)

T4 — CLI end-to-end smoke tests

  • warden inventory add agt-test --type agt --principal agt-task-test
  • warden inventory list shows the actor
  • warden issue agt-test (local backend) produces keypair + cert
  • warden status agt-test shows valid cert
  • warden scorecard returns 4/4
  • warden inventory remove agt-test removes actor

T5 — ops-bridge cert_command integration

  • Add agt-state-hub-bridge to inventory (or use existing from ops-bridge config)
  • Set cert_command: "warden sign agt-state-hub-bridge --pubkey ~/.ssh/agt-state-hub-bridge_ed25519.pub" in a test tunnels.yaml
  • Run bridge up state-hub-coulombcore; confirm cert is present in ~/.local/state/bridge/ and cert_identity appears in the audit log
  • Document result in a progress event

T6 — CI/CD setup

  • Add .github/workflows/ci.yml (or equivalent) running uv run pytest and uv run ruff check . on push
  • Tests must pass without Vault (VaultCA integration tests skipped via pytest marker)

T7 — Documentation

  • wiki/OpsWardenConfig.md — annotated warden.yaml reference (already stubbed)
  • wiki/CertCommandInterface.md — contract for cert_command callers (already stubbed)
  • Ensure wiki/AccessManagementDirective.md is in sync with ops-bridge/wiki/

Acceptance Criteria

  • warden sign agt-test --pubkey /tmp/test.pub → valid cert on stdout (local backend)
  • warden status agt-test → identity, principals, time-to-expiry shown correctly
  • warden scorecard → 4/4 on clean inventory
  • warden sign works as cert_command in ops-bridge tunnel config
  • All unit tests pass: uv run pytest
  • All lints pass: uv run ruff check .
  • No secrets (CA private key, certs) committed to repo