Files
ops-bridge/workplans/WARDEN-WP-0001-initial-implementation.md
tegwick 22601ef3e6 chore(workplans): sync BRIDGE-WP-0004 and WARDEN-WP-0001 tasks to state hub
Both workplans had been registered as active workstreams but tasks were
never ingested — the markdown checkbox format was invisible to the
consistency checker, which requires task code blocks. Activated both
workplans (draft→active) and added task blocks with state_hub_task_id
for all 19 tasks (9 + 10).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 00:29:51 +02:00

11 KiB
Raw Blame History

id, type, title, domain, repo, status, owner, topic_slug, created, updated, state_hub_workstream_id
id type title domain repo status owner topic_slug created updated state_hub_workstream_id
WARDEN-WP-0001 workplan OpsWarden Initial Implementation custodian ops-warden active Bernd custodian 2026-03-28 2026-03-28 c3118cc6-adfb-428c-a9c6-edd0ee152ae6

WARDEN-WP-0001 — OpsWarden Initial Implementation

Note: This workplan is authored in ops-bridge because ops-warden does not yet exist. Move it to workplans/WARDEN-WP-0001-initial-implementation.md in the new repo as the first commit action.

Scope: Bootstrap the ops-warden repository and deliver a working warden CLI that implements the SSH CA and certificate lifecycle defined in wiki/AccessManagementDirective.md.

Out of scope: Vault HA/cluster setup, Ansible playbooks for host principal deployment (those live in railiance-infra), session recording, and SSO integration (trigger §6.2 of the directive when scale requires it).


Goal

Create a new ops-warden repository that owns credential issuance only — the CA, certificate signing, actor identity registry, and scorecard tooling. Its sole public surface to sibling repos is a well-defined cert_command interface that any tool (principally ops-bridge) can call to obtain a short-lived, CA-signed SSH certificate for a named actor.


Reference Documents

Document Location
AccessManagementDirective ops-bridge/wiki/AccessManagementDirective.md
ops-bridge SCOPE.md ops-bridge/SCOPE.md

Architecture

ops-warden/
├── SCOPE.md
├── CLAUDE.md
├── pyproject.toml
├── src/warden/
│   ├── cli.py          # Typer CLI: sign / issue / status / inventory / scorecard
│   ├── models.py       # ActorType enum, CertSpec, CertRecord, PrincipalsInventory
│   ├── ca.py           # LocalCA backend (file-based, for dev / non-Vault)
│   ├── vault.py        # VaultCA backend (Vault SSH engine, for production)
│   ├── inventory.py    # YAML principals inventory read/write
│   ├── scorecard.py    # §5 compliance checks
│   └── config.py       # ~/.config/warden/warden.yaml loader
├── tests/
└── wiki/               # (symlink or copy of AccessManagementDirective.md)

Backends are swappable. Config key backend: local | vault selects which CA implementation is used. This means the tool is fully functional without Vault for local lab use, and production-grade with Vault — the same CLI surface, the same cert_command interface, the same principals inventory format.

cert_command interface contract:

warden sign <actor-name> --pubkey <path>

Writes the signed certificate to stdout (the cert text). Exits non-zero on failure. ops-bridge calls this verbatim via cert_command in tunnels.yaml.


Stack

  • Language: Python 3.11+
  • CLI framework: Typer
  • Dependencies: typer, pyyaml, httpx, cryptography (for cert parsing / TTL reading)
  • Vault SDK: hvac (optional; only required for vault backend)
  • Packaging: uv tool install

Tasks

T1 — Repository bootstrap

id: WARDEN-WP-0001-T1
state_hub_task_id: 6d643e9d-5e97-4224-9d82-87267b5ba6bc
status: todo
priority: high
  • Create ops-warden repo; copy CLAUDE.md template from ops-bridge; add workplans/WARDEN-WP-0001-initial-implementation.md (this file)
  • Write SCOPE.md (see template in §SCOPE below)
  • pyproject.toml: [project.scripts] warden = "warden.cli:app"
  • Register repo with state-hub (register_repo)
  • Create state-hub workstream for this workplan

T2 — Models and config

id: WARDEN-WP-0001-T2
state_hub_task_id: c66fc65a-0b16-4ba2-9e70-a83d875572ec
status: todo
priority: high
  • models.py: ActorType enum (adm | agt | atm); CertSpec (actor_name, pubkey_path, ttl_hours, principals); CertRecord (identity, valid_before, cert_path, signed_at)
  • config.py: load ~/.config/warden/warden.yaml; required fields: backend, ca_key (local) or vault_addr + vault_role_map (vault); optional: inventory_path, state_dir
  • Validate actor name prefix matches ActorType (adm-*, agt-*, atm-*)

T3 — LocalCA backend

id: WARDEN-WP-0001-T3
state_hub_task_id: a5a41e58-1c6d-42a9-9b11-2088f17c29b5
status: todo
priority: high
  • ca.py: LocalCA.sign(spec: CertSpec) -> CertRecord - Calls ssh-keygen -s <ca_key> -I <identity> -n <principals> -V +<ttl>h <pubkey> - Parses ssh-keygen -L -f <cert> output to extract Valid before, Key ID, Principals - Returns CertRecord; writes cert to ~/.local/state/warden/<actor>.cert.pub
  • Default TTLs enforced per ActorType: adm → 48 h, agt → 24 h, atm → 8 h (overridable per actor in inventory)
  • LocalCA.generate_keypair(actor_name) -> (privkey_path, pubkey_path) — for agt/atm actors that do not bring their own key

T4 — VaultCA backend

id: WARDEN-WP-0001-T4
state_hub_task_id: b2067ee6-c9ce-423b-9d60-0d28069fb304
status: todo
priority: medium
  • vault.py: VaultCA.sign(spec: CertSpec) -> CertRecord - POST /v1/ssh/sign/<role> with public_key, valid_principals, ttl - Parse response signed_key field; write to state dir; extract metadata via ssh-keygen -L
  • Role map in config: vault_role_map: {adm: adm-role, agt: agt-role, atm: atm-role}
  • Graceful error message when Vault is unreachable (with --backend local fallback hint)

T5 — Principals inventory

id: WARDEN-WP-0001-T5
state_hub_task_id: 6d13f8cd-1850-44c9-b769-b21250348319
status: todo
priority: high
  • inventory.py: load/save inventory.yaml (format mirrors §4.1 of directive): yaml actors: agt-state-hub-bridge: type: agt principals: [agt-task-bridge] ttl_hours: 24 description: "ops-bridge tunnel actor" hosts: coulombcore: allowed_principals: agt: [agt-task-bridge] atm: [atm-backup-daily]
  • warden inventory list — print table
  • warden inventory add <actor-name> --type <adm|agt|atm> --principals <...>
  • warden inventory remove <actor-name>

T6 — CLI commands

id: WARDEN-WP-0001-T6
state_hub_task_id: 656a4615-92bb-4b5d-9406-e86d24fa15d0
status: todo
priority: high
  • warden sign <actor-name> --pubkey <path> — sign existing pubkey; write cert to stdout (the cert_command interface for ops-bridge)
  • warden issue <actor-name> — generate keypair + sign; output JSON with privkey, cert, valid_before, identity
  • warden status [actor-name] — show cert validity, identity, principals, TTL remaining; --all flag to show all actors in state dir
  • warden scorecard — run §5 checks (see T7)
  • warden inventory <subcommand> (list / add / remove)

T7 — Scorecard runner

id: WARDEN-WP-0001-T7
state_hub_task_id: 7818bcc5-f40e-4793-b117-d36f653ffeed
status: todo
priority: medium
  • scorecard.py: implement each §5 row as a named check function returning CheckResult(name, passed, detail)
  • Checks in scope for ops-warden (local checks, not host-side): - All certs in state dir respect TTL policy for their ActorType - No actor in inventory lacks a principals entry - Actor name prefix matches declared type - No cert expired by more than 5 min still present in state dir (stale cleanup)
  • Host-side checks (password auth disabled, root login disabled, etc.) are out of scope — those live in the Ansible ssh-access-audit.yml playbook in railiance-infra
  • warden scorecard --json for machine-readable output

T8 — ops-ssh-wrapper script

id: WARDEN-WP-0001-T8
state_hub_task_id: e9c28152-5785-4995-83a5-439985ed3db9
status: todo
priority: medium
  • Ship scripts/ops-ssh-wrapper (the Python snippet from §4.1, hardened): - Reads WARDEN_ACTOR and SSH_PUBKEY env vars - Calls warden sign $WARDEN_ACTOR --pubkey $SSH_PUBKEY - Loads cert via ssh-add; execs the given command
  • Install as part of uv tool install entry points

T9 — Tests

id: WARDEN-WP-0001-T9
state_hub_task_id: 950139ab-cc17-4f1d-9a17-d5744e402ddf
status: todo
priority: high
  • Unit tests for LocalCA (mock ssh-keygen subprocess)
  • Unit tests for inventory YAML round-trip
  • Unit tests for actor name prefix validation
  • Integration test: LocalCA.sign on a real test keypair (requires ssh-keygen in PATH)
  • Scorecard unit tests (mock cert records)

T10 — Documentation

id: WARDEN-WP-0001-T10
state_hub_task_id: 271d6759-e359-41ce-80e4-76c574634a87
status: todo
priority: medium
  • SCOPE.md (see below)
  • wiki/AccessManagementDirective.md — copy from ops-bridge/wiki/
  • wiki/OpsWardenConfig.md — annotated warden.yaml reference
  • wiki/CertCommandInterface.md — contract for cert_command callers (ops-bridge etc.)

SCOPE.md Template

# SCOPE

## One-liner
SSH Certificate Authority and credential issuance for the ops fleet —
signs short-lived certs for adm/agt/atm actors; provides the cert_command
interface consumed by ops-bridge and other tooling.

## Core Idea
Implements AccessManagementDirective §§15. Owns the CA key, actor inventory,
signing logic, and scorecard. Does not own tunnel lifecycle, host provisioning,
or SSH key generation for humans.

## In Scope
- Local CA backend (ssh-keygen -s) for lab / non-Vault use
- Vault SSH engine backend for production
- Actor identity registry (inventory.yaml)
- cert_command CLI interface: `warden sign <actor> --pubkey <path>`
- TTL policy enforcement per ActorType (adm/agt/atm)
- Certificate status and stale-cert cleanup
- Scorecard checks (local / cert-side only)
- ops-ssh-wrapper script for agt/atm startup automation

## Out of Scope
- Host-side principal deployment (railiance-infra Ansible)
- SSH key generation for human admins (self-service: ssh-keygen)
- Vault cluster setup / HA
- Session recording, audit forwarding to SIEM (host-side)
- Tunnel lifecycle (ops-bridge)
- SSO / Teleport (trigger when §6.2 scale thresholds are hit)

## Relevant When
- Issuing or refreshing a cert for any adm/agt/atm actor
- Checking cert validity / scorecard compliance
- ops-bridge needs cert_command to be defined
- Adding a new actor to the principals inventory

## Not Relevant When
- Managing tunnel lifecycle (ops-bridge)
- Deploying SSH config to hosts (railiance-infra)
- All access is via static keys with no TTL (legacy mode)

## Current State
Status: planned (WARDEN-WP-0001 not yet started)

## Related Repositories
- ops-bridge — primary consumer of cert_command interface
- railiance-infra — owns host-side principal deployment
- the-custodian/state-hub — registers domain/workstreams

Acceptance Criteria

  • warden sign agt-test-actor --pubkey /tmp/test.pub outputs a valid cert (local backend)
  • warden status agt-test-actor shows correct identity, principals, and time-to-expiry
  • warden scorecard returns 5/5 on a clean test inventory
  • warden sign called from ops-bridge cert_command in an integration test tunnel
  • All tests pass: uv run pytest
  • All lints pass: uv run ruff check .