generated from coulomb/repo-seed
WARDEN-WP-0004: repo hygiene and hub sync
Update SCOPE and README to reflect the shipped warden CLI, fill agent rules for stack/architecture/boundary, archive finished workplans 0001–0003, and register WP-0004 in State Hub.
This commit is contained in:
@@ -0,0 +1,333 @@
|
||||
---
|
||||
id: WARDEN-WP-0001
|
||||
type: workplan
|
||||
title: "OpsWarden Initial Implementation"
|
||||
domain: custodian
|
||||
repo: ops-warden
|
||||
status: archived
|
||||
owner: Bernd
|
||||
topic_slug: custodian
|
||||
created: "2026-03-28"
|
||||
updated: "2026-03-28"
|
||||
state_hub_workstream_id: "c3118cc6-adfb-428c-a9c6-edd0ee152ae6"
|
||||
---
|
||||
|
||||
# WARDEN-WP-0001 — OpsWarden Initial Implementation
|
||||
|
||||
> **Note:** This workplan is authored in `ops-bridge` because `ops-warden` does not yet exist.
|
||||
> Move it to `workplans/WARDEN-WP-0001-initial-implementation.md` in the new repo as the
|
||||
> first commit action.
|
||||
|
||||
**Scope:** Bootstrap the `ops-warden` repository and deliver a working `warden` CLI that
|
||||
implements the SSH CA and certificate lifecycle defined in `wiki/AccessManagementDirective.md`.
|
||||
|
||||
**Out of scope:** Vault HA/cluster setup, Ansible playbooks for host principal deployment
|
||||
(those live in `railiance-infra`), session recording, and SSO integration (trigger §6.2 of
|
||||
the directive when scale requires it).
|
||||
|
||||
---
|
||||
|
||||
## Goal
|
||||
|
||||
Create a new `ops-warden` repository that owns **credential issuance only** — the CA,
|
||||
certificate signing, actor identity registry, and scorecard tooling. Its sole public surface
|
||||
to sibling repos is a well-defined `cert_command` interface that any tool (principally
|
||||
`ops-bridge`) can call to obtain a short-lived, CA-signed SSH certificate for a named actor.
|
||||
|
||||
---
|
||||
|
||||
## Reference Documents
|
||||
|
||||
| Document | Location |
|
||||
|---|---|
|
||||
| AccessManagementDirective | `ops-bridge/wiki/AccessManagementDirective.md` |
|
||||
| ops-bridge SCOPE.md | `ops-bridge/SCOPE.md` |
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
ops-warden/
|
||||
├── SCOPE.md
|
||||
├── CLAUDE.md
|
||||
├── pyproject.toml
|
||||
├── src/warden/
|
||||
│ ├── cli.py # Typer CLI: sign / issue / status / inventory / scorecard
|
||||
│ ├── models.py # ActorType enum, CertSpec, CertRecord, PrincipalsInventory
|
||||
│ ├── ca.py # LocalCA backend (file-based, for dev / non-Vault)
|
||||
│ ├── vault.py # VaultCA backend (Vault SSH engine, for production)
|
||||
│ ├── inventory.py # YAML principals inventory read/write
|
||||
│ ├── scorecard.py # §5 compliance checks
|
||||
│ └── config.py # ~/.config/warden/warden.yaml loader
|
||||
├── tests/
|
||||
└── wiki/ # (symlink or copy of AccessManagementDirective.md)
|
||||
```
|
||||
|
||||
**Backends are swappable.** Config key `backend: local | vault` selects which CA
|
||||
implementation is used. This means the tool is fully functional without Vault for local lab
|
||||
use, and production-grade with Vault — the same CLI surface, the same `cert_command`
|
||||
interface, the same principals inventory format.
|
||||
|
||||
**cert_command interface contract:**
|
||||
```
|
||||
warden sign <actor-name> --pubkey <path>
|
||||
```
|
||||
Writes the signed certificate to stdout (the cert text). Exits non-zero on failure.
|
||||
`ops-bridge` calls this verbatim via `cert_command` in `tunnels.yaml`.
|
||||
|
||||
---
|
||||
|
||||
## Stack
|
||||
|
||||
- **Language:** Python 3.11+
|
||||
- **CLI framework:** Typer
|
||||
- **Dependencies:** typer, pyyaml, httpx, cryptography (for cert parsing / TTL reading)
|
||||
- **Vault SDK:** `hvac` (optional; only required for vault backend)
|
||||
- **Packaging:** `uv tool install`
|
||||
|
||||
---
|
||||
|
||||
## Tasks
|
||||
|
||||
### T1 — Repository bootstrap
|
||||
|
||||
```task
|
||||
id: WARDEN-WP-0001-T1
|
||||
state_hub_task_id: 6d643e9d-5e97-4224-9d82-87267b5ba6bc
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
- [x] Create `ops-warden` repo; copy CLAUDE.md template from `ops-bridge`; add
|
||||
`workplans/WARDEN-WP-0001-initial-implementation.md` (this file)
|
||||
- [x] Write `SCOPE.md` (see template in §SCOPE below)
|
||||
- [x] `pyproject.toml`: `[project.scripts] warden = "warden.cli:app"`
|
||||
- [x] Register repo with state-hub (`register_repo`)
|
||||
- [x] Create state-hub workstream for this workplan
|
||||
|
||||
### T2 — Models and config
|
||||
|
||||
```task
|
||||
id: WARDEN-WP-0001-T2
|
||||
state_hub_task_id: c66fc65a-0b16-4ba2-9e70-a83d875572ec
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
- [x] `models.py`: `ActorType` enum (`adm | agt | atm`); `CertSpec` (actor_name, pubkey_path,
|
||||
ttl_hours, principals); `CertRecord` (identity, valid_before, cert_path, signed_at)
|
||||
- [x] `config.py`: load `~/.config/warden/warden.yaml`; required fields: `backend`,
|
||||
`ca_key` (local) or `vault_addr` + `vault_role_map` (vault); optional:
|
||||
`inventory_path`, `state_dir`
|
||||
- [x] Validate actor name prefix matches `ActorType` (`adm-*`, `agt-*`, `atm-*`)
|
||||
|
||||
### T3 — LocalCA backend
|
||||
|
||||
```task
|
||||
id: WARDEN-WP-0001-T3
|
||||
state_hub_task_id: a5a41e58-1c6d-42a9-9b11-2088f17c29b5
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
- [x] `ca.py`: `LocalCA.sign(spec: CertSpec) -> CertRecord`
|
||||
- Calls `ssh-keygen -s <ca_key> -I <identity> -n <principals> -V +<ttl>h <pubkey>`
|
||||
- Parses `ssh-keygen -L -f <cert>` output to extract `Valid before`, `Key ID`,
|
||||
`Principals`
|
||||
- Returns `CertRecord`; writes cert to `~/.local/state/warden/<actor>.cert.pub`
|
||||
- [x] Default TTLs enforced per `ActorType`: adm → 48 h, agt → 24 h, atm → 8 h
|
||||
(overridable per actor in inventory)
|
||||
- [x] `LocalCA.generate_keypair(actor_name) -> (privkey_path, pubkey_path)` — for agt/atm
|
||||
actors that do not bring their own key
|
||||
|
||||
### T4 — VaultCA backend
|
||||
|
||||
```task
|
||||
id: WARDEN-WP-0001-T4
|
||||
state_hub_task_id: b2067ee6-c9ce-423b-9d60-0d28069fb304
|
||||
status: done
|
||||
priority: medium
|
||||
```
|
||||
|
||||
- [x] `vault.py`: `VaultCA.sign(spec: CertSpec) -> CertRecord`
|
||||
- `POST /v1/ssh/sign/<role>` with `public_key`, `valid_principals`, `ttl`
|
||||
- Parse response `signed_key` field; write to state dir; extract metadata via
|
||||
`ssh-keygen -L`
|
||||
- [x] Role map in config: `vault_role_map: {adm: adm-role, agt: agt-role, atm: atm-role}`
|
||||
- [x] Graceful error message when Vault is unreachable (with `--backend local` fallback hint)
|
||||
|
||||
### T5 — Principals inventory
|
||||
|
||||
```task
|
||||
id: WARDEN-WP-0001-T5
|
||||
state_hub_task_id: 6d13f8cd-1850-44c9-b769-b21250348319
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
- [x] `inventory.py`: load/save `inventory.yaml` (format mirrors §4.1 of directive):
|
||||
```yaml
|
||||
actors:
|
||||
agt-state-hub-bridge:
|
||||
type: agt
|
||||
principals: [agt-task-bridge]
|
||||
ttl_hours: 24
|
||||
description: "ops-bridge tunnel actor"
|
||||
hosts:
|
||||
coulombcore:
|
||||
allowed_principals:
|
||||
agt: [agt-task-bridge]
|
||||
atm: [atm-backup-daily]
|
||||
```
|
||||
- [x] `warden inventory list` — print table
|
||||
- [x] `warden inventory add <actor-name> --type <adm|agt|atm> --principals <...>`
|
||||
- [x] `warden inventory remove <actor-name>`
|
||||
|
||||
### T6 — CLI commands
|
||||
|
||||
```task
|
||||
id: WARDEN-WP-0001-T6
|
||||
state_hub_task_id: 656a4615-92bb-4b5d-9406-e86d24fa15d0
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
- [x] `warden sign <actor-name> --pubkey <path>` — sign existing pubkey; write cert to
|
||||
stdout (the `cert_command` interface for ops-bridge)
|
||||
- [x] `warden issue <actor-name>` — generate keypair + sign; output JSON with
|
||||
`privkey`, `cert`, `valid_before`, `identity`
|
||||
- [x] `warden status [actor-name]` — show cert validity, identity, principals, TTL
|
||||
remaining; `--all` flag to show all actors in state dir
|
||||
- [x] `warden scorecard` — run §5 checks (see T7)
|
||||
- [x] `warden inventory <subcommand>` (list / add / remove)
|
||||
|
||||
### T7 — Scorecard runner
|
||||
|
||||
```task
|
||||
id: WARDEN-WP-0001-T7
|
||||
state_hub_task_id: 7818bcc5-f40e-4793-b117-d36f653ffeed
|
||||
status: done
|
||||
priority: medium
|
||||
```
|
||||
|
||||
- [x] `scorecard.py`: implement each §5 row as a named check function returning
|
||||
`CheckResult(name, passed, detail)`
|
||||
- [x] Checks in scope for `ops-warden` (local checks, not host-side):
|
||||
- All certs in state dir respect TTL policy for their `ActorType`
|
||||
- No actor in inventory lacks a `principals` entry
|
||||
- Actor name prefix matches declared type
|
||||
- No cert expired by more than 5 min still present in state dir (stale cleanup)
|
||||
- [x] Host-side checks (password auth disabled, root login disabled, etc.) are out of scope
|
||||
— those live in the Ansible `ssh-access-audit.yml` playbook in `railiance-infra`
|
||||
- [x] `warden scorecard --json` for machine-readable output
|
||||
|
||||
### T8 — ops-ssh-wrapper script
|
||||
|
||||
```task
|
||||
id: WARDEN-WP-0001-T8
|
||||
state_hub_task_id: e9c28152-5785-4995-83a5-439985ed3db9
|
||||
status: done
|
||||
priority: medium
|
||||
```
|
||||
|
||||
- [x] Ship `scripts/ops-ssh-wrapper` (the Python snippet from §4.1, hardened):
|
||||
- Reads `WARDEN_ACTOR` and `SSH_PUBKEY` env vars
|
||||
- Calls `warden sign $WARDEN_ACTOR --pubkey $SSH_PUBKEY`
|
||||
- Loads cert via `ssh-add`; execs the given command
|
||||
- [x] Install as part of `uv tool install` entry points
|
||||
|
||||
### T9 — Tests
|
||||
|
||||
```task
|
||||
id: WARDEN-WP-0001-T9
|
||||
state_hub_task_id: 950139ab-cc17-4f1d-9a17-d5744e402ddf
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
- [x] Unit tests for `LocalCA` (mock `ssh-keygen` subprocess)
|
||||
- [x] Unit tests for inventory YAML round-trip
|
||||
- [x] Unit tests for actor name prefix validation
|
||||
- [x] Integration test: `LocalCA.sign` on a real test keypair (requires `ssh-keygen` in PATH)
|
||||
- [x] Scorecard unit tests (mock cert records)
|
||||
|
||||
### T10 — Documentation
|
||||
|
||||
```task
|
||||
id: WARDEN-WP-0001-T10
|
||||
state_hub_task_id: 271d6759-e359-41ce-80e4-76c574634a87
|
||||
status: done
|
||||
priority: medium
|
||||
```
|
||||
|
||||
- [x] `SCOPE.md` (see below)
|
||||
- [x] `wiki/AccessManagementDirective.md` — copy from `ops-bridge/wiki/`
|
||||
- [x] `wiki/OpsWardenConfig.md` — annotated `warden.yaml` reference
|
||||
- [x] `wiki/CertCommandInterface.md` — contract for `cert_command` callers (ops-bridge etc.)
|
||||
|
||||
---
|
||||
|
||||
## SCOPE.md Template
|
||||
|
||||
```
|
||||
# SCOPE
|
||||
|
||||
## One-liner
|
||||
SSH Certificate Authority and credential issuance for the ops fleet —
|
||||
signs short-lived certs for adm/agt/atm actors; provides the cert_command
|
||||
interface consumed by ops-bridge and other tooling.
|
||||
|
||||
## Core Idea
|
||||
Implements AccessManagementDirective §§1–5. Owns the CA key, actor inventory,
|
||||
signing logic, and scorecard. Does not own tunnel lifecycle, host provisioning,
|
||||
or SSH key generation for humans.
|
||||
|
||||
## In Scope
|
||||
- Local CA backend (ssh-keygen -s) for lab / non-Vault use
|
||||
- Vault SSH engine backend for production
|
||||
- Actor identity registry (inventory.yaml)
|
||||
- cert_command CLI interface: `warden sign <actor> --pubkey <path>`
|
||||
- TTL policy enforcement per ActorType (adm/agt/atm)
|
||||
- Certificate status and stale-cert cleanup
|
||||
- Scorecard checks (local / cert-side only)
|
||||
- ops-ssh-wrapper script for agt/atm startup automation
|
||||
|
||||
## Out of Scope
|
||||
- Host-side principal deployment (railiance-infra Ansible)
|
||||
- SSH key generation for human admins (self-service: ssh-keygen)
|
||||
- Vault cluster setup / HA
|
||||
- Session recording, audit forwarding to SIEM (host-side)
|
||||
- Tunnel lifecycle (ops-bridge)
|
||||
- SSO / Teleport (trigger when §6.2 scale thresholds are hit)
|
||||
|
||||
## Relevant When
|
||||
- Issuing or refreshing a cert for any adm/agt/atm actor
|
||||
- Checking cert validity / scorecard compliance
|
||||
- ops-bridge needs cert_command to be defined
|
||||
- Adding a new actor to the principals inventory
|
||||
|
||||
## Not Relevant When
|
||||
- Managing tunnel lifecycle (ops-bridge)
|
||||
- Deploying SSH config to hosts (railiance-infra)
|
||||
- All access is via static keys with no TTL (legacy mode)
|
||||
|
||||
## Current State
|
||||
Status: planned (WARDEN-WP-0001 not yet started)
|
||||
|
||||
## Related Repositories
|
||||
- ops-bridge — primary consumer of cert_command interface
|
||||
- railiance-infra — owns host-side principal deployment
|
||||
- the-custodian/state-hub — registers domain/workstreams
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [x] `warden sign agt-test-actor --pubkey /tmp/test.pub` outputs a valid cert (local backend)
|
||||
- [x] `warden status agt-test-actor` shows correct identity, principals, and time-to-expiry
|
||||
- [x] `warden scorecard` returns 5/5 on a clean test inventory
|
||||
- [x] `warden sign` called from ops-bridge `cert_command` in an integration test tunnel
|
||||
- [x] All tests pass: `uv run pytest`
|
||||
- [x] All lints pass: `uv run ruff check .`
|
||||
@@ -0,0 +1,176 @@
|
||||
---
|
||||
id: WARDEN-WP-0002
|
||||
type: workplan
|
||||
title: "OpsWarden Correctness and Operational Completeness"
|
||||
domain: custodian
|
||||
repo: ops-warden
|
||||
status: archived
|
||||
owner: Bernd
|
||||
topic_slug: custodian
|
||||
planning_priority: high
|
||||
planning_order: 2
|
||||
created: "2026-05-15"
|
||||
updated: "2026-05-15"
|
||||
state_hub_workstream_id: "5a9fba2c-6161-49a4-a231-e750fa4ab572"
|
||||
---
|
||||
|
||||
# WARDEN-WP-0002 — Correctness and Operational Completeness
|
||||
|
||||
**Scope:** Fix three functional gaps identified after WARDEN-WP-0001: TTL max
|
||||
enforcement (directive compliance), stale cert cleanup (SCOPE.md promises it),
|
||||
and an outgoing signatures log (audit traceability for every signing operation).
|
||||
|
||||
**Out of scope:** Test coverage improvements (WARDEN-WP-0003), Vault cluster
|
||||
setup, host-side principal deployment.
|
||||
|
||||
---
|
||||
|
||||
## Goal
|
||||
|
||||
After this workplan:
|
||||
|
||||
1. `warden sign` and `warden issue` reject TTLs that exceed the type maximum
|
||||
defined in the AccessManagementDirective — no cert can be silently issued
|
||||
with a longer-than-allowed validity window.
|
||||
2. Stale/expired certs do not accumulate in the state dir. `warden cleanup`
|
||||
provides an on-demand sweep; `LocalCA.sign()` auto-evicts the previous cert
|
||||
for the same actor before writing the new one.
|
||||
3. Every successful signing operation is recorded in an append-only
|
||||
`signatures.log` in the state dir. `warden log` provides a human-readable
|
||||
and machine-readable view of the signing history.
|
||||
|
||||
---
|
||||
|
||||
## Reference Documents
|
||||
|
||||
| Document | Location |
|
||||
|---|---|
|
||||
| AccessManagementDirective | `wiki/AccessManagementDirective.md` |
|
||||
| WARDEN-WP-0001 | `workplans/WARDEN-WP-0001-initial-implementation.md` |
|
||||
| SCOPE.md | `SCOPE.md` |
|
||||
|
||||
---
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### TTL enforcement: reject, don't clamp
|
||||
|
||||
When `spec.ttl_hours > DEFAULT_TTL_HOURS[actor_type]`, raise `CAError` rather
|
||||
than silently clamping. A silent clamp would mask configuration errors and hide
|
||||
directive violations from operators. An explicit error forces a deliberate
|
||||
decision.
|
||||
|
||||
The check lives in `CABackend.sign()` before the subprocess call so it applies
|
||||
to both `LocalCA` and `VaultCA`. Vault's own role `max_ttl` provides a second
|
||||
layer; this check is the warden-side gate.
|
||||
|
||||
### Cleanup: proactive (on sign) + reactive (on demand)
|
||||
|
||||
`LocalCA.sign()` removes the previous cert for the same actor before writing the
|
||||
new one — this keeps state_dir from growing unboundedly under normal operation.
|
||||
`warden cleanup` handles the edge cases: certs whose actor is no longer in the
|
||||
inventory, certs from aborted sessions, certs left by actors that were renamed.
|
||||
|
||||
`VaultCA.sign()` also evicts before writing (same logic, same helper function).
|
||||
|
||||
### Signatures log: JSONL, append-only, in state_dir
|
||||
|
||||
One line per signing event, written after a successful `CertRecord` is produced.
|
||||
Format: `{"timestamp": ..., "actor": ..., "actor_type": ..., "identity": ...,
|
||||
"principals": [...], "ttl_hours": ..., "valid_before": ..., "backend": ...}`.
|
||||
|
||||
The log lives alongside certs in `state_dir` so a single directory backup
|
||||
captures the full operational history. No rotation at this scope — add rotation
|
||||
in a follow-up if the file grows beyond a few MB in practice.
|
||||
|
||||
`warden log` is read-only. No deletion via CLI — the log is an audit artefact.
|
||||
|
||||
---
|
||||
|
||||
## Tasks
|
||||
|
||||
### T1 — TTL max enforcement per ActorType
|
||||
|
||||
```task
|
||||
id: WARDEN-WP-0002-T1
|
||||
state_hub_task_id: b0d0b5f7-a181-4590-be26-c48ae28cd964
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
- [x] `models.py`: add `MAX_TTL_HOURS = DEFAULT_TTL_HOURS` alias (same values,
|
||||
explicit name signals policy intent); add helper
|
||||
`enforce_ttl(spec: CertSpec) -> None` that raises `CAError` when
|
||||
`spec.ttl_hours > MAX_TTL_HOURS[spec.actor_type]`
|
||||
- [x] `ca.py`: call `enforce_ttl(spec)` at the top of `CABackend.sign()` base
|
||||
(or in both `LocalCA.sign()` and `VaultCA.sign()` if no shared base call)
|
||||
- [x] `scorecard.py`: add `check_ttl_policy(state_dir, inventory)` — parse each
|
||||
cert in state_dir via `ssh-keygen -L`; compare cert validity window
|
||||
duration against `MAX_TTL_HOURS[actor_type]`; flag if exceeded
|
||||
- [x] Add `check_ttl_policy` to `run_scorecard()`
|
||||
- [x] Update tests: `test_ca.py` — assert `CAError` raised when `ttl_hours`
|
||||
exceeds max for each type; assert no error at exactly the max
|
||||
|
||||
### T2 — Stale cert cleanup command
|
||||
|
||||
```task
|
||||
id: WARDEN-WP-0002-T2
|
||||
state_hub_task_id: aeeefbad-c0bd-4ae8-a3fe-9f72321b4caa
|
||||
status: done
|
||||
priority: medium
|
||||
```
|
||||
|
||||
- [x] `ca.py`: extract `_evict_cert(actor_name, state_dir)` — removes
|
||||
`state_dir/<actor_name>-cert.pub` if it exists; call at the top of
|
||||
`LocalCA.sign()` and `VaultCA.sign()` before writing the new cert
|
||||
- [x] `cli.py`: add `warden cleanup [actor-name]` command
|
||||
- No actor-name: iterate `state_dir/*.cert.pub`, remove any whose
|
||||
`valid_before < now - 5 min`
|
||||
- With actor-name: remove only that actor's cert if stale
|
||||
- `--dry-run`: print what would be removed without deleting
|
||||
- Exit 0 always (cleanup is idempotent; nothing to clean is not an error)
|
||||
- [x] Update `check_no_stale_certs` scorecard check detail message to suggest
|
||||
running `warden cleanup`
|
||||
- [x] Update tests: verify `_evict_cert` is called during sign; verify cleanup
|
||||
command removes stale file; verify `--dry-run` does not delete
|
||||
|
||||
### T3 — Outgoing signatures log
|
||||
|
||||
```task
|
||||
id: WARDEN-WP-0002-T3
|
||||
state_hub_task_id: 0194d24f-a8fe-4f6d-88e6-addea3542c0e
|
||||
status: done
|
||||
priority: medium
|
||||
```
|
||||
|
||||
- [x] `ca.py`: after a successful `CertRecord` is produced in `LocalCA.sign()`
|
||||
and `VaultCA.sign()`, call `_append_signature_log(record, spec, state_dir,
|
||||
backend)` which appends a JSONL line to
|
||||
`state_dir/signatures.log`
|
||||
Fields: `timestamp` (ISO 8601 UTC), `actor`, `actor_type`, `identity`,
|
||||
`principals`, `ttl_hours`, `valid_before`, `cert_path`, `backend`
|
||||
- [x] `cli.py`: add `warden log [actor-name]` command
|
||||
- Reads `state_dir/signatures.log` (empty list if absent)
|
||||
- `--last N` (default 20): show last N entries
|
||||
- `--actor <name>`: filter by actor
|
||||
- `--json`: output newline-delimited JSON; default: Rich table
|
||||
- Exit 0 always
|
||||
- [x] Update tests: verify log entry written after sign; verify log not written
|
||||
on CAError; verify `warden log` filters correctly
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [x] `warden sign agt-test --pubkey /tmp/k.pub --ttl 100` raises `CAError`
|
||||
(agt max is 24h)
|
||||
- [x] `warden sign agt-test --pubkey /tmp/k.pub --ttl 24` succeeds
|
||||
- [x] `warden scorecard` includes TTL policy check; fails when a cert exceeds type max
|
||||
- [x] After `warden sign`, `state_dir/signatures.log` has one new line; valid JSON
|
||||
- [x] `warden log` renders a table; `warden log --json` is parseable
|
||||
- [x] `warden log --actor agt-test` returns only entries for that actor
|
||||
- [x] `warden cleanup --dry-run` lists stale certs without deleting
|
||||
- [x] `warden cleanup` removes stale certs; scorecard `no_stale_certs` passes after
|
||||
- [x] Re-signing an actor replaces its cert file (no accumulation)
|
||||
- [x] All tests pass: `uv run pytest`
|
||||
- [x] All lints pass: `uv run ruff check .`
|
||||
@@ -0,0 +1,209 @@
|
||||
---
|
||||
id: WARDEN-WP-0003
|
||||
type: workplan
|
||||
title: "OpsWarden Test Coverage and Code Quality"
|
||||
domain: custodian
|
||||
repo: ops-warden
|
||||
status: archived
|
||||
owner: Bernd
|
||||
topic_slug: custodian
|
||||
planning_priority: medium
|
||||
planning_order: 3
|
||||
created: "2026-05-15"
|
||||
updated: "2026-05-15"
|
||||
state_hub_workstream_id: "cb2bbf3c-848a-4af6-ba64-8361e64cd4d7"
|
||||
---
|
||||
|
||||
# WARDEN-WP-0003 — Test Coverage and Code Quality
|
||||
|
||||
**Scope:** Close the test coverage gaps left after WARDEN-WP-0001: VaultCA has
|
||||
zero tests, `generate_keypair` is untested, no CLI tests exist, and no real
|
||||
`ssh-keygen` integration test was written. Also fix file permission enforcement
|
||||
(security) and add `--state-dir` override to `warden status` (usability).
|
||||
|
||||
**Out of scope:** Functional behaviour changes (WARDEN-WP-0002), Vault cluster
|
||||
setup, host-side tooling.
|
||||
|
||||
---
|
||||
|
||||
## Goal
|
||||
|
||||
After this workplan:
|
||||
|
||||
1. VaultCA has a full unit test suite covering success, auth failure, network
|
||||
failure, and role-map misconfiguration.
|
||||
2. `generate_keypair` has direct unit tests alongside the existing `sign` tests.
|
||||
3. A `tests/test_cli.py` covers every command's exit codes and output shape.
|
||||
4. A `tests/test_integration.py` marked `@pytest.mark.integration` exercises
|
||||
`LocalCA.sign()` against a real `ssh-keygen` without any mocking.
|
||||
5. Cert and key files written by warden are always mode 600; a scorecard check
|
||||
flags world-readable files.
|
||||
6. `warden status --state-dir <path>` works without a `warden.yaml`.
|
||||
|
||||
---
|
||||
|
||||
## Reference Documents
|
||||
|
||||
| Document | Location |
|
||||
|---|---|
|
||||
| WARDEN-WP-0001 | `workplans/WARDEN-WP-0001-initial-implementation.md` |
|
||||
| WARDEN-WP-0002 | `workplans/WARDEN-WP-0002-correctness-and-completeness.md` |
|
||||
| CertCommandInterface | `wiki/CertCommandInterface.md` |
|
||||
|
||||
---
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### Integration tests: separate marker, not separate directory
|
||||
|
||||
Use `@pytest.mark.integration` and skip when `ssh-keygen` is not in PATH. Tests
|
||||
live in `tests/test_integration.py`. The unit suite (`uv run pytest`) excludes
|
||||
integration tests via `pytest.ini_options addopts = "-m 'not integration'"`;
|
||||
`uv run pytest -m integration` runs them explicitly. This keeps CI fast while
|
||||
making the real-ssh-keygen path easy to invoke manually.
|
||||
|
||||
### File permissions: at write time, not at read time
|
||||
|
||||
`os.chmod(path, 0o600)` is called immediately after each `path.write_text()` or
|
||||
`shutil.copy2()` that writes a key or cert. No deferred or scheduled chmod. The
|
||||
scorecard check catches files that were written by older versions of warden or by
|
||||
external tools.
|
||||
|
||||
### `warden status --state-dir`: config bypass, not config optional
|
||||
|
||||
When `--state-dir` is provided, skip `load_config()` entirely — don't try to
|
||||
load a partial config. This makes the flag useful on remote machines that have
|
||||
received a cert via ops-bridge but have no warden installation.
|
||||
|
||||
---
|
||||
|
||||
## Tasks
|
||||
|
||||
### T1 — VaultCA tests
|
||||
|
||||
```task
|
||||
id: WARDEN-WP-0003-T1
|
||||
state_hub_task_id: eff074ce-c027-4df5-8006-0990296592ac
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
- [x] Create `tests/test_vault.py`
|
||||
- [x] Test `VaultCA.sign()` success: mock `httpx.post` returning a valid
|
||||
`signed_key`; assert `CertRecord` fields; assert cert file written to
|
||||
state_dir
|
||||
- [x] Test HTTP 403: `httpx.HTTPStatusError` → `CAError` with status code in message
|
||||
- [x] Test unreachable Vault: `httpx.RequestError` → `CAError` with fallback hint
|
||||
- [x] Test missing `VAULT_TOKEN`: `_token()` raises `CAError` before HTTP call
|
||||
- [x] Test missing role in `role_map`: `CAError` before HTTP call
|
||||
- [x] Test missing pubkey file: `CAError` before HTTP call
|
||||
|
||||
### T2 — LocalCA.generate_keypair tests
|
||||
|
||||
```task
|
||||
id: WARDEN-WP-0003-T2
|
||||
state_hub_task_id: ddfe5331-0a3b-4783-bdf4-f5ebcdf7965c
|
||||
status: done
|
||||
priority: medium
|
||||
```
|
||||
|
||||
- [x] Add `TestGenerateKeypair` class to `tests/test_ca.py`
|
||||
- [x] Test success: mock `subprocess.run`; assert privkey and pubkey paths returned
|
||||
- [x] Test ssh-keygen called with `-t ed25519`, `-N ""`, `-C actor_name`
|
||||
- [x] Test existing files are unlinked before generation (write dummy files first)
|
||||
- [x] Test `CAError` raised on non-zero ssh-keygen exit code
|
||||
- [x] Test output files land in `state_dir/keys/`
|
||||
|
||||
### T3 — CLI tests
|
||||
|
||||
```task
|
||||
id: WARDEN-WP-0003-T3
|
||||
state_hub_task_id: 040ce3a1-0efb-4816-a2d9-357162dd1612
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
- [x] Create `tests/test_cli.py` using `typer.testing.CliRunner`
|
||||
- [x] `warden sign`: exits 0 and stdout is cert text (mock CA); exits 1 on
|
||||
unknown actor; exits 1 on config error
|
||||
- [x] `warden issue`: exits 1 on vault backend; exits 0 on local backend (mock CA)
|
||||
- [x] `warden status`: exits 0 and prints "no cert" message when state_dir empty;
|
||||
exits 1 when cert is expired (mock `parse_cert_metadata`)
|
||||
- [x] `warden scorecard`: exits 0 on clean inventory + empty state_dir;
|
||||
exits 1 when a check fails
|
||||
- [x] `warden inventory add / list / remove`: round-trip via tmp inventory file
|
||||
- [x] `warden log`: exits 0 with empty output when no log; `--json` is valid JSON
|
||||
(after WARDEN-WP-0002 T3 adds the log command)
|
||||
- [x] `warden cleanup --dry-run`: exits 0, no files deleted
|
||||
(after WARDEN-WP-0002 T2 adds cleanup)
|
||||
|
||||
### T4 — Real ssh-keygen integration test
|
||||
|
||||
```task
|
||||
id: WARDEN-WP-0003-T4
|
||||
state_hub_task_id: 434fb008-103f-410c-85fd-e77b33e61fe4
|
||||
status: done
|
||||
priority: medium
|
||||
```
|
||||
|
||||
- [x] Create `tests/test_integration.py`
|
||||
- [x] Mark all tests `@pytest.mark.integration`
|
||||
- [x] Add `pytest.ini_options` to `pyproject.toml`:
|
||||
`addopts = "-m 'not integration'"` so unit suite skips them by default
|
||||
- [x] Test `LocalCA.sign()` end-to-end: generate a real CA keypair and actor
|
||||
keypair via subprocess ssh-keygen in tmp_path; call `LocalCA.sign()`;
|
||||
assert `CertRecord.valid_before > datetime.now(utc)`; assert cert file
|
||||
exists; assert `parse_cert_metadata()` succeeds on it without mocking
|
||||
- [x] Skip test if `shutil.which("ssh-keygen") is None`
|
||||
- [x] Document in README: `uv run pytest -m integration` to run real-CA tests
|
||||
|
||||
### T5 — File permissions enforcement (mode 600)
|
||||
|
||||
```task
|
||||
id: WARDEN-WP-0003-T5
|
||||
state_hub_task_id: ac146fe6-d1fd-4186-91bd-6f098de72449
|
||||
status: done
|
||||
priority: medium
|
||||
```
|
||||
|
||||
- [x] `ca.py` `LocalCA.sign()`: call `os.chmod(dest, 0o600)` after `shutil.copy2`
|
||||
- [x] `ca.py` `LocalCA.generate_keypair()`: call `os.chmod(privkey, 0o600)` and
|
||||
`os.chmod(pubkey, 0o644)` after generation
|
||||
- [x] `vault.py` `VaultCA.sign()`: call `os.chmod(dest, 0o600)` after `dest.write_text`
|
||||
- [x] `scorecard.py`: add `check_file_permissions(state_dir)` — flag any
|
||||
`*-cert.pub` or `keys/*` file where `stat().st_mode & 0o044 != 0`
|
||||
- [x] Add `check_file_permissions` to `run_scorecard()`
|
||||
- [x] Update `test_ca.py`: assert `os.chmod` called with correct mode after sign
|
||||
and generate_keypair (patch os.chmod or check stat on actual files in
|
||||
tmp_path)
|
||||
|
||||
### T6 — warden status --state-dir override
|
||||
|
||||
```task
|
||||
id: WARDEN-WP-0003-T6
|
||||
state_hub_task_id: 1c9f1987-7b11-43c1-a5e3-c2fd8d1c1589
|
||||
status: done
|
||||
priority: low
|
||||
```
|
||||
|
||||
- [x] `cli.py` `status()`: add
|
||||
`state_dir_override: Annotated[Optional[Path], typer.Option("--state-dir")] = None`
|
||||
- [x] When `--state-dir` is provided: use it directly, skip `_load_cfg()` entirely
|
||||
- [x] When absent: load config as today
|
||||
- [x] Add test in `test_cli.py`: invoke `warden status --state-dir <tmp_path>`
|
||||
without a config file; assert exit 0
|
||||
|
||||
---
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [x] `uv run pytest` runs unit suite only; all pass; VaultCA and generate_keypair
|
||||
covered
|
||||
- [x] `uv run pytest -m integration` succeeds (requires ssh-keygen in PATH)
|
||||
- [x] `test_cli.py` covers all commands; no mocked subprocess in CLI tests where
|
||||
avoidable (use tmp inventory files and mocked CA)
|
||||
- [x] `ls -la ~/.local/state/warden/*.pub` shows mode 600 for newly signed certs
|
||||
- [x] Scorecard `file_permissions` check passes on a clean state dir; fails on a
|
||||
world-readable cert
|
||||
- [x] `warden status --state-dir /tmp/some-dir` runs without a `warden.yaml`
|
||||
- [x] All lints pass: `uv run ruff check .`
|
||||
Reference in New Issue
Block a user