feat(warden): implement WARDEN-WP-0002 correctness and operational completeness

T1 — TTL max enforcement:
  - models.py: MAX_TTL_HOURS policy constant
  - ca.py: _enforce_ttl() raises CAError when spec.ttl_hours > type max
  - Called at top of LocalCA.sign() and VaultCA.sign()
  - scorecard.py: check_ttl_policy() — flags certs with issued TTL > type max
  - run_scorecard() now returns 5 checks

T2 — Stale cert cleanup:
  - ca.py: _evict_cert() removes existing cert before writing new one (no accumulation)
  - cli.py: warden cleanup [actor] [--dry-run] command
  - check_no_stale_certs detail suggests 'warden cleanup' when stale certs found

T3 — Outgoing signatures log:
  - ca.py: _append_signature_log() writes JSONL to state_dir/signatures.log
  - Called after every successful sign() in LocalCA and VaultCA
  - cli.py: warden log [actor] [--last N] [--json] command
  - parse_cert_metadata now also returns valid_from (needed for TTL policy check)

61 tests passing, ruff clean.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-05-15 15:53:10 +02:00
parent 66e93e5e5c
commit 9857ed1424
9 changed files with 494 additions and 37 deletions

View File

@@ -4,7 +4,7 @@ type: workplan
title: "OpsWarden Correctness and Operational Completeness"
domain: custodian
repo: ops-warden
status: active
status: done
owner: Bernd
topic_slug: custodian
planning_priority: high
@@ -94,21 +94,21 @@ in a follow-up if the file grows beyond a few MB in practice.
```task
id: WARDEN-WP-0002-T1
state_hub_task_id: b0d0b5f7-a181-4590-be26-c48ae28cd964
status: todo
status: done
priority: high
```
- [ ] `models.py`: add `MAX_TTL_HOURS = DEFAULT_TTL_HOURS` alias (same values,
- [x] `models.py`: add `MAX_TTL_HOURS = DEFAULT_TTL_HOURS` alias (same values,
explicit name signals policy intent); add helper
`enforce_ttl(spec: CertSpec) -> None` that raises `CAError` when
`spec.ttl_hours > MAX_TTL_HOURS[spec.actor_type]`
- [ ] `ca.py`: call `enforce_ttl(spec)` at the top of `CABackend.sign()` base
- [x] `ca.py`: call `enforce_ttl(spec)` at the top of `CABackend.sign()` base
(or in both `LocalCA.sign()` and `VaultCA.sign()` if no shared base call)
- [ ] `scorecard.py`: add `check_ttl_policy(state_dir, inventory)` — parse each
- [x] `scorecard.py`: add `check_ttl_policy(state_dir, inventory)` — parse each
cert in state_dir via `ssh-keygen -L`; compare cert validity window
duration against `MAX_TTL_HOURS[actor_type]`; flag if exceeded
- [ ] Add `check_ttl_policy` to `run_scorecard()`
- [ ] Update tests: `test_ca.py` — assert `CAError` raised when `ttl_hours`
- [x] Add `check_ttl_policy` to `run_scorecard()`
- [x] Update tests: `test_ca.py` — assert `CAError` raised when `ttl_hours`
exceeds max for each type; assert no error at exactly the max
### T2 — Stale cert cleanup command
@@ -116,22 +116,22 @@ priority: high
```task
id: WARDEN-WP-0002-T2
state_hub_task_id: aeeefbad-c0bd-4ae8-a3fe-9f72321b4caa
status: todo
status: done
priority: medium
```
- [ ] `ca.py`: extract `_evict_cert(actor_name, state_dir)` — removes
- [x] `ca.py`: extract `_evict_cert(actor_name, state_dir)` — removes
`state_dir/<actor_name>-cert.pub` if it exists; call at the top of
`LocalCA.sign()` and `VaultCA.sign()` before writing the new cert
- [ ] `cli.py`: add `warden cleanup [actor-name]` command
- [x] `cli.py`: add `warden cleanup [actor-name]` command
- No actor-name: iterate `state_dir/*.cert.pub`, remove any whose
`valid_before < now - 5 min`
- With actor-name: remove only that actor's cert if stale
- `--dry-run`: print what would be removed without deleting
- Exit 0 always (cleanup is idempotent; nothing to clean is not an error)
- [ ] Update `check_no_stale_certs` scorecard check detail message to suggest
- [x] Update `check_no_stale_certs` scorecard check detail message to suggest
running `warden cleanup`
- [ ] Update tests: verify `_evict_cert` is called during sign; verify cleanup
- [x] Update tests: verify `_evict_cert` is called during sign; verify cleanup
command removes stale file; verify `--dry-run` does not delete
### T3 — Outgoing signatures log
@@ -139,38 +139,38 @@ priority: medium
```task
id: WARDEN-WP-0002-T3
state_hub_task_id: 0194d24f-a8fe-4f6d-88e6-addea3542c0e
status: todo
status: done
priority: medium
```
- [ ] `ca.py`: after a successful `CertRecord` is produced in `LocalCA.sign()`
- [x] `ca.py`: after a successful `CertRecord` is produced in `LocalCA.sign()`
and `VaultCA.sign()`, call `_append_signature_log(record, spec, state_dir,
backend)` which appends a JSONL line to
`state_dir/signatures.log`
Fields: `timestamp` (ISO 8601 UTC), `actor`, `actor_type`, `identity`,
`principals`, `ttl_hours`, `valid_before`, `cert_path`, `backend`
- [ ] `cli.py`: add `warden log [actor-name]` command
- [x] `cli.py`: add `warden log [actor-name]` command
- Reads `state_dir/signatures.log` (empty list if absent)
- `--last N` (default 20): show last N entries
- `--actor <name>`: filter by actor
- `--json`: output newline-delimited JSON; default: Rich table
- Exit 0 always
- [ ] Update tests: verify log entry written after sign; verify log not written
- [x] Update tests: verify log entry written after sign; verify log not written
on CAError; verify `warden log` filters correctly
---
## Acceptance Criteria
- [ ] `warden sign agt-test --pubkey /tmp/k.pub --ttl 100` raises `CAError`
- [x] `warden sign agt-test --pubkey /tmp/k.pub --ttl 100` raises `CAError`
(agt max is 24h)
- [ ] `warden sign agt-test --pubkey /tmp/k.pub --ttl 24` succeeds
- [ ] `warden scorecard` includes TTL policy check; fails when a cert exceeds type max
- [ ] After `warden sign`, `state_dir/signatures.log` has one new line; valid JSON
- [ ] `warden log` renders a table; `warden log --json` is parseable
- [ ] `warden log --actor agt-test` returns only entries for that actor
- [ ] `warden cleanup --dry-run` lists stale certs without deleting
- [ ] `warden cleanup` removes stale certs; scorecard `no_stale_certs` passes after
- [ ] Re-signing an actor replaces its cert file (no accumulation)
- [ ] All tests pass: `uv run pytest`
- [ ] All lints pass: `uv run ruff check .`
- [x] `warden sign agt-test --pubkey /tmp/k.pub --ttl 24` succeeds
- [x] `warden scorecard` includes TTL policy check; fails when a cert exceeds type max
- [x] After `warden sign`, `state_dir/signatures.log` has one new line; valid JSON
- [x] `warden log` renders a table; `warden log --json` is parseable
- [x] `warden log --actor agt-test` returns only entries for that actor
- [x] `warden cleanup --dry-run` lists stale certs without deleting
- [x] `warden cleanup` removes stale certs; scorecard `no_stale_certs` passes after
- [x] Re-signing an actor replaces its cert file (no accumulation)
- [x] All tests pass: `uv run pytest`
- [x] All lints pass: `uv run ruff check .`