T1 — TTL max enforcement: - models.py: MAX_TTL_HOURS policy constant - ca.py: _enforce_ttl() raises CAError when spec.ttl_hours > type max - Called at top of LocalCA.sign() and VaultCA.sign() - scorecard.py: check_ttl_policy() — flags certs with issued TTL > type max - run_scorecard() now returns 5 checks T2 — Stale cert cleanup: - ca.py: _evict_cert() removes existing cert before writing new one (no accumulation) - cli.py: warden cleanup [actor] [--dry-run] command - check_no_stale_certs detail suggests 'warden cleanup' when stale certs found T3 — Outgoing signatures log: - ca.py: _append_signature_log() writes JSONL to state_dir/signatures.log - Called after every successful sign() in LocalCA and VaultCA - cli.py: warden log [actor] [--last N] [--json] command - parse_cert_metadata now also returns valid_from (needed for TTL policy check) 61 tests passing, ruff clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6.9 KiB
id, type, title, domain, repo, status, owner, topic_slug, planning_priority, planning_order, created, updated, state_hub_workstream_id
| id | type | title | domain | repo | status | owner | topic_slug | planning_priority | planning_order | created | updated | state_hub_workstream_id |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| WARDEN-WP-0002 | workplan | OpsWarden Correctness and Operational Completeness | custodian | ops-warden | done | Bernd | custodian | high | 2 | 2026-05-15 | 2026-05-15 | 5a9fba2c-6161-49a4-a231-e750fa4ab572 |
WARDEN-WP-0002 — Correctness and Operational Completeness
Scope: Fix three functional gaps identified after WARDEN-WP-0001: TTL max enforcement (directive compliance), stale cert cleanup (SCOPE.md promises it), and an outgoing signatures log (audit traceability for every signing operation).
Out of scope: Test coverage improvements (WARDEN-WP-0003), Vault cluster setup, host-side principal deployment.
Goal
After this workplan:
warden signandwarden issuereject TTLs that exceed the type maximum defined in the AccessManagementDirective — no cert can be silently issued with a longer-than-allowed validity window.- Stale/expired certs do not accumulate in the state dir.
warden cleanupprovides an on-demand sweep;LocalCA.sign()auto-evicts the previous cert for the same actor before writing the new one. - Every successful signing operation is recorded in an append-only
signatures.login the state dir.warden logprovides a human-readable and machine-readable view of the signing history.
Reference Documents
| Document | Location |
|---|---|
| AccessManagementDirective | wiki/AccessManagementDirective.md |
| WARDEN-WP-0001 | workplans/WARDEN-WP-0001-initial-implementation.md |
| SCOPE.md | SCOPE.md |
Design Decisions
TTL enforcement: reject, don't clamp
When spec.ttl_hours > DEFAULT_TTL_HOURS[actor_type], raise CAError rather
than silently clamping. A silent clamp would mask configuration errors and hide
directive violations from operators. An explicit error forces a deliberate
decision.
The check lives in CABackend.sign() before the subprocess call so it applies
to both LocalCA and VaultCA. Vault's own role max_ttl provides a second
layer; this check is the warden-side gate.
Cleanup: proactive (on sign) + reactive (on demand)
LocalCA.sign() removes the previous cert for the same actor before writing the
new one — this keeps state_dir from growing unboundedly under normal operation.
warden cleanup handles the edge cases: certs whose actor is no longer in the
inventory, certs from aborted sessions, certs left by actors that were renamed.
VaultCA.sign() also evicts before writing (same logic, same helper function).
Signatures log: JSONL, append-only, in state_dir
One line per signing event, written after a successful CertRecord is produced.
Format: {"timestamp": ..., "actor": ..., "actor_type": ..., "identity": ..., "principals": [...], "ttl_hours": ..., "valid_before": ..., "backend": ...}.
The log lives alongside certs in state_dir so a single directory backup
captures the full operational history. No rotation at this scope — add rotation
in a follow-up if the file grows beyond a few MB in practice.
warden log is read-only. No deletion via CLI — the log is an audit artefact.
Tasks
T1 — TTL max enforcement per ActorType
id: WARDEN-WP-0002-T1
state_hub_task_id: b0d0b5f7-a181-4590-be26-c48ae28cd964
status: done
priority: high
models.py: addMAX_TTL_HOURS = DEFAULT_TTL_HOURSalias (same values, explicit name signals policy intent); add helperenforce_ttl(spec: CertSpec) -> Nonethat raisesCAErrorwhenspec.ttl_hours > MAX_TTL_HOURS[spec.actor_type]ca.py: callenforce_ttl(spec)at the top ofCABackend.sign()base (or in bothLocalCA.sign()andVaultCA.sign()if no shared base call)scorecard.py: addcheck_ttl_policy(state_dir, inventory)— parse each cert in state_dir viassh-keygen -L; compare cert validity window duration againstMAX_TTL_HOURS[actor_type]; flag if exceeded- Add
check_ttl_policytorun_scorecard() - Update tests:
test_ca.py— assertCAErrorraised whenttl_hoursexceeds max for each type; assert no error at exactly the max
T2 — Stale cert cleanup command
id: WARDEN-WP-0002-T2
state_hub_task_id: aeeefbad-c0bd-4ae8-a3fe-9f72321b4caa
status: done
priority: medium
ca.py: extract_evict_cert(actor_name, state_dir)— removesstate_dir/<actor_name>-cert.pubif it exists; call at the top ofLocalCA.sign()andVaultCA.sign()before writing the new certcli.py: addwarden cleanup [actor-name]command - No actor-name: iteratestate_dir/*.cert.pub, remove any whosevalid_before < now - 5 min- With actor-name: remove only that actor's cert if stale ---dry-run: print what would be removed without deleting - Exit 0 always (cleanup is idempotent; nothing to clean is not an error)- Update
check_no_stale_certsscorecard check detail message to suggest runningwarden cleanup - Update tests: verify
_evict_certis called during sign; verify cleanup command removes stale file; verify--dry-rundoes not delete
T3 — Outgoing signatures log
id: WARDEN-WP-0002-T3
state_hub_task_id: 0194d24f-a8fe-4f6d-88e6-addea3542c0e
status: done
priority: medium
ca.py: after a successfulCertRecordis produced inLocalCA.sign()andVaultCA.sign(), call_append_signature_log(record, spec, state_dir, backend)which appends a JSONL line tostate_dir/signatures.logFields:timestamp(ISO 8601 UTC),actor,actor_type,identity,principals,ttl_hours,valid_before,cert_path,backendcli.py: addwarden log [actor-name]command - Readsstate_dir/signatures.log(empty list if absent) ---last N(default 20): show last N entries ---actor <name>: filter by actor ---json: output newline-delimited JSON; default: Rich table - Exit 0 always- Update tests: verify log entry written after sign; verify log not written
on CAError; verify
warden logfilters correctly
Acceptance Criteria
warden sign agt-test --pubkey /tmp/k.pub --ttl 100raisesCAError(agt max is 24h)warden sign agt-test --pubkey /tmp/k.pub --ttl 24succeedswarden scorecardincludes TTL policy check; fails when a cert exceeds type max- After
warden sign,state_dir/signatures.loghas one new line; valid JSON warden logrenders a table;warden log --jsonis parseablewarden log --actor agt-testreturns only entries for that actorwarden cleanup --dry-runlists stale certs without deletingwarden cleanupremoves stale certs; scorecardno_stale_certspasses after- Re-signing an actor replaces its cert file (no accumulation)
- All tests pass:
uv run pytest - All lints pass:
uv run ruff check .