From c66cb1b0fef71498643d4a634769d9befd8a88c4 Mon Sep 17 00:00:00 2001 From: tegwick Date: Fri, 15 May 2026 15:28:31 +0200 Subject: [PATCH] chore(workplans): add WARDEN-WP-0002 and WARDEN-WP-0003 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit WP-0002 — Correctness and Operational Completeness (priority: high) T1: TTL max enforcement per ActorType T2: Stale cert cleanup command (warden cleanup) T3: Outgoing signatures log (warden log) WP-0003 — Test Coverage and Code Quality (priority: medium) T1: VaultCA tests T2: LocalCA.generate_keypair tests T3: CLI tests (test_cli.py) T4: Real ssh-keygen integration test T5: File permissions enforcement (mode 600) T6: warden status --state-dir override Both registered in Custodian State Hub under ops-warden repo (74df727e). Co-Authored-By: Claude Sonnet 4.6 --- ...EN-WP-0002-correctness-and-completeness.md | 174 +++++++++++++++ ...ARDEN-WP-0003-test-coverage-and-quality.md | 207 ++++++++++++++++++ 2 files changed, 381 insertions(+) create mode 100644 workplans/WARDEN-WP-0002-correctness-and-completeness.md create mode 100644 workplans/WARDEN-WP-0003-test-coverage-and-quality.md diff --git a/workplans/WARDEN-WP-0002-correctness-and-completeness.md b/workplans/WARDEN-WP-0002-correctness-and-completeness.md new file mode 100644 index 0000000..c4ae587 --- /dev/null +++ b/workplans/WARDEN-WP-0002-correctness-and-completeness.md @@ -0,0 +1,174 @@ +--- +id: WARDEN-WP-0002 +type: workplan +title: "OpsWarden Correctness and Operational Completeness" +domain: custodian +repo: ops-warden +status: active +owner: Bernd +topic_slug: custodian +created: "2026-05-15" +updated: "2026-05-15" +state_hub_workstream_id: "5a9fba2c-6161-49a4-a231-e750fa4ab572" +--- + +# WARDEN-WP-0002 — Correctness and Operational Completeness + +**Scope:** Fix three functional gaps identified after WARDEN-WP-0001: TTL max +enforcement (directive compliance), stale cert cleanup (SCOPE.md promises it), +and an outgoing signatures log (audit traceability for every signing operation). + +**Out of scope:** Test coverage improvements (WARDEN-WP-0003), Vault cluster +setup, host-side principal deployment. + +--- + +## Goal + +After this workplan: + +1. `warden sign` and `warden issue` reject TTLs that exceed the type maximum + defined in the AccessManagementDirective — no cert can be silently issued + with a longer-than-allowed validity window. +2. Stale/expired certs do not accumulate in the state dir. `warden cleanup` + provides an on-demand sweep; `LocalCA.sign()` auto-evicts the previous cert + for the same actor before writing the new one. +3. Every successful signing operation is recorded in an append-only + `signatures.log` in the state dir. `warden log` provides a human-readable + and machine-readable view of the signing history. + +--- + +## Reference Documents + +| Document | Location | +|---|---| +| AccessManagementDirective | `wiki/AccessManagementDirective.md` | +| WARDEN-WP-0001 | `workplans/WARDEN-WP-0001-initial-implementation.md` | +| SCOPE.md | `SCOPE.md` | + +--- + +## Design Decisions + +### TTL enforcement: reject, don't clamp + +When `spec.ttl_hours > DEFAULT_TTL_HOURS[actor_type]`, raise `CAError` rather +than silently clamping. A silent clamp would mask configuration errors and hide +directive violations from operators. An explicit error forces a deliberate +decision. + +The check lives in `CABackend.sign()` before the subprocess call so it applies +to both `LocalCA` and `VaultCA`. Vault's own role `max_ttl` provides a second +layer; this check is the warden-side gate. + +### Cleanup: proactive (on sign) + reactive (on demand) + +`LocalCA.sign()` removes the previous cert for the same actor before writing the +new one — this keeps state_dir from growing unboundedly under normal operation. +`warden cleanup` handles the edge cases: certs whose actor is no longer in the +inventory, certs from aborted sessions, certs left by actors that were renamed. + +`VaultCA.sign()` also evicts before writing (same logic, same helper function). + +### Signatures log: JSONL, append-only, in state_dir + +One line per signing event, written after a successful `CertRecord` is produced. +Format: `{"timestamp": ..., "actor": ..., "actor_type": ..., "identity": ..., +"principals": [...], "ttl_hours": ..., "valid_before": ..., "backend": ...}`. + +The log lives alongside certs in `state_dir` so a single directory backup +captures the full operational history. No rotation at this scope — add rotation +in a follow-up if the file grows beyond a few MB in practice. + +`warden log` is read-only. No deletion via CLI — the log is an audit artefact. + +--- + +## Tasks + +### T1 — TTL max enforcement per ActorType + +```task +id: WARDEN-WP-0002-T1 +state_hub_task_id: b0d0b5f7-a181-4590-be26-c48ae28cd964 +status: todo +priority: high +``` + +- [ ] `models.py`: add `MAX_TTL_HOURS = DEFAULT_TTL_HOURS` alias (same values, + explicit name signals policy intent); add helper + `enforce_ttl(spec: CertSpec) -> None` that raises `CAError` when + `spec.ttl_hours > MAX_TTL_HOURS[spec.actor_type]` +- [ ] `ca.py`: call `enforce_ttl(spec)` at the top of `CABackend.sign()` base + (or in both `LocalCA.sign()` and `VaultCA.sign()` if no shared base call) +- [ ] `scorecard.py`: add `check_ttl_policy(state_dir, inventory)` — parse each + cert in state_dir via `ssh-keygen -L`; compare cert validity window + duration against `MAX_TTL_HOURS[actor_type]`; flag if exceeded +- [ ] Add `check_ttl_policy` to `run_scorecard()` +- [ ] Update tests: `test_ca.py` — assert `CAError` raised when `ttl_hours` + exceeds max for each type; assert no error at exactly the max + +### T2 — Stale cert cleanup command + +```task +id: WARDEN-WP-0002-T2 +state_hub_task_id: aeeefbad-c0bd-4ae8-a3fe-9f72321b4caa +status: todo +priority: medium +``` + +- [ ] `ca.py`: extract `_evict_cert(actor_name, state_dir)` — removes + `state_dir/-cert.pub` if it exists; call at the top of + `LocalCA.sign()` and `VaultCA.sign()` before writing the new cert +- [ ] `cli.py`: add `warden cleanup [actor-name]` command + - No actor-name: iterate `state_dir/*.cert.pub`, remove any whose + `valid_before < now - 5 min` + - With actor-name: remove only that actor's cert if stale + - `--dry-run`: print what would be removed without deleting + - Exit 0 always (cleanup is idempotent; nothing to clean is not an error) +- [ ] Update `check_no_stale_certs` scorecard check detail message to suggest + running `warden cleanup` +- [ ] Update tests: verify `_evict_cert` is called during sign; verify cleanup + command removes stale file; verify `--dry-run` does not delete + +### T3 — Outgoing signatures log + +```task +id: WARDEN-WP-0002-T3 +state_hub_task_id: 0194d24f-a8fe-4f6d-88e6-addea3542c0e +status: todo +priority: medium +``` + +- [ ] `ca.py`: after a successful `CertRecord` is produced in `LocalCA.sign()` + and `VaultCA.sign()`, call `_append_signature_log(record, spec, state_dir, + backend)` which appends a JSONL line to + `state_dir/signatures.log` + Fields: `timestamp` (ISO 8601 UTC), `actor`, `actor_type`, `identity`, + `principals`, `ttl_hours`, `valid_before`, `cert_path`, `backend` +- [ ] `cli.py`: add `warden log [actor-name]` command + - Reads `state_dir/signatures.log` (empty list if absent) + - `--last N` (default 20): show last N entries + - `--actor `: filter by actor + - `--json`: output newline-delimited JSON; default: Rich table + - Exit 0 always +- [ ] Update tests: verify log entry written after sign; verify log not written + on CAError; verify `warden log` filters correctly + +--- + +## Acceptance Criteria + +- [ ] `warden sign agt-test --pubkey /tmp/k.pub --ttl 100` raises `CAError` + (agt max is 24h) +- [ ] `warden sign agt-test --pubkey /tmp/k.pub --ttl 24` succeeds +- [ ] `warden scorecard` includes TTL policy check; fails when a cert exceeds type max +- [ ] After `warden sign`, `state_dir/signatures.log` has one new line; valid JSON +- [ ] `warden log` renders a table; `warden log --json` is parseable +- [ ] `warden log --actor agt-test` returns only entries for that actor +- [ ] `warden cleanup --dry-run` lists stale certs without deleting +- [ ] `warden cleanup` removes stale certs; scorecard `no_stale_certs` passes after +- [ ] Re-signing an actor replaces its cert file (no accumulation) +- [ ] All tests pass: `uv run pytest` +- [ ] All lints pass: `uv run ruff check .` diff --git a/workplans/WARDEN-WP-0003-test-coverage-and-quality.md b/workplans/WARDEN-WP-0003-test-coverage-and-quality.md new file mode 100644 index 0000000..d1d7327 --- /dev/null +++ b/workplans/WARDEN-WP-0003-test-coverage-and-quality.md @@ -0,0 +1,207 @@ +--- +id: WARDEN-WP-0003 +type: workplan +title: "OpsWarden Test Coverage and Code Quality" +domain: custodian +repo: ops-warden +status: active +owner: Bernd +topic_slug: custodian +created: "2026-05-15" +updated: "2026-05-15" +state_hub_workstream_id: "cb2bbf3c-848a-4af6-ba64-8361e64cd4d7" +--- + +# WARDEN-WP-0003 — Test Coverage and Code Quality + +**Scope:** Close the test coverage gaps left after WARDEN-WP-0001: VaultCA has +zero tests, `generate_keypair` is untested, no CLI tests exist, and no real +`ssh-keygen` integration test was written. Also fix file permission enforcement +(security) and add `--state-dir` override to `warden status` (usability). + +**Out of scope:** Functional behaviour changes (WARDEN-WP-0002), Vault cluster +setup, host-side tooling. + +--- + +## Goal + +After this workplan: + +1. VaultCA has a full unit test suite covering success, auth failure, network + failure, and role-map misconfiguration. +2. `generate_keypair` has direct unit tests alongside the existing `sign` tests. +3. A `tests/test_cli.py` covers every command's exit codes and output shape. +4. A `tests/test_integration.py` marked `@pytest.mark.integration` exercises + `LocalCA.sign()` against a real `ssh-keygen` without any mocking. +5. Cert and key files written by warden are always mode 600; a scorecard check + flags world-readable files. +6. `warden status --state-dir ` works without a `warden.yaml`. + +--- + +## Reference Documents + +| Document | Location | +|---|---| +| WARDEN-WP-0001 | `workplans/WARDEN-WP-0001-initial-implementation.md` | +| WARDEN-WP-0002 | `workplans/WARDEN-WP-0002-correctness-and-completeness.md` | +| CertCommandInterface | `wiki/CertCommandInterface.md` | + +--- + +## Design Decisions + +### Integration tests: separate marker, not separate directory + +Use `@pytest.mark.integration` and skip when `ssh-keygen` is not in PATH. Tests +live in `tests/test_integration.py`. The unit suite (`uv run pytest`) excludes +integration tests via `pytest.ini_options addopts = "-m 'not integration'"`; +`uv run pytest -m integration` runs them explicitly. This keeps CI fast while +making the real-ssh-keygen path easy to invoke manually. + +### File permissions: at write time, not at read time + +`os.chmod(path, 0o600)` is called immediately after each `path.write_text()` or +`shutil.copy2()` that writes a key or cert. No deferred or scheduled chmod. The +scorecard check catches files that were written by older versions of warden or by +external tools. + +### `warden status --state-dir`: config bypass, not config optional + +When `--state-dir` is provided, skip `load_config()` entirely — don't try to +load a partial config. This makes the flag useful on remote machines that have +received a cert via ops-bridge but have no warden installation. + +--- + +## Tasks + +### T1 — VaultCA tests + +```task +id: WARDEN-WP-0003-T1 +state_hub_task_id: eff074ce-c027-4df5-8006-0990296592ac +status: todo +priority: high +``` + +- [ ] Create `tests/test_vault.py` +- [ ] Test `VaultCA.sign()` success: mock `httpx.post` returning a valid + `signed_key`; assert `CertRecord` fields; assert cert file written to + state_dir +- [ ] Test HTTP 403: `httpx.HTTPStatusError` → `CAError` with status code in message +- [ ] Test unreachable Vault: `httpx.RequestError` → `CAError` with fallback hint +- [ ] Test missing `VAULT_TOKEN`: `_token()` raises `CAError` before HTTP call +- [ ] Test missing role in `role_map`: `CAError` before HTTP call +- [ ] Test missing pubkey file: `CAError` before HTTP call + +### T2 — LocalCA.generate_keypair tests + +```task +id: WARDEN-WP-0003-T2 +state_hub_task_id: ddfe5331-0a3b-4783-bdf4-f5ebcdf7965c +status: todo +priority: medium +``` + +- [ ] Add `TestGenerateKeypair` class to `tests/test_ca.py` +- [ ] Test success: mock `subprocess.run`; assert privkey and pubkey paths returned +- [ ] Test ssh-keygen called with `-t ed25519`, `-N ""`, `-C actor_name` +- [ ] Test existing files are unlinked before generation (write dummy files first) +- [ ] Test `CAError` raised on non-zero ssh-keygen exit code +- [ ] Test output files land in `state_dir/keys/` + +### T3 — CLI tests + +```task +id: WARDEN-WP-0003-T3 +state_hub_task_id: 040ce3a1-0efb-4816-a2d9-357162dd1612 +status: todo +priority: high +``` + +- [ ] Create `tests/test_cli.py` using `typer.testing.CliRunner` +- [ ] `warden sign`: exits 0 and stdout is cert text (mock CA); exits 1 on + unknown actor; exits 1 on config error +- [ ] `warden issue`: exits 1 on vault backend; exits 0 on local backend (mock CA) +- [ ] `warden status`: exits 0 and prints "no cert" message when state_dir empty; + exits 1 when cert is expired (mock `parse_cert_metadata`) +- [ ] `warden scorecard`: exits 0 on clean inventory + empty state_dir; + exits 1 when a check fails +- [ ] `warden inventory add / list / remove`: round-trip via tmp inventory file +- [ ] `warden log`: exits 0 with empty output when no log; `--json` is valid JSON + (after WARDEN-WP-0002 T3 adds the log command) +- [ ] `warden cleanup --dry-run`: exits 0, no files deleted + (after WARDEN-WP-0002 T2 adds cleanup) + +### T4 — Real ssh-keygen integration test + +```task +id: WARDEN-WP-0003-T4 +state_hub_task_id: 434fb008-103f-410c-85fd-e77b33e61fe4 +status: todo +priority: medium +``` + +- [ ] Create `tests/test_integration.py` +- [ ] Mark all tests `@pytest.mark.integration` +- [ ] Add `pytest.ini_options` to `pyproject.toml`: + `addopts = "-m 'not integration'"` so unit suite skips them by default +- [ ] Test `LocalCA.sign()` end-to-end: generate a real CA keypair and actor + keypair via subprocess ssh-keygen in tmp_path; call `LocalCA.sign()`; + assert `CertRecord.valid_before > datetime.now(utc)`; assert cert file + exists; assert `parse_cert_metadata()` succeeds on it without mocking +- [ ] Skip test if `shutil.which("ssh-keygen") is None` +- [ ] Document in README: `uv run pytest -m integration` to run real-CA tests + +### T5 — File permissions enforcement (mode 600) + +```task +id: WARDEN-WP-0003-T5 +state_hub_task_id: ac146fe6-d1fd-4186-91bd-6f098de72449 +status: todo +priority: medium +``` + +- [ ] `ca.py` `LocalCA.sign()`: call `os.chmod(dest, 0o600)` after `shutil.copy2` +- [ ] `ca.py` `LocalCA.generate_keypair()`: call `os.chmod(privkey, 0o600)` and + `os.chmod(pubkey, 0o644)` after generation +- [ ] `vault.py` `VaultCA.sign()`: call `os.chmod(dest, 0o600)` after `dest.write_text` +- [ ] `scorecard.py`: add `check_file_permissions(state_dir)` — flag any + `*-cert.pub` or `keys/*` file where `stat().st_mode & 0o044 != 0` +- [ ] Add `check_file_permissions` to `run_scorecard()` +- [ ] Update `test_ca.py`: assert `os.chmod` called with correct mode after sign + and generate_keypair (patch os.chmod or check stat on actual files in + tmp_path) + +### T6 — warden status --state-dir override + +```task +id: WARDEN-WP-0003-T6 +state_hub_task_id: 1c9f1987-7b11-43c1-a5e3-c2fd8d1c1589 +status: todo +priority: low +``` + +- [ ] `cli.py` `status()`: add + `state_dir_override: Annotated[Optional[Path], typer.Option("--state-dir")] = None` +- [ ] When `--state-dir` is provided: use it directly, skip `_load_cfg()` entirely +- [ ] When absent: load config as today +- [ ] Add test in `test_cli.py`: invoke `warden status --state-dir ` + without a config file; assert exit 0 + +--- + +## Acceptance Criteria + +- [ ] `uv run pytest` runs unit suite only; all pass; VaultCA and generate_keypair + covered +- [ ] `uv run pytest -m integration` succeeds (requires ssh-keygen in PATH) +- [ ] `test_cli.py` covers all commands; no mocked subprocess in CLI tests where + avoidable (use tmp inventory files and mocked CA) +- [ ] `ls -la ~/.local/state/warden/*.pub` shows mode 600 for newly signed certs +- [ ] Scorecard `file_permissions` check passes on a clean state dir; fails on a + world-readable cert +- [ ] `warden status --state-dir /tmp/some-dir` runs without a `warden.yaml` +- [ ] All lints pass: `uv run ruff check .`