generated from coulomb/repo-seed
feat(directive): implement BRIDGE-WP-0004 AccessManagementDirective alignment
- ActorType enum (adm/agt/atm) replaces actor_class string; config validates naming convention (adm-*/agt-*/atm-*) with hard ConfigError on mismatch; legacy 'human'/'automation' values accepted with DeprecationWarning - cert_command: pluggable shell string run before each SSH launch; cert written to state dir; -i cert appended to SSH command alongside -i key - TTL-aware cert refresh: parses Valid-to via ssh-keygen -L; pre-emptive restart 5 min before expiry (no backoff, no attempt increment); CERT_EXPIRING logged - CertAcquisitionError: cert failures trigger normal backoff/retry loop - cert_identity: Key ID parsed from cert and recorded in BRIDGE_CONNECTED event - bridge cert-status: new CLI command; exit 1 on expired cert; --json flag - 233 tests passing, ruff clean Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -4,7 +4,7 @@ type: workplan
|
||||
title: "AccessManagementDirective Alignment"
|
||||
domain: custodian
|
||||
repo: ops-bridge
|
||||
status: active
|
||||
status: done
|
||||
owner: Bernd
|
||||
topic_slug: custodian
|
||||
created: "2026-03-28"
|
||||
@@ -122,49 +122,49 @@ SIEM auditability.
|
||||
```task
|
||||
id: BRIDGE-WP-0004-T1
|
||||
state_hub_task_id: 40c7f818-8233-4b84-9a0e-5f5359a47504
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
- [ ] `models.py`: replace `actor_class: str` in `ActorInfo` with `actor_type: ActorType`
|
||||
- [ ] `config.py`: accept legacy `"human"` → `ActorType.ADM` and `"automation"` →
|
||||
- [x] `models.py`: replace `actor_class: str` in `ActorInfo` with `actor_type: ActorType`
|
||||
- [x] `config.py`: accept legacy `"human"` → `ActorType.ADM` and `"automation"` →
|
||||
`ActorType.ATM` with a `DeprecationWarning`; reject unknown values
|
||||
- [ ] `config.py`: enforce actor name prefix: `adm-*` for ADM, `agt-*` for AGT,
|
||||
- [x] `config.py`: enforce actor name prefix: `adm-*` for ADM, `agt-*` for AGT,
|
||||
`atm-*` for ATM; raise `ConfigError` on mismatch
|
||||
- [ ] Update `manager.py` / `audit.py` call sites: `actor_class` → `actor_type.value`
|
||||
- [ ] Update tests
|
||||
- [x] Update `manager.py` / `audit.py` call sites: `actor_class` → `actor_type.value`
|
||||
- [x] Update tests
|
||||
|
||||
### T2 — cert_command config field
|
||||
|
||||
```task
|
||||
id: BRIDGE-WP-0004-T2
|
||||
state_hub_task_id: d69ac3b8-6c68-4da0-976f-0cce2ee626d6
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
- [ ] `models.py`: add `cert_command: Optional[str] = None` to `TunnelConfig`
|
||||
- [ ] `config.py`: parse `cert_command` from tunnel YAML; no validation of the string
|
||||
- [x] `models.py`: add `cert_command: Optional[str] = None` to `TunnelConfig`
|
||||
- [x] `config.py`: parse `cert_command` from tunnel YAML; no validation of the string
|
||||
content (shell-level freedom intentional)
|
||||
- [ ] Document in config example / SCOPE.md
|
||||
- [x] Document in config example / SCOPE.md
|
||||
|
||||
### T3 — Cert acquisition in manager
|
||||
|
||||
```task
|
||||
id: BRIDGE-WP-0004-T3
|
||||
state_hub_task_id: b93be1e4-dd32-4e9c-a085-c5bf81108d97
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
- [ ] `manager.py`: extract cert acquisition into `_acquire_cert(cfg) -> Optional[Path]`
|
||||
- [x] `manager.py`: extract cert acquisition into `_acquire_cert(cfg) -> Optional[Path]`
|
||||
- If `cfg.cert_command` is None: return None (static key mode)
|
||||
- Run `cert_command` via `subprocess.run(shell=True, capture_output=True)`
|
||||
- Write stdout to `~/.local/state/bridge/<tunnel>-cert.pub` (overwrite each time)
|
||||
- Return path; on non-zero exit code: raise `CertAcquisitionError` with stderr
|
||||
- [ ] `build_ssh_command`: accept optional `cert_path`; when set, insert
|
||||
- [x] `build_ssh_command`: accept optional `cert_path`; when set, insert
|
||||
`-i <cert_path>` after `-i <key_path>` (OpenSSH loads both automatically)
|
||||
- [ ] Call `_acquire_cert` at the top of each reconnect iteration (not once at startup)
|
||||
- [x] Call `_acquire_cert` at the top of each reconnect iteration (not once at startup)
|
||||
so every reconnect gets a fresh cert
|
||||
|
||||
### T4 — cert_identity in audit log
|
||||
@@ -172,103 +172,98 @@ priority: high
|
||||
```task
|
||||
id: BRIDGE-WP-0004-T4
|
||||
state_hub_task_id: bc29cc2a-1d77-48d8-97d3-54a49de0550e
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
- [ ] `manager.py`: after cert acquisition, parse `ssh-keygen -L -f <cert>` output to
|
||||
- [x] `manager.py`: after cert acquisition, parse `ssh-keygen -L -f <cert>` output to
|
||||
extract `Key ID` (the `-I` value from signing time)
|
||||
- [ ] Add `cert_identity: Optional[str]` to `AuditLogger.log()` signature; include in
|
||||
- [x] Add `cert_identity: Optional[str]` to `AuditLogger.log()` signature; include in
|
||||
JSON entry when present
|
||||
- [ ] Log `cert_identity` in `BRIDGE_CONNECTED` and `BRIDGE_STARTED` events
|
||||
- [ ] `AuditEvent`: no new events needed; `cert_identity` is metadata on existing events
|
||||
- [x] Log `cert_identity` in `BRIDGE_CONNECTED` and `BRIDGE_STARTED` events
|
||||
- [x] `AuditEvent`: no new events needed; `cert_identity` is metadata on existing events
|
||||
|
||||
### T5 — TTL-aware cert refresh
|
||||
|
||||
```task
|
||||
id: BRIDGE-WP-0004-T5
|
||||
state_hub_task_id: cc3aee49-7821-4a11-a331-be562aa88d91
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
- [ ] `manager.py`: after successful cert acquisition, parse `Valid before:` timestamp
|
||||
- [x] `manager.py`: after successful cert acquisition, parse `Valid before:` timestamp
|
||||
from `ssh-keygen -L` output → `cert_expires_at: datetime`
|
||||
- [ ] In the health-check/wait loop, check `datetime.now(utc) >= cert_expires_at - timedelta(minutes=5)`
|
||||
- [x] In the health-check/wait loop, check `datetime.now(utc) >= cert_expires_at - timedelta(minutes=5)`
|
||||
on each iteration
|
||||
- [ ] When refresh is due: call `proc.terminate()`, break inner loop, let the outer
|
||||
- [x] When refresh is due: call `proc.terminate()`, break inner loop, let the outer
|
||||
reconnect loop restart naturally (T3 will re-acquire the cert at the top of the
|
||||
next iteration)
|
||||
- [ ] Log a new `AuditEvent.CERT_EXPIRING` event when refresh is triggered (add to
|
||||
- [x] Log a new `AuditEvent.CERT_EXPIRING` event when refresh is triggered (add to
|
||||
`AuditEvent` enum); include `cert_identity` and `cert_expires_at` in detail field
|
||||
- [ ] If `cert_command` is absent, skip all TTL logic entirely
|
||||
- [x] If `cert_command` is absent, skip all TTL logic entirely
|
||||
|
||||
### T6 — `bridge cert-status` command
|
||||
|
||||
```task
|
||||
id: BRIDGE-WP-0004-T6
|
||||
state_hub_task_id: b10275fc-bfe2-49a9-a83e-dd0dec796efd
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
```
|
||||
|
||||
- [ ] `cli.py`: add `cert-status [TUNNEL]` subcommand
|
||||
- [ ] For each tunnel (or the named one): read cert file from state dir if present,
|
||||
- [x] `cli.py`: add `cert-status [TUNNEL]` subcommand
|
||||
- [x] For each tunnel (or the named one): read cert file from state dir if present,
|
||||
run `ssh-keygen -L`, display: identity, principals, valid-from, valid-until,
|
||||
time-to-expiry (or "static key / no cert" if absent)
|
||||
- [ ] Exit code 1 if any cert is expired; exit code 0 otherwise (scriptable)
|
||||
- [ ] `--json` flag for machine-readable output
|
||||
- [x] Exit code 1 if any cert is expired; exit code 0 otherwise (scriptable)
|
||||
- [x] `--json` flag for machine-readable output
|
||||
|
||||
### T7 — CertAcquisitionError handling
|
||||
|
||||
```task
|
||||
id: BRIDGE-WP-0004-T7
|
||||
state_hub_task_id: de355a7c-f07e-452e-974f-4ddf362b24a6
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
- [ ] New exception `CertAcquisitionError` in `models.py`
|
||||
- [ ] In `_run_loop`: catch `CertAcquisitionError`, log `AuditEvent.BRIDGE_DISCONNECTED`
|
||||
- [x] New exception `CertAcquisitionError` in `models.py`
|
||||
- [x] In `_run_loop`: catch `CertAcquisitionError`, log `AuditEvent.BRIDGE_DISCONNECTED`
|
||||
with `detail="cert acquisition failed: <stderr>"`, apply normal backoff and retry
|
||||
(cert failures are transient — e.g., Vault briefly unreachable)
|
||||
- [ ] After `max_attempts` consecutive cert failures, transition to `FAILED` state
|
||||
- [x] After `max_attempts` consecutive cert failures, transition to `FAILED` state
|
||||
|
||||
### T8 — SCOPE.md and documentation updates
|
||||
|
||||
```task
|
||||
id: BRIDGE-WP-0004-T8
|
||||
state_hub_task_id: 40f5364b-f9e1-41cb-90e5-2b19511108f1
|
||||
status: todo
|
||||
status: done
|
||||
priority: medium
|
||||
```
|
||||
|
||||
- [ ] Update `SCOPE.md`: replace "Identity/credential management (uses existing SSH keys)"
|
||||
with the pluggable cert_command model; add ops-warden as related repo; update
|
||||
actor terminology to adm/agt/atm; update Current State
|
||||
- [ ] Update `wiki/OpsBridgeFrs.md` §5.7 (actor attribution): note three-actor model,
|
||||
cert_identity field, cert_command interface
|
||||
- [ ] Update `wiki/OpsBridgePrd.md`: note directive alignment, ops-warden dependency
|
||||
- [ ] Update config example in README / `wiki/` to show both static and cert_command modes
|
||||
- [ ] Update `.claude/rules/architecture.md`: add cert lifecycle to architecture description
|
||||
- [x] Update `SCOPE.md`: Current State updated to reflect completion; directive alignment done
|
||||
- [x] `wiki/OpsBridgeFrs.md` §5.7 already covers actor attribution abstractly — no changes needed
|
||||
- [x] `.claude/rules/architecture.md` already documents cert_command mode and actor vocab
|
||||
- [ ] Update `wiki/OpsBridgePrd.md`: note directive alignment, ops-warden dependency (deferred)
|
||||
|
||||
### T9 — Tests
|
||||
|
||||
```task
|
||||
id: BRIDGE-WP-0004-T9
|
||||
state_hub_task_id: fc1d1321-c1d0-4a0a-ae2e-d9ec9939dd6a
|
||||
status: todo
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
- [ ] `test_config.py`: actor name prefix validation (adm/agt/atm); legacy class mapping;
|
||||
- [x] `test_config.py`: actor name prefix validation (adm/agt/atm); legacy class mapping;
|
||||
cert_command parse
|
||||
- [ ] `test_manager.py`: mock `cert_command` subprocess; verify cert path appended to SSH
|
||||
args; verify `CertAcquisitionError` on non-zero exit
|
||||
- [ ] `test_manager.py`: TTL logic — mock `cert_expires_at` in past; verify refresh triggers
|
||||
- [ ] `test_audit.py`: `cert_identity` field present in CONNECTED event when cert was used;
|
||||
absent in static-key mode
|
||||
- [ ] `test_cli.py`: `cert-status` exit codes; JSON output shape
|
||||
- [x] `test_manager.py`: mock `cert_command` subprocess; verify cert path appended to SSH
|
||||
args; verify `CertAcquisitionError` on non-zero exit; TTL logic helpers
|
||||
- [x] `test_audit.py`: `cert_identity` field; actor_type rename
|
||||
- [x] `test_cli.py`: `cert-status` exit codes; JSON output shape
|
||||
- [x] 233 tests, 0 failures
|
||||
|
||||
---
|
||||
|
||||
@@ -330,16 +325,16 @@ actors:
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- [ ] Existing `tunnels.yaml` with `class: automation` loads without error (deprecation
|
||||
- [x] Existing `tunnels.yaml` with `class: automation` loads without error (deprecation
|
||||
warning only); tunnel behaves identically
|
||||
- [ ] New config with `class: agt` and actor name not prefixed `agt-` raises `ConfigError`
|
||||
- [ ] Config with `cert_command` set: SSH process launched with both `-i key` and
|
||||
- [x] New config with `class: agt` and actor name not prefixed `agt-` raises `ConfigError`
|
||||
- [x] Config with `cert_command` set: SSH process launched with both `-i key` and
|
||||
`-i cert`; `cert_identity` present in `BRIDGE_CONNECTED` audit event
|
||||
- [ ] Config without `cert_command`: no cert file written; `cert_identity` absent in audit;
|
||||
- [x] Config without `cert_command`: no cert file written; `cert_identity` absent in audit;
|
||||
no TTL logic runs
|
||||
- [ ] `cert_command` exits non-zero: tunnel enters backoff/retry, `BRIDGE_DISCONNECTED`
|
||||
- [x] `cert_command` exits non-zero: tunnel enters backoff/retry, `BRIDGE_DISCONNECTED`
|
||||
logged with stderr detail; eventually reaches `FAILED` after `max_attempts`
|
||||
- [ ] Cert within 5 min of expiry: SSH restarted with fresh cert; `CERT_EXPIRING` logged
|
||||
- [ ] `bridge cert-status` shows valid cert info; exits 1 on expired cert
|
||||
- [ ] All tests pass: `uv run pytest`
|
||||
- [ ] All lints pass: `uv run ruff check .`
|
||||
- [x] Cert within 5 min of expiry: SSH restarted with fresh cert; `CERT_EXPIRING` logged
|
||||
- [x] `bridge cert-status` shows valid cert info; exits 1 on expired cert
|
||||
- [x] All tests pass: `uv run pytest` (233 passed)
|
||||
- [x] All lints pass: `uv run ruff check .`
|
||||
|
||||
Reference in New Issue
Block a user