All 39 tasks marked done; both workstreams updated to completed status in the State Hub and workplan files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
11 KiB
id, type, title, domain, repo, status, owner, topic_slug, state_hub_workstream_id, created, updated
| id | type | title | domain | repo | status | owner | topic_slug | state_hub_workstream_id | created | updated |
|---|---|---|---|---|---|---|---|---|---|---|
| BRIDGE-WP-0001 | workplan | OpsBridge Initial Implementation | custodian | ops-bridge | completed | Bernd | custodian | 79112cff-9c0a-42ad-aa3d-916013001aee | 2026-03-11 | 2026-03-12 |
BRIDGE-WP-0001 — OpsBridge Initial Implementation
Scope: Full implementation of the bridge CLI tool as specified in the PRD and FRS.
Out of scope: OpsCatalog integration (deferred to a future workplan).
Goal
Deliver a working bridge CLI installable via uv tool install that manages named SSH reverse tunnels with auto-reconnect, optional HTTP health checks, actor attribution, and an operational audit log.
Reference Documents
| Document | Location |
|---|---|
| PRD | wiki/OpsBridgePrd.md |
| FRS | wiki/OpsBridgeFrs.md |
| CLAUDE.md | CLAUDE.md |
Architecture Summary
~/.config/bridge/tunnels.yaml # static config: tunnels + actors
~/.local/state/bridge/ # runtime state
<name>.pid # PID of tunnel subprocess manager
<name>.log # reconnect + health event log
<name>.state # current state string (for status cmd)
src/bridge/
__init__.py
cli.py # Typer app, all commands
config.py # load + validate tunnels.yaml
models.py # dataclasses: TunnelConfig, BridgeState, ActorInfo
manager.py # TunnelManager: start/stop subprocess, reconnect loop
health.py # HTTP health check via httpx
state.py # read/write PID + state files
audit.py # structured event log writer
Bridge state machine: stopped → starting → connected → degraded → failed
degraded= SSH process alive but HTTP health check failingfailed= reconnect attempts exhausted (configurable max)
Config Schema (~/.config/bridge/tunnels.yaml)
tunnels:
state-hub-coulombcore:
host: coulombcore.local
remote_port: 18000
local_port: 8000
ssh_user: ubuntu
ssh_key: ~/.ssh/id_ops
actor: agent.claude-coulombcore
health_check:
url: http://127.0.0.1:18000/health # checked from remote side
interval_seconds: 30
timeout_seconds: 5
reconnect:
max_attempts: 0 # 0 = infinite
backoff_initial: 5
backoff_max: 60
actors:
agent.claude-coulombcore:
class: automation
description: Claude Code agent on CoulombCore
operator.bernd:
class: human
description: Bernd Worsch
Phase 1 — Project Scaffolding
Acceptance: bridge --help lists all commands.
T01 — Create pyproject.toml
id: BRIDGE-WP-0001-T01
state_hub_task_id: 76c9ee58-10bf-4060-87bb-b73fa8cf25ea
status: done
priority: high
Set up [project], [project.scripts] (entry point bridge = bridge.cli:app), and dependencies: typer, pyyaml, httpx. Run uv lock.
T02 — Create package skeleton
id: BRIDGE-WP-0001-T02
state_hub_task_id: b2be974c-6173-457d-9276-080ac551c105
status: done
priority: high
Create src/bridge/__init__.py and empty module stubs: cli.py, config.py, models.py, manager.py, health.py, state.py, audit.py.
T03 — Verify uv tool install
id: BRIDGE-WP-0001-T03
state_hub_task_id: 82f70483-91ae-4545-88af-44fe693ecb79
status: done
priority: medium
Verify uv tool install -e . produces a working bridge --help.
Phase 2 — Config Loading (FR-2, FC-1)
Acceptance: config.load() returns typed config objects; clear error message on bad YAML.
T04 — Define config dataclasses in models.py
id: BRIDGE-WP-0001-T04
state_hub_task_id: 495e4257-40ad-4a1b-8a71-3a311476d41e
status: done
priority: high
Define TunnelConfig, ReconnectPolicy, HealthCheckConfig, ActorInfo as dataclasses.
T05 — Implement config.py
id: BRIDGE-WP-0001-T05
state_hub_task_id: b6782df4-e692-49e1-b3a3-d65d07826907
status: done
priority: high
Load ~/.config/bridge/tunnels.yaml, validate required fields, raise clear errors. Support BRIDGE_CONFIG env var override for testing.
T06 — Unit tests for config loading
id: BRIDGE-WP-0001-T06
state_hub_task_id: 341c866f-8f4b-4165-9fa5-f10fe37c9252
status: done
priority: medium
Test: valid config, missing required field, unknown tunnel name.
Phase 3 — State Management (FR-4, FR-7, FR-14)
Acceptance: State round-trips correctly; stale PIDs detected without error.
T07 — Implement state.py
id: BRIDGE-WP-0001-T07
state_hub_task_id: ae5e2566-a4b1-426f-9c32-4a2c025f2927
status: done
priority: high
Read/write PID file and state file under ~/.local/state/bridge/. Check if PID is alive. Create state dir on first write.
T08 — Define BridgeState enum
id: BRIDGE-WP-0001-T08
state_hub_task_id: 456a3cb5-50fa-4fed-9283-57e2d1c6fbb9
status: done
priority: medium
States: STOPPED, STARTING, CONNECTED, DEGRADED, RECONNECTING, FAILED.
T09 — Unit tests for state management
id: BRIDGE-WP-0001-T09
state_hub_task_id: 0accc0b7-d013-43ad-a810-3269e64fb096
status: done
priority: medium
Test: write/read state round-trip, stale PID detection without error.
Phase 4 — Tunnel Process Manager (FR-1, FR-3, FR-12, FR-13)
Acceptance: bridge up <name> starts tunnel; killing SSH process triggers reconnect; bridge down <name> stops cleanly.
T10 — Implement TunnelManager — SSH subprocess wrapper
id: BRIDGE-WP-0001-T10
state_hub_task_id: d0341e90-b48d-48ab-9e6d-82f4c365afec
status: done
priority: high
SSH command: ssh -N -R {remote_port}:127.0.0.1:{local_port} -i {key} -o ServerAliveInterval=10 -o ExitOnForwardFailure=yes {user}@{host}. Manager runs as a daemonised child process; parent writes PID and exits.
T11 — Implement reconnect backoff loop
id: BRIDGE-WP-0001-T11
state_hub_task_id: f5c91eff-fca3-4f66-b073-276a733b5a27
status: done
priority: high
Exponential backoff between backoff_initial and backoff_max. Respect max_attempts (0 = infinite). On disconnect: state → RECONNECTING, log event, restart SSH.
T12 — Implement graceful shutdown
id: BRIDGE-WP-0001-T12
state_hub_task_id: 3f4df535-0d6a-49e8-9d3a-c3926d7f230c
status: done
priority: medium
Catch SIGTERM/SIGINT, kill SSH subprocess, write STOPPED state.
Phase 5 — Health Monitoring (FR-15, FR-16, FR-17)
Acceptance: With a non-responsive health URL, bridge status shows degraded.
T13 — Implement health.py
id: BRIDGE-WP-0001-T13
state_hub_task_id: 5aaa0e35-f32a-4c68-8707-1a1e037b76f4
status: done
priority: medium
Async HTTP GET via httpx to configured health URL. Run health check loop inside manager process. On failure: state → DEGRADED; on recovery: state → CONNECTED.
T14 — Write health check result to state dir
id: BRIDGE-WP-0001-T14
state_hub_task_id: 599d4e28-88c8-4c2a-80ac-ca57824af467
status: done
priority: low
Persist timestamp, status, HTTP code or error for display in bridge status.
Phase 6 — Audit Logging (FR-24, FR-25, FR-26)
Acceptance: All lifecycle events appear in the log with actor attribution.
T15 — Implement audit.py
id: BRIDGE-WP-0001-T15
state_hub_task_id: 2f124b16-f1e7-4e9f-ad23-9f08543db3b7
status: done
priority: medium
Append JSON-lines to ~/.local/state/bridge/<name>.log. Events: bridge_started, bridge_connected, bridge_disconnected, bridge_reconnecting, health_check_failed, health_check_recovered, bridge_stopped. Each entry: timestamp (ISO-8601), tunnel, actor, actor_class, event, detail.
Phase 7 — CLI Commands (FR-1, FR-5, FR-8, FR-10, FR-11)
Acceptance: All commands work end-to-end; --help on each command shows correct usage.
Status table columns: TUNNEL, STATE, ACTOR, HOST, UPTIME, HEALTH. Exit codes: 0 = success, 1 = tunnel not found / config error, 2 = tunnel already in requested state. --json flag on status for automation.
T16 — CLI: bridge up
id: BRIDGE-WP-0001-T16
state_hub_task_id: 2c22b8fe-8a35-4887-89b2-f8fb7f43e0b6
status: done
priority: high
Start named tunnel or all tunnels if name omitted.
T17 — CLI: bridge down
id: BRIDGE-WP-0001-T17
state_hub_task_id: 768e1a8b-fdf7-4718-b00e-bc2401f57657
status: done
priority: high
Stop named tunnel or all tunnels if name omitted.
T18 — CLI: bridge restart
id: BRIDGE-WP-0001-T18
state_hub_task_id: 8fd6486d-af4f-4295-a57a-a5fabbf25681
status: done
priority: medium
Down then up for named tunnel or all.
T19 — CLI: bridge status
id: BRIDGE-WP-0001-T19
state_hub_task_id: 28f3f392-9e94-43e7-811a-fa036f588e10
status: done
priority: high
Table output with --json flag for automation.
T20 — CLI: bridge logs
id: BRIDGE-WP-0001-T20
state_hub_task_id: 43582657-b1b9-4113-88e1-2109b30f3732
status: done
priority: medium
Tail log file. Defaults to last 50 lines. --follow for live tail. --lines N to override.
Phase 8 — Integration Tests
Acceptance: uv run pytest passes cleanly.
T21 — Integration test: up/status/down cycle
id: BRIDGE-WP-0001-T21
state_hub_task_id: 5e3c7ac6-03fd-45e9-af64-11bde1d03ab8
status: done
priority: medium
Test fixture with minimal tunnels.yaml pointing to localhost. Test full up → status → down cycle against loopback SSH target or mocked subprocess.
T22 — Integration test: reconnect behaviour
id: BRIDGE-WP-0001-T22
state_hub_task_id: 8b6ac68e-d0ab-4826-8df5-ebdf30a1e23e
status: done
priority: medium
Test reconnect loop with a subprocess that exits immediately.
T23 — Integration test: health check degraded path
id: BRIDGE-WP-0001-T23
state_hub_task_id: c472bb1a-2fe2-4a88-aa6b-e18f732a3fde
status: done
priority: medium
Test degraded state with a mock HTTP server that returns failures.
FRS Traceability
| FRS Requirement Group | Phase |
|---|---|
| FR-1 to FR-4 — Bridge creation | 4 |
| FR-5 to FR-7 — Bridge termination | 4 |
| FR-8 to FR-9 — Bridge restart | 7 |
| FR-10 to FR-11 — Status inspection | 7 |
| FR-12 to FR-14 — Lifecycle monitoring | 4 |
| FR-15 to FR-17 — Health monitoring | 5 |
| FR-18 to FR-20 — Actor attribution | 2, 6 |
| FR-24 to FR-26 — Audit logging | 6 |
| FC-1 — Config dependency | 2 |
| FC-2 — External connectivity | 4 |
FR-21 to FR-23 (target discovery) and FR-27 to FR-29 (identity integration) are deferred — they depend on OpsCatalog and an identity provider respectively.
Deferred
- FR-21–FR-23 — Infrastructure target discovery (
bridge targets) — requires OpsCatalog - FR-27–FR-29 — Identity provider integration (privacyIDEA / SSH CA) — requires external identity infrastructure
- OpsCatalog — Separate workplan (
BRIDGE-WP-0002)