diff --git a/workplans/ACTIVITY-WP-0007-ops-inventory-probe-runner.md b/workplans/ACTIVITY-WP-0007-ops-inventory-probe-runner.md new file mode 100644 index 0000000..7d2d636 --- /dev/null +++ b/workplans/ACTIVITY-WP-0007-ops-inventory-probe-runner.md @@ -0,0 +1,218 @@ +--- +id: ACTIVITY-WP-0007 +type: workplan +title: "Ops Inventory Probe Runner" +domain: custodian +repo: activity-core +status: ready +owner: codex +topic_slug: custodian +created: "2026-06-05" +updated: "2026-06-05" +--- + +# ACTIVITY-WP-0007 - Ops Inventory Probe Runner + +## Context + +Custodian `CUST-WP-0047` introduced an inventory-first ops-hub slice: a +non-secret service inventory file, a generated service catalog view, and a +disabled draft ActivityDefinition for repeatable service inventory probes. +State Hub message `d543c39c-1f04-4f1e-a1e4-0d5b40503525` handed the +activity-core portion to this repo. + +The request fits activity-core only if it stays narrow. activity-core should +provide scheduled policy, bounded context resolution, deterministic lightweight +HTTP/HTTPS checks, and non-secret evidence emission. It must not become an ops +executor, secret handler, Inter-Hub operator, k8s/ssh/tunnel runner, or service +inventory authority. + +Existing activity-core capabilities that make this feasible: +- cron/manual ActivityDefinitions and Temporal orchestration +- external definition scanning via `ACTIVITY_DEFINITION_DIRS` +- static context sources and pluggable context resolvers +- State Hub context resolver and State Hub progress report sink pattern +- working-memory sink, NATS/EventEnvelope routing, and event type registry + +Known gaps this workplan closes: +- no `ops-inventory` resolver for `service-inventory.yml` +- no deterministic endpoint/access-path probe result model +- no non-LLM evidence sink for ops probe summaries +- no ops evidence event definitions owned by activity-core +- no Railiance projection for the Custodian probe definition or inventory input + +## Add Ops Inventory Context Resolver + +```task +id: ACTIVITY-WP-0007-T01 +status: todo +priority: high +``` + +Add a registered context resolver: + +- source type: `ops-inventory` +- query: `probe_services` +- params: `inventory_path`, `timeout_seconds`, `include_kinds`, + `allow_network`, `required` + +The resolver reads and validates a non-secret service inventory YAML file, +initially `/home/worsch/the-custodian/ops/service-inventory.yml` when present. +It produces compact structured output: + +```json +{ + "services": [], + "endpoints": [], + "summary": {"ok": 0, "degraded": 0, "down": 0, "skipped": 0}, + "generated_at": "..." +} +``` + +First implementation scope: +- HTTP/HTTPS endpoint probes only +- expected status and expected signal checks only +- non-HTTP, k8s, ssh, tunnel, and authenticated access paths return + `skipped` / `unsupported`, not failed +- missing optional inventory returns `{}` or a skipped summary unless the + context source is required +- no response bodies, cookies, authorization headers, tokens, or command output + are stored + +Done when fixture-based resolver tests cover `ok`, expected-status mismatch, +expected-signal mismatch, network/down, unsupported, and optional/required +inventory failure behavior. + +## Add Ops Evidence Sink + +```task +id: ACTIVITY-WP-0007-T02 +status: todo +priority: high +``` + +Add a deterministic non-LLM evidence sink for compact probe results. + +Initial sink behavior: +- sink type: `state-hub-progress` +- State Hub event type: `ops_inventory_probe` +- idempotency key: `activity_core_run_id + service_id + endpoint_id/access_path_id + event_type` +- detail contains compact non-secret results only +- one summary progress event per run is acceptable for the first version + +Prepare the contract for later Inter-Hub submission without making it mandatory: +- event names: `ops-service-observed`, `ops-endpoint-verified`, + `ops-access-path-checked`, `ops-backup-verified`, `ops-inventory-drift` +- Inter-Hub mode requires `INTER_HUB_URL`, `OPS_HUB_KEY` from Secret, and + widget/capability mapping config +- missing Inter-Hub config skips cleanly with an explicit sink result + +Done when sink idempotency, State Hub fallback posting, missing Inter-Hub +config, and no-secret-leak behavior are covered by tests. + +## Register Ops Evidence Event Definitions + +```task +id: ACTIVITY-WP-0007-T03 +status: todo +priority: medium +``` + +Add activity-core-owned event type definitions for ops evidence so producers +and future widgets have a stable contract: + +- `ops-service-observed` +- `ops-endpoint-verified` +- `ops-access-path-checked` +- `ops-backup-verified` +- `ops-inventory-drift` + +Each definition must document: +- publisher intent +- non-secret attribute schema +- idempotency fields +- examples for success, degraded, down, skipped, and drift where applicable +- explicit forbidden payload material: secrets, auth headers, cookies, raw + response bodies, command output, and token-like values + +Done when event registry tests or parser coverage prove the definitions are +valid and reviewable. + +## Wire Custodian Definition Safely + +```task +id: ACTIVITY-WP-0007-T04 +status: todo +priority: medium +``` + +Accept the Custodian-owned disabled draft definition: + +`/home/worsch/the-custodian/activity-definitions/ops-service-inventory-probes.md` + +Requirements: +- support sync/parse with + `ACTIVITY_DEFINITION_DIRS=/home/worsch/the-custodian` +- keep the definition disabled until resolver, sink, and deployment wiring pass +- add parser/sync tests for external definition directories +- ensure manual trigger of a disabled definition in test/dev can produce fixture + evidence without enabling the production schedule + +Done when activity-core can scan the Custodian definition path without enabling +it prematurely. + +## Wire Railiance Runtime Inputs + +```task +id: ACTIVITY-WP-0007-T05 +status: todo +priority: medium +``` + +Wire the production deployment only after the local resolver/sink tests pass. + +Scope: +- project the new disabled Custodian definition into + `actcore-external-activity-definitions` +- decide how the worker sees the inventory input: + - generated ConfigMap from `service-inventory.yml` + - mounted repo snapshot + - State Hub endpoint if Custodian later exposes the inventory +- add config placeholders for `OPS_INVENTORY_PATH`, `INTER_HUB_URL`, and widget + mapping +- keep `OPS_HUB_KEY` in Secret only + +Done when the Railiance worker can see the disabled definition and inventory +input without leaking secrets or activating the schedule early. + +## Close Safety And Handoff Gates + +```task +id: ACTIVITY-WP-0007-T06 +status: wait +priority: medium +``` + +Complete the operational handoff only after the safety gates are satisfied. + +Acceptance criteria: +- a disabled manual trigger runs the ops inventory probe against fixture or + non-production inventory data and produces compact non-secret evidence +- State Hub progress receives one `ops_inventory_probe` summary per run +- Inter-Hub submission is either implemented behind config/secret gates or + explicitly deferred with a clean sink result +- the activity-core worker can sync the Custodian definition without enabling it + prematurely +- no k8s, ssh, tunnel, or authenticated command execution is required for the + first version +- `CUST-WP-0047-T07` has enough evidence to move from `progress` toward done + +This task waits on the implementation tasks above and, for final Inter-Hub +activation, the operator-gated ops-hub widget/API-key path in `CUST-WP-0047`. + +## Review Verdict + +activity-core should provide this as a bounded probe-and-evidence capability. +It should not provide a general operational execution engine. The first useful +slice is safe and valuable if it remains HTTP/HTTPS-only, non-secret, disabled +until explicitly wired, and idempotent in its evidence output.