Files
activity-core/workplans/ACTIVITY-WP-0007-ops-inventory-probe-runner.md

274 lines
10 KiB
Markdown

---
id: ACTIVITY-WP-0007
type: workplan
title: "Ops Inventory Probe Runner"
domain: custodian
repo: activity-core
status: active
owner: codex
topic_slug: custodian
created: "2026-06-05"
updated: "2026-06-07"
state_hub_workstream_id: "c91a0946-92f9-4b41-8a92-005b29952916"
---
# ACTIVITY-WP-0007 - Ops Inventory Probe Runner
## Context
Custodian `CUST-WP-0047` introduced an inventory-first ops-hub slice: a
non-secret service inventory file, a generated service catalog view, and a
disabled draft ActivityDefinition for repeatable service inventory probes.
State Hub message `d543c39c-1f04-4f1e-a1e4-0d5b40503525` handed the
activity-core portion to this repo.
The request fits activity-core only if it stays narrow. activity-core should
provide scheduled policy, bounded context resolution, deterministic lightweight
HTTP/HTTPS checks, and non-secret evidence emission. It must not become an ops
executor, secret handler, Inter-Hub operator, k8s/ssh/tunnel runner, or service
inventory authority.
Existing activity-core capabilities that make this feasible:
- cron/manual ActivityDefinitions and Temporal orchestration
- external definition scanning via `ACTIVITY_DEFINITION_DIRS`
- static context sources and pluggable context resolvers
- State Hub context resolver and State Hub progress report sink pattern
- working-memory sink, NATS/EventEnvelope routing, and event type registry
Known gaps this workplan closes:
- no `ops-inventory` resolver for `service-inventory.yml`
- no deterministic endpoint/access-path probe result model
- no non-LLM evidence sink for ops probe summaries
- no ops evidence event definitions owned by activity-core
- no Railiance projection for the Custodian probe definition or inventory input
## Add Ops Inventory Context Resolver
```task
id: ACTIVITY-WP-0007-T01
status: done
priority: high
state_hub_task_id: "dbe49dfb-f073-4245-8e86-d0355a6bb8bb"
```
Add a registered context resolver:
- source type: `ops-inventory`
- query: `probe_services`
- params: `inventory_path`, `timeout_seconds`, `include_kinds`,
`allow_network`, `required`
The resolver reads and validates a non-secret service inventory YAML file,
initially `/home/worsch/the-custodian/ops/service-inventory.yml` when present.
It produces compact structured output:
```json
{
"services": [],
"endpoints": [],
"summary": {"ok": 0, "degraded": 0, "down": 0, "skipped": 0},
"generated_at": "..."
}
```
First implementation scope:
- HTTP/HTTPS endpoint probes only
- expected status and expected signal checks only
- non-HTTP, k8s, ssh, tunnel, and authenticated access paths return
`skipped` / `unsupported`, not failed
- missing optional inventory returns `{}` or a skipped summary unless the
context source is required
- no response bodies, cookies, authorization headers, tokens, or command output
are stored
Done when fixture-based resolver tests cover `ok`, expected-status mismatch,
expected-signal mismatch, network/down, unsupported, and optional/required
inventory failure behavior.
2026-06-05: Completed the first resolver slice. Added
`src/activity_core/context_resolvers/ops_inventory.py`, registered source type
`ops-inventory`, and covered ok/degraded/down/skipped results plus required vs
optional inventory failure and no-secret output behavior.
## Add Ops Evidence Sink
```task
id: ACTIVITY-WP-0007-T02
status: done
priority: high
state_hub_task_id: "c6b5f49d-6f05-4be9-a968-de42195170cb"
```
Add a deterministic non-LLM evidence sink for compact probe results.
Initial sink behavior:
- sink type: `state-hub-progress`
- State Hub event type: `ops_inventory_probe`
- idempotency key: `activity_core_run_id + service_id + endpoint_id/access_path_id + event_type`
- detail contains compact non-secret results only
- one summary progress event per run is acceptable for the first version
Prepare the contract for later Inter-Hub submission without making it mandatory:
- event names: `ops-service-observed`, `ops-endpoint-verified`,
`ops-access-path-checked`, `ops-backup-verified`, `ops-inventory-drift`
- Inter-Hub mode requires `INTER_HUB_URL`, `OPS_HUB_KEY` from Secret, and
widget/capability mapping config
- missing Inter-Hub config skips cleanly with an explicit sink result
Done when sink idempotency, State Hub fallback posting, missing Inter-Hub
config, and no-secret-leak behavior are covered by tests.
2026-06-05: Completed the State Hub fallback sink slice. Added
`src/activity_core/ops_evidence_sinks.py`, a `persist_ops_evidence` Temporal
activity, workflow/worker wiring, idempotent `ops_inventory_probe` progress
posting, missing-Inter-Hub-config skip behavior, and no-secret compaction tests.
## Register Ops Evidence Event Definitions
```task
id: ACTIVITY-WP-0007-T03
status: done
priority: medium
state_hub_task_id: "70eb470e-9b0a-448f-ae3a-f5b1bed49e04"
```
Add activity-core-owned event type definitions for ops evidence so producers
and future widgets have a stable contract:
- `ops-service-observed`
- `ops-endpoint-verified`
- `ops-access-path-checked`
- `ops-backup-verified`
- `ops-inventory-drift`
Each definition must document:
- publisher intent
- non-secret attribute schema
- idempotency fields
- examples for success, degraded, down, skipped, and drift where applicable
- explicit forbidden payload material: secrets, auth headers, cookies, raw
response bodies, command output, and token-like values
Done when event registry tests or parser coverage prove the definitions are
valid and reviewable.
2026-06-05: Completed. Added the five ops evidence event definitions under
`event-types/` and parser tests covering required fields and safety language.
## Wire Custodian Definition Safely
```task
id: ACTIVITY-WP-0007-T04
status: done
priority: medium
state_hub_task_id: "45132f9f-da3c-44f1-a488-195aa0e46428"
```
Accept the Custodian-owned disabled draft definition:
`/home/worsch/the-custodian/activity-definitions/ops-service-inventory-probes.md`
Requirements:
- support sync/parse with
`ACTIVITY_DEFINITION_DIRS=/home/worsch/the-custodian`
- keep the definition disabled until resolver, sink, and deployment wiring pass
- add parser/sync tests for external definition directories
- ensure manual trigger of a disabled definition in test/dev can produce fixture
evidence without enabling the production schedule
Done when activity-core can scan the Custodian definition path without enabling
it prematurely.
2026-06-05: Started. Added test coverage that
`ACTIVITY_DEFINITION_DIRS=/home/worsch/the-custodian` style external roots can
scan a disabled `ops-service-inventory-probes.md` definition carrying an
`ops-inventory` context source and explicit `state-hub-progress` evidence sink.
2026-06-05: Completed. The Railiance-projected disabled definition now uses the
`ops-inventory` resolver and explicit `state-hub-progress` evidence sink. Tests
prove the disabled definition can resolve fixture inventory data and emit one
compact `ops_inventory_probe` State Hub progress event without enabling the
production schedule.
## Wire Railiance Runtime Inputs
```task
id: ACTIVITY-WP-0007-T05
status: done
priority: medium
state_hub_task_id: "474564be-a447-4bdf-b995-168f7a93e515"
```
Wire the production deployment only after the local resolver/sink tests pass.
Scope:
- project the new disabled Custodian definition into
`actcore-external-activity-definitions`
- decide how the worker sees the inventory input:
- generated ConfigMap from `service-inventory.yml`
- mounted repo snapshot
- State Hub endpoint if Custodian later exposes the inventory
- add config placeholders for `OPS_INVENTORY_PATH`, `INTER_HUB_URL`, and widget
mapping
- keep `OPS_HUB_KEY` in Secret only
Done when the Railiance worker can see the disabled definition and inventory
input without leaking secrets or activating the schedule early.
2026-06-05: Completed the first production wiring slice. `20-runtime.yaml`
projects the disabled ops probe definition, runtime config placeholders
(`OPS_INVENTORY_PATH`, `INTER_HUB_URL`, `OPS_HUB_WIDGET_MAPPING`), and a
non-secret `actcore-ops-service-inventory` ConfigMap snapshot. The worker mounts
the inventory at `/etc/activity-core/ops`, and `bootstrap-secrets.sh` keeps
`OPS_HUB_KEY` as an empty Secret-only placeholder until operator provisioning.
## Close Safety And Handoff Gates
```task
id: ACTIVITY-WP-0007-T06
status: wait
priority: medium
state_hub_task_id: "d15fc947-3fbe-4269-93c6-d98577352149"
```
Complete the operational handoff only after the safety gates are satisfied.
Acceptance criteria:
- a disabled manual trigger runs the ops inventory probe against fixture or
non-production inventory data and produces compact non-secret evidence
- State Hub progress receives one `ops_inventory_probe` summary per run
- Inter-Hub submission is either implemented behind config/secret gates or
explicitly deferred with a clean sink result
- the activity-core worker can sync the Custodian definition without enabling it
prematurely
- no k8s, ssh, tunnel, or authenticated command execution is required for the
first version
- `CUST-WP-0047-T07` has enough evidence to move from `progress` toward done
This task waits on the implementation tasks above and, for final Inter-Hub
activation, the operator-gated ops-hub widget/API-key path in `CUST-WP-0047`.
2026-06-05: The local implementation gates are now satisfied and tested. Live
closure remains waiting on applying the updated Railiance manifests and on the
operator-gated Inter-Hub ops-hub widget/API-key path.
2026-06-07: Added the remaining deployment handoff for this gate while
investigating the missing daily WSJF run. The Railiance runtime projection now
includes the daily WSJF definition alongside the disabled ops probe definition,
schema/config support needed by the shared worker, and a working-memory PVC.
No live `ops_inventory_probe` event exists yet, and the cluster currently lacks
an `activity-core` namespace. Cross-repo closure tasks were posted via State
Hub to `railiance-cluster` (`53e78702`), `inter-hub` (`f3ec4a36`),
`the-custodian` (`7a5d4e62`), `state-hub` (`dc10704f`), and `activity-core`
(`28d11021`). This task remains waiting on live manifest application,
`actcore-sync`, a disabled manual probe trigger, State Hub
`ops_inventory_probe` evidence, and an Inter-Hub activation or explicit defer
decision.
## Review Verdict
activity-core should provide this as a bounded probe-and-evidence capability.
It should not provide a general operational execution engine. The first useful
slice is safe and valuable if it remains HTTP/HTTPS-only, non-secret, disabled
until explicitly wired, and idempotent in its evidence output.