Add automation status surface

2026-07-01 20:12:04 +02:00
parent 3f85274916
commit ffe10f098e
20 changed files with 1732 additions and 11 deletions
--- a/workplans/ACTIVITY-WP-0006-post-triage-operational-hardening.md
+++ b/workplans/ACTIVITY-WP-0006-post-triage-operational-hardening.md
@@ -4,11 +4,11 @@ type: workplan
 title: "Post-triage operational hardening"
 domain: custodian
 repo: activity-core
-status: active
+status: finished
 owner: codex
 topic_slug: custodian
 created: "2026-06-03"
-updated: "2026-06-27"
+updated: "2026-06-30"
 state_hub_workstream_id: "5646e13a-13af-4724-bca6-3c0d86f96733"
 ---

@@ -104,7 +104,7 @@ and emitted a validated `daily_triage` report plus working-memory note.

 ```task
 id: ACTIVITY-WP-0006-T03
-status: wait
+status: done
 priority: medium
 state_hub_task_id: "7cbf0a35-71a1-47ac-afc2-f51ad2180fd0"
 ```
@@ -203,6 +203,27 @@ ACTIVITY-WP-0016 output-robustness bundle and runtime prompt/token changes, not
 a missing schedule. T03 stays wait until a post-deployment smoke passes and three
 new clean scheduled runs are collected.

+2026-06-30 early checkpoint: two new clean scheduled runs exist after the
+validation failures. State Hub daily_triage progress shows 2026-06-28
+05:20:51Z run `6a44d6dd-3f02-53f2-a5d8-d42b76b0ef98` and 2026-06-29
+05:20:49Z run `1dfb47c9-07bf-551b-b778-1d21a40bd95c`, both with
+`output_validated=true` and working-memory notes written. The current local time
+was 2026-06-30 01:37 Europe/Berlin, before the expected 07:20 Berlin scheduled
+fire, so the three-clean-run gate cannot close yet. Recheck after 2026-06-30
+05:20Z; if that scheduled run validates, the clean streak is 06-28 / 06-29 /
+06-30 and T03 can close with calibration feedback.
+
+2026-06-30 closeout: the 07:20 Berlin scheduled run fired at 05:20:50Z as run
+`ac3d71a0-2f8f-50df-b3ce-7c60c2abb5c5` with `output_validated=true` and a
+working-memory note written. The post-failure clean streak is now complete:
+2026-06-28 (`6a44d6dd`), 2026-06-29 (`1dfb47c9`), and 2026-06-30 (`ac3d71a0`).
+Calibration feedback: the scheduler, worker, llm-connect route, State Hub sink,
+and working-memory sink are stable again; the recommendations were operationally
+useful but too dense at 10 items, repeatedly emphasizing human-dependency and
+infrastructure-unblock work. ACTIVITY-WP-0016 now owns the density/contract fix:
+Railiance runtime projection was aligned to a top-7 contract so the next live
+run can prove the bounded output posture. T03 is done.
+
 ## Rule Action Contract Documentation

 ```task
--- a/workplans/ACTIVITY-WP-0016-llm-output-robustness-trust-boundary.md
+++ b/workplans/ACTIVITY-WP-0016-llm-output-robustness-trust-boundary.md
@@ -8,7 +8,7 @@ status: active
 owner: codex
 topic_slug: custodian
 created: "2026-06-26"
-updated: "2026-06-27"
+updated: "2026-06-30"
 state_hub_workstream_id: "4ef0d53b-1777-41ae-80c6-1b69fdb34726"
 ---

@@ -144,11 +144,21 @@ Done when:
  `tests/fixtures/wp0016/daily_triage_2026-06-26_validation_failure.partial.json`
  (the 4000-char preview + validation error; full payload pending the remote pull).

+2026-06-30 local retention hardening: activity-core now preserves future
+llm-connect diagnostic metadata instead of dropping it at the client boundary.
+`LLMConnectClient.complete()` still returns the content string for compatibility,
+but records safe non-secret response fields such as `finish_reason` and `usage`
+on `last_response_metadata`; the executor copies that into report artifacts,
+State Hub progress detail, and working-memory notes. Invalid report raw previews
+were raised from 4000 to 12000 chars. This does not recover the historical
+06-26 full payload or producer-side `finish_reason`, so T01 remains wait on the
+remote llm-connect log pull, but the retention gap is closed for future failures.
+
 ## Schema + Prompt Redesign For Error Locality

 ```task
 id: ACTIVITY-WP-0016-T02
-status: progress
+status: done
 priority: high
 state_hub_task_id: "ae67ca8c-ee01-4a8d-9e8a-a0a36c999758"
 ```
@@ -209,6 +219,21 @@ Apply there:
 4. State the value vocabularies (`action`, `confidence`) the T04 guardrails will
   check.

+2026-06-30 live evidence check: the 2026-06-28 and 2026-06-29 scheduled
+`daily_triage` events validated successfully, which shows the runtime is no
+longer failing every day. However, the preserved State Hub reports still contain
+10 recommendations, not the requested bounded top-N of 7 / framed item contract.
+Treat that as evidence that the runtime-projected prompt/schema/max-token bundle
+has not fully absorbed the T02 handoff yet.
+
+2026-06-30 source projection closeout: patched `k8s/railiance/20-runtime.yaml`
+so the projected `daily-statehub-wsjf-triage.md` prompt now says at most 7
+recommendations and instructs the model to emit fewer well-formed items rather
+than more. The projected `daily-triage-report.json` now has `maxItems: 7` and
+`rank.maximum: 7`, aligned with the repo schema. `max_tokens: 1800` remains as
+headroom for the bounded report. T02 is done in source; live deployment and an
+observed <=7 recommendation run remain under T05.
+
 ## Boundary Parser — Verify & Mitigate (Posture B)

 ```task
@@ -368,6 +393,19 @@ Done when:
  is cluster/operator work outside this repo's SCOPE. T05 therefore stays
  `progress` until that live run exists; the in-repo deliverables are done.

+2026-06-30 follow-up: added forward-looking diagnostics so future validation
+failures carry llm-connect response metadata and a larger bounded raw-output
+preview in activity-core-owned evidence. Focused verification passed:
+`uv run pytest tests/test_llm_client.py tests/rules/test_executor.py tests/test_report_sinks.py -q`
+=> 39 passed. This improves future root-cause ability but does not replace the
+required live smoke proving graceful degradation on railiance01.
+
+2026-06-30 projection follow-up: local source projection now enforces the top-7
+prompt/schema contract. Remaining T05 proof is operational: deploy or sync the
+updated `k8s/railiance/20-runtime.yaml`, run `actcore-sync`/schedule smoke or wait
+for the next 07:20 Berlin fire, then confirm State Hub `daily_triage` evidence is
+`output_validated=true` with no more than 7 recommendations.
+
 ## Relationships

 - **Blocks / feeds:** `ACTIVITY-WP-0006-T03` (three clean scheduled runs) and
--- a/workplans/ACTIVITY-WP-0018-own-infra-automation-status.md
+++ b/workplans/ACTIVITY-WP-0018-own-infra-automation-status.md
@@ -0,0 +1,248 @@
+---
+id: ACTIVITY-WP-0018
+type: workplan
+title: "Own-infrastructure automation status surface"
+domain: infotech
+repo: activity-core
+status: finished
+owner: codex
+topic_slug: automation-observability
+created: "2026-06-29"
+updated: "2026-06-29"
+state_hub_workstream_id: "0220b38b-7c73-4601-9601-5f2c1a5b29e8"
+---
+
+# Own-infrastructure automation status surface
+
+## Goal
+
+Make activity-core's own scheduling and evidence infrastructure the explicit
+operating preference for durable automations, independent of any coding
+assistant-provided scheduler or reminder system.
+
+An operator should be able to answer a question like "How did our automations go
+since Friday?" with a repo-native command that does not require an LLM. Coding
+assistants may inspect or summarize that command's output, but they must not be
+the source of truth for scheduled execution, run history, or operational
+evidence.
+
+## Review notes
+
+The repo already owns the correct infrastructure direction:
+
+- `SCOPE.md` defines activity-core as the org-wide event bridge for cron,
+  one-off scheduled datetime, and event-triggered automation.
+- `Makefile` exposes sync and service targets, but no operator status target for
+  recent automation outcomes.
+- `docs/runbook.md` documents daily-triage verification through
+  `scripts/verify_daily_triage.py`, but that helper is activity-specific and
+  still reads like a checklist rather than the baseline answer surface for all
+  automations.
+- Existing workplan evidence shows the status question is operationally common:
+  2026-06-24 and 2026-06-25 daily triage runs were clean, while 2026-06-26 and
+  2026-06-27 fired on schedule but failed output validation. That distinction is
+  exactly what the baseline command must make obvious.
+
+## Task: Codify the own-infra scheduling preference
+
+```task
+id: ACTIVITY-WP-0018-T01
+status: done
+priority: high
+state_hub_task_id: "00127678-5ce4-4cb3-b81c-f42e04407c73"
+```
+
+Record the repository preference that durable automation scheduling, execution
+history, and run evidence belong to activity-core's own infrastructure: Temporal
+Schedules, NATS JetStream, activity-core run records, State Hub progress, and
+working-memory/report sinks.
+
+Acceptance:
+
+- `AGENTS.md` repo-specific instructions say not to use coding
+  assistant-provided automation tooling as the execution or evidence source for
+  activity-core automations.
+- `SCOPE.md` and `docs/runbook.md` describe coding assistants as callers or
+  summarizers of repo-native automation commands, not as schedulers.
+- The preference distinguishes durable automation from harmless local session
+  reminders: production/operational recurrence belongs to activity-core.
+- The text names the authoritative evidence sources and avoids tying the policy
+  to any one assistant product.
+
+2026-06-29 progress: Added the immediate repo-agent instruction in AGENTS.md
+that durable activity-core automations must use repo-owned infrastructure, not
+coding assistant automation/reminder/heartbeat tooling, as the execution or
+evidence source. Remaining T01 work is to carry the same preference into
+SCOPE.md and docs/runbook.md.
+
+## Task: Define the automation status evidence contract
+
+```task
+id: ACTIVITY-WP-0018-T02
+status: done
+priority: high
+state_hub_task_id: "17e6bb87-d4bf-4ef3-b91c-4bdfe2fe3492"
+```
+
+Define a small, deterministic report contract for answering recent automation
+status questions across all ActivityDefinitions.
+
+Acceptance:
+
+- The contract covers schedule state, expected fires in the requested window,
+  observed workflow runs, `activity_runs` rows, State Hub progress events,
+  working-memory/report sink evidence, and known validation or sink failures.
+- It defines normalized statuses such as `completed`, `running`, `retrying`,
+  `validation_failed`, `sink_failed`, `missed`, `disabled`, and `unknown`.
+- Partial data is explicit: if Temporal, Postgres, State Hub, or a sink path is
+  unavailable, the report includes warnings rather than silently passing or
+  failing the whole check.
+- The contract is safe for operator logs: no secrets, prompts, raw model output,
+  or credential-bearing URLs.
+- The contract can be emitted as JSON for scripts and rendered as concise text
+  for humans.
+
+## Task: Implement the non-LLM automation status CLI
+
+```task
+id: ACTIVITY-WP-0018-T03
+status: done
+priority: high
+state_hub_task_id: "7831f2fc-8b76-48fe-aa34-9dcc11ee84db"
+```
+
+Add a deterministic CLI, likely under `scripts/automation_status.py` or an
+`activity_core` module, that answers recent automation status questions without
+calling an LLM.
+
+Acceptance:
+
+- Supports `--since`, `--until`, activity name/id filters, JSON output, and a
+  concise human summary.
+- Accepts simple operator dates, including absolute dates and a documented
+  `friday`/`last-friday` style shortcut, resolving them to concrete dates in the
+  configured timezone.
+- Inspects all enabled scheduled ActivityDefinitions by default, not just daily
+  triage.
+- Uses live sources when configured: Postgres `activity_definitions` /
+  `activity_runs`, Temporal schedule and workflow visibility, State Hub
+  progress, and configured local report sink paths.
+- Degrades usefully when a source is unavailable and exits non-zero only for
+  real status failures or invalid input, not for optional evidence gaps that are
+  clearly reported.
+- Includes focused unit tests with fixture data for clean runs, validation
+  failures, missed runs, disabled schedules, and partial-source availability.
+
+## Task: Add the Make target baseline
+
+```task
+id: ACTIVITY-WP-0018-T04
+status: done
+priority: high
+state_hub_task_id: "451bdf62-b619-4ace-9262-46d20b912781"
+```
+
+Expose the CLI through a Make target that is easy for an operator or any coding
+assistant to run before attempting a prose summary.
+
+Acceptance:
+
+- `make automation-status SINCE=2026-06-26` prints the human-readable baseline.
+- `make automation-status SINCE=friday` is supported or documented with the
+  exact accepted shortcut.
+- A JSON form is available, either through `FORMAT=json` or a separate target
+  such as `make automation-status-json`.
+- The target does not require LLM credentials, coding assistant automation
+  tooling, or interactive prompts.
+- `make help` lists the target with a clear one-line description.
+
+## Task: Update operator docs and examples
+
+```task
+id: ACTIVITY-WP-0018-T05
+status: done
+priority: medium
+state_hub_task_id: "233659aa-e14a-4b3d-b156-d04f0fa16db6"
+```
+
+Update the runbook so "How did automations go since Friday?" has an obvious
+operator recipe.
+
+Acceptance:
+
+- `docs/runbook.md` has a short "Automation status" section near the scheduling
+  operations.
+- The docs include example output or a compact sample for the known daily
+  triage distinction: fired on time versus completed successfully versus output
+  validation failure.
+- The docs clarify that LLM summaries are optional convenience only; the Make
+  target output is the baseline evidence.
+- The daily-triage-specific helper is either kept as a lower-level diagnostic or
+  folded into the generalized status command.
+
+## Task: Verify against recent scheduled-run evidence
+
+```task
+id: ACTIVITY-WP-0018-T06
+status: done
+priority: medium
+state_hub_task_id: "24efbe9f-dfff-482f-9edc-456379c9a2aa"
+```
+
+Prove the new surface against the recent evidence that motivated this workplan.
+
+Acceptance:
+
+- Running the status command over the window starting Friday, 2026-06-26 shows
+  that the daily triage schedule fired on 2026-06-26 and 2026-06-27 but did not
+  produce clean validated reports.
+- The command distinguishes scheduling health from output/schema validation
+  failure.
+- Disabled or waiting schedules, such as the weekly coding retro gate when its
+  upstream read model is not available, are reported without being counted as
+  missed runs.
+- Verification results are recorded in this workplan and as a State Hub progress
+  note once the implementation lands.
+
+## Implementation Result
+
+Completed 2026-06-29: implemented the own-infrastructure automation status
+surface and codified the scheduling preference.
+
+Delivered:
+
+- `AGENTS.md` now states that durable activity-core automations use repo-owned
+  infrastructure, not coding assistant automation/reminder/heartbeat tooling, as
+  execution or evidence authority.
+- `SCOPE.md` and `docs/runbook.md` describe the deterministic status surface and
+  assistant boundary.
+- `src/activity_core/automation_status.py` and `scripts/automation_status.py`
+  provide the non-LLM CLI.
+- `make automation-status SINCE=...` and `make automation-status-json` expose the
+  baseline operator commands.
+- `tests/test_automation_status.py` covers date shortcuts, cron fire estimation,
+  completed runs, validation failures, missed runs, disabled schedules, partial
+  source availability, and working-memory evidence parsing.
+
+Verification:
+
+```bash
+python3 -m py_compile src/activity_core/automation_status.py scripts/automation_status.py tests/test_automation_status.py
+/home/worsch/.local/bin/uv run pytest tests/test_automation_status.py tests/test_daily_triage_verifier.py -q
+/home/worsch/.local/bin/uv run python scripts/automation_status.py \
+  --since 2026-06-26 --until 2026-06-27 --db-url '' \
+  --progress-event-type daily_triage --timeout-seconds 10 \
+  --working-memory-dir /tmp --format json
+```
+
+Results:
+
+- focused tests: `11 passed`;
+- `make help` lists `automation-status` and `automation-status-json`;
+- the 2026-06-26 through 2026-06-27 status run exited `1` as expected because
+  State Hub evidence classified daily triage activity
+  `6fca51fa-387a-4fd0-bc4e-d62c29eb859a` as `validation_failed` with two
+  non-secret evidence records: 2026-06-26 `Expecting ',' delimiter` and
+  2026-06-27 `Unterminated string`;
+- the same report classified the gated weekly coding retro as `disabled`, not
+  `missed`.
--- a/workplans/ACTIVITY-WP-0019-automation-schedule-inventory-targets.md
+++ b/workplans/ACTIVITY-WP-0019-automation-schedule-inventory-targets.md
@@ -0,0 +1,164 @@
+---
+id: ACTIVITY-WP-0019
+type: workplan
+title: "Automation schedule inventory Make targets"
+domain: infotech
+repo: activity-core
+status: ready
+owner: codex
+topic_slug: automation-inventory
+created: "2026-06-29"
+updated: "2026-06-29"
+state_hub_workstream_id: "21c73763-9adc-42f6-8fd2-1b8b33c2c770"
+---
+
+# Automation schedule inventory Make targets
+
+## Goal
+
+Provide a repo-native, non-LLM way to list every scheduled automation that
+activity-core knows about.
+
+`ACTIVITY-WP-0018` added the status surface for questions like "How did our
+automations go since Friday?". The next operator question is the inventory
+baseline: "What automations are scheduled at all?" That should be answerable
+through Make targets backed by activity-core's own ActivityDefinitions,
+database, and Temporal schedule metadata when available, independent of any
+coding assistant automation infrastructure.
+
+## Review notes
+
+- `Makefile` currently exposes `automation-status` and
+  `automation-status-json`, but no dedicated inventory/list target.
+- `scripts/automation_status.py` and `src/activity_core/automation_status.py`
+  already load scheduled ActivityDefinitions and compute their Temporal schedule
+  ids. The inventory target should reuse that parsing/loading posture where it
+  fits rather than creating a second discovery path.
+- `make sync-schedules` reconciles Temporal schedules from the
+  `activity_definitions` database, but it is an action target, not a read-only
+  operator inventory command.
+- The inventory command should remain useful in degraded local mode: file-backed
+  definitions are enough to list configured scheduled automations, while live
+  DB and Temporal visibility can enrich the output.
+
+## Task: Define the automation inventory contract
+
+```task
+id: ACTIVITY-WP-0019-T01
+status: todo
+priority: high
+state_hub_task_id: "8de24590-f9ee-4d0e-8692-b7ada9f232ed"
+```
+
+Define the fields and source precedence for a deterministic scheduled
+automation inventory report.
+
+Acceptance:
+
+- The report includes every ActivityDefinition with `trigger_type` of `cron` or
+  `scheduled`, including disabled definitions.
+- Each row includes id, name, enabled/disabled state, trigger type, schedule
+  expression or one-shot datetime, timezone, overlap/catchup policy when known,
+  and the derived Temporal schedule id.
+- The report identifies its source for each row: database, repo definition file,
+  Temporal visibility, or a combination.
+- If Temporal is reachable, the report adds paused/missing/drift hints without
+  mutating schedules.
+- Missing optional sources produce warnings, not silent omissions.
+- The JSON shape is stable enough for scripts and tests.
+
+## Task: Implement a non-mutating inventory CLI
+
+```task
+id: ACTIVITY-WP-0019-T02
+status: todo
+priority: high
+state_hub_task_id: "538cb9a5-48f3-470c-8518-29ee66c96678"
+```
+
+Add a deterministic CLI path for listing scheduled automations without requiring
+LLM credentials or coding assistant tooling.
+
+Acceptance:
+
+- A script or module command, likely sharing code with
+  `activity_core.automation_status`, supports human and JSON output.
+- The command is read-only: it does not call `sync-schedules`, upsert schedules,
+  delete schedules, enqueue workflows, or write State Hub evidence.
+- It supports filters by activity id, activity name, enabled state, and trigger
+  type.
+- It loads from the database when configured and falls back to repo definition
+  files when the database is unavailable or explicitly disabled.
+- It optionally enriches rows from Temporal when `TEMPORAL_HOST` is configured,
+  with bounded timeouts so an unreachable service does not hang the command.
+- Unit tests cover DB rows, file fallback, disabled definitions, Temporal
+  enrichment unavailable, and JSON output.
+
+## Task: Add Make targets
+
+```task
+id: ACTIVITY-WP-0019-T03
+status: todo
+priority: high
+state_hub_task_id: "f2001721-07f3-42f5-a15e-0c7d1b0ed801"
+```
+
+Expose the inventory command through Make targets that are easy for humans,
+scripts, and coding assistants to run before asking for a prose summary.
+
+Acceptance:
+
+- `make automation-list` prints a concise human-readable inventory.
+- `make automation-list-json` emits the same inventory as JSON.
+- Optional Make variables pass through cleanly, for example `ENABLED=true`,
+  `TRIGGER=cron`, `ACTIVITY_ID=<uuid>`, or `FORMAT=json`.
+- `make help` lists both targets with clear one-line descriptions.
+- The targets do not require LLM access, Codex automation tooling, or
+  interactive prompts.
+
+## Task: Document the inventory workflow
+
+```task
+id: ACTIVITY-WP-0019-T04
+status: todo
+priority: medium
+state_hub_task_id: "f687743b-3936-413e-ae50-d35484ae9a81"
+```
+
+Update operator documentation so the scheduled automation inventory path is
+discoverable next to the status path.
+
+Acceptance:
+
+- `docs/runbook.md` documents `make automation-list` and
+  `make automation-list-json`.
+- The docs distinguish inventory from status: inventory answers what is
+  configured; status answers what happened in a time window.
+- The docs state that the command is read-only and uses activity-core-owned
+  scheduling evidence.
+- The docs include a compact example of the expected human output.
+
+## Task: Verify against current repo and live/degraded sources
+
+```task
+id: ACTIVITY-WP-0019-T05
+status: todo
+priority: medium
+state_hub_task_id: "5317b532-5cef-4eff-b6d8-3e85bbca8e8a"
+```
+
+Prove the target against the current scheduled automation definitions and
+degraded local conditions.
+
+Acceptance:
+
+- `make automation-list` shows the current scheduled automations, including
+  daily triage and weekly scheduled definitions when present in the selected
+  source.
+- JSON output is valid and includes the same rows.
+- A DB-unavailable run falls back to repo definition files or reports a clear
+  warning if no definitions are discoverable.
+- A Temporal-unavailable run exits successfully with Temporal warnings rather
+  than hanging.
+- Focused tests pass and the result is recorded in this workplan before the
+  workplan is moved to `finished`.