Commit Graph

1758 Commits

Author SHA1 Message Date
b84e474ac5 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-18:
  - update .custodian-brief.md for activity-core
2026-06-18 13:16:24 +02:00
498d90b965 chore: promote coulomb-loop pilot schedule to daily stabilize phase 2026-06-18 12:09:25 +02:00
a2a6a30d8b chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-18:
  - update .custodian-brief.md for activity-core
2026-06-18 12:07:56 +02:00
9a72c9f210 fix: unwrap single-key kaizen resolver payloads in resolve_context
When discover_kaizen_projects returns {"projects": [...]} bound to
context.projects, for_each can iterate the list directly. Multi-key
summaries (e.g. repo SBOM bulk) remain unchanged.
2026-06-18 08:11:09 +02:00
517bf9c133 Add kaizen context resolver for scheduled agent fleet discovery.
Implement discover_kaizen_scheduled_repos and discover_kaizen_projects per
kaizen-agentic ADR-005 contract: State Hub roster, roster.yaml filter, schedule
validation, and prepare_command emission. Register kaizen/resolver/shell source
types with unit tests and runbook dry-run instructions.
2026-06-18 07:46:46 +02:00
29bf87a44c Opt in to coulomb-loop kaizen bootstrap scheduling.
Add .kaizen/schedule.yml for coach and optimization agent runs during the
hourly bootstrap phase of the coulomb-loop engagement.
2026-06-18 04:53:51 +02:00
1a279e9f22 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-17:
  - update .custodian-brief.md for activity-core
2026-06-17 23:59:37 +02:00
bcddc88320 Close ops inventory probe handoff 2026-06-16 03:51:02 +02:00
7613f1e5c7 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-16:
  - update .custodian-brief.md for activity-core
2026-06-16 03:49:15 +02:00
1deb2999a1 Add capability registry with seed entry from reuse-surface
Bootstrap registry layout and migrate helix_forge capability owned by
this repository (REUSE-WP-0014-T02).
2026-06-16 01:46:52 +02:00
ab17378e0d Add schedule metadata artifacts 2026-06-07 21:09:08 +02:00
14b2d40eb7 Implement weekly coding retro schedule 2026-06-07 20:58:34 +02:00
992fe94034 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-07:
  - update .custodian-brief.md for activity-core
2026-06-07 20:56:40 +02:00
4e8ccbb344 Set up daily WSJF closure gates 2026-06-07 11:00:03 +02:00
418eb4ffda Add schedule smoke test routine 2026-06-06 15:32:57 +02:00
e926636617 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-05:
  - update .custodian-brief.md for activity-core
2026-06-05 23:41:24 +02:00
4b1b3e1b5f Wire ops inventory probes for Railiance 2026-06-05 23:40:25 +02:00
5838077327 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-05:
  - update .custodian-brief.md for activity-core
2026-06-05 23:17:52 +02:00
ebcaacc0b5 chore(consistency): renormalize lifecycle state [auto]
Updated by fix-consistency on 2026-06-05:
  - workplan status: ready → active
2026-06-05 23:17:48 +02:00
41d3e75a88 Implement ops inventory probe evidence slice 2026-06-05 23:16:40 +02:00
ee1f805c0b Sync ACTIVITY-WP-0007 with State Hub 2026-06-05 22:49:20 +02:00
15f495361e chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-05:
  - update .custodian-brief.md for activity-core
2026-06-05 22:47:56 +02:00
3b8bac26da Add ops inventory probe runner workplan 2026-06-05 22:46:11 +02:00
42e373aba1 Harden WSJF triage report recovery 2026-06-05 19:27:03 +02:00
20d4f26166 Implement post-triage operational hardening 2026-06-04 12:15:07 +02:00
8a33ec44b6 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-04:
  - update .custodian-brief.md for activity-core
2026-06-04 09:58:37 +02:00
b2d56624b2 Normalize legacy WP-0003 status 2026-06-03 15:28:59 +02:00
87d3979c20 Record State Hub IDs for WP-0006 2026-06-03 12:09:28 +02:00
33cc19ad7c chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-03:
  - update .custodian-brief.md for activity-core
2026-06-03 12:07:05 +02:00
30598fd1ad Expand rule actions for per-repo tasks
Add safe action interpolation and for_each binding for rule fan-out, update the weekly SBOM definition, cover the new evaluation path, and reconcile activity-core scope/workplans for the State Hub sync.
2026-06-03 11:58:24 +02:00
4b4e162c44 Log raw LLM output preview on instruction validation failure
The CUST-WP-0045 canary failed validation twice without leaving any
record of what the model actually returned. The warning logged only the
error message ($: missing required property 'summary'), not the JSON
shape that triggered it — so diagnosing required modifying code and
re-running. Log a 2KB preview of the offending raw output alongside the
error so the next failure of this shape is one grep away from diagnosis.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 13:06:43 +02:00
c79d0980a9 Make Temporal activity timeout env-configurable (ADHOC-2026-06-01-T03)
The CUST-WP-0045 daily triage canary on 2026-06-01 hit a BrokenPipeError
on the llm-connect side. Two 5-minute timeouts were racing:

- _ACTIVITY_TIMEOUT = timedelta(minutes=5) in workflows.py
- LLM_CONNECT_TIMEOUT_SECONDS default 300 in llm_client.py

The 10KB curated digest + max_depth:2 + JSON schema enforcement pushed
Claude past 5 minutes. Whichever timer fired first killed the httpx call;
the model's late response arrived to a closed socket.

Read _ACTIVITY_TIMEOUT from ACTIVITY_TIMEOUT_SECONDS env (default 900 —
15 minutes) so judgement-call activities have headroom for slow LLM runs.
Operators should also widen httpx via LLM_CONNECT_TIMEOUT_SECONDS=840 so
httpx still times out slightly before Temporal, preserving the
clean-error contract.

Tests: 120 passed, 1 skipped.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 08:10:24 +02:00
a8d3cc2782 Fix repo_sbom_status resolver — close ADHOC-2026-06-01-T01
The state-hub resolver was calling GET /sbom/status?repo={slug}, which State
Hub does not expose. Real SBOM routes are /sbom/, /sbom/{slug},
/sbom/snapshots/, /sbom/snapshots/{id}, /sbom/ingest/, /sbom/report/licences/.
The weekly-sbom-staleness ActivityDefinition was passing params {repos: all}
and the resolver was reading params.get("repo_slug", ""), so the URL
collapsed to /sbom/status?repo= and 404'd. _fetch_json swallowed the error,
the rule context.repos.sbom_age_days > 30 evaluated against {} and never
matched, and the weekly SBOM check has been a silent no-op for as long as
the route mismatch has existed.

Resolver now supports two modes selected by params:
- single-repo: {repo_slug: foo} → GET /sbom/{foo}, returns
  {repo_slug, last_sbom_at, sbom_age_days, has_sbom}
- bulk: {repos: all} → GET /repos/, computes per-repo age, returns the
  worst repo's fields hoisted to the top of the result alongside
  stale_count, total_count, worst_* fields, and the full per-repo list

Never-scanned repos get a 99999 sentinel age so threshold rules treat
them as very stale without forcing the rule to special-case None.

Hoisting the worst entry to the top preserves the existing rule
expression context.repos.sbom_age_days > 30 (and target_repo:
context.repos.repo_slug, though that field is a separate interpolation
gap tracked as ADHOC-2026-06-01-T02). The integration tests'
aspirational per-repo iteration model is left intact.

Live validation against State Hub on 2026-06-01:
- single: activity-core → 36 days since 2026-04-26 ingest
- bulk: 48 repos total, 46 stale (>30d), worst is info-tech-canon (never
  scanned), rule expression evaluates True

Tests: 120 passed, 1 skipped.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 03:31:56 +02:00
5d3fb33c6b Capture sbom_status resolver bug as ADHOC-2026-06-01
Surfaced while bringing up the dev worker for the CUST-WP-0045 T06 cutover.
weekly-sbom-staleness fires its state-hub resolver with query
repo_sbom_status, which hits GET /sbom/status?repo=. State Hub does not
expose that route, so _fetch_json returns {} and the rule
context.repos.sbom_age_days > 30 silently no-ops. The weekly SBOM check has
been a no-op for as long as the route mismatch has existed. Logged as a
low-priority adhoc rather than promoting to a workplan because the resolver
and definition both need a one-line decision (single-repo vs fan-out), not
multi-phase design.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 03:16:12 +02:00
ca6d80ec07 Enable hourly RecentlyOnScope rollout 2026-05-23 02:51:54 +02:00
5055f3eaca Add State Hub RecentlyOnScope invocation 2026-05-22 16:14:10 +02:00
f4c38e2d5f Record state hub IDs for railiance deployment 2026-05-22 13:51:51 +02:00
e2aac3ad8c Deploy activity-core on railiance01 2026-05-22 13:49:46 +02:00
cf92f0d686 Forward instruction schemas to llm-connect 2026-05-21 03:19:27 +02:00
5c4f96e7aa Pass instruction depth config to llm-connect 2026-05-19 20:55:35 +02:00
1ff8b14d1b Fix ActivityDefinition sync for daily triage canary 2026-05-19 20:13:23 +02:00
6cb0718e90 Add curated daily triage digest 2026-05-19 19:09:21 +02:00
3110399b11 Add instruction report sinks 2026-05-19 18:36:58 +02:00
0dc342eb1b Wire instruction report execution 2026-05-19 18:28:23 +02:00
0e7084207e Extend State Hub context resolver for daily triage 2026-05-19 15:59:12 +02:00
5bb61fdef5 Refresh agent instruction files 2026-05-18 16:55:39 +02:00
00e688bd8e fix(WP-0004): live deployment fixes from integration test
- Dockerfile: copy alembic.ini + migrations/ so actcore-migrate works
- docker-compose.railiance.yml:
    - Temporal: add dynamicconfig volume mount + correct DYNAMIC_CONFIG_FILE_PATH
    - Temporal: healthcheck uses 'temporal operator cluster health' (not tctl)
    - NATS: add monitoring port -m 8222 for wget-based healthcheck
    - actcore-api healthcheck: use Python urllib (curl absent from slim image)
- api.py: fix /health Temporal probe — Client has no describe_namespace;
    use workflow_service.get_system_info(GetSystemInfoRequest()) instead
- Makefile: grep -Eh to suppress filename prefix when MAKEFILE_LIST has
    multiple files (.env included via -include)

All 8 services start cleanly; /health returns {"status":"ok",...} HTTP 200;
SIGTERM drains worker cleanly within grace period; make help correct.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 09:23:14 +02:00
94bd34231c chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-15:
  - update .custodian-brief.md for activity-core
2026-05-15 00:06:45 +02:00
a9d2c12212 chore(WP-0004): mark workplan done
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 00:05:01 +02:00
2a8e6cfe7f feat(WP-0004): railiance deployment & service ops
- Dockerfile (multi-stage, uv-based, slim runtime)
- .dockerignore
- docker-compose.railiance.yml (Temporal + NATS + PG, no Elasticsearch)
- GET /health endpoint (db + temporal probes, 200/503)
- .env.example (complete env var reference)
- Makefile: migrate, sync-all, dev-up/down, railiance-up/down,
  start-worker, start-api, start-event-router, help targets;
  extracted sync-event-types Python to scripts/sync_event_types.py
- SIGTERM graceful shutdown in worker.py and event_router.py
- docs/runbook.md: Railiance deployment section

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 00:04:39 +02:00