42 Commits

Author SHA1 Message Date
bcddc88320 Close ops inventory probe handoff 2026-06-16 03:51:02 +02:00
14b2d40eb7 Implement weekly coding retro schedule 2026-06-07 20:58:34 +02:00
4e8ccbb344 Set up daily WSJF closure gates 2026-06-07 11:00:03 +02:00
418eb4ffda Add schedule smoke test routine 2026-06-06 15:32:57 +02:00
4b1b3e1b5f Wire ops inventory probes for Railiance 2026-06-05 23:40:25 +02:00
ebcaacc0b5 chore(consistency): renormalize lifecycle state [auto]
Updated by fix-consistency on 2026-06-05:
  - workplan status: ready → active
2026-06-05 23:17:48 +02:00
41d3e75a88 Implement ops inventory probe evidence slice 2026-06-05 23:16:40 +02:00
ee1f805c0b Sync ACTIVITY-WP-0007 with State Hub 2026-06-05 22:49:20 +02:00
3b8bac26da Add ops inventory probe runner workplan 2026-06-05 22:46:11 +02:00
42e373aba1 Harden WSJF triage report recovery 2026-06-05 19:27:03 +02:00
20d4f26166 Implement post-triage operational hardening 2026-06-04 12:15:07 +02:00
b2d56624b2 Normalize legacy WP-0003 status 2026-06-03 15:28:59 +02:00
87d3979c20 Record State Hub IDs for WP-0006 2026-06-03 12:09:28 +02:00
30598fd1ad Expand rule actions for per-repo tasks
Add safe action interpolation and for_each binding for rule fan-out, update the weekly SBOM definition, cover the new evaluation path, and reconcile activity-core scope/workplans for the State Hub sync.
2026-06-03 11:58:24 +02:00
c79d0980a9 Make Temporal activity timeout env-configurable (ADHOC-2026-06-01-T03)
The CUST-WP-0045 daily triage canary on 2026-06-01 hit a BrokenPipeError
on the llm-connect side. Two 5-minute timeouts were racing:

- _ACTIVITY_TIMEOUT = timedelta(minutes=5) in workflows.py
- LLM_CONNECT_TIMEOUT_SECONDS default 300 in llm_client.py

The 10KB curated digest + max_depth:2 + JSON schema enforcement pushed
Claude past 5 minutes. Whichever timer fired first killed the httpx call;
the model's late response arrived to a closed socket.

Read _ACTIVITY_TIMEOUT from ACTIVITY_TIMEOUT_SECONDS env (default 900 —
15 minutes) so judgement-call activities have headroom for slow LLM runs.
Operators should also widen httpx via LLM_CONNECT_TIMEOUT_SECONDS=840 so
httpx still times out slightly before Temporal, preserving the
clean-error contract.

Tests: 120 passed, 1 skipped.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 08:10:24 +02:00
a8d3cc2782 Fix repo_sbom_status resolver — close ADHOC-2026-06-01-T01
The state-hub resolver was calling GET /sbom/status?repo={slug}, which State
Hub does not expose. Real SBOM routes are /sbom/, /sbom/{slug},
/sbom/snapshots/, /sbom/snapshots/{id}, /sbom/ingest/, /sbom/report/licences/.
The weekly-sbom-staleness ActivityDefinition was passing params {repos: all}
and the resolver was reading params.get("repo_slug", ""), so the URL
collapsed to /sbom/status?repo= and 404'd. _fetch_json swallowed the error,
the rule context.repos.sbom_age_days > 30 evaluated against {} and never
matched, and the weekly SBOM check has been a silent no-op for as long as
the route mismatch has existed.

Resolver now supports two modes selected by params:
- single-repo: {repo_slug: foo} → GET /sbom/{foo}, returns
  {repo_slug, last_sbom_at, sbom_age_days, has_sbom}
- bulk: {repos: all} → GET /repos/, computes per-repo age, returns the
  worst repo's fields hoisted to the top of the result alongside
  stale_count, total_count, worst_* fields, and the full per-repo list

Never-scanned repos get a 99999 sentinel age so threshold rules treat
them as very stale without forcing the rule to special-case None.

Hoisting the worst entry to the top preserves the existing rule
expression context.repos.sbom_age_days > 30 (and target_repo:
context.repos.repo_slug, though that field is a separate interpolation
gap tracked as ADHOC-2026-06-01-T02). The integration tests'
aspirational per-repo iteration model is left intact.

Live validation against State Hub on 2026-06-01:
- single: activity-core → 36 days since 2026-04-26 ingest
- bulk: 48 repos total, 46 stale (>30d), worst is info-tech-canon (never
  scanned), rule expression evaluates True

Tests: 120 passed, 1 skipped.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 03:31:56 +02:00
5d3fb33c6b Capture sbom_status resolver bug as ADHOC-2026-06-01
Surfaced while bringing up the dev worker for the CUST-WP-0045 T06 cutover.
weekly-sbom-staleness fires its state-hub resolver with query
repo_sbom_status, which hits GET /sbom/status?repo=. State Hub does not
expose that route, so _fetch_json returns {} and the rule
context.repos.sbom_age_days > 30 silently no-ops. The weekly SBOM check has
been a no-op for as long as the route mismatch has existed. Logged as a
low-priority adhoc rather than promoting to a workplan because the resolver
and definition both need a one-line decision (single-repo vs fan-out), not
multi-phase design.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 03:16:12 +02:00
f4c38e2d5f Record state hub IDs for railiance deployment 2026-05-22 13:51:51 +02:00
e2aac3ad8c Deploy activity-core on railiance01 2026-05-22 13:49:46 +02:00
a9d2c12212 chore(WP-0004): mark workplan done
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 00:05:01 +02:00
2a8e6cfe7f feat(WP-0004): railiance deployment & service ops
- Dockerfile (multi-stage, uv-based, slim runtime)
- .dockerignore
- docker-compose.railiance.yml (Temporal + NATS + PG, no Elasticsearch)
- GET /health endpoint (db + temporal probes, 200/503)
- .env.example (complete env var reference)
- Makefile: migrate, sync-all, dev-up/down, railiance-up/down,
  start-worker, start-api, start-event-router, help targets;
  extracted sync-event-types Python to scripts/sync_event_types.py
- SIGTERM graceful shutdown in worker.py and event_router.py
- docs/runbook.md: Railiance deployment section

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 00:04:39 +02:00
827ef9c1a0 feat(WP-0003c): context adapters, first ActivityDefinition, full test suite
T51: ContextResolver ABC + CONTEXT_RESOLVER_REGISTRY; resolve_context activity
updated to dispatch via registry (warns + binds {} on failure, never aborts run).
T52: RepoScopingContextResolver with 5-min in-process cache.
T53: StateHubContextResolver (no cache) for domain_summary and repo_sbom_status.
T54: activity-definitions/weekly-sbom-staleness.md (Monday 09:00 Berlin, cron
trigger, flag-stale-sbom rule at >30 days) + tasks/sbom-rescan.md template.
T55: 51 parametrized evaluator tests — all whitelisted operators, unsafe
expression rejection, empty condition, missing attribute, nested context access.
T56: 15 executor safety tests — UntrustedFieldError, object-type rejection,
injection fixture, LLM retry on bad JSON, review_required field.
T57: 6 integration tests — parses real definition, evaluates rule per-repo
(stale/fresh boundary), emits via NullSink, verifies spawn log entries.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 23:24:48 +02:00
9abb69d179 chore(consistency): sync task status from DB [auto] 2026-05-14 23:03:29 +02:00
c3a256509b feat(event-bridge): WP-0003a — domain model, rules module, event type registry
Implements phases 7–8 of the Event Bridge architecture (custodian-WP-0003a).

Domain model (T34, T40):
- Added RuleDef, InstructionDef, ActionDef to models.py
- Updated ActivityDefinition with rules/instructions fields (task_templates deprecated)
- Formalized EventEnvelope: id, type, version, timestamp, publisher, attributes
- Added from_nats_message() and from_webhook_payload() classmethods

Rules module (T35, T36, T37):
- src/activity_core/rules/ skeleton with boundary enforcement
- evaluate_condition() — sandboxed AST walker, whitelisted nodes only, never exec()
- execute_instruction() — LLM task generation with trusted_fields injection guard
- tests/rules/test_boundary.py verifies no cross-boundary imports

Infrastructure (T38, T39):
- Alembic migrations 0004 (task_spawn_log) and 0005 (event_types)
- IssueSink ABC + IssueCoreRestSink (REST) + NullSink (testing)
- TaskSpawnLog and EventType ORM models

Event type registry (T41, T42, T43):
- event_type_registry.py: file scanner, parser, DB sync, in-process lookup
- ACTIVITY_CURATOR_GATE env var (disabled|required) + approve endpoint
- Three org event type definitions: org.repo.registered, org.workstream.completed,
  org.activity.run.completed

All 10 tests pass. Boundary test confirms rules/ isolation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 22:01:15 +02:00
ee81adb2fa chore(workplan): split WP-0003 into three context-window-sized parts
WP-0003 (24 tasks) exceeds the single-run limit. Split by build phase:
  0003a — phases 7–8: domain model, rules module, IssueSink, event type registry (10 tasks)
  0003b — phases 9–10: ActivityDefinition parser, workflow wiring, triggers, webhooks (7 tasks)
  0003c — phases 11–12: context adapters, first ActivityDefinition, test suite (7 tasks)

Original WP-0003 marked superseded. Hub task IDs and workstream ID unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 19:04:50 +02:00
4a8e1a76b8 chore(workplan): add WP-0003 Event Bridge Implementation
24 tasks (T34-T57) across 6 phases: domain model refactor
(rules/instructions), sandboxed rule evaluator, instruction executor,
IssueSink adapter, task_spawn_log migration, event type registry,
ActivityDefinition file parser, one-off scheduled trigger,
Gitea webhook receiver, context resolver adapters (repo-scoping,
state-hub), first real ActivityDefinition, and full test suite.

Hub workstream: b4eb45a9-69e3-4ab0-b00c-67a53c3117c5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 17:46:30 +02:00
0818ce3eb1 chore(workplan): close WP-0001 Foundation — all 21 tasks done
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 15:03:26 +02:00
ea5fbe0bf3 feat(WP-0002): complete Triggers & Ops workstream
Delivers all 12 tasks (T22–T33): Temporal Schedule manager + startup
sync, NATS JetStream event router, FastAPI CRUD + manual trigger,
Prometheus metrics wiring, custom search-attribute tagging, and
operational runbook. Marks workplan status as done.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 01:04:43 +01:00
4457d6d6b9 chore: add WP-0002 handoff note for CoulombCore continuation
All 12 tasks unblocked (broker decision resolved: NATS + JetStream).
Work interrupted on workstation due to WSL2 Docker pull issues.
Note captures build order, file names, key design decisions, and
state hub IDs for seamless pickup on CoulombCore.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-28 00:10:33 +01:00
8d8a353901 feat(e2e): add e2e contract and test script (closes T21)
CUST-WP-0028-T03/T04:
- e2e/e2e.yml: declares stack (docker-compose.dev.yml), Temporal UI
  health check, test command
- e2e/tests/test_full_flow.py: automates WP-0001 T21 — seeds DB, starts
  workers, triggers RunActivityWorkflow, polls completion, asserts
  ActivityRun + TaskInstances written to DB

Run via: make e2e REPO=activity-core  (from ~/the-custodian)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-27 00:52:47 +01:00
34aa70cbd9 feat(workflows): TaskExecutorWorkflow stub + wire worker — T19/T20
activities.py — persist_task_instance (new):
  Idempotent INSERT ... ON CONFLICT (id) DO NOTHING on task_instances.
  task_id passed in from workflow (derived from workflow_id via uuid5).
  Registered on task-execution-tq.

workflows.py — TaskExecutorWorkflow (T19):
  Derives stable task_id = uuid5(NAMESPACE_URL, workflow_id).
  Calls persist_task_instance → status=done, returns immediately.
  Real execution logic to replace stub in a later workstream.

worker.py — T20:
  Registers persist_task_instance on task-execution-tq Worker.
  Both queues fully wired: orchestrator-tq and task-execution-tq.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 22:30:50 +00:00
da7de6ea3b feat(workflows): implement RunActivityWorkflow — T18
workflows.py — RunActivityWorkflow:
  1. load_activity_definition(activity_id)
  2. resolve_context(context_sources)
  3. evaluate_templates (pure, called in-workflow)
  4. log_run({run_id, ...}) — run_id = uuid5(NAMESPACE_URL, activity_id:trigger_key)
  5. start_child_workflow(TaskExecutorWorkflow, ...) per task spec
     ABANDON parent-close policy (fire-and-forget)
  Returns {"run_id": str, "tasks_spawned": int}

activities.py — log_run updated:
  - now accepts run_id in run_payload (deterministic, passed from workflow)
  - uses pg INSERT ... ON CONFLICT (run_id) DO NOTHING for idempotency

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 22:25:19 +00:00
068780224e feat(activities): implement log_run — T17
Inserts an ActivityRun row via the shared session factory.
Accepts run_payload dict with activity_id, scheduled_for (ISO-8601 or
None), context_snapshot, tasks_spawned, version_used.
Returns run_id as a str UUID.
fired_at is set server-side to now(UTC).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 22:19:12 +00:00
bac3efee89 feat(activities): resolve_context stub + evaluate_templates — T15/T16
activities.py — resolve_context (T15):
  - dispatches on source.type: 'static' returns config["value"]
  - 'http_get' / 'db_query' raise ApplicationError(non_retryable=True)
  - unknown types raise ApplicationError(non_retryable=True)

template_engine.py — evaluate_templates (T16, pure function):
  - evaluates optional condition expressions against context snapshot
    (restricted eval, no builtins)
  - interpolates {context.<name>.<key>} placeholders via str.format_map
  - returns list[{task_type, params}] with falsy-condition rows omitted

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 22:06:09 +00:00
5e4dc6c946 feat(activities): implement load_activity_definition — T14
activities.py:
- init_session_factory(url): module-level async_sessionmaker init,
  called once from worker.py before workers start
- load_activity_definition(activity_id): queries activity_definitions
  by UUID, returns JSON-serialisable dict; raises ApplicationError
  (non_retryable=True) if row not found

worker.py:
- reads ACTCORE_DB_URL at startup, fails fast if missing
- calls init_session_factory() before connecting to Temporal

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 22:02:15 +00:00
21edc313db feat(worker): scaffold activities, workflows, worker entrypoint — T13
src/activity_core/activities.py:
  - load_activity_definition, resolve_context, log_run — @activity.defn
    stubs (raise NotImplementedError, bodies in T14–T17)

src/activity_core/workflows.py:
  - RunActivityWorkflow (orchestrator-tq) — @workflow.defn stub (T18)
  - TaskExecutorWorkflow (task-execution-tq) — @workflow.defn stub (T19)

src/activity_core/worker.py:
  - Connects to Temporal via TEMPORAL_HOST / TEMPORAL_NAMESPACE env vars
  - Spawns two Workers: orchestrator-tq and task-execution-tq
  - Runs until cancelled (python -m activity_core.worker)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 21:57:56 +00:00
027e41dbc0 feat(db): add dev seed script for ActivityDefinition — T12
src/activity_core/seed.py: inserts one example ActivityDefinition
('example-heartbeat', cron every minute, static context source,
log_message task template). Idempotent — skips by name on re-run.

Run with:
  ACTCORE_DB_URL=postgresql+asyncpg://actcore:actcore@localhost:5433/actcore \
      python -m activity_core.seed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 21:53:59 +00:00
cb7cf3bc8c feat(db): ORM models + Alembic migrations 0001–0003 — T09/T10/T11
SQLAlchemy ORM (src/activity_core/orm.py):
  - ActivityDefinition, ActivityRun, TaskInstance mapped to Base.metadata
  - Wired into migrations/env.py for autogenerate support

Migrations (chained 0001 → 0002 → 0003):
  - 0001: activity_definitions (id, name, enabled, trigger_type,
          trigger_config JSONB, context_sources JSONB, task_templates JSONB,
          dedupe_key_strategy, version, created_at, updated_at)
  - 0002: activity_runs (run_id, activity_id FK→activity_definitions,
          scheduled_for, fired_at, context_snapshot JSONB, tasks_spawned,
          version_used) + index on activity_id
  - 0003: task_instances (id, run_id FK→activity_runs CASCADE,
          type, params JSONB, status, created_at) + index on run_id

Apply with: ACTCORE_DB_URL=... alembic upgrade head

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 21:51:01 +00:00
f55f497107 feat(db): init Alembic (async) + SQLAlchemy declarative base — T08
- alembic init -t async migrations
- alembic.ini: dev fallback URL postgresql+asyncpg://…:5433/actcore;
  ACTCORE_DB_URL env var overrides at runtime; src/ added to sys.path
- migrations/env.py: reads ACTCORE_DB_URL, wires target_metadata to Base.metadata
- src/activity_core/db.py: DeclarativeBase subclass + make_engine() helper

Tool choice: Alembic + SQLAlchemy[asyncio] (already declared in pyproject.toml).
Migrations run with: ACTCORE_DB_URL=... alembic upgrade head

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 21:45:40 +00:00
c39c32abcd fix(docker): use fully-qualified docker.io image refs; mark T07 done
Prefix all image names with docker.io/ to avoid registry ambiguity
on hosts where containerd/Podman default to docker.io but the pull
fails without an explicit registry prefix.

Also marks T07 (smoke-test Temporal cluster and UI) as done in the
workplan now that the stack boots cleanly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 21:26:33 +00:00
045461282d chore: add .custodian-brief.md protocol and unblock T07
- CLAUDE.md: read .custodian-brief.md as Step 1 (offline-safe orientation
  before MCP call); matches pattern now standard across all domain repos
- T07 (Smoke test Temporal): remove stale Docker TLS blocking_reason;
  status → todo (WSL2 MTU issue resolved by implementing on CoulombCore)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-26 21:03:56 +01:00
6f9132314f Add project scaffold: contracts, schemas, docker-compose, workplans
Phase 0 contracts (event envelope, ActivityDefinition, idempotency doc,
naming conventions) and Phase 1 Temporal cluster setup (docker-compose.dev.yml,
Temporal dynamic config) are complete. Includes Pydantic models, JSON schemas,
wiki architecture docs, and ADR-001 workplan files for both workstreams.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-04 22:45:40 +01:00