Commit Graph

1740 Commits

Author SHA1 Message Date
ebcaacc0b5 chore(consistency): renormalize lifecycle state [auto]
Updated by fix-consistency on 2026-06-05:
  - workplan status: ready → active
2026-06-05 23:17:48 +02:00
41d3e75a88 Implement ops inventory probe evidence slice 2026-06-05 23:16:40 +02:00
ee1f805c0b Sync ACTIVITY-WP-0007 with State Hub 2026-06-05 22:49:20 +02:00
15f495361e chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-05:
  - update .custodian-brief.md for activity-core
2026-06-05 22:47:56 +02:00
3b8bac26da Add ops inventory probe runner workplan 2026-06-05 22:46:11 +02:00
42e373aba1 Harden WSJF triage report recovery 2026-06-05 19:27:03 +02:00
20d4f26166 Implement post-triage operational hardening 2026-06-04 12:15:07 +02:00
8a33ec44b6 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-04:
  - update .custodian-brief.md for activity-core
2026-06-04 09:58:37 +02:00
b2d56624b2 Normalize legacy WP-0003 status 2026-06-03 15:28:59 +02:00
87d3979c20 Record State Hub IDs for WP-0006 2026-06-03 12:09:28 +02:00
33cc19ad7c chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-03:
  - update .custodian-brief.md for activity-core
2026-06-03 12:07:05 +02:00
30598fd1ad Expand rule actions for per-repo tasks
Add safe action interpolation and for_each binding for rule fan-out, update the weekly SBOM definition, cover the new evaluation path, and reconcile activity-core scope/workplans for the State Hub sync.
2026-06-03 11:58:24 +02:00
4b4e162c44 Log raw LLM output preview on instruction validation failure
The CUST-WP-0045 canary failed validation twice without leaving any
record of what the model actually returned. The warning logged only the
error message ($: missing required property 'summary'), not the JSON
shape that triggered it — so diagnosing required modifying code and
re-running. Log a 2KB preview of the offending raw output alongside the
error so the next failure of this shape is one grep away from diagnosis.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 13:06:43 +02:00
c79d0980a9 Make Temporal activity timeout env-configurable (ADHOC-2026-06-01-T03)
The CUST-WP-0045 daily triage canary on 2026-06-01 hit a BrokenPipeError
on the llm-connect side. Two 5-minute timeouts were racing:

- _ACTIVITY_TIMEOUT = timedelta(minutes=5) in workflows.py
- LLM_CONNECT_TIMEOUT_SECONDS default 300 in llm_client.py

The 10KB curated digest + max_depth:2 + JSON schema enforcement pushed
Claude past 5 minutes. Whichever timer fired first killed the httpx call;
the model's late response arrived to a closed socket.

Read _ACTIVITY_TIMEOUT from ACTIVITY_TIMEOUT_SECONDS env (default 900 —
15 minutes) so judgement-call activities have headroom for slow LLM runs.
Operators should also widen httpx via LLM_CONNECT_TIMEOUT_SECONDS=840 so
httpx still times out slightly before Temporal, preserving the
clean-error contract.

Tests: 120 passed, 1 skipped.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 08:10:24 +02:00
a8d3cc2782 Fix repo_sbom_status resolver — close ADHOC-2026-06-01-T01
The state-hub resolver was calling GET /sbom/status?repo={slug}, which State
Hub does not expose. Real SBOM routes are /sbom/, /sbom/{slug},
/sbom/snapshots/, /sbom/snapshots/{id}, /sbom/ingest/, /sbom/report/licences/.
The weekly-sbom-staleness ActivityDefinition was passing params {repos: all}
and the resolver was reading params.get("repo_slug", ""), so the URL
collapsed to /sbom/status?repo= and 404'd. _fetch_json swallowed the error,
the rule context.repos.sbom_age_days > 30 evaluated against {} and never
matched, and the weekly SBOM check has been a silent no-op for as long as
the route mismatch has existed.

Resolver now supports two modes selected by params:
- single-repo: {repo_slug: foo} → GET /sbom/{foo}, returns
  {repo_slug, last_sbom_at, sbom_age_days, has_sbom}
- bulk: {repos: all} → GET /repos/, computes per-repo age, returns the
  worst repo's fields hoisted to the top of the result alongside
  stale_count, total_count, worst_* fields, and the full per-repo list

Never-scanned repos get a 99999 sentinel age so threshold rules treat
them as very stale without forcing the rule to special-case None.

Hoisting the worst entry to the top preserves the existing rule
expression context.repos.sbom_age_days > 30 (and target_repo:
context.repos.repo_slug, though that field is a separate interpolation
gap tracked as ADHOC-2026-06-01-T02). The integration tests'
aspirational per-repo iteration model is left intact.

Live validation against State Hub on 2026-06-01:
- single: activity-core → 36 days since 2026-04-26 ingest
- bulk: 48 repos total, 46 stale (>30d), worst is info-tech-canon (never
  scanned), rule expression evaluates True

Tests: 120 passed, 1 skipped.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 03:31:56 +02:00
5d3fb33c6b Capture sbom_status resolver bug as ADHOC-2026-06-01
Surfaced while bringing up the dev worker for the CUST-WP-0045 T06 cutover.
weekly-sbom-staleness fires its state-hub resolver with query
repo_sbom_status, which hits GET /sbom/status?repo=. State Hub does not
expose that route, so _fetch_json returns {} and the rule
context.repos.sbom_age_days > 30 silently no-ops. The weekly SBOM check has
been a no-op for as long as the route mismatch has existed. Logged as a
low-priority adhoc rather than promoting to a workplan because the resolver
and definition both need a one-line decision (single-repo vs fan-out), not
multi-phase design.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-02 03:16:12 +02:00
ca6d80ec07 Enable hourly RecentlyOnScope rollout 2026-05-23 02:51:54 +02:00
5055f3eaca Add State Hub RecentlyOnScope invocation 2026-05-22 16:14:10 +02:00
f4c38e2d5f Record state hub IDs for railiance deployment 2026-05-22 13:51:51 +02:00
e2aac3ad8c Deploy activity-core on railiance01 2026-05-22 13:49:46 +02:00
cf92f0d686 Forward instruction schemas to llm-connect 2026-05-21 03:19:27 +02:00
5c4f96e7aa Pass instruction depth config to llm-connect 2026-05-19 20:55:35 +02:00
1ff8b14d1b Fix ActivityDefinition sync for daily triage canary 2026-05-19 20:13:23 +02:00
6cb0718e90 Add curated daily triage digest 2026-05-19 19:09:21 +02:00
3110399b11 Add instruction report sinks 2026-05-19 18:36:58 +02:00
0dc342eb1b Wire instruction report execution 2026-05-19 18:28:23 +02:00
0e7084207e Extend State Hub context resolver for daily triage 2026-05-19 15:59:12 +02:00
5bb61fdef5 Refresh agent instruction files 2026-05-18 16:55:39 +02:00
00e688bd8e fix(WP-0004): live deployment fixes from integration test
- Dockerfile: copy alembic.ini + migrations/ so actcore-migrate works
- docker-compose.railiance.yml:
    - Temporal: add dynamicconfig volume mount + correct DYNAMIC_CONFIG_FILE_PATH
    - Temporal: healthcheck uses 'temporal operator cluster health' (not tctl)
    - NATS: add monitoring port -m 8222 for wget-based healthcheck
    - actcore-api healthcheck: use Python urllib (curl absent from slim image)
- api.py: fix /health Temporal probe — Client has no describe_namespace;
    use workflow_service.get_system_info(GetSystemInfoRequest()) instead
- Makefile: grep -Eh to suppress filename prefix when MAKEFILE_LIST has
    multiple files (.env included via -include)

All 8 services start cleanly; /health returns {"status":"ok",...} HTTP 200;
SIGTERM drains worker cleanly within grace period; make help correct.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 09:23:14 +02:00
94bd34231c chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-15:
  - update .custodian-brief.md for activity-core
2026-05-15 00:06:45 +02:00
a9d2c12212 chore(WP-0004): mark workplan done
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 00:05:01 +02:00
2a8e6cfe7f feat(WP-0004): railiance deployment & service ops
- Dockerfile (multi-stage, uv-based, slim runtime)
- .dockerignore
- docker-compose.railiance.yml (Temporal + NATS + PG, no Elasticsearch)
- GET /health endpoint (db + temporal probes, 200/503)
- .env.example (complete env var reference)
- Makefile: migrate, sync-all, dev-up/down, railiance-up/down,
  start-worker, start-api, start-event-router, help targets;
  extracted sync-event-types Python to scripts/sync_event_types.py
- SIGTERM graceful shutdown in worker.py and event_router.py
- docs/runbook.md: Railiance deployment section

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-15 00:04:39 +02:00
987cf5a75c chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-14:
  - update .custodian-brief.md for activity-core
2026-05-14 23:51:06 +02:00
9f8cc43129 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-14:
  - update .custodian-brief.md for activity-core
2026-05-14 23:36:03 +02:00
827ef9c1a0 feat(WP-0003c): context adapters, first ActivityDefinition, full test suite
T51: ContextResolver ABC + CONTEXT_RESOLVER_REGISTRY; resolve_context activity
updated to dispatch via registry (warns + binds {} on failure, never aborts run).
T52: RepoScopingContextResolver with 5-min in-process cache.
T53: StateHubContextResolver (no cache) for domain_summary and repo_sbom_status.
T54: activity-definitions/weekly-sbom-staleness.md (Monday 09:00 Berlin, cron
trigger, flag-stale-sbom rule at >30 days) + tasks/sbom-rescan.md template.
T55: 51 parametrized evaluator tests — all whitelisted operators, unsafe
expression rejection, empty condition, missing attribute, nested context access.
T56: 15 executor safety tests — UntrustedFieldError, object-type rejection,
injection fixture, LLM retry on bad JSON, review_required field.
T57: 6 integration tests — parses real definition, evaluates rule per-repo
(stale/fresh boundary), emits via NullSink, verifies spawn log entries.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 23:24:48 +02:00
fd8d0827d7 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-14:
  - update .custodian-brief.md for activity-core
2026-05-14 23:21:10 +02:00
df73d28f55 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-14:
  - update .custodian-brief.md for activity-core
2026-05-14 23:06:22 +02:00
9abb69d179 chore(consistency): sync task status from DB [auto] 2026-05-14 23:03:29 +02:00
176867cbe3 feat(WP-0003b): parser, workflow wiring, triggers, webhooks
T44: ActivityDefinition markdown file parser (definition_parser.py)
  - Scans activity-definitions/*.md and ACTIVITY_DEFINITION_DIRS paths
  - Parses YAML frontmatter + fenced rule/instruction blocks
  - Raises ParseError on any malformed file — never silently skips

T45: ActivityDefinition sync command
  - Migration 0006: adds rules_json/instructions_json JSONB columns
  - sync_activity_definitions.py + make sync-activity-definitions
  - Called at worker startup before schedule sync

T46: Rule/instruction pipeline wired into RunActivityWorkflow
  - New evaluate_rules and emit_tasks Temporal activities
  - Workflow passes event_envelope_json to enable rule evaluation
  - EventRouter now passes full envelope JSON as 4th workflow arg
  - IssueSink.emit() writes task_spawn_log rows per task

T47: ScheduledTriggerConfig model (one-off future datetime trigger)

T48: One-off Temporal Schedule support
  - Fixed timezone_name → time_zone_name (was causing all schedule tests to fail)
  - Added ScheduleCalendarSpec-based one-off schedule with remaining_actions=1
  - cancel_scheduled() for admin cancellation
  - Fixed backfill() call to use *args unpacking (not list wrapper)
  - Fixed ScheduleAlreadyRunningError catch in upsert_schedule
  - sync_schedules now handles ScheduledTriggerConfig definitions

T49: Webhook receiver
  - POST /webhooks/gitea  — HMAC-SHA256 via X-Gitea-Signature-256
  - POST /webhooks/github — HMAC-SHA256 via X-Hub-Signature-256
  - Normalisers: repo.created, push, issue.closed → EventEnvelope
  - Publishes to NATS activity.{type} subject after registry validation
  - Mounted in api.py at /webhooks prefix

T50: Gitea event type definitions
  - gitea.repo.created.md, gitea.push.md, gitea.issue.closed.md
  - Each includes normaliser field mapping in Consumer Notes

Tests: 18 passed, 1 skipped (integration). Fixed embedded Temporal
server visibility latency in test_upsert_schedule_creates_schedule.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 23:02:33 +02:00
dc20c44a44 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-14:
  - update .custodian-brief.md for activity-core
2026-05-14 22:52:19 +02:00
c25c711e3c chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-14:
  - update .custodian-brief.md for activity-core
2026-05-14 22:36:00 +02:00
7c1c4441d1 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-14:
  - update .custodian-brief.md for activity-core
2026-05-14 22:22:06 +02:00
0cb98af18c chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-14:
  - update .custodian-brief.md for activity-core
2026-05-14 22:06:11 +02:00
c3a256509b feat(event-bridge): WP-0003a — domain model, rules module, event type registry
Implements phases 7–8 of the Event Bridge architecture (custodian-WP-0003a).

Domain model (T34, T40):
- Added RuleDef, InstructionDef, ActionDef to models.py
- Updated ActivityDefinition with rules/instructions fields (task_templates deprecated)
- Formalized EventEnvelope: id, type, version, timestamp, publisher, attributes
- Added from_nats_message() and from_webhook_payload() classmethods

Rules module (T35, T36, T37):
- src/activity_core/rules/ skeleton with boundary enforcement
- evaluate_condition() — sandboxed AST walker, whitelisted nodes only, never exec()
- execute_instruction() — LLM task generation with trusted_fields injection guard
- tests/rules/test_boundary.py verifies no cross-boundary imports

Infrastructure (T38, T39):
- Alembic migrations 0004 (task_spawn_log) and 0005 (event_types)
- IssueSink ABC + IssueCoreRestSink (REST) + NullSink (testing)
- TaskSpawnLog and EventType ORM models

Event type registry (T41, T42, T43):
- event_type_registry.py: file scanner, parser, DB sync, in-process lookup
- ACTIVITY_CURATOR_GATE env var (disabled|required) + approve endpoint
- Three org event type definitions: org.repo.registered, org.workstream.completed,
  org.activity.run.completed

All 10 tests pass. Boundary test confirms rules/ isolation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 22:01:15 +02:00
ee81adb2fa chore(workplan): split WP-0003 into three context-window-sized parts
WP-0003 (24 tasks) exceeds the single-run limit. Split by build phase:
  0003a — phases 7–8: domain model, rules module, IssueSink, event type registry (10 tasks)
  0003b — phases 9–10: ActivityDefinition parser, workflow wiring, triggers, webhooks (7 tasks)
  0003c — phases 11–12: context adapters, first ActivityDefinition, test suite (7 tasks)

Original WP-0003 marked superseded. Hub task IDs and workstream ID unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 19:04:50 +02:00
803770a899 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-14:
  - update .custodian-brief.md for activity-core
2026-05-14 17:49:34 +02:00
4a8e1a76b8 chore(workplan): add WP-0003 Event Bridge Implementation
24 tasks (T34-T57) across 6 phases: domain model refactor
(rules/instructions), sandboxed rule evaluator, instruction executor,
IssueSink adapter, task_spawn_log migration, event type registry,
ActivityDefinition file parser, one-off scheduled trigger,
Gitea webhook receiver, context resolver adapters (repo-scoping,
state-hub), first real ActivityDefinition, and full test suite.

Hub workstream: b4eb45a9-69e3-4ab0-b00c-67a53c3117c5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 17:46:30 +02:00
91a9073448 docs: write INTENT.md and rewrite SCOPE.md for Event Bridge architecture
INTENT.md: articulates why activity-core exists, the governing
three-question principle (when/what/where), what it is and is not,
and the design values (markdown-as-definition, rules before instructions,
no task state ownership, publisher-declared governance).

SCOPE.md: rewritten from stale pre-alpha state to reflect WP-0001/0002
completion and the ACT-ADR-001/002/003 architecture. Adds rule/instruction
model, event type registry, task emission adapter, webhook receiver, and
updated current state, terminology, and architecture decision references.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 16:56:07 +02:00
617b2420d3 docs(adr): establish three foundational ADRs for Event Bridge architecture
ADR-001: activity-core as org-wide Event Bridge — boundaries, NATS as
org infrastructure, state hub delegation, rules-core module-first,
issue-core adapter interface, capabilities domain assignment.

ADR-002: markdown-as-definition format for event types and
ActivityDefinitions — co-located intent/schema/logic/debugging,
publisher-declared governance with environment-configurable curator gate,
attribute type system, task template files.

ADR-003: Rule vs. Instruction model and expression DSL — sandboxed
Python-like AST evaluator for Rules, trusted-fields prompt injection
protection for Instructions, output schema enforcement, audit trail,
testing strategy, rules-core module boundary.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 16:48:42 +02:00
0818ce3eb1 chore(workplan): close WP-0001 Foundation — all 21 tasks done
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-14 15:03:26 +02:00