activity-core

Author	SHA1	Message	Date
tegwick	bf877b7f0d	test(ACTIVITY-WP-0016-T05): regression coverage incl. real 06-26 payload + over-depth Add a test driving the actual captured 2026-06-26 failure payload (tests/fixtures/wp0016/...partial.json): it now recovers 6+ valid recommendations and quarantines the truncated tail, where before WP-0016 it discarded the whole run. Add an over-depth guardrail test. Together with T03/T04 the regression set now covers truncation, one-bad-item, oversized-string, over-depth, allow-list/injection-shaped, and happy-path count cap. In-repo portion of T05 complete; the live railiance01 graceful-degradation smoke is operator-owned cluster work (deploy-coupled with the T02 bundle changes) and remains outstanding. Hand-back notes posted to WP-0006-T03 and WP-0010-T04. Full suite: 220 passed, 1 skipped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 18:18:37 +02:00
tegwick	9be4ddbdb7	feat(ACTIVITY-WP-0016-T04): producer trust-boundary guardrails + ADR-004 Add ADR-004 documenting the producer trust boundary: untrusted producers (LLM, agent, human; erroneous and malicious), the trust-but-handle vs verify-and-mitigate postures, error-locality and quarantine-with-provenance principles, and the concrete activity-core mechanisms. Implement producer-agnostic guardrails in executor.py, applied uniformly on the happy path and the recovery path via _partition_items: structural-type -> schema -> structural caps (_MAX_DEPTH, _MAX_STRING_LEN) -> reference allow-list -> count cap. Each quarantine carries a reason. Closes the happy-path maxItems count cap deferred from T03 (valid 9-item report keeps 7, quarantines 2). Reference allow-list reads context["known_candidates"] via _allow_list_from_context; inert until a resolver populates it. SCOPE.md updated (executor bullet + ADR list); no INTENT drift. New tests: happy-path count cap, oversized-string guardrail, allow-list rejection. Full suite: 218 passed, 1 skipped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 18:10:17 +02:00
tegwick	a70c00a789	feat(ACTIVITY-WP-0016-T03): resilient per-item report recovery with quarantine lane When the whole-document parse + one retry still fail, report instructions now run _resilient_report before the total-loss path. A brace/quote-aware scanner (_extract_object_spans) recovers each recommendation object whether pretty-printed across many lines or NDJSON one-per-line; a truncated tail gets a best-effort _try_repair; _partition_items validates each recovered object against the T02 item schema. Valid items survive (output_validated=True, partial=True), malformed/ over-maxItems items are quarantined with provenance (index, error, raw, reason), capped at 20. Error locality now matches the unit of work: one bad item costs one item, not the whole report. Verified against the real 06-26 shape: 7 valid recommendations + a truncated tail now recovers all 7 and quarantines the broken tail (previously the whole run was discarded). Happy-path maxItems top-N enforcement is deferred to T04 (count caps). Full suite: 215 passed, 1 skipped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 17:56:28 +02:00
tegwick	61f278d643	feat(ACTIVITY-WP-0016-T02): strict bounded daily-triage output schema Replace the accept-anything recommendations.items ({type: object}) with a strict per-item contract (required [rank, candidate, action, why] + typed wsjf) and a maxItems:7 hint. Strict item structure is what lets the T03 boundary parser validate each recommendation independently and quarantine only malformed ones. maxItems is a producer hint (prompt + llm-connect json_schema + T03 mitigation), NOT a hard reject — a hard maxItems reject would discard a whole 16-item report, the blast-radius bug WP-0016 removes. DEPLOY COUPLING: the strict schema is also consumed by the current whole-doc validator, so it must ship with T03's per-item quarantine parser; until then it increases whole-doc hard-fails. Prompt + max_tokens headroom + NDJSON framing are documented as a runtime-bundle handoff. Updated four tests to the strict contract; the forwarded-schema test now reads the live schema file instead of hard-coding it. Full suite: 213 passed, 1 skipped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-26 17:36:24 +02:00
tegwick	4e8ccbb344	Set up daily WSJF closure gates	2026-06-07 11:00:03 +02:00
tegwick	418eb4ffda	Add schedule smoke test routine	2026-06-06 15:32:57 +02:00
tegwick	42e373aba1	Harden WSJF triage report recovery	2026-06-05 19:27:03 +02:00
tegwick	cf92f0d686	Forward instruction schemas to llm-connect	2026-05-21 03:19:27 +02:00
tegwick	5c4f96e7aa	Pass instruction depth config to llm-connect	2026-05-19 20:55:35 +02:00
tegwick	0dc342eb1b	Wire instruction report execution	2026-05-19 18:28:23 +02:00
tegwick	827ef9c1a0	feat(WP-0003c): context adapters, first ActivityDefinition, full test suite T51: ContextResolver ABC + CONTEXT_RESOLVER_REGISTRY; resolve_context activity updated to dispatch via registry (warns + binds {} on failure, never aborts run). T52: RepoScopingContextResolver with 5-min in-process cache. T53: StateHubContextResolver (no cache) for domain_summary and repo_sbom_status. T54: activity-definitions/weekly-sbom-staleness.md (Monday 09:00 Berlin, cron trigger, flag-stale-sbom rule at >30 days) + tasks/sbom-rescan.md template. T55: 51 parametrized evaluator tests — all whitelisted operators, unsafe expression rejection, empty condition, missing attribute, nested context access. T56: 15 executor safety tests — UntrustedFieldError, object-type rejection, injection fixture, LLM retry on bad JSON, review_required field. T57: 6 integration tests — parses real definition, evaluates rule per-repo (stale/fresh boundary), emits via NullSink, verifies spawn log entries. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-14 23:24:48 +02:00

11 Commits