Commit Graph

10 Commits

Author SHA1 Message Date
bf877b7f0d test(ACTIVITY-WP-0016-T05): regression coverage incl. real 06-26 payload + over-depth
Add a test driving the actual captured 2026-06-26 failure payload
(tests/fixtures/wp0016/...partial.json): it now recovers 6+ valid recommendations
and quarantines the truncated tail, where before WP-0016 it discarded the whole run.
Add an over-depth guardrail test. Together with T03/T04 the regression set now covers
truncation, one-bad-item, oversized-string, over-depth, allow-list/injection-shaped,
and happy-path count cap.

In-repo portion of T05 complete; the live railiance01 graceful-degradation smoke is
operator-owned cluster work (deploy-coupled with the T02 bundle changes) and remains
outstanding. Hand-back notes posted to WP-0006-T03 and WP-0010-T04. Full suite: 220
passed, 1 skipped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 18:18:37 +02:00
9be4ddbdb7 feat(ACTIVITY-WP-0016-T04): producer trust-boundary guardrails + ADR-004
Add ADR-004 documenting the producer trust boundary: untrusted producers (LLM,
agent, human; erroneous and malicious), the trust-but-handle vs verify-and-mitigate
postures, error-locality and quarantine-with-provenance principles, and the concrete
activity-core mechanisms.

Implement producer-agnostic guardrails in executor.py, applied uniformly on the
happy path and the recovery path via _partition_items: structural-type -> schema ->
structural caps (_MAX_DEPTH, _MAX_STRING_LEN) -> reference allow-list -> count cap.
Each quarantine carries a reason. Closes the happy-path maxItems count cap deferred
from T03 (valid 9-item report keeps 7, quarantines 2). Reference allow-list reads
context["known_candidates"] via _allow_list_from_context; inert until a resolver
populates it. SCOPE.md updated (executor bullet + ADR list); no INTENT drift.

New tests: happy-path count cap, oversized-string guardrail, allow-list rejection.
Full suite: 218 passed, 1 skipped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 18:10:17 +02:00
53dc0f6e93 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-26:
  - ACTIVITY-WP-0016-T03: progress → done
2026-06-26 18:03:50 +02:00
960fb05268 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-26:
  - ACTIVITY-WP-0016-T03: todo → progress
2026-06-26 17:52:30 +02:00
b7b0b5bf6e chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-26:
  - ACTIVITY-WP-0016-T02: todo → progress
2026-06-26 17:52:29 +02:00
14f76fb6d9 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-06-26:
  - ACTIVITY-WP-0016-T01: todo → wait
2026-06-26 17:52:28 +02:00
caa2608092 chore(consistency): renormalize lifecycle state [auto]
Updated by fix-consistency on 2026-06-26:
  - workplan status: proposed → active
2026-06-26 17:52:28 +02:00
61f278d643 feat(ACTIVITY-WP-0016-T02): strict bounded daily-triage output schema
Replace the accept-anything recommendations.items ({type: object}) with a strict
per-item contract (required [rank, candidate, action, why] + typed wsjf) and a
maxItems:7 hint. Strict item structure is what lets the T03 boundary parser
validate each recommendation independently and quarantine only malformed ones.

maxItems is a producer hint (prompt + llm-connect json_schema + T03 mitigation),
NOT a hard reject — a hard maxItems reject would discard a whole 16-item report,
the blast-radius bug WP-0016 removes. DEPLOY COUPLING: the strict schema is also
consumed by the current whole-doc validator, so it must ship with T03's per-item
quarantine parser; until then it increases whole-doc hard-fails. Prompt + max_tokens
headroom + NDJSON framing are documented as a runtime-bundle handoff.

Updated four tests to the strict contract; the forwarded-schema test now reads the
live schema file instead of hard-coding it. Full suite: 213 passed, 1 skipped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 17:36:24 +02:00
0e9e18a59a chore(ACTIVITY-WP-0016-T01): record root-cause findings + partial failure fixture
Local analysis of the 2026-06-26 daily-triage validation failure: the unbounded
~1-recommendation-per-workstream list (16 active workstreams; JSON break at char
5268, ~rank 8-9) is the structural cause; both the first attempt and the retry
failed. The exact offending token and finish_reason are unrecoverable from
activity-core data — complete() drops finish_reason/usage, the report sink caps
raw output at 4000 chars (< 5268), and the log preview at 2000. Confirming the
exact token needs llm-connect producer-side logs on railiance01 (operator-owned);
mitigation (T02/T03) is identical regardless. Partial fixture captured.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 15:04:27 +02:00
5eb33bd3bb feat(ACTIVITY-WP-0016): register LLM output robustness & producer trust boundary workplan
Add WP-0016 to make the instruction-executor output contract robust after the
2026-06-26 daily-triage validation failure (one malformed delimiter discarded a
whole report). Per-item framing for error locality, verify-and-mitigate boundary
parsing with a quarantine lane, producer-trust-boundary guardrails (ADR-004), and
regression/calibration tests. Unblocks WP-0006-T03 / WP-0010-T04.

Also record the 06-26 recheck outcome (streak reset at two) in WP-0006-T03.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 14:39:21 +02:00