Make Temporal activity timeout env-configurable (ADHOC-2026-06-01-T03)

The CUST-WP-0045 daily triage canary on 2026-06-01 hit a BrokenPipeError
on the llm-connect side. Two 5-minute timeouts were racing:

- _ACTIVITY_TIMEOUT = timedelta(minutes=5) in workflows.py
- LLM_CONNECT_TIMEOUT_SECONDS default 300 in llm_client.py

The 10KB curated digest + max_depth:2 + JSON schema enforcement pushed
Claude past 5 minutes. Whichever timer fired first killed the httpx call;
the model's late response arrived to a closed socket.

Read _ACTIVITY_TIMEOUT from ACTIVITY_TIMEOUT_SECONDS env (default 900 —
15 minutes) so judgement-call activities have headroom for slow LLM runs.
Operators should also widen httpx via LLM_CONNECT_TIMEOUT_SECONDS=840 so
httpx still times out slightly before Temporal, preserving the
clean-error contract.

Tests: 120 passed, 1 skipped.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-06-02 08:10:24 +02:00
parent a8d3cc2782
commit c79d0980a9
2 changed files with 34 additions and 1 deletions

View File

@@ -11,6 +11,7 @@ Workflow IDs follow the conventions in docs/conventions.md:
from __future__ import annotations
import os
import uuid
from datetime import timedelta
@@ -42,7 +43,9 @@ _RETRY_POLICY = RetryPolicy(
maximum_attempts=10,
)
_ACTIVITY_TIMEOUT = timedelta(minutes=5)
_ACTIVITY_TIMEOUT = timedelta(
seconds=int(os.environ.get("ACTIVITY_TIMEOUT_SECONDS", "900"))
)
_TASK_QUEUE = "task-execution-tq"