generated from coulomb/repo-seed
Make Temporal activity timeout env-configurable (ADHOC-2026-06-01-T03)
The CUST-WP-0045 daily triage canary on 2026-06-01 hit a BrokenPipeError on the llm-connect side. Two 5-minute timeouts were racing: - _ACTIVITY_TIMEOUT = timedelta(minutes=5) in workflows.py - LLM_CONNECT_TIMEOUT_SECONDS default 300 in llm_client.py The 10KB curated digest + max_depth:2 + JSON schema enforcement pushed Claude past 5 minutes. Whichever timer fired first killed the httpx call; the model's late response arrived to a closed socket. Read _ACTIVITY_TIMEOUT from ACTIVITY_TIMEOUT_SECONDS env (default 900 — 15 minutes) so judgement-call activities have headroom for slow LLM runs. Operators should also widen httpx via LLM_CONNECT_TIMEOUT_SECONDS=840 so httpx still times out slightly before Temporal, preserving the clean-error contract. Tests: 120 passed, 1 skipped. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -11,6 +11,7 @@ Workflow IDs follow the conventions in docs/conventions.md:
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import uuid
|
||||
from datetime import timedelta
|
||||
|
||||
@@ -42,7 +43,9 @@ _RETRY_POLICY = RetryPolicy(
|
||||
maximum_attempts=10,
|
||||
)
|
||||
|
||||
_ACTIVITY_TIMEOUT = timedelta(minutes=5)
|
||||
_ACTIVITY_TIMEOUT = timedelta(
|
||||
seconds=int(os.environ.get("ACTIVITY_TIMEOUT_SECONDS", "900"))
|
||||
)
|
||||
_TASK_QUEUE = "task-execution-tq"
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user