feat(infospace,llm): stabilize free-tier eval workflow
Some checks failed
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
Some checks failed
Test Suite / code-quality (push) Has been cancelled
Test Suite / security-scan (push) Has been cancelled
Test Suite / unit-tests (3.11) (push) Has been cancelled
Test Suite / unit-tests (3.12) (push) Has been cancelled
Test Suite / integration-tests (push) Has been cancelled
Test Suite / e2e-tests (push) Has been cancelled
Test Suite / performance-tests (push) Has been cancelled
Test Suite / test-summary (push) Has been cancelled
Five improvements that eliminate most of the agent-in-the-loop friction
observed while closing out the 988-entity WoN evaluation (C.1):
1. Gemini adapter now retries on 429 + 5xx with exponential backoff
(same pattern already used by OpenRouter/OpenAI). Removes the need
for shell-level retry wrappers when hitting free-tier rate limits.
2. evaluate CLI prints the underlying error ("ERROR — HTTP 503 …")
instead of a bare "ERROR", so agents don't have to drop into Python
to diagnose transient failures.
3. --entity/--chapter now respect existing evaluation files by default
(previously only the full-collection pass did). New --force flag
opts into re-evaluation. Stops silently burning free-tier quota on
re-runs of the same slug.
4. --entity accepts hyphenated slugs (matching entity filenames) and
normalizes them to the underscore form used on disk. On a miss the
CLI suggests near matches instead of a bare "not found".
5. eval-summary --update-metrics is no longer destructive:
read_metrics_file/write_metrics_file preserve structured values
(type_distribution) and don't flatten ints to floats. Fixes a
silent data loss observed on every run.
Bonus: the evaluator field in written evaluation frontmatter now
falls back from run_config.model_name to the adapter's resolved model
(or the model echoed back in the API response), so rows no longer
show `evaluator: null` when --model is omitted.
Tests: new tests/unit/llm/test_gemini.py covers retry behavior;
tests/unit/infospace/test_history.py gains a round-trip test that
pins the type_distribution / int-preservation invariants.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -9,7 +9,11 @@ from markitect.llm.adapter import LLMAdapter
|
||||
from markitect.llm.models import RunConfig, LLMResponse
|
||||
from markitect.llm.config import resolve_api_key, find_project_root
|
||||
from markitect.llm._http import post_json
|
||||
from markitect.llm.exceptions import LLMConfigurationError
|
||||
from markitect.llm.exceptions import (
|
||||
LLMConfigurationError,
|
||||
LLMAPIError,
|
||||
LLMRateLimitError,
|
||||
)
|
||||
|
||||
_DEFAULT_MODEL = "gemini-2.5-flash"
|
||||
_API_BASE = "https://generativelanguage.googleapis.com/v1beta"
|
||||
@@ -26,10 +30,12 @@ class GeminiAdapter(LLMAdapter):
|
||||
model: Optional[str] = None,
|
||||
api_key: Optional[str] = None,
|
||||
system_prompt: Optional[str] = None,
|
||||
max_retries: int = 3,
|
||||
**_kwargs: Any,
|
||||
):
|
||||
self._model = model or _DEFAULT_MODEL
|
||||
self._system_prompt = system_prompt
|
||||
self._max_retries = max_retries
|
||||
|
||||
root = find_project_root()
|
||||
key_file_paths = [root / "apikey-geminifree.txt"] if root else []
|
||||
@@ -77,7 +83,7 @@ class GeminiAdapter(LLMAdapter):
|
||||
url = f"{_API_BASE}/models/{model}:generateContent?key={self._api_key}"
|
||||
|
||||
start = time.time()
|
||||
data = post_json(url, payload, timeout=config.timeout_seconds)
|
||||
data = self._post_with_retries(url, payload, timeout=config.timeout_seconds)
|
||||
latency = time.time() - start
|
||||
|
||||
# Parse Gemini response
|
||||
@@ -113,3 +119,27 @@ class GeminiAdapter(LLMAdapter):
|
||||
if not (0.0 <= config.temperature <= 2.0):
|
||||
return False
|
||||
return True
|
||||
|
||||
# ── Internals ───────────────────────────────────────────────────
|
||||
|
||||
def _post_with_retries(
|
||||
self,
|
||||
url: str,
|
||||
payload: Dict[str, Any],
|
||||
timeout: int,
|
||||
) -> Dict[str, Any]:
|
||||
last_exc: Optional[Exception] = None
|
||||
for attempt in range(self._max_retries + 1):
|
||||
try:
|
||||
return post_json(url, payload, timeout=timeout)
|
||||
except LLMRateLimitError as exc:
|
||||
last_exc = exc
|
||||
if attempt < self._max_retries:
|
||||
time.sleep(2 ** attempt)
|
||||
except LLMAPIError as exc:
|
||||
if exc.status_code in (502, 503, 504) and attempt < self._max_retries:
|
||||
last_exc = exc
|
||||
time.sleep(2 ** attempt)
|
||||
else:
|
||||
raise
|
||||
raise last_exc # type: ignore[misc]
|
||||
|
||||
Reference in New Issue
Block a user