feat(infospace,llm): stabilize free-tier eval workflow

Five improvements that eliminate most of the agent-in-the-loop friction observed while closing out the 988-entity WoN evaluation (C.1): 1. Gemini adapter now retries on 429 + 5xx with exponential backoff (same pattern already used by OpenRouter/OpenAI). Removes the need for shell-level retry wrappers when hitting free-tier rate limits. 2. evaluate CLI prints the underlying error ("ERROR — HTTP 503 …") instead of a bare "ERROR", so agents don't have to drop into Python to diagnose transient failures. 3. --entity/--chapter now respect existing evaluation files by default (previously only the full-collection pass did). New --force flag opts into re-evaluation. Stops silently burning free-tier quota on re-runs of the same slug. 4. --entity accepts hyphenated slugs (matching entity filenames) and normalizes them to the underscore form used on disk. On a miss the CLI suggests near matches instead of a bare "not found". 5. eval-summary --update-metrics is no longer destructive: read_metrics_file/write_metrics_file preserve structured values (type_distribution) and don't flatten ints to floats. Fixes a silent data loss observed on every run. Bonus: the evaluator field in written evaluation frontmatter now falls back from run_config.model_name to the adapter's resolved model (or the model echoed back in the API response), so rows no longer show `evaluator: null` when --model is omitted. Tests: new tests/unit/llm/test_gemini.py covers retry behavior; tests/unit/infospace/test_history.py gains a round-trip test that pins the type_distribution / int-preservation invariants. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 00:51:00 +02:00
parent 965508ec06
commit c0615c2d50
6 changed files with 210 additions and 27 deletions
--- a/markitect/llm/gemini.py
+++ b/markitect/llm/gemini.py
@@ -9,7 +9,11 @@ from markitect.llm.adapter import LLMAdapter
 from markitect.llm.models import RunConfig, LLMResponse
 from markitect.llm.config import resolve_api_key, find_project_root
 from markitect.llm._http import post_json
-from markitect.llm.exceptions import LLMConfigurationError
+from markitect.llm.exceptions import (
+    LLMConfigurationError,
+    LLMAPIError,
+    LLMRateLimitError,
+)

 _DEFAULT_MODEL = "gemini-2.5-flash"
 _API_BASE = "https://generativelanguage.googleapis.com/v1beta"
@@ -26,10 +30,12 @@ class GeminiAdapter(LLMAdapter):
        model: Optional[str] = None,
        api_key: Optional[str] = None,
        system_prompt: Optional[str] = None,
+        max_retries: int = 3,
        **_kwargs: Any,
    ):
        self._model = model or _DEFAULT_MODEL
        self._system_prompt = system_prompt
+        self._max_retries = max_retries

        root = find_project_root()
        key_file_paths = [root / "apikey-geminifree.txt"] if root else []
@@ -77,7 +83,7 @@ class GeminiAdapter(LLMAdapter):
        url = f"{_API_BASE}/models/{model}:generateContent?key={self._api_key}"

        start = time.time()
-        data = post_json(url, payload, timeout=config.timeout_seconds)
+        data = self._post_with_retries(url, payload, timeout=config.timeout_seconds)
        latency = time.time() - start

        # Parse Gemini response
@@ -113,3 +119,27 @@ class GeminiAdapter(LLMAdapter):
        if not (0.0 <= config.temperature <= 2.0):
            return False
        return True
+
+    # ── Internals ───────────────────────────────────────────────────
+
+    def _post_with_retries(
+        self,
+        url: str,
+        payload: Dict[str, Any],
+        timeout: int,
+    ) -> Dict[str, Any]:
+        last_exc: Optional[Exception] = None
+        for attempt in range(self._max_retries + 1):
+            try:
+                return post_json(url, payload, timeout=timeout)
+            except LLMRateLimitError as exc:
+                last_exc = exc
+                if attempt < self._max_retries:
+                    time.sleep(2 ** attempt)
+            except LLMAPIError as exc:
+                if exc.status_code in (502, 503, 504) and attempt < self._max_retries:
+                    last_exc = exc
+                    time.sleep(2 ** attempt)
+                else:
+                    raise
+        raise last_exc  # type: ignore[misc]