Add adaptive cost-quality routing primitives

2026-05-17 21:32:27 +02:00
parent bf86a03c5d
commit c4ad4bb9f2
17 changed files with 2480 additions and 25 deletions
--- a/contracts/functional/shadowing-adapter.md
+++ b/contracts/functional/shadowing-adapter.md
@@ -0,0 +1,84 @@
+# Contract: ShadowingAdapter
+
+**layer:** Functional
+**maturity:** Beta
+**module:** `llm_connect.shadowing`
+**since:** WP-0004
+
+## Purpose
+
+Collect quality observations without changing caller-visible model behavior.
+`ShadowingAdapter` wraps a candidate adapter, returns the candidate response to
+the caller, and samples extra baseline/grading work that appends
+`QualityObservation` records to a `QualityLedger`.
+
+## Public surface
+
+```python
+@dataclass
+class ShadowingAdapter(LLMAdapter):
+    candidate_adapter: LLMAdapter
+    baseline_adapter: LLMAdapter
+    grader: BaselineGrader
+    ledger: QualityLedger
+    task_type: str
+    adapter_id: str
+    model_id: Optional[str] = None
+    baseline_adapter_id: Optional[str] = None
+    shadow_rate: float = 1.0
+    async_shadow: bool = False
+    tags: Mapping[str, Any] = field(default_factory=dict)
+    on_shadow_error: Optional[Callable[[Exception], None]] = None
+
+    def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse: ...
+    async def async_execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse: ...
+    def flush(self, timeout: Optional[float] = None) -> None: ...
+    def shutdown(self, wait: bool = True) -> None: ...
+```
+
+## Invariants
+
+1. The candidate adapter is always called first.
+2. The response returned by `execute_prompt()` and `async_execute_prompt()` is
+   always the candidate response.
+3. Shadow failures from the baseline adapter, grader, or ledger writer are
+   isolated from the caller. They are sent to `on_shadow_error` when configured.
+4. `shadow_rate=0.0` records no observations. `shadow_rate=1.0` shadows every
+   successful candidate call. Intermediate values sample with `random_source`.
+5. Shadow grading reuses the candidate response already returned by the wrapped
+   candidate adapter; it does not make a second candidate model call.
+6. Shadow calls use a copy of `RunConfig` with `budget_tracker=None`, so
+   observation collection cannot consume the caller's foreground token budget.
+7. `async_shadow=True` schedules shadow work on a background thread. `flush()`
+   waits for currently queued work, and `shutdown()` releases the executor.
+
+## Observation mapping
+
+The appended observation uses:
+
+- `task_type` from the wrapper configuration
+- `adapter_id` from the wrapper configuration
+- `model_id` from the wrapper configuration, then candidate response model, then
+  `RunConfig.model_name`
+- `quality_score` from the `GradingResult`
+- `cost_usd` from response metadata keys `cost_usd`, `estimated_cost_usd`, or
+  `cost`, falling back to `0.0`
+- token counts from candidate response usage keys `prompt_tokens` and
+  `completion_tokens`
+- `baseline_adapter_id` and `tags` from wrapper configuration
+
+## Error contract
+
+| Condition | Exception |
+|-----------|-----------|
+| Empty `task_type` | `ValueError` |
+| Empty `adapter_id` | `ValueError` |
+| `shadow_rate` outside `0..1` | `ValueError` |
+| Candidate adapter failure | Original exception propagates |
+| Shadow baseline/grading/ledger failure | Suppressed; optional callback |
+
+## Privacy note
+
+The wrapper does not store prompt or response text in the ledger by default.
+Callers that need regime tracking should store non-sensitive fingerprints in
+`tags`, for example `prompt_fingerprint` or `template_version`.