Add adaptive cost-quality routing primitives

2026-05-17 21:32:27 +02:00
parent bf86a03c5d
commit c4ad4bb9f2
17 changed files with 2480 additions and 25 deletions
--- a/contracts/functional/adaptive-routing-policy.md
+++ b/contracts/functional/adaptive-routing-policy.md
@@ -0,0 +1,87 @@
+# Contract: AdaptiveRoutingPolicy
+
+**layer:** Functional
+**maturity:** Beta
+**module:** `llm_connect.routing`
+**since:** WP-0004
+
+## Purpose
+
+Select the cheapest adapter whose observed mean quality for a task type clears
+a caller-supplied quality floor. The policy builds on `RoutingPolicy`: static
+rules remain the cold-start and failure fallback, while adaptive selection is
+used only when the ledger has enough qualifying observations.
+
+## Public surface
+
+```python
+@dataclass
+class AdaptiveRoutingPolicy(RoutingPolicy):
+    ledger: Optional[QualityLedger] = None
+    adapters_by_id: Mapping[str, LLMAdapter] = field(default_factory=dict)
+    window_size: int = 20
+    min_observations: int = 1
+    max_age: Optional[timedelta] = None
+
+    def resolve(
+        self,
+        task_type: str,
+        estimated_cost_per_1k: Optional[float] = None,
+        *,
+        quality_floor: Optional[float] = None,
+    ) -> LLMAdapter: ...
+```
+
+## Candidate identity
+
+Observations are keyed by `(task_type, adapter_id)`. Callers should pass
+`adapters_by_id` so the policy can map ledger observations back to concrete
+`LLMAdapter` instances. If a static rule adapter is not present in
+`adapters_by_id`, the policy also checks common string attributes
+`adapter_id`, `id`, and `name`.
+
+## Invariants
+
+1. If `quality_floor is None` or `ledger is None`, resolution is exactly the
+   same as `RoutingPolicy.resolve()`.
+2. `quality_floor` must be between `0` and `1`, inclusive.
+3. Each candidate is evaluated over the newest `window_size` observations for
+   the requested `task_type` and adapter id.
+4. `max_age`, when provided, filters out observations older than that age.
+5. A candidate is considered only when it has at least `min_observations` after
+   filtering.
+6. A candidate qualifies when its mean `quality_score` is greater than or equal
+   to `quality_floor`.
+7. Among qualifying candidates, the policy chooses the lowest mean observed
+   `cost_usd`.
+8. If mean observed cost ties exactly, the policy prefers the matching static
+   rule's explicit `prefer` adapter.
+9. If there are still ties, stable candidate order is used.
+10. If no candidate qualifies, resolution falls through to
+    `RoutingPolicy.resolve(task_type, estimated_cost_per_1k)`.
+
+## Sample-size and freshness trade-off
+
+Small `window_size` values react quickly to model or prompt changes but can be
+noisy. Larger windows are more stable but may preserve stale behavior after a
+provider update or prompt template change. `min_observations` lets callers avoid
+acting on a single lucky sample, while `max_age` bounds how long old observations
+can influence routing. Callers that change prompts materially should also filter
+by a prompt fingerprint in observation tags before writing comparable samples to
+the same ledger regime.
+
+## Error contract
+
+| Condition | Exception |
+|-----------|-----------|
+| `quality_floor` outside `0..1` | `ValueError` |
+| `window_size <= 0` | `ValueError` |
+| `min_observations <= 0` | `ValueError` |
+| `max_age < 0` | `ValueError` |
+| No qualifying adaptive candidate and no static fallback | `LookupError` |
+
+## Non-goals
+
+The policy does not define a task taxonomy, set task quality floors, decide
+which baseline is authoritative, or perform billing-grade accounting. Those are
+consumer policy choices.