# Contract: AdaptiveRoutingPolicy **layer:** Functional **maturity:** Beta **module:** `llm_connect.routing` **since:** WP-0004 ## Purpose Select the cheapest adapter whose observed mean quality for a task type clears a caller-supplied quality floor. The policy builds on `RoutingPolicy`: static rules remain the cold-start and failure fallback, while adaptive selection is used only when the ledger has enough qualifying observations. ## Public surface ```python @dataclass class AdaptiveRoutingPolicy(RoutingPolicy): ledger: Optional[QualityLedger] = None adapters_by_id: Mapping[str, LLMAdapter] = field(default_factory=dict) window_size: int = 20 min_observations: int = 1 max_age: Optional[timedelta] = None def resolve( self, task_type: str, estimated_cost_per_1k: Optional[float] = None, *, quality_floor: Optional[float] = None, ) -> LLMAdapter: ... ``` ## Candidate identity Observations are keyed by `(task_type, adapter_id)`. Callers should pass `adapters_by_id` so the policy can map ledger observations back to concrete `LLMAdapter` instances. If a static rule adapter is not present in `adapters_by_id`, the policy also checks common string attributes `adapter_id`, `id`, and `name`. ## Invariants 1. If `quality_floor is None` or `ledger is None`, resolution is exactly the same as `RoutingPolicy.resolve()`. 2. `quality_floor` must be between `0` and `1`, inclusive. 3. Each candidate is evaluated over the newest `window_size` observations for the requested `task_type` and adapter id. 4. `max_age`, when provided, filters out observations older than that age. 5. A candidate is considered only when it has at least `min_observations` after filtering. 6. A candidate qualifies when its mean `quality_score` is greater than or equal to `quality_floor`. 7. Among qualifying candidates, the policy chooses the lowest mean observed `cost_usd`. 8. If mean observed cost ties exactly, the policy prefers the matching static rule's explicit `prefer` adapter. 9. If there are still ties, stable candidate order is used. 10. If no candidate qualifies, resolution falls through to `RoutingPolicy.resolve(task_type, estimated_cost_per_1k)`. ## Sample-size and freshness trade-off Small `window_size` values react quickly to model or prompt changes but can be noisy. Larger windows are more stable but may preserve stale behavior after a provider update or prompt template change. `min_observations` lets callers avoid acting on a single lucky sample, while `max_age` bounds how long old observations can influence routing. Callers that change prompts materially should also filter by a prompt fingerprint in observation tags before writing comparable samples to the same ledger regime. ## Error contract | Condition | Exception | |-----------|-----------| | `quality_floor` outside `0..1` | `ValueError` | | `window_size <= 0` | `ValueError` | | `min_observations <= 0` | `ValueError` | | `max_age < 0` | `ValueError` | | No qualifying adaptive candidate and no static fallback | `LookupError` | ## Non-goals The policy does not define a task taxonomy, set task quality floors, decide which baseline is authoritative, or perform billing-grade accounting. Those are consumer policy choices.