llm-connect/contracts/functional/adaptive-routing-policy.md

# Contract: AdaptiveRoutingPolicy

**layer:** Functional
**maturity:** Beta
**module:** `llm_connect.routing`
**since:** WP-0004

## Purpose

Select the cheapest adapter whose observed mean quality for a task type clears
a caller-supplied quality floor. The policy builds on `RoutingPolicy`: static
rules remain the cold-start and failure fallback, while adaptive selection is
used only when the ledger has enough qualifying observations.

## Public surface

```python
@dataclass
class AdaptiveRoutingPolicy(RoutingPolicy):
    ledger: Optional[QualityLedger] = None
    adapters_by_id: Mapping[str, LLMAdapter] = field(default_factory=dict)
    window_size: int = 20
    min_observations: int = 1
    max_age: Optional[timedelta] = None

    def resolve(
        self,
        task_type: str,
        estimated_cost_per_1k: Optional[float] = None,
        *,
        quality_floor: Optional[float] = None,
    ) -> LLMAdapter: ...
```

## Candidate identity

Observations are keyed by `(task_type, adapter_id)`. Callers should pass
`adapters_by_id` so the policy can map ledger observations back to concrete
`LLMAdapter` instances. If a static rule adapter is not present in
`adapters_by_id`, the policy also checks common string attributes
`adapter_id`, `id`, and `name`.

## Invariants

1. If `quality_floor is None` or `ledger is None`, resolution is exactly the
   same as `RoutingPolicy.resolve()`.
2. `quality_floor` must be between `0` and `1`, inclusive.
3. Each candidate is evaluated over the newest `window_size` observations for
   the requested `task_type` and adapter id.
4. `max_age`, when provided, filters out observations older than that age.
5. A candidate is considered only when it has at least `min_observations` after
   filtering.
6. A candidate qualifies when its mean `quality_score` is greater than or equal
   to `quality_floor`.
7. Among qualifying candidates, the policy chooses the lowest mean observed
   `cost_usd`.
8. If mean observed cost ties exactly, the policy prefers the matching static
   rule's explicit `prefer` adapter.
9. If there are still ties, stable candidate order is used.
10. If no candidate qualifies, resolution falls through to
    `RoutingPolicy.resolve(task_type, estimated_cost_per_1k)`.

## Sample-size and freshness trade-off

Small `window_size` values react quickly to model or prompt changes but can be
noisy. Larger windows are more stable but may preserve stale behavior after a
provider update or prompt template change. `min_observations` lets callers avoid
acting on a single lucky sample, while `max_age` bounds how long old observations
can influence routing. Callers that change prompts materially should also filter
by a prompt fingerprint in observation tags before writing comparable samples to
the same ledger regime.

## Error contract

| Condition | Exception |
|-----------|-----------|
| `quality_floor` outside `0..1` | `ValueError` |
| `window_size <= 0` | `ValueError` |
| `min_observations <= 0` | `ValueError` |
| `max_age < 0` | `ValueError` |
| No qualifying adaptive candidate and no static fallback | `LookupError` |

## Non-goals

The policy does not define a task taxonomy, set task quality floors, decide
which baseline is authoritative, or perform billing-grade accounting. Those are
consumer policy choices.