Files
llm-connect/contracts/functional/adaptive-routing-policy.md
tegwick c4ad4bb9f2
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
Add adaptive cost-quality routing primitives
2026-05-17 21:32:27 +02:00

88 lines
3.2 KiB
Markdown

# Contract: AdaptiveRoutingPolicy
**layer:** Functional
**maturity:** Beta
**module:** `llm_connect.routing`
**since:** WP-0004
## Purpose
Select the cheapest adapter whose observed mean quality for a task type clears
a caller-supplied quality floor. The policy builds on `RoutingPolicy`: static
rules remain the cold-start and failure fallback, while adaptive selection is
used only when the ledger has enough qualifying observations.
## Public surface
```python
@dataclass
class AdaptiveRoutingPolicy(RoutingPolicy):
ledger: Optional[QualityLedger] = None
adapters_by_id: Mapping[str, LLMAdapter] = field(default_factory=dict)
window_size: int = 20
min_observations: int = 1
max_age: Optional[timedelta] = None
def resolve(
self,
task_type: str,
estimated_cost_per_1k: Optional[float] = None,
*,
quality_floor: Optional[float] = None,
) -> LLMAdapter: ...
```
## Candidate identity
Observations are keyed by `(task_type, adapter_id)`. Callers should pass
`adapters_by_id` so the policy can map ledger observations back to concrete
`LLMAdapter` instances. If a static rule adapter is not present in
`adapters_by_id`, the policy also checks common string attributes
`adapter_id`, `id`, and `name`.
## Invariants
1. If `quality_floor is None` or `ledger is None`, resolution is exactly the
same as `RoutingPolicy.resolve()`.
2. `quality_floor` must be between `0` and `1`, inclusive.
3. Each candidate is evaluated over the newest `window_size` observations for
the requested `task_type` and adapter id.
4. `max_age`, when provided, filters out observations older than that age.
5. A candidate is considered only when it has at least `min_observations` after
filtering.
6. A candidate qualifies when its mean `quality_score` is greater than or equal
to `quality_floor`.
7. Among qualifying candidates, the policy chooses the lowest mean observed
`cost_usd`.
8. If mean observed cost ties exactly, the policy prefers the matching static
rule's explicit `prefer` adapter.
9. If there are still ties, stable candidate order is used.
10. If no candidate qualifies, resolution falls through to
`RoutingPolicy.resolve(task_type, estimated_cost_per_1k)`.
## Sample-size and freshness trade-off
Small `window_size` values react quickly to model or prompt changes but can be
noisy. Larger windows are more stable but may preserve stale behavior after a
provider update or prompt template change. `min_observations` lets callers avoid
acting on a single lucky sample, while `max_age` bounds how long old observations
can influence routing. Callers that change prompts materially should also filter
by a prompt fingerprint in observation tags before writing comparable samples to
the same ledger regime.
## Error contract
| Condition | Exception |
|-----------|-----------|
| `quality_floor` outside `0..1` | `ValueError` |
| `window_size <= 0` | `ValueError` |
| `min_observations <= 0` | `ValueError` |
| `max_age < 0` | `ValueError` |
| No qualifying adaptive candidate and no static fallback | `LookupError` |
## Non-goals
The policy does not define a task taxonomy, set task quality floors, decide
which baseline is authoritative, or perform billing-grade accounting. Those are
consumer policy choices.