3.2 KiB
Contract: AdaptiveRoutingPolicy
layer: Functional
maturity: Beta
module: llm_connect.routing
since: WP-0004
Purpose
Select the cheapest adapter whose observed mean quality for a task type clears
a caller-supplied quality floor. The policy builds on RoutingPolicy: static
rules remain the cold-start and failure fallback, while adaptive selection is
used only when the ledger has enough qualifying observations.
Public surface
@dataclass
class AdaptiveRoutingPolicy(RoutingPolicy):
ledger: Optional[QualityLedger] = None
adapters_by_id: Mapping[str, LLMAdapter] = field(default_factory=dict)
window_size: int = 20
min_observations: int = 1
max_age: Optional[timedelta] = None
def resolve(
self,
task_type: str,
estimated_cost_per_1k: Optional[float] = None,
*,
quality_floor: Optional[float] = None,
) -> LLMAdapter: ...
Candidate identity
Observations are keyed by (task_type, adapter_id). Callers should pass
adapters_by_id so the policy can map ledger observations back to concrete
LLMAdapter instances. If a static rule adapter is not present in
adapters_by_id, the policy also checks common string attributes
adapter_id, id, and name.
Invariants
- If
quality_floor is Noneorledger is None, resolution is exactly the same asRoutingPolicy.resolve(). quality_floormust be between0and1, inclusive.- Each candidate is evaluated over the newest
window_sizeobservations for the requestedtask_typeand adapter id. max_age, when provided, filters out observations older than that age.- A candidate is considered only when it has at least
min_observationsafter filtering. - A candidate qualifies when its mean
quality_scoreis greater than or equal toquality_floor. - Among qualifying candidates, the policy chooses the lowest mean observed
cost_usd. - If mean observed cost ties exactly, the policy prefers the matching static
rule's explicit
preferadapter. - If there are still ties, stable candidate order is used.
- If no candidate qualifies, resolution falls through to
RoutingPolicy.resolve(task_type, estimated_cost_per_1k).
Sample-size and freshness trade-off
Small window_size values react quickly to model or prompt changes but can be
noisy. Larger windows are more stable but may preserve stale behavior after a
provider update or prompt template change. min_observations lets callers avoid
acting on a single lucky sample, while max_age bounds how long old observations
can influence routing. Callers that change prompts materially should also filter
by a prompt fingerprint in observation tags before writing comparable samples to
the same ledger regime.
Error contract
| Condition | Exception |
|---|---|
quality_floor outside 0..1 |
ValueError |
window_size <= 0 |
ValueError |
min_observations <= 0 |
ValueError |
max_age < 0 |
ValueError |
| No qualifying adaptive candidate and no static fallback | LookupError |
Non-goals
The policy does not define a task taxonomy, set task quality floors, decide which baseline is authoritative, or perform billing-grade accounting. Those are consumer policy choices.