Contract: AdaptiveRoutingPolicy

layer: Functional maturity: Beta module: llm_connect.routing since: WP-0004

Purpose

Select the cheapest adapter whose observed mean quality for a task type clears a caller-supplied quality floor. The policy builds on RoutingPolicy: static rules remain the cold-start and failure fallback, while adaptive selection is used only when the ledger has enough qualifying observations.

Public surface

@dataclass
class AdaptiveRoutingPolicy(RoutingPolicy):
    ledger: Optional[QualityLedger] = None
    adapters_by_id: Mapping[str, LLMAdapter] = field(default_factory=dict)
    window_size: int = 20
    min_observations: int = 1
    max_age: Optional[timedelta] = None

    def resolve(
        self,
        task_type: str,
        estimated_cost_per_1k: Optional[float] = None,
        *,
        quality_floor: Optional[float] = None,
    ) -> LLMAdapter: ...

Candidate identity

Observations are keyed by (task_type, adapter_id). Callers should pass adapters_by_id so the policy can map ledger observations back to concrete LLMAdapter instances. If a static rule adapter is not present in adapters_by_id, the policy also checks common string attributes adapter_id, id, and name.

Invariants

If quality_floor is None or ledger is None, resolution is exactly the same as RoutingPolicy.resolve().
quality_floor must be between 0 and 1, inclusive.
Each candidate is evaluated over the newest window_size observations for the requested task_type and adapter id.
max_age, when provided, filters out observations older than that age.
A candidate is considered only when it has at least min_observations after filtering.
A candidate qualifies when its mean quality_score is greater than or equal to quality_floor.
Among qualifying candidates, the policy chooses the lowest mean observed cost_usd.
If mean observed cost ties exactly, the policy prefers the matching static rule's explicit prefer adapter.
If there are still ties, stable candidate order is used.
If no candidate qualifies, resolution falls through to RoutingPolicy.resolve(task_type, estimated_cost_per_1k).

Sample-size and freshness trade-off

Small window_size values react quickly to model or prompt changes but can be noisy. Larger windows are more stable but may preserve stale behavior after a provider update or prompt template change. min_observations lets callers avoid acting on a single lucky sample, while max_age bounds how long old observations can influence routing. Callers that change prompts materially should also filter by a prompt fingerprint in observation tags before writing comparable samples to the same ledger regime.

Error contract

Condition	Exception
`quality_floor` outside `0..1`	`ValueError`
`window_size <= 0`	`ValueError`
`min_observations <= 0`	`ValueError`
`max_age < 0`	`ValueError`
No qualifying adaptive candidate and no static fallback	`LookupError`

Non-goals

The policy does not define a task taxonomy, set task quality floors, decide which baseline is authoritative, or perform billing-grade accounting. Those are consumer policy choices.

3.2 KiB Raw Blame History