generated from coulomb/repo-seed
88 lines
3.2 KiB
Markdown
88 lines
3.2 KiB
Markdown
# Contract: AdaptiveRoutingPolicy
|
|
|
|
**layer:** Functional
|
|
**maturity:** Beta
|
|
**module:** `llm_connect.routing`
|
|
**since:** WP-0004
|
|
|
|
## Purpose
|
|
|
|
Select the cheapest adapter whose observed mean quality for a task type clears
|
|
a caller-supplied quality floor. The policy builds on `RoutingPolicy`: static
|
|
rules remain the cold-start and failure fallback, while adaptive selection is
|
|
used only when the ledger has enough qualifying observations.
|
|
|
|
## Public surface
|
|
|
|
```python
|
|
@dataclass
|
|
class AdaptiveRoutingPolicy(RoutingPolicy):
|
|
ledger: Optional[QualityLedger] = None
|
|
adapters_by_id: Mapping[str, LLMAdapter] = field(default_factory=dict)
|
|
window_size: int = 20
|
|
min_observations: int = 1
|
|
max_age: Optional[timedelta] = None
|
|
|
|
def resolve(
|
|
self,
|
|
task_type: str,
|
|
estimated_cost_per_1k: Optional[float] = None,
|
|
*,
|
|
quality_floor: Optional[float] = None,
|
|
) -> LLMAdapter: ...
|
|
```
|
|
|
|
## Candidate identity
|
|
|
|
Observations are keyed by `(task_type, adapter_id)`. Callers should pass
|
|
`adapters_by_id` so the policy can map ledger observations back to concrete
|
|
`LLMAdapter` instances. If a static rule adapter is not present in
|
|
`adapters_by_id`, the policy also checks common string attributes
|
|
`adapter_id`, `id`, and `name`.
|
|
|
|
## Invariants
|
|
|
|
1. If `quality_floor is None` or `ledger is None`, resolution is exactly the
|
|
same as `RoutingPolicy.resolve()`.
|
|
2. `quality_floor` must be between `0` and `1`, inclusive.
|
|
3. Each candidate is evaluated over the newest `window_size` observations for
|
|
the requested `task_type` and adapter id.
|
|
4. `max_age`, when provided, filters out observations older than that age.
|
|
5. A candidate is considered only when it has at least `min_observations` after
|
|
filtering.
|
|
6. A candidate qualifies when its mean `quality_score` is greater than or equal
|
|
to `quality_floor`.
|
|
7. Among qualifying candidates, the policy chooses the lowest mean observed
|
|
`cost_usd`.
|
|
8. If mean observed cost ties exactly, the policy prefers the matching static
|
|
rule's explicit `prefer` adapter.
|
|
9. If there are still ties, stable candidate order is used.
|
|
10. If no candidate qualifies, resolution falls through to
|
|
`RoutingPolicy.resolve(task_type, estimated_cost_per_1k)`.
|
|
|
|
## Sample-size and freshness trade-off
|
|
|
|
Small `window_size` values react quickly to model or prompt changes but can be
|
|
noisy. Larger windows are more stable but may preserve stale behavior after a
|
|
provider update or prompt template change. `min_observations` lets callers avoid
|
|
acting on a single lucky sample, while `max_age` bounds how long old observations
|
|
can influence routing. Callers that change prompts materially should also filter
|
|
by a prompt fingerprint in observation tags before writing comparable samples to
|
|
the same ledger regime.
|
|
|
|
## Error contract
|
|
|
|
| Condition | Exception |
|
|
|-----------|-----------|
|
|
| `quality_floor` outside `0..1` | `ValueError` |
|
|
| `window_size <= 0` | `ValueError` |
|
|
| `min_observations <= 0` | `ValueError` |
|
|
| `max_age < 0` | `ValueError` |
|
|
| No qualifying adaptive candidate and no static fallback | `LookupError` |
|
|
|
|
## Non-goals
|
|
|
|
The policy does not define a task taxonomy, set task quality floors, decide
|
|
which baseline is authoritative, or perform billing-grade accounting. Those are
|
|
consumer policy choices.
|