generated from coulomb/repo-seed
Add adaptive cost-quality routing primitives
This commit is contained in:
87
contracts/functional/adaptive-routing-policy.md
Normal file
87
contracts/functional/adaptive-routing-policy.md
Normal file
@@ -0,0 +1,87 @@
|
||||
# Contract: AdaptiveRoutingPolicy
|
||||
|
||||
**layer:** Functional
|
||||
**maturity:** Beta
|
||||
**module:** `llm_connect.routing`
|
||||
**since:** WP-0004
|
||||
|
||||
## Purpose
|
||||
|
||||
Select the cheapest adapter whose observed mean quality for a task type clears
|
||||
a caller-supplied quality floor. The policy builds on `RoutingPolicy`: static
|
||||
rules remain the cold-start and failure fallback, while adaptive selection is
|
||||
used only when the ledger has enough qualifying observations.
|
||||
|
||||
## Public surface
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class AdaptiveRoutingPolicy(RoutingPolicy):
|
||||
ledger: Optional[QualityLedger] = None
|
||||
adapters_by_id: Mapping[str, LLMAdapter] = field(default_factory=dict)
|
||||
window_size: int = 20
|
||||
min_observations: int = 1
|
||||
max_age: Optional[timedelta] = None
|
||||
|
||||
def resolve(
|
||||
self,
|
||||
task_type: str,
|
||||
estimated_cost_per_1k: Optional[float] = None,
|
||||
*,
|
||||
quality_floor: Optional[float] = None,
|
||||
) -> LLMAdapter: ...
|
||||
```
|
||||
|
||||
## Candidate identity
|
||||
|
||||
Observations are keyed by `(task_type, adapter_id)`. Callers should pass
|
||||
`adapters_by_id` so the policy can map ledger observations back to concrete
|
||||
`LLMAdapter` instances. If a static rule adapter is not present in
|
||||
`adapters_by_id`, the policy also checks common string attributes
|
||||
`adapter_id`, `id`, and `name`.
|
||||
|
||||
## Invariants
|
||||
|
||||
1. If `quality_floor is None` or `ledger is None`, resolution is exactly the
|
||||
same as `RoutingPolicy.resolve()`.
|
||||
2. `quality_floor` must be between `0` and `1`, inclusive.
|
||||
3. Each candidate is evaluated over the newest `window_size` observations for
|
||||
the requested `task_type` and adapter id.
|
||||
4. `max_age`, when provided, filters out observations older than that age.
|
||||
5. A candidate is considered only when it has at least `min_observations` after
|
||||
filtering.
|
||||
6. A candidate qualifies when its mean `quality_score` is greater than or equal
|
||||
to `quality_floor`.
|
||||
7. Among qualifying candidates, the policy chooses the lowest mean observed
|
||||
`cost_usd`.
|
||||
8. If mean observed cost ties exactly, the policy prefers the matching static
|
||||
rule's explicit `prefer` adapter.
|
||||
9. If there are still ties, stable candidate order is used.
|
||||
10. If no candidate qualifies, resolution falls through to
|
||||
`RoutingPolicy.resolve(task_type, estimated_cost_per_1k)`.
|
||||
|
||||
## Sample-size and freshness trade-off
|
||||
|
||||
Small `window_size` values react quickly to model or prompt changes but can be
|
||||
noisy. Larger windows are more stable but may preserve stale behavior after a
|
||||
provider update or prompt template change. `min_observations` lets callers avoid
|
||||
acting on a single lucky sample, while `max_age` bounds how long old observations
|
||||
can influence routing. Callers that change prompts materially should also filter
|
||||
by a prompt fingerprint in observation tags before writing comparable samples to
|
||||
the same ledger regime.
|
||||
|
||||
## Error contract
|
||||
|
||||
| Condition | Exception |
|
||||
|-----------|-----------|
|
||||
| `quality_floor` outside `0..1` | `ValueError` |
|
||||
| `window_size <= 0` | `ValueError` |
|
||||
| `min_observations <= 0` | `ValueError` |
|
||||
| `max_age < 0` | `ValueError` |
|
||||
| No qualifying adaptive candidate and no static fallback | `LookupError` |
|
||||
|
||||
## Non-goals
|
||||
|
||||
The policy does not define a task taxonomy, set task quality floors, decide
|
||||
which baseline is authoritative, or perform billing-grade accounting. Those are
|
||||
consumer policy choices.
|
||||
Reference in New Issue
Block a user