generated from coulomb/repo-seed
Add adaptive cost-quality routing primitives
This commit is contained in:
84
contracts/functional/shadowing-adapter.md
Normal file
84
contracts/functional/shadowing-adapter.md
Normal file
@@ -0,0 +1,84 @@
|
||||
# Contract: ShadowingAdapter
|
||||
|
||||
**layer:** Functional
|
||||
**maturity:** Beta
|
||||
**module:** `llm_connect.shadowing`
|
||||
**since:** WP-0004
|
||||
|
||||
## Purpose
|
||||
|
||||
Collect quality observations without changing caller-visible model behavior.
|
||||
`ShadowingAdapter` wraps a candidate adapter, returns the candidate response to
|
||||
the caller, and samples extra baseline/grading work that appends
|
||||
`QualityObservation` records to a `QualityLedger`.
|
||||
|
||||
## Public surface
|
||||
|
||||
```python
|
||||
@dataclass
|
||||
class ShadowingAdapter(LLMAdapter):
|
||||
candidate_adapter: LLMAdapter
|
||||
baseline_adapter: LLMAdapter
|
||||
grader: BaselineGrader
|
||||
ledger: QualityLedger
|
||||
task_type: str
|
||||
adapter_id: str
|
||||
model_id: Optional[str] = None
|
||||
baseline_adapter_id: Optional[str] = None
|
||||
shadow_rate: float = 1.0
|
||||
async_shadow: bool = False
|
||||
tags: Mapping[str, Any] = field(default_factory=dict)
|
||||
on_shadow_error: Optional[Callable[[Exception], None]] = None
|
||||
|
||||
def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse: ...
|
||||
async def async_execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse: ...
|
||||
def flush(self, timeout: Optional[float] = None) -> None: ...
|
||||
def shutdown(self, wait: bool = True) -> None: ...
|
||||
```
|
||||
|
||||
## Invariants
|
||||
|
||||
1. The candidate adapter is always called first.
|
||||
2. The response returned by `execute_prompt()` and `async_execute_prompt()` is
|
||||
always the candidate response.
|
||||
3. Shadow failures from the baseline adapter, grader, or ledger writer are
|
||||
isolated from the caller. They are sent to `on_shadow_error` when configured.
|
||||
4. `shadow_rate=0.0` records no observations. `shadow_rate=1.0` shadows every
|
||||
successful candidate call. Intermediate values sample with `random_source`.
|
||||
5. Shadow grading reuses the candidate response already returned by the wrapped
|
||||
candidate adapter; it does not make a second candidate model call.
|
||||
6. Shadow calls use a copy of `RunConfig` with `budget_tracker=None`, so
|
||||
observation collection cannot consume the caller's foreground token budget.
|
||||
7. `async_shadow=True` schedules shadow work on a background thread. `flush()`
|
||||
waits for currently queued work, and `shutdown()` releases the executor.
|
||||
|
||||
## Observation mapping
|
||||
|
||||
The appended observation uses:
|
||||
|
||||
- `task_type` from the wrapper configuration
|
||||
- `adapter_id` from the wrapper configuration
|
||||
- `model_id` from the wrapper configuration, then candidate response model, then
|
||||
`RunConfig.model_name`
|
||||
- `quality_score` from the `GradingResult`
|
||||
- `cost_usd` from response metadata keys `cost_usd`, `estimated_cost_usd`, or
|
||||
`cost`, falling back to `0.0`
|
||||
- token counts from candidate response usage keys `prompt_tokens` and
|
||||
`completion_tokens`
|
||||
- `baseline_adapter_id` and `tags` from wrapper configuration
|
||||
|
||||
## Error contract
|
||||
|
||||
| Condition | Exception |
|
||||
|-----------|-----------|
|
||||
| Empty `task_type` | `ValueError` |
|
||||
| Empty `adapter_id` | `ValueError` |
|
||||
| `shadow_rate` outside `0..1` | `ValueError` |
|
||||
| Candidate adapter failure | Original exception propagates |
|
||||
| Shadow baseline/grading/ledger failure | Suppressed; optional callback |
|
||||
|
||||
## Privacy note
|
||||
|
||||
The wrapper does not store prompt or response text in the ledger by default.
|
||||
Callers that need regime tracking should store non-sensitive fingerprints in
|
||||
`tags`, for example `prompt_fingerprint` or `template_version`.
|
||||
Reference in New Issue
Block a user