generated from coulomb/repo-seed
3.2 KiB
3.2 KiB
Contract: ShadowingAdapter
layer: Functional
maturity: Beta
module: llm_connect.shadowing
since: WP-0004
Purpose
Collect quality observations without changing caller-visible model behavior.
ShadowingAdapter wraps a candidate adapter, returns the candidate response to
the caller, and samples extra baseline/grading work that appends
QualityObservation records to a QualityLedger.
Public surface
@dataclass
class ShadowingAdapter(LLMAdapter):
candidate_adapter: LLMAdapter
baseline_adapter: LLMAdapter
grader: BaselineGrader
ledger: QualityLedger
task_type: str
adapter_id: str
model_id: Optional[str] = None
baseline_adapter_id: Optional[str] = None
shadow_rate: float = 1.0
async_shadow: bool = False
tags: Mapping[str, Any] = field(default_factory=dict)
on_shadow_error: Optional[Callable[[Exception], None]] = None
def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse: ...
async def async_execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse: ...
def flush(self, timeout: Optional[float] = None) -> None: ...
def shutdown(self, wait: bool = True) -> None: ...
Invariants
- The candidate adapter is always called first.
- The response returned by
execute_prompt()andasync_execute_prompt()is always the candidate response. - Shadow failures from the baseline adapter, grader, or ledger writer are
isolated from the caller. They are sent to
on_shadow_errorwhen configured. shadow_rate=0.0records no observations.shadow_rate=1.0shadows every successful candidate call. Intermediate values sample withrandom_source.- Shadow grading reuses the candidate response already returned by the wrapped candidate adapter; it does not make a second candidate model call.
- Shadow calls use a copy of
RunConfigwithbudget_tracker=None, so observation collection cannot consume the caller's foreground token budget. async_shadow=Trueschedules shadow work on a background thread.flush()waits for currently queued work, andshutdown()releases the executor.
Observation mapping
The appended observation uses:
task_typefrom the wrapper configurationadapter_idfrom the wrapper configurationmodel_idfrom the wrapper configuration, then candidate response model, thenRunConfig.model_namequality_scorefrom theGradingResultcost_usdfrom response metadata keyscost_usd,estimated_cost_usd, orcost, falling back to0.0- token counts from candidate response usage keys
prompt_tokensandcompletion_tokens baseline_adapter_idandtagsfrom wrapper configuration
Error contract
| Condition | Exception |
|---|---|
Empty task_type |
ValueError |
Empty adapter_id |
ValueError |
shadow_rate outside 0..1 |
ValueError |
| Candidate adapter failure | Original exception propagates |
| Shadow baseline/grading/ledger failure | Suppressed; optional callback |
Privacy note
The wrapper does not store prompt or response text in the ledger by default.
Callers that need regime tracking should store non-sensitive fingerprints in
tags, for example prompt_fingerprint or template_version.