# Contract: ShadowingAdapter **layer:** Functional **maturity:** Beta **module:** `llm_connect.shadowing` **since:** WP-0004 ## Purpose Collect quality observations without changing caller-visible model behavior. `ShadowingAdapter` wraps a candidate adapter, returns the candidate response to the caller, and samples extra baseline/grading work that appends `QualityObservation` records to a `QualityLedger`. ## Public surface ```python @dataclass class ShadowingAdapter(LLMAdapter): candidate_adapter: LLMAdapter baseline_adapter: LLMAdapter grader: BaselineGrader ledger: QualityLedger task_type: str adapter_id: str model_id: Optional[str] = None baseline_adapter_id: Optional[str] = None shadow_rate: float = 1.0 async_shadow: bool = False tags: Mapping[str, Any] = field(default_factory=dict) on_shadow_error: Optional[Callable[[Exception], None]] = None def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse: ... async def async_execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse: ... def flush(self, timeout: Optional[float] = None) -> None: ... def shutdown(self, wait: bool = True) -> None: ... ``` ## Invariants 1. The candidate adapter is always called first. 2. The response returned by `execute_prompt()` and `async_execute_prompt()` is always the candidate response. 3. Shadow failures from the baseline adapter, grader, or ledger writer are isolated from the caller. They are sent to `on_shadow_error` when configured. 4. `shadow_rate=0.0` records no observations. `shadow_rate=1.0` shadows every successful candidate call. Intermediate values sample with `random_source`. 5. Shadow grading reuses the candidate response already returned by the wrapped candidate adapter; it does not make a second candidate model call. 6. Shadow calls use a copy of `RunConfig` with `budget_tracker=None`, so observation collection cannot consume the caller's foreground token budget. 7. `async_shadow=True` schedules shadow work on a background thread. `flush()` waits for currently queued work, and `shutdown()` releases the executor. ## Observation mapping The appended observation uses: - `task_type` from the wrapper configuration - `adapter_id` from the wrapper configuration - `model_id` from the wrapper configuration, then candidate response model, then `RunConfig.model_name` - `quality_score` from the `GradingResult` - `cost_usd` from response metadata keys `cost_usd`, `estimated_cost_usd`, or `cost`, falling back to `0.0` - token counts from candidate response usage keys `prompt_tokens` and `completion_tokens` - `baseline_adapter_id` and `tags` from wrapper configuration ## Error contract | Condition | Exception | |-----------|-----------| | Empty `task_type` | `ValueError` | | Empty `adapter_id` | `ValueError` | | `shadow_rate` outside `0..1` | `ValueError` | | Candidate adapter failure | Original exception propagates | | Shadow baseline/grading/ledger failure | Suppressed; optional callback | ## Privacy note The wrapper does not store prompt or response text in the ledger by default. Callers that need regime tracking should store non-sensitive fingerprints in `tags`, for example `prompt_fingerprint` or `template_version`.