Files
llm-connect/contracts/functional/shadowing-adapter.md
tegwick c4ad4bb9f2
Some checks failed
CI / test (3.10) (push) Has been cancelled
CI / test (3.11) (push) Has been cancelled
CI / test (3.12) (push) Has been cancelled
Add adaptive cost-quality routing primitives
2026-05-17 21:32:27 +02:00

3.2 KiB

Contract: ShadowingAdapter

layer: Functional maturity: Beta module: llm_connect.shadowing since: WP-0004

Purpose

Collect quality observations without changing caller-visible model behavior. ShadowingAdapter wraps a candidate adapter, returns the candidate response to the caller, and samples extra baseline/grading work that appends QualityObservation records to a QualityLedger.

Public surface

@dataclass
class ShadowingAdapter(LLMAdapter):
    candidate_adapter: LLMAdapter
    baseline_adapter: LLMAdapter
    grader: BaselineGrader
    ledger: QualityLedger
    task_type: str
    adapter_id: str
    model_id: Optional[str] = None
    baseline_adapter_id: Optional[str] = None
    shadow_rate: float = 1.0
    async_shadow: bool = False
    tags: Mapping[str, Any] = field(default_factory=dict)
    on_shadow_error: Optional[Callable[[Exception], None]] = None

    def execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse: ...
    async def async_execute_prompt(self, prompt: str, config: RunConfig) -> LLMResponse: ...
    def flush(self, timeout: Optional[float] = None) -> None: ...
    def shutdown(self, wait: bool = True) -> None: ...

Invariants

  1. The candidate adapter is always called first.
  2. The response returned by execute_prompt() and async_execute_prompt() is always the candidate response.
  3. Shadow failures from the baseline adapter, grader, or ledger writer are isolated from the caller. They are sent to on_shadow_error when configured.
  4. shadow_rate=0.0 records no observations. shadow_rate=1.0 shadows every successful candidate call. Intermediate values sample with random_source.
  5. Shadow grading reuses the candidate response already returned by the wrapped candidate adapter; it does not make a second candidate model call.
  6. Shadow calls use a copy of RunConfig with budget_tracker=None, so observation collection cannot consume the caller's foreground token budget.
  7. async_shadow=True schedules shadow work on a background thread. flush() waits for currently queued work, and shutdown() releases the executor.

Observation mapping

The appended observation uses:

  • task_type from the wrapper configuration
  • adapter_id from the wrapper configuration
  • model_id from the wrapper configuration, then candidate response model, then RunConfig.model_name
  • quality_score from the GradingResult
  • cost_usd from response metadata keys cost_usd, estimated_cost_usd, or cost, falling back to 0.0
  • token counts from candidate response usage keys prompt_tokens and completion_tokens
  • baseline_adapter_id and tags from wrapper configuration

Error contract

Condition Exception
Empty task_type ValueError
Empty adapter_id ValueError
shadow_rate outside 0..1 ValueError
Candidate adapter failure Original exception propagates
Shadow baseline/grading/ledger failure Suppressed; optional callback

Privacy note

The wrapper does not store prompt or response text in the ledger by default. Callers that need regime tracking should store non-sensitive fingerprints in tags, for example prompt_fingerprint or template_version.