generated from coulomb/repo-seed
Add adaptive cost-quality routing primitives
This commit is contained in:
87
contracts/functional/quality-ledger.md
Normal file
87
contracts/functional/quality-ledger.md
Normal file
@@ -0,0 +1,87 @@
|
||||
# Contract: QualityObservation and QualityLedger
|
||||
|
||||
**layer:** Functional
|
||||
**maturity:** Beta
|
||||
**module:** `llm_connect.quality`
|
||||
**since:** WP-0004
|
||||
|
||||
## Purpose
|
||||
|
||||
Record observed quality, cost, latency, and token outcomes for a logical task
|
||||
type so consumers can build adaptive routing policy without putting
|
||||
consumer-specific thresholds into llm-connect.
|
||||
|
||||
## Public surface
|
||||
|
||||
```python
|
||||
@dataclass(frozen=True)
|
||||
class QualityObservation:
|
||||
task_type: str
|
||||
adapter_id: str
|
||||
model_id: str
|
||||
cost_usd: float
|
||||
quality_score: float
|
||||
latency_ms: float
|
||||
tokens_in: int
|
||||
tokens_out: int
|
||||
baseline_adapter_id: str | None = None
|
||||
recorded_at: datetime = field(default_factory=...)
|
||||
tags: dict[str, Any] = field(default_factory=dict)
|
||||
|
||||
@property
|
||||
def total_tokens(self) -> int: ...
|
||||
def to_dict(self) -> dict[str, Any]: ...
|
||||
@classmethod
|
||||
def from_dict(cls, data: dict[str, Any]) -> "QualityObservation": ...
|
||||
|
||||
class QualityLedger:
|
||||
def __init__(self, path: str | Path): ...
|
||||
@property
|
||||
def path(self) -> Path: ...
|
||||
def append(self, observation: QualityObservation) -> None: ...
|
||||
def read_all(self) -> list[QualityObservation]: ...
|
||||
def malformed_count(self) -> int: ...
|
||||
def by_task_type(self, task_type: str) -> list[QualityObservation]: ...
|
||||
def recent(...) -> list[QualityObservation]: ...
|
||||
def mean_quality(...) -> float | None: ...
|
||||
def prune_before(self, timestamp: datetime) -> int: ...
|
||||
|
||||
def is_stale(observation: QualityObservation, max_age: timedelta, *, now: datetime | None = None) -> bool: ...
|
||||
```
|
||||
|
||||
## Invariants
|
||||
|
||||
1. `quality_score` is a normalised `0.0..1.0` score where `1.0` means the
|
||||
candidate fully meets the grader's quality bar and `0.0` means complete
|
||||
failure for that grader.
|
||||
2. `task_type`, `adapter_id`, and `model_id` must be non-empty strings.
|
||||
3. `cost_usd`, `latency_ms`, `tokens_in`, and `tokens_out` are non-negative.
|
||||
4. `recorded_at` is normalised to UTC. Naive datetimes are interpreted as UTC.
|
||||
5. Ledger records are JSON Lines. Each line is one `QualityObservation.to_dict()`.
|
||||
6. `QualityLedger.append()` performs a process-local lock plus an advisory file
|
||||
lock around each write.
|
||||
7. Read/query helpers skip malformed lines instead of failing the whole ledger.
|
||||
`malformed_count()` exposes how many lines were skipped.
|
||||
8. `prune_before()` removes only valid observations older than the cutoff.
|
||||
Malformed lines are preserved.
|
||||
|
||||
## Error contract
|
||||
|
||||
| Condition | Exception |
|
||||
|-----------|-----------|
|
||||
| Invalid observation field | `ValueError` |
|
||||
| Invalid datetime field | `TypeError` or `ValueError` |
|
||||
| Negative recent limit | `ValueError` |
|
||||
| `mean_quality(min_observations <= 0)` | `ValueError` |
|
||||
| `is_stale(max_age < 0)` | `ValueError` |
|
||||
|
||||
## Known consumers
|
||||
|
||||
- `infospace-bench` is the first intended consumer. It is expected to provide
|
||||
task taxonomy, thresholds, and baseline choice.
|
||||
|
||||
## Notes
|
||||
|
||||
The ledger intentionally stores only observation metadata in this slice. Callers
|
||||
that need prompt or response digests can place those in `tags`, for example
|
||||
`prompt_fingerprint`.
|
||||
Reference in New Issue
Block a user