Implement-LLM-WP-0005-cost-model-estimators

2026-05-19 05:02:20 +02:00
parent 0054afe689
commit c11c6afa3f
16 changed files with 1525 additions and 10 deletions
--- a/contracts/functional/costs.md
+++ b/contracts/functional/costs.md
@@ -0,0 +1,25 @@
+# Cost Estimates
+
+`llm_connect.costs` converts token estimates or observed token counts into
+USD estimates using `ModelRateRegistry`.
+
+## Contract
+
+```python
+from llm_connect import estimate_cost
+
+estimate = estimate_cost("openai/gpt-4o-mini", 28_000, 7_500)
+```
+
+For known models the result is:
+
+- `cost_usd`: prompt plus completion estimate.
+- `prompt_cost_usd`: prompt-token component.
+- `completion_cost_usd`: completion-token component.
+- `cost_source`: `rate_table:<model_id>`.
+
+Unknown models return `CostEstimate(cost_usd=None, cost_source="unknown")`.
+Missing rates are never silently treated as zero cost.
+
+The module also exposes `CostModel(registry=...)` for callers that prefer to
+carry a registry object and call `model.estimate_cost(...)`.
--- a/contracts/functional/problem-classes.md
+++ b/contracts/functional/problem-classes.md
@@ -0,0 +1,46 @@
+# Problem Classes
+
+`llm_connect.problem_classes` provides generic token estimators for recurring
+LLM workflow shapes.
+
+## Contract
+
+Every problem class exposes:
+
+- `name`: stable registry key.
+- `base_dimensions`: required dimension names supplied by consumers.
+- `tunable_params`: parameters that can be overridden or fitted.
+- `estimate(dimensions, params=None) -> TokenEstimate`.
+- `fit(observations, min_observations=3) -> ProblemClass`.
+
+`TokenEstimate` contains `prompt_tokens`, `completion_tokens`, and a
+`confidence` score from `0` to `1`.
+
+## Built-Ins
+
+| Name | Dimensions | Tunable params |
+|---|---|---|
+| `chunk-summarization` | `chunk_words`, `template_words` | `completion_ratio` |
+| `entity-extraction` | `chunk_words`, `template_words`, `expected_entities` | `tokens_per_entity` |
+| `relation-extraction` | `chunk_words`, `template_words`, `expected_relations` | `tokens_per_relation` |
+| `judge-eval` | `artifact_words`, `template_words`, `n_criteria` | `tokens_per_criterion` |
+| `report-synthesis` | `n_chunks`, `n_entities`, `n_relations`, `template_words` | `base_completion_tokens` |
+
+## Observations
+
+`fit()` accepts either `Observation` objects or `QualityObservation` rows whose
+`tags` include:
+
+```python
+{
+    "problem_class": "entity-extraction",
+    "dimensions": {
+        "chunk_words": 900,
+        "template_words": 200,
+        "expected_entities": 4,
+    },
+}
+```
+
+When fewer than `min_observations` usable rows are present, fitting falls back
+to the current parameters.
--- a/contracts/functional/rates.md
+++ b/contracts/functional/rates.md
@@ -0,0 +1,30 @@
+# Model Rate Registry
+
+`llm_connect.rates` owns static model list prices used for planning and
+post-hoc estimates.
+
+## Contract
+
+- `ModelRate` records `model_id`, prompt and completion rates in USD per
+  1,000 tokens, `currency`, `source_url`, and `captured_at`.
+- `ModelRateRegistry.default()` returns the bundled OpenRouter snapshot
+  captured on `2026-05-17`.
+- `ModelRateRegistry.from_yaml(path)` accepts the package/consumer override
+  shape:
+
+```yaml
+schema_version: 1
+currency: USD
+source_url: https://openrouter.ai/models
+captured_at: "2026-05-17"
+rates:
+  openai/gpt-4o-mini:
+    prompt_per_1k: 0.00015
+    completion_per_1k: 0.00060
+```
+
+- `merged_with(override)` returns a new registry where matching override
+  entries replace default entries by `model_id`.
+
+Rates are a static snapshot. Consumers decide whether `captured_at` is fresh
+enough for their workflow.