# Task-Type Taxonomy for Routing Workplan: IB-WP-0018 (T01) Depends on: llm-connect LLM-WP-0004 (RoutingPolicy, AdaptiveRoutingPolicy) This file names the task types that infospace-bench emits when it routes each generation stage through llm-connect. The names are the consumer side of LLM-WP-0004's scope guardrail: llm-connect ships the routing primitives, infospace-bench owns the taxonomy. ## Default identity mapping `RoutingAssistedGenerationAdapter` (see `src/infospace_bench/routing.py`) maps stage ids to task types using the identity mapping below by default. Callers override individual entries via `RoutingAssistedGenerationAdapter(..., stage_to_task_type={...})`. | Stage id | Task type | Notes | |---|---|---| | `summarize-source` | `summarize-source` | One call per source chunk. Cheap, high-volume; small models usually clear the bar. | | `extract-entities` | `extract-entities` | One call per source chunk. Quality matters most here — bad extractions cascade. | | `extract-relations` | `extract-relations` | One call per source chunk. Quality close to extraction; relations rely on entity titles being stable. | | `evaluate-entity` | `evaluate-entity` | One call per generated entity. Cheap, often a different model than extraction to avoid self-grading. | | `synthesize-report` | `synthesize-report` | One call at the end. Volume-of-one; quality matters; cost negligible. | ## Quality expectations `AdaptiveRoutingPolicy.resolve(task_type, quality_floor=...)` picks the cheapest adapter whose ledger-observed mean quality clears the floor. The recommended starting floors: | Task type | Quality floor | Rationale | |---|---|---| | `summarize-source` | 0.70 | Summaries are intermediate. Slight quality loss is recoverable downstream. | | `extract-entities` | 0.85 | Entities are the durable output. Be strict. | | `extract-relations` | 0.80 | Relations depend on entities; slightly looser is OK as long as evidence is intact. | | `evaluate-entity` | 0.80 | Judge-level reliability. Self-grading bias is more of a concern than absolute score. | | `synthesize-report` | 0.70 | The report is a review surface; tolerate looser language for cheaper models. | These are starting points. Bind them at the calling site (`RoutingAssistedGenerationAdapter(..., quality_floor=0.85)` for extraction stages) — they are not enforced by this taxonomy. ## Common overrides Callers may want to **collapse** task types to share observations across related stages, or **split** a task type to pin a specific model to a narrow workload. Two illustrative overrides: ```python # Collapse extraction stages so a single ledger drives both stage_to_task_type = { "extract-entities": "extraction", "extract-relations": "extraction", } ``` ```python # Split entity evaluation by category — useful when a profile has very # different quality bars for different entity categories (e.g. # trading-literature's `evidence_bearing_claim` is harder to judge than # `instrument`). stage_to_task_type = { "evaluate-entity": "judge", } ``` Anything not in the override map falls through to the identity mapping. ## What this taxonomy does NOT decide - **Which adapter ships per task type.** That belongs to the caller's `RoutingPolicy` rule list. - **Where the quality ledger lives.** Caller-supplied path on the `AdaptiveRoutingPolicy`. - **When to refresh observations.** Caller decides via the ledger's TTL helpers in llm-connect. - **What a quality score means.** Each judge defines its own.