infospace-bench

Author	SHA1	Message	Date
tegwick	3ca891de4a	fix: review findings from Lefevre live smoke Two small fixes informed by the 2026-05-18 live OpenRouter chapter-I run. 1. extract-entities templates (trading-literature and general-knowledge): the # Entity Title placeholder was interpreted by gpt-4o-mini as a literal heading prefix, so every entity came back as "# Entity Title: Bucket Shop" etc. The instruction now spells the placeholder out with concrete examples and an explicit "not the literal string" note, so smaller models hit the intended shape. 2. generate plan grows --model <id>. When supplied, the cost estimate pulls per-prompt and per-completion rates from the bundled model_rates.yaml instead of multiplying a single blended --cost-per-1k value across all tokens. The summary now also returns a separate estimated_completion_tokens field plus a cost_source tag ("rate_table:<model>" \| "cost_per_1k_blended" \| None). This is a stopgap. LLM-WP-0005 (proposed in llm-connect this round) will move the rate registry and token-shape problem classes upstream so consumers stop re-implementing them. The live smoke ran 28k prompt tokens / 7.5k completion / $0.0088 actual. With --model openai/gpt-4o-mini the plan estimate now lands at $0.0076 (within 14% of actual) versus the prior $8.40 estimate at --cost-per-1k 0.30. 181 tests pass, 2 skipped (both live OpenRouter smokes). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-19 04:30:33 +02:00
tegwick	b0d67ae79e	IB-WP-0020-T05: shadow-mode CLI flags; close IB-WP-0020 Add --shadow-baseline <id> and --shadow-rate <float> opt-in flags to generate run, generate resume, and generate from-source. When --shadow-baseline names a candidate id from the routing config, build_routing_policy_from_config wraps every other candidate in an llm-connect ShadowingAdapter using that baseline plus a PairedGrader(ExactMatchJudge()) and the workspace-resolved QualityLedger. The baseline candidate itself is never wrapped — that would shadow it against itself. --shadow-rate defaults to 0.1 when --shadow-baseline is set; passing --shadow-rate without --shadow-baseline fails fast with shadow_rate_without_baseline. Setting --shadow-baseline without a ledger_path in the config fails with missing_routing_ledger_for_shadow so observations have a place to land before any call goes out. run_generation grew shadow_baseline + shadow_rate kwargs and _adapter_for("routing", ...) plumbs them into build_routing_policy_from_config. The wrapped ShadowingAdapter slots into the policy's prefer/fallback per task type via a (candidate_id, task_type) reverse lookup, and adapters_by_id on the adaptive policy gets the string-keyed entries. Five new tests cover: shadow_rate without baseline fails fast, shadow mode without a ledger fails fast, unknown shadow baseline id fails fast, structural assertion that ShadowingAdapter wraps non-baseline candidates and leaves the baseline raw, and a behavioural check that shadow_rate=1.0 calls the baseline on every call while shadow_rate=0.0 skips entirely. Test forces async_shadow=False so the call counter is deterministic. Closes IB-WP-0020: T01-T05 all done. Workplan status flips from active to finished. 179 tests pass, 2 skipped (both live OpenRouter smokes). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 23:30:36 +02:00
tegwick	debd2b8e69	IB-WP-0020-T04: example routing config + live routing smoke examples/routing/trading-literature.yaml is the checked-in starting config for a Lefevre-style run. It applies the IB-WP-0018 task-type taxonomy: cheap candidates for summary + evaluation, smart candidates for entity + relation extraction, and a separate baseline rule wiring claude_code for a follow-on T05 ShadowingAdapter step. Workspace- relative ledger_path keeps adaptive observations with the workspace. tests/test_routing_config.py gains a regression test that asserts the shipped example parses cleanly, every stage in stage_to_task_type maps to a declared task type, and the baseline candidate uses the claude_code provider — so the example will not bit-rot silently. tests/test_openrouter_live.py gains test_provider_routing_one_chapter_live_smoke gated on the same INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER + OPENROUTER_API_KEY opt-in as the existing static smoke. It builds a one-candidate routing config, runs a single chapter through --provider routing, and asserts the per-stage adapter-choices report section names the routed model and the routed artifacts carry adapter_id provenance. docs/generic-source-generator.md gains a "Live runs with --provider routing" subsection that walks through the one-command routed run, explains the --quality-floor override, and points at the parallel live smoke test. 174 tests pass, 2 skipped (both live smokes, correctly gated). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 22:19:54 +02:00
tegwick	d3562454d7	IB-WP-0020-T03: routing CLI flags Add --provider routing, --routing-config <yaml>, and --quality-floor <float> to generate run, generate resume, and generate from-source. The CLI flag wiring constructs a RoutingAssistedGenerationAdapter from the parsed config, with the workspace handed in so any ledger_path in the config resolves relative to it. --quality-floor overrides the config-level default_quality_floor for a single invocation. run_generation gains routing_config + quality_floor kwargs and _adapter_for grew a "routing" branch. Missing --routing-config with --provider routing fails fast with InfospaceError("missing_routing_config"); missing API key for any candidate fails fast with InfospaceError("missing_routing_api_key"). Two small bug fixes surfaced while writing T03: - routing._identify_adapter now also reads ``_model`` from llm-connect adapters (their public attribute is private), so the per-stage adapter-choice line shows the model id rather than just the class name. - budget.TOKEN_EVENTS_PATH corrected from /state/token-events to the state-hub HTTP endpoint /token-events/ that actually exists; the failure-isolation in emit_token_event already kept the prior typo from breaking runs, but the hub never saw the events. Five new tests cover: _adapter_for refusal on missing config, _adapter_for happy path, run_generation end-to-end through routing with a stubbed OpenRouterAdapter.execute_prompt (no network), workspace-relative ledger resolution, and a CLI subprocess smoke asserting fast-fail on missing API key. 173 tests pass, 1 skipped. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 22:08:51 +02:00
tegwick	82468c2165	IB-WP-0020-T02: routing config loader build_routing_policy_from_config(config, *, workspace=None, env=None, adapter_factory=None) materialises a parsed RoutingConfig into a live llm-connect routing policy: - Static RoutingPolicy when the config has no adaptive signals; one RoutingRule per task type, prefer = first candidate, fallback = second candidate (when present), max_cost_per_1k pulled from the preferred candidate. - AdaptiveRoutingPolicy when default_quality_floor, any per-task quality_floor, or ledger_path is set. ledger_path resolves relative to the supplied workspace; parent directory is created so the ledger writes never fail on first call. - API-key resolution from env (default os.environ) using the per-provider DEFAULT_API_KEY_ENV map; candidate.api_key_env overrides the default. Missing key raises InfospaceError("missing_routing_api_key") before any provider constructor runs. - claude_code candidates need no API key (shells out to the local CLI). - adapter_factory hook lets tests inject a sentinel-returning factory so policy construction stays network- and llm-adapter-free. Eight new tests cover: static-policy default, adaptive selection via ledger_path, adaptive selection via quality_floor, multi-candidate fallback rule, real-factory smoke (OpenRouterAdapter constructed with env API key), missing-key fast-fail, claude_code zero-key path, and custom api_key_env override. 168 tests pass, 1 skipped. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 19:58:15 +02:00
tegwick	c11a942bb7	IB-WP-0020-T01: routing config schema and parser Add a small YAML routing config schema (schema_version 1) and a parser-only loader at src/infospace_bench/routing_config.py. The loader validates the declarative shape — task_types with candidates, optional per-task quality_floor, optional default_quality_floor, optional ledger_path, optional stage_to_task_type override map — and refuses bad shapes before any network or workspace work happens. Supported provider names: openrouter, claude_code, openai, gemini. Unknown providers, missing required candidate fields, out-of-range quality floors, negative max_cost_per_1k, duplicate candidate ids within a task type, and non-mapping stage_to_task_type all raise focused InfospaceError codes that callers can pattern-match. docs/routing-config.md documents the schema with two annotated examples (OpenRouter-only two-tier, and adaptive with a ClaudeCode baseline) plus the full "what fails fast" list. 16 parser tests cover happy-path round-trip, file load, missing file, malformed YAML, and every validation surface (wrong/missing schema version, empty task_types, empty candidates, missing required fields, unsupported provider, negative cost, out-of-range quality_floor, duplicate ids, non-mapping stage_map, non-string ledger_path). T02 will turn a RoutingConfig into a live llm-connect RoutingPolicy / AdaptiveRoutingPolicy with constructed LLMAdapter instances. 160 tests pass, 1 skipped. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 18:09:28 +02:00
tegwick	f818acfc62	IB-WP-0018-T03+T04: shadow sampling + report/CLI surfacing; close IB-WP-0018 T03 — wrap_with_shadow_sampling() helper in routing.py: builds a llm-connect ShadowingAdapter around any candidate LLMAdapter with a caller-supplied baseline, grader, and QualityLedger. async_shadow=True by default so production load is not doubled; on_shadow_error escape hatch keeps caller logs informed when a baseline outage swallows the shadow path. The returned adapter is still an LLMAdapter so it slots into a RoutingPolicy rule without further code change. T04 — generation report enrichment plus a small CLI helper: - _collect_adapter_choices walks artifact provenance, groups by (stage_id, adapter_id), and surfaces calls + prompt/completion tokens per (stage, adapter) pair in a new ## Per-stage adapter choices section. Runs that did not go through the bridge have no provider_metadata.adapter_id and emit an empty list, so fixture-only reports stay terse. - summarise_quality_ledger() rolls a llm-connect QualityLedger up by (task_type, adapter_id) with mean quality, mean cost, observations, and cumulative tokens. - infospace-bench routing ledger <path> CLI prints the rollup as JSON. Five new tests cover shadow happy-path, shadow failure isolation, ledger rollup, the routing CLI, and the report's adapter-choice aggregation. Closes IB-WP-0018: T01-T05 are all done and the workplan status flips from blocked to done now that LLM-WP-0004's primitives have shipped. 144 tests pass, 1 skipped (the OpenRouter live smoke, gated as before). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 11:52:05 +02:00
tegwick	0a83e908ce	IB-WP-0018-T01+T02+T05: routing bridge to llm-connect T01 — task-type taxonomy. docs/routing-task-types.md names the five generation stages as the default identity-mapped task types (summarize-source, extract-entities, extract-relations, evaluate-entity, synthesize-report) and records the recommended quality floors per stage. The taxonomy explicitly does not decide which adapter ships per task type, where the ledger lives, or what a quality score means — those stay with the caller per the LLM-WP-0004 scope guardrail. T02 — RoutingAssistedGenerationAdapter bridge in src/infospace_bench/routing.py. Wraps any llm-connect RoutingPolicy or AdaptiveRoutingPolicy as an infospace-bench AssistedGenerationAdapter: maps stage_id -> task_type (overridable), resolves an LLMAdapter, delegates execute_prompt with a configurable RunConfig, and surfaces the resolved adapter id, task type, model, usage, and finish_reason back on AssistedGenerationResult.metadata. Provider tag stays back-compatible with the strings already used in run records and the budget rollup (openrouter / claude_code / openai / gemini / mock / routing). T05 — eight tests in tests/test_routing_adapter.py cover: static-policy per-stage resolution, stage_to_task_type overrides, default-mapping completeness, fall-through for unmapped stage ids, the adaptive path selecting the cheaper qualifying adapter when a quality_floor is set, adaptive policy falling back to static when no floor is set, response metadata round-trip with provider tagging, and estimated_cost_per_1k pass-through. Adds llm-connect as a path dependency on pyproject.toml and to the pytest pythonpath. Static OpenRouter and fixture paths are unchanged; this commit only adds the option of routing. 139 tests pass, 1 skipped (the OpenRouter live smoke, gated as before). T03 (shadow-mode integration) and T04 (CLI + per-stage chosen-adapter in the generation report) follow next. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 11:33:58 +02:00
tegwick	1d62dffae9	IB-WP-0016-T07: review report and output policy; close IB-WP-0016 Enrich reports/generation-summary.md with the review-oriented sections that the 2026-05-17 smoke run flagged as missing: ## Chapter coverage (per-chapter source/entity/relation/anchor counts), ## Entities (the deduped title list), ## Unmapped source chunks (sources with no downstream generated artifact), and ## Page anchors (total plus deterministic sample). Sections are conditional on data being present so generic non-Lefevre runs stay terse. Add docs/lefevre-readiness.md as the final sign-off document for IB-WP-0016: what is wired (T01-T06 recap), an output policy table (checked-in fixture sources vs disposable generated infospaces vs archive targets), a seven-item reviewer checklist (duplicate entities, relation endpoints, weak evidence, overgeneralization, anchor coverage, unmapped sources, plan-vs-actual variance), a scale-up plan from one-chapter to full-book, and the load-bearing risks still outstanding (cross-chunk dedup, whole-run resume, adaptive routing deferred to LLM-WP-0004 / IB-WP-0018, rate-table drift). Closes IB-WP-0016 (Lefevre EPUB3 Infospace Readiness Pilot): T01-T07 all done; the workplan is set to status=done. 131 tests pass, 1 skipped (live OpenRouter smoke, correctly gated). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-18 01:22:41 +02:00
tegwick	ab23c5873e	IB-WP-0016-T06: OpenRouter live-run guardrails Add --chapter / --from-chapter / --to-chapter / --chunk selection flags to generate init and generate from-source, plumb them into init_generation_infospace via a new _filter_chunks_by_chapter helper, and refuse to create an infospace when the filters reject every chunk (InfospaceError "empty_chapter_selection"). The flags use the same T03/T02 plumbing (chapter labels, roman numerals, chunk ids) so a single-chapter selection is a one-flag command. OpenRouter run-record metadata (model, request_id, usage tokens, retry_count, duration_seconds) already lands in output/workflows/runs/*.yaml; this task just adds the smoke test that proves it stays there, plus the parallel guarantee that the same provider metadata reaches generated artifact provenance via provenance.provider_metadata. tests/test_openrouter_live.py covers: - chapter-filter, from/to-chapter range, and empty-selection failure on init (non-live, deterministic) - CLI smoke through generate from-source with --chapter - a pytest-skipped live OpenRouter one-chapter end-to-end gated by OPENROUTER_API_KEY + INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER, with INFOSPACE_BENCH_LIVE_MODEL override (default openai/gpt-4o-mini) docs/generic-source-generator.md gains a "Live OpenRouter runs (handle with care)" section that walks plan-before-run, single-chapter live run, the budget/usage artifacts, and the checks a reviewer should run before scaling to the full book. 129 tests pass, 1 skipped (the live smoke, correctly gated). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 23:04:19 +02:00
tegwick	348deca9f2	IB-WP-0016-T05: deterministic Lefevre acceptance fixture Check in a small Lefevre-shaped EPUB fixture as separate source files under tests/fixtures/lefevre/sources/ (container.xml, OPF, nav, cover, PG header, three roman-numeral chapters with page anchors, transcriber notes, license, PG footer). The test helper assembles these into an EPUB at test time so the inputs stay inspectable in git. Fixture responses tuned to the trading-literature profile (T04) live at tests/fixtures/lefevre/responses.yaml: trader / institution / strategy categories on entities, strategy_outcome / actor_venue relation types, and all four trading-tuned evaluation criteria. Three tests cover the acceptance: - end-to-end Python pipeline: stable chapter-NN source slugs, full artifact tree (entities, relations, evaluations, metrics, history, generation report), budget registry persisted, chapter_number provenance round-trips through artifacts/index.yaml - regression: PG boilerplate (cover, nav, header, notes, license, footer) is excluded by default and only appears under include_non_body=True - CLI smoke through generate from-source --profile trading-literature --fixture-responses ... 125 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 22:31:17 +02:00
tegwick	bb70b2f4b9	IB-WP-0019-T07: archive integration; close IB-WP-0019 The default archive include set already pulls output/ in wholesale, so output/budget/ already lands inside the archive package with no code change. Add a budget_summary block to ArchiveRecord.metadata so catalog-level tools can see plans_count, runs_count, total_tokens, total_cost_usd_known, total_cost_usd_estimated, and the latest_snapshot_id without unpacking the archive. An infospace with no budget data still archives cleanly with an empty metadata dict. Closes IB-WP-0019 (Budget and Usage Registry): T01-T07 all done. Three-layer design landed end-to-end — layer 1 (per-infospace plans.yaml / usage.yaml / summary.yaml) and layer 3 (state-hub record_token_event emission with failure isolation) live here; layer 2 (cross-application QualityLedger for adaptive routing) is parked in llm-connect LLM-WP-0004 and infospace-bench IB-WP-0018 awaits it. 122 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 21:53:28 +02:00
tegwick	816a95b3ef	IB-WP-0019-T06: workspace budget CLI infospace-bench budget list <workspace> walks <workspace>/infospaces/* and prints one row per infospace with slug, plans_count, runs_count, total_tokens, total_cost_usd_known, total_cost_usd_estimated, last_run_at, and latest_snapshot_id. infospace-bench budget show <root> dumps the full plans/usage/summary structure for a single infospace. Missing budget directories are treated as zero rows rather than errors, so the CLI is safe to run on partially-populated or fresh workspaces. 120 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 20:44:40 +02:00
tegwick	110c78b9ad	IB-WP-0019-T05: state-hub token-event emission with failure isolation Emit one record_token_event payload per completed generate run, derived from the just-recorded usage rollup. tokens_in/out come from the rollup, model defaults to the dominant model used (or "mixed" when buckets disagree), agent="infospace-bench", ref_type="session", and ref_id="<slug>/run-<run_index>". The note carries the infospace slug, workspace, snapshot_id, and any known/estimated cost so the hub event is self-describing. Failure isolation: any exception from the HTTP poster (hub down, timeout, 5xx) is caught, logged to stderr, and reported as status=failed; the generate run still completes. INFOSPACE_BENCH_HUB_URL overrides the default http://127.0.0.1:8000 base; INFOSPACE_BENCH_DISABLE_HUB_TOKEN_EVENTS skips emission entirely. Tests cover the happy path, the disable env var, poster failure, the no-usage skip, multi-model coalescing to "mixed", and an end-to-end run_generation against an unbindable hub port to prove the run survives when the hub is unreachable. 116 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 20:33:29 +02:00
tegwick	d4c9c56f5c	IB-WP-0019-T04: plan-vs-actual variance and surfacing After every generate run, compute variance between the executing plan snapshot and the just-recorded usage rollup, persist it to output/budget/summary.yaml (overwrite-on-run), and surface it both in the generate status JSON (new budget_summary field) and as a "Plan variance" line in reports/generation-summary.md. Variance fields: calls / prompt_tokens / total_tokens each carry {estimated, actual, delta, ratio}; cost_usd carries {estimated, actual_known, actual_estimated_from_rates, actual_total, delta, ratio}; per_workflow rolls the per-bucket usage up to the same workflow_id grain the plan reports. Runs whose snapshot_id cannot be resolved (no prior plan, or pruned from the retention window) still record a variance row with null comparison fields and snapshot_resolved=false, so the consumer always sees a current summary. Reordered run_generation so usage and variance are written before the generation report, allowing the report to embed the variance line on the same pass. 110 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 20:06:19 +02:00
tegwick	a4dde53fc3	IB-WP-0019-T03: rate-table cost computation Ship a starter model rate table at src/infospace_bench/model_rates.yaml (prompt_per_1k / completion_per_1k for the OpenRouter models we have actually touched: gpt-4o, gpt-4o-mini, gpt-4-turbo, claude 3.5 sonnet and haiku, claude 3 opus, gemini 1.5 flash/pro, llama 3.1 70b) and a load_rate_table() / estimate_cost_usd() pair that overlays an optional <workspace>/model-rates.yaml on top of the bundled defaults. generate run now passes a workspace-aware cost_resolver into record_run_usage, so cost_usd_estimated lands on every usage bucket whose model matches the table. Adapter-returned cost still wins (cost_status="known"); rate-table cost is reported under cost_status="estimated"; unmatched models are recorded as cost_status="unknown" rather than silently zeroed. Rate-table file is listed in pyproject.toml package-data so pip-installed users keep the defaults. 106 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 19:54:30 +02:00
tegwick	678508226a	IB-WP-0019-T02: usage rollup from run records Every completed generate run now aggregates per-call adapter usage from the workflow-engine run records into output/budget/usage.yaml. Per-call data is bucketed by (workflow_id, stage_id, provider, model) with running totals for calls, prompt_tokens, completion_tokens, total_tokens, and cost_usd_known (sum of adapter-reported cost when the provider returns it; usually zero today). A run-level entry captures run_index, started_at, completed_at, duration_seconds, the executing plan snapshot_id (resolved from the latest plans.yaml entry), and the workflow-level run_id / stage_count summaries. cost_usd_estimated is left as None for this task; T03 wires the rate-table resolver so the same bucket gets a model-priced fallback when the adapter does not return cost directly. Fixture-mode runs are recorded with provider='fixture', zero tokens, and cost_status='unknown' rather than silently skipped, so the rollup honestly reflects which stages actually ran. 102 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 19:46:40 +02:00
tegwick	182f7011bb	IB-WP-0019-T01: plan snapshot persistence Every generate plan invocation now appends its compact summary to output/budget/plans.yaml with a deterministic 12-char snapshot_id hashed over the selection filters and the estimated call/token/cost totals. Identical-fingerprint plans refresh the most recent entry's recorded_at instead of stacking duplicates. Retention defaults to the last 50 snapshots; older entries are pruned and counted on a top-level pruned_count field. The summary now echoes its input filters (chapter_filter, chunk_filter, from_chapter, to_chapter) so reviewers can read the snapshot without cross-referencing the CLI invocation. New module src/infospace_bench/budget.py owns layer 1 (per-infospace recording) of the IB-WP-0019 three-layer design; layer 2 still belongs in llm-connect LLM-WP-0004 and layer 3 in state-hub. 99 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 19:19:35 +02:00
tegwick	df87e212a2	IB-WP-0016-T04: trading-literature profile Ship a specialized profile for trading memoirs and market-structure texts. The profile names eight entity categories (trader, market, strategy, error, psychological_pattern, institution, instrument, evidence_bearing_claim), five relation types (cause_effect, lesson_evidence, risk_mitigation, actor_venue, strategy_outcome), and four evaluation criteria (groundedness, lesson_clarity, historical_context, overgeneralization_risk). Each is reflected in the prompts and contracts so the LLM is steered toward operator-level findings rather than biographical detail or moralising. The generic profile remains the default. A 2-chapter Lefevre smoke run with --profile trading-literature completes end-to-end with viable metrics; 93 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 18:59:45 +02:00
tegwick	13f9c1895c	IB-WP-0016-T03: scale-aware planning Replace generate plan's full-prompt dump with a compact summary that reports selected-chunk counts, selected chapter numbers, per-workflow call counts, prompt-word and token estimates, and a rough USD cost when --cost-per-1k is supplied. Selection filters --chapter (label or number, repeatable), --from-chapter / --to-chapter (numeric range), and --chunk (repeatable id) shape the estimate. Budget caps --max-calls and --cost-cap are reported as exceeds_* booleans so callers can fail fast before run. The old full per-workflow plan with prompts remains available behind --full so deep inspection is opt-in instead of the default. Whole-Lefevre estimate at default max_words=800: 146 chunks, 730 calls, ~518k prompt tokens, ~$155 at $0.30/1k. Chapters 3-5 only: 19 chunks, 95 calls, ~64k tokens. 87 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 18:18:09 +02:00
tegwick	b9173b6569	IB-WP-0016-T02: chapter-aware chunking and stable IDs Resolve chapter labels from EPUB nav entries (when present) and from the first in-document h1/h2/h3 heading, parse roman-numeral and "Chapter N" labels into numeric chapter indices, and generate stable IDs of the form chapter-NN with -part-NNN suffix when a chapter exceeds max_words. The chunker now operates on cleaned body text, distributes id="Page_*" page anchors per part via inline markers extracted before splitting, and supports a configurable overlap_words evidence window between adjacent parts of the same chapter. Reclassify body sections whose chapter label matches contents/transcriber-notes/license/colophon tokens so they leave the body stream by default. Strip <head>...</head> from HTML body extraction to stop the <title> tag from duplicating heading text in the chunk markdown. Real Lefevre EPUB now detects all 24 roman-numeral chapters with stable chapter-NN IDs, distributes Page_N anchors across multi-part chapters, and reclassifies Contents and Transcriber's Notes out of body (role histogram body=67, cover=1, header=1, toc=1, notes=1, footer=2). 82 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 15:52:47 +02:00
tegwick	5b6a63fb7a	IB-WP-0016-T01: spine-aware EPUB3 intake Parse META-INF/container.xml and the OPF package document, then iterate documents in spine reading order instead of archive-name sort. Classify each spine item (body, cover, nav, toc, header, footer, notes, license, auxiliary) and exclude non-body sections by default; include_non_body=True opts them back in for inspection. Capture OPF book metadata (title, creator, language, subjects, rights, identifier, source_url, modified) onto every chunk and propagate it through source artifact provenance. Preserve the legacy zip-without-OPF fallback for malformed EPUBs. Real Lefevre EPUB now yields 148 body chunks in spine order (was 155 mixed, archive-sorted) with cover=1, header=1, footer=4 detected and dropped. 78 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 13:52:24 +02:00
tegwick	37c28d2298	archive: include contracts/, schemas/; report skipped top-level dirs Two of yesterday's archives silently dropped infospace content: the default include set was missing contracts/, so wealth-vsm-generation-pilot (16 files) and wealth-vsm-legacy-slice (12 files) were preserved as 14 and 10 files respectively. Fix the include set and make silent drops visible. - DEFAULT_INCLUDE now: infospace.yaml, artifacts, contracts, schemas, workflows, output, reports, exports - ArchiveRecord gains skipped_top_level: top-level entries present in the live root that are not in the include set, not excluded, and not auto- hidden (hidden dotfiles, empty dirs, .store/index.yaml). Surfaces in index.yaml only when non-empty. - Re-archived the two affected pilots with correct counts. Prior records remain in each index.yaml as history. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 12:21:19 +02:00
tegwick	ddefd69f71	IB-WP-0014: archive-list, restore, retention annotation, docs (T03-T05) Round out IB-WP-0014 with the remaining archive operations and docs. - restore_archive() and `infospace-bench restore <pkg> --target <dir>` round-trip a finalized package's bytes back to disk. Refuses to overwrite a non-empty target unless --force. --from <infospace-root> resolves the store location. - archive-list CLI with --with-retention flag; annotate_retention() opens the per-infospace registry and joins each record with its current retention state (effective class, expires, holds, eligibility). - docs/archive-integration.md covers when to archive, the include set, retention classes, storage layout, credentials policy, and the explicit non-goal that S3/git backends live in artifact-store. - SCOPE.md cross-links the new doc. - Workplan flipped to status: done. Full pytest suite: 72 passed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 11:46:23 +02:00
tegwick	36bfa33fb9	IB-WP-0014: archive integration with artifact-store (T01+T02) Reframe IB-WP-0014 from "in-repo S3/git backend adapters" to "durable archive surface via artifact-store". The live infospace stays in a local working folder; finalized snapshots are bundled into content-addressed artifact-store packages. - New module infospace_bench.archive: archive_infospace(), list_archives(), ArchiveRecord. Self-bootstraps a SQLite + local-FS registry under output/archives/.store/ when no Registry is passed in. - New output/archives/index.yaml records each archive event (package id, manifest digest, retention class, included paths, file count, note). - artifactstore added as a path dep; Python floor bumped to 3.12 to match. - Makefile for venv-based dev setup; stack-and-commands.md updated. - tests/test_archive.py covers index write, list, recursive-capture guard, caller-supplied include, and empty-include error. Full suite 65 passed. Remaining tasks (T03 list CLI, T04 restore, T05 docs) tracked in the workplan. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 11:30:49 +02:00
tegwick	c3b62a6ec3	Agentic memory profile	2026-05-15 16:01:35 +02:00
tegwick	46aad3cce8	generic source-to-infospace generator	2026-05-14 19:33:22 +02:00
tegwick	a729a7643e	infospace pipeline for wealth of nations example	2026-05-14 18:04:38 +02:00
tegwick	3de72eb0d2	command parity and migration guide	2026-05-14 17:16:39 +02:00
tegwick	5d53c33d3e	Kontextual Engine Integration Boundary	2026-05-14 16:43:29 +02:00
tegwick	fc70acb257	engine and lifecycle	2026-05-14 16:26:42 +02:00
tegwick	55405d8a5a	acceptance matrix and workflow generation	2026-05-14 16:01:32 +02:00
tegwick	7f54dec585	eval history and metrics	2026-05-14 15:35:04 +02:00
tegwick	9627d03c1a	entity relationship model	2026-05-14 15:06:17 +02:00
tegwick	6eb3c6a0fb	markitect-tool integration	2026-05-14 14:53:16 +02:00
tegwick	916a895a85	Initial implementation	2026-05-14 11:32:25 +02:00

36 Commits