infospace-bench

Author	SHA1	Message	Date
tegwick	816a95b3ef	IB-WP-0019-T06: workspace budget CLI infospace-bench budget list <workspace> walks <workspace>/infospaces/* and prints one row per infospace with slug, plans_count, runs_count, total_tokens, total_cost_usd_known, total_cost_usd_estimated, last_run_at, and latest_snapshot_id. infospace-bench budget show <root> dumps the full plans/usage/summary structure for a single infospace. Missing budget directories are treated as zero rows rather than errors, so the CLI is safe to run on partially-populated or fresh workspaces. 120 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 20:44:40 +02:00
tegwick	110c78b9ad	IB-WP-0019-T05: state-hub token-event emission with failure isolation Emit one record_token_event payload per completed generate run, derived from the just-recorded usage rollup. tokens_in/out come from the rollup, model defaults to the dominant model used (or "mixed" when buckets disagree), agent="infospace-bench", ref_type="session", and ref_id="<slug>/run-<run_index>". The note carries the infospace slug, workspace, snapshot_id, and any known/estimated cost so the hub event is self-describing. Failure isolation: any exception from the HTTP poster (hub down, timeout, 5xx) is caught, logged to stderr, and reported as status=failed; the generate run still completes. INFOSPACE_BENCH_HUB_URL overrides the default http://127.0.0.1:8000 base; INFOSPACE_BENCH_DISABLE_HUB_TOKEN_EVENTS skips emission entirely. Tests cover the happy path, the disable env var, poster failure, the no-usage skip, multi-model coalescing to "mixed", and an end-to-end run_generation against an unbindable hub port to prove the run survives when the hub is unreachable. 116 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 20:33:29 +02:00
tegwick	d4c9c56f5c	IB-WP-0019-T04: plan-vs-actual variance and surfacing After every generate run, compute variance between the executing plan snapshot and the just-recorded usage rollup, persist it to output/budget/summary.yaml (overwrite-on-run), and surface it both in the generate status JSON (new budget_summary field) and as a "Plan variance" line in reports/generation-summary.md. Variance fields: calls / prompt_tokens / total_tokens each carry {estimated, actual, delta, ratio}; cost_usd carries {estimated, actual_known, actual_estimated_from_rates, actual_total, delta, ratio}; per_workflow rolls the per-bucket usage up to the same workflow_id grain the plan reports. Runs whose snapshot_id cannot be resolved (no prior plan, or pruned from the retention window) still record a variance row with null comparison fields and snapshot_resolved=false, so the consumer always sees a current summary. Reordered run_generation so usage and variance are written before the generation report, allowing the report to embed the variance line on the same pass. 110 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 20:06:19 +02:00
tegwick	a4dde53fc3	IB-WP-0019-T03: rate-table cost computation Ship a starter model rate table at src/infospace_bench/model_rates.yaml (prompt_per_1k / completion_per_1k for the OpenRouter models we have actually touched: gpt-4o, gpt-4o-mini, gpt-4-turbo, claude 3.5 sonnet and haiku, claude 3 opus, gemini 1.5 flash/pro, llama 3.1 70b) and a load_rate_table() / estimate_cost_usd() pair that overlays an optional <workspace>/model-rates.yaml on top of the bundled defaults. generate run now passes a workspace-aware cost_resolver into record_run_usage, so cost_usd_estimated lands on every usage bucket whose model matches the table. Adapter-returned cost still wins (cost_status="known"); rate-table cost is reported under cost_status="estimated"; unmatched models are recorded as cost_status="unknown" rather than silently zeroed. Rate-table file is listed in pyproject.toml package-data so pip-installed users keep the defaults. 106 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 19:54:30 +02:00
tegwick	678508226a	IB-WP-0019-T02: usage rollup from run records Every completed generate run now aggregates per-call adapter usage from the workflow-engine run records into output/budget/usage.yaml. Per-call data is bucketed by (workflow_id, stage_id, provider, model) with running totals for calls, prompt_tokens, completion_tokens, total_tokens, and cost_usd_known (sum of adapter-reported cost when the provider returns it; usually zero today). A run-level entry captures run_index, started_at, completed_at, duration_seconds, the executing plan snapshot_id (resolved from the latest plans.yaml entry), and the workflow-level run_id / stage_count summaries. cost_usd_estimated is left as None for this task; T03 wires the rate-table resolver so the same bucket gets a model-priced fallback when the adapter does not return cost directly. Fixture-mode runs are recorded with provider='fixture', zero tokens, and cost_status='unknown' rather than silently skipped, so the rollup honestly reflects which stages actually ran. 102 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 19:46:40 +02:00
tegwick	182f7011bb	IB-WP-0019-T01: plan snapshot persistence Every generate plan invocation now appends its compact summary to output/budget/plans.yaml with a deterministic 12-char snapshot_id hashed over the selection filters and the estimated call/token/cost totals. Identical-fingerprint plans refresh the most recent entry's recorded_at instead of stacking duplicates. Retention defaults to the last 50 snapshots; older entries are pruned and counted on a top-level pruned_count field. The summary now echoes its input filters (chapter_filter, chunk_filter, from_chapter, to_chapter) so reviewers can read the snapshot without cross-referencing the CLI invocation. New module src/infospace_bench/budget.py owns layer 1 (per-infospace recording) of the IB-WP-0019 three-layer design; layer 2 still belongs in llm-connect LLM-WP-0004 and layer 3 in state-hub. 99 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 19:19:35 +02:00
tegwick	df87e212a2	IB-WP-0016-T04: trading-literature profile Ship a specialized profile for trading memoirs and market-structure texts. The profile names eight entity categories (trader, market, strategy, error, psychological_pattern, institution, instrument, evidence_bearing_claim), five relation types (cause_effect, lesson_evidence, risk_mitigation, actor_venue, strategy_outcome), and four evaluation criteria (groundedness, lesson_clarity, historical_context, overgeneralization_risk). Each is reflected in the prompts and contracts so the LLM is steered toward operator-level findings rather than biographical detail or moralising. The generic profile remains the default. A 2-chapter Lefevre smoke run with --profile trading-literature completes end-to-end with viable metrics; 93 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 18:59:45 +02:00
tegwick	13f9c1895c	IB-WP-0016-T03: scale-aware planning Replace generate plan's full-prompt dump with a compact summary that reports selected-chunk counts, selected chapter numbers, per-workflow call counts, prompt-word and token estimates, and a rough USD cost when --cost-per-1k is supplied. Selection filters --chapter (label or number, repeatable), --from-chapter / --to-chapter (numeric range), and --chunk (repeatable id) shape the estimate. Budget caps --max-calls and --cost-cap are reported as exceeds_* booleans so callers can fail fast before run. The old full per-workflow plan with prompts remains available behind --full so deep inspection is opt-in instead of the default. Whole-Lefevre estimate at default max_words=800: 146 chunks, 730 calls, ~518k prompt tokens, ~$155 at $0.30/1k. Chapters 3-5 only: 19 chunks, 95 calls, ~64k tokens. 87 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 18:18:09 +02:00
tegwick	b9173b6569	IB-WP-0016-T02: chapter-aware chunking and stable IDs Resolve chapter labels from EPUB nav entries (when present) and from the first in-document h1/h2/h3 heading, parse roman-numeral and "Chapter N" labels into numeric chapter indices, and generate stable IDs of the form chapter-NN with -part-NNN suffix when a chapter exceeds max_words. The chunker now operates on cleaned body text, distributes id="Page_*" page anchors per part via inline markers extracted before splitting, and supports a configurable overlap_words evidence window between adjacent parts of the same chapter. Reclassify body sections whose chapter label matches contents/transcriber-notes/license/colophon tokens so they leave the body stream by default. Strip <head>...</head> from HTML body extraction to stop the <title> tag from duplicating heading text in the chunk markdown. Real Lefevre EPUB now detects all 24 roman-numeral chapters with stable chapter-NN IDs, distributes Page_N anchors across multi-part chapters, and reclassifies Contents and Transcriber's Notes out of body (role histogram body=67, cover=1, header=1, toc=1, notes=1, footer=2). 82 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 15:52:47 +02:00
tegwick	5b6a63fb7a	IB-WP-0016-T01: spine-aware EPUB3 intake Parse META-INF/container.xml and the OPF package document, then iterate documents in spine reading order instead of archive-name sort. Classify each spine item (body, cover, nav, toc, header, footer, notes, license, auxiliary) and exclude non-body sections by default; include_non_body=True opts them back in for inspection. Capture OPF book metadata (title, creator, language, subjects, rights, identifier, source_url, modified) onto every chunk and propagate it through source artifact provenance. Preserve the legacy zip-without-OPF fallback for malformed EPUBs. Real Lefevre EPUB now yields 148 body chunks in spine order (was 155 mixed, archive-sorted) with cover=1, header=1, footer=4 detected and dropped. 78 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 13:52:24 +02:00
tegwick	37c28d2298	archive: include contracts/, schemas/; report skipped top-level dirs Two of yesterday's archives silently dropped infospace content: the default include set was missing contracts/, so wealth-vsm-generation-pilot (16 files) and wealth-vsm-legacy-slice (12 files) were preserved as 14 and 10 files respectively. Fix the include set and make silent drops visible. - DEFAULT_INCLUDE now: infospace.yaml, artifacts, contracts, schemas, workflows, output, reports, exports - ArchiveRecord gains skipped_top_level: top-level entries present in the live root that are not in the include set, not excluded, and not auto- hidden (hidden dotfiles, empty dirs, .store/index.yaml). Surfaces in index.yaml only when non-empty. - Re-archived the two affected pilots with correct counts. Prior records remain in each index.yaml as history. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 12:21:19 +02:00
tegwick	ddefd69f71	IB-WP-0014: archive-list, restore, retention annotation, docs (T03-T05) Round out IB-WP-0014 with the remaining archive operations and docs. - restore_archive() and `infospace-bench restore <pkg> --target <dir>` round-trip a finalized package's bytes back to disk. Refuses to overwrite a non-empty target unless --force. --from <infospace-root> resolves the store location. - archive-list CLI with --with-retention flag; annotate_retention() opens the per-infospace registry and joins each record with its current retention state (effective class, expires, holds, eligibility). - docs/archive-integration.md covers when to archive, the include set, retention classes, storage layout, credentials policy, and the explicit non-goal that S3/git backends live in artifact-store. - SCOPE.md cross-links the new doc. - Workplan flipped to status: done. Full pytest suite: 72 passed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 11:46:23 +02:00
tegwick	36bfa33fb9	IB-WP-0014: archive integration with artifact-store (T01+T02) Reframe IB-WP-0014 from "in-repo S3/git backend adapters" to "durable archive surface via artifact-store". The live infospace stays in a local working folder; finalized snapshots are bundled into content-addressed artifact-store packages. - New module infospace_bench.archive: archive_infospace(), list_archives(), ArchiveRecord. Self-bootstraps a SQLite + local-FS registry under output/archives/.store/ when no Registry is passed in. - New output/archives/index.yaml records each archive event (package id, manifest digest, retention class, included paths, file count, note). - artifactstore added as a path dep; Python floor bumped to 3.12 to match. - Makefile for venv-based dev setup; stack-and-commands.md updated. - tests/test_archive.py covers index write, list, recursive-capture guard, caller-supplied include, and empty-include error. Full suite 65 passed. Remaining tasks (T03 list CLI, T04 restore, T05 docs) tracked in the workplan. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 11:30:49 +02:00
tegwick	c3b62a6ec3	Agentic memory profile	2026-05-15 16:01:35 +02:00
tegwick	46aad3cce8	generic source-to-infospace generator	2026-05-14 19:33:22 +02:00
tegwick	a729a7643e	infospace pipeline for wealth of nations example	2026-05-14 18:04:38 +02:00
tegwick	3de72eb0d2	command parity and migration guide	2026-05-14 17:16:39 +02:00
tegwick	5d53c33d3e	Kontextual Engine Integration Boundary	2026-05-14 16:43:29 +02:00
tegwick	fc70acb257	engine and lifecycle	2026-05-14 16:26:42 +02:00
tegwick	55405d8a5a	acceptance matrix and workflow generation	2026-05-14 16:01:32 +02:00
tegwick	7f54dec585	eval history and metrics	2026-05-14 15:35:04 +02:00
tegwick	9627d03c1a	entity relationship model	2026-05-14 15:06:17 +02:00
tegwick	6eb3c6a0fb	markitect-tool integration	2026-05-14 14:53:16 +02:00
tegwick	916a895a85	Initial implementation	2026-05-14 11:32:25 +02:00

24 Commits