36 Commits

Author SHA1 Message Date
3ca891de4a fix: review findings from Lefevre live smoke
Two small fixes informed by the 2026-05-18 live OpenRouter chapter-I run.

1. extract-entities templates (trading-literature and general-knowledge):
   the # Entity Title placeholder was interpreted by gpt-4o-mini as a
   literal heading prefix, so every entity came back as "# Entity Title:
   Bucket Shop" etc. The instruction now spells the placeholder out
   with concrete examples and an explicit "not the literal string"
   note, so smaller models hit the intended shape.

2. generate plan grows --model <id>. When supplied, the cost estimate
   pulls per-prompt and per-completion rates from the bundled
   model_rates.yaml instead of multiplying a single blended
   --cost-per-1k value across all tokens. The summary now also returns
   a separate estimated_completion_tokens field plus a cost_source tag
   ("rate_table:<model>" | "cost_per_1k_blended" | None).

This is a stopgap. LLM-WP-0005 (proposed in llm-connect this round)
will move the rate registry and token-shape problem classes upstream
so consumers stop re-implementing them.

The live smoke ran 28k prompt tokens / 7.5k completion / $0.0088
actual. With --model openai/gpt-4o-mini the plan estimate now lands at
$0.0076 (within 14% of actual) versus the prior $8.40 estimate at
--cost-per-1k 0.30.

181 tests pass, 2 skipped (both live OpenRouter smokes).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-19 04:30:33 +02:00
b0d67ae79e IB-WP-0020-T05: shadow-mode CLI flags; close IB-WP-0020
Add --shadow-baseline <id> and --shadow-rate <float> opt-in flags to
generate run, generate resume, and generate from-source. When
--shadow-baseline names a candidate id from the routing config,
build_routing_policy_from_config wraps every other candidate in an
llm-connect ShadowingAdapter using that baseline plus a
PairedGrader(ExactMatchJudge()) and the workspace-resolved
QualityLedger. The baseline candidate itself is never wrapped — that
would shadow it against itself. --shadow-rate defaults to 0.1 when
--shadow-baseline is set; passing --shadow-rate without
--shadow-baseline fails fast with shadow_rate_without_baseline.
Setting --shadow-baseline without a ledger_path in the config fails
with missing_routing_ledger_for_shadow so observations have a place to
land before any call goes out.

run_generation grew shadow_baseline + shadow_rate kwargs and
_adapter_for("routing", ...) plumbs them into
build_routing_policy_from_config. The wrapped ShadowingAdapter slots
into the policy's prefer/fallback per task type via a
(candidate_id, task_type) reverse lookup, and adapters_by_id on the
adaptive policy gets the string-keyed entries.

Five new tests cover: shadow_rate without baseline fails fast, shadow
mode without a ledger fails fast, unknown shadow baseline id fails
fast, structural assertion that ShadowingAdapter wraps non-baseline
candidates and leaves the baseline raw, and a behavioural check that
shadow_rate=1.0 calls the baseline on every call while shadow_rate=0.0
skips entirely. Test forces async_shadow=False so the call counter is
deterministic.

Closes IB-WP-0020: T01-T05 all done. Workplan status flips from active
to finished. 179 tests pass, 2 skipped (both live OpenRouter smokes).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 23:30:36 +02:00
debd2b8e69 IB-WP-0020-T04: example routing config + live routing smoke
examples/routing/trading-literature.yaml is the checked-in starting
config for a Lefevre-style run. It applies the IB-WP-0018 task-type
taxonomy: cheap candidates for summary + evaluation, smart candidates
for entity + relation extraction, and a separate baseline rule wiring
claude_code for a follow-on T05 ShadowingAdapter step. Workspace-
relative ledger_path keeps adaptive observations with the workspace.

tests/test_routing_config.py gains a regression test that asserts the
shipped example parses cleanly, every stage in stage_to_task_type maps
to a declared task type, and the baseline candidate uses the
claude_code provider — so the example will not bit-rot silently.

tests/test_openrouter_live.py gains test_provider_routing_one_chapter_live_smoke
gated on the same INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER + OPENROUTER_API_KEY
opt-in as the existing static smoke. It builds a one-candidate routing
config, runs a single chapter through --provider routing, and asserts
the per-stage adapter-choices report section names the routed model
and the routed artifacts carry adapter_id provenance.

docs/generic-source-generator.md gains a "Live runs with --provider
routing" subsection that walks through the one-command routed run,
explains the --quality-floor override, and points at the parallel
live smoke test.

174 tests pass, 2 skipped (both live smokes, correctly gated).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 22:19:54 +02:00
d3562454d7 IB-WP-0020-T03: routing CLI flags
Add --provider routing, --routing-config <yaml>, and --quality-floor
<float> to generate run, generate resume, and generate from-source.
The CLI flag wiring constructs a RoutingAssistedGenerationAdapter from
the parsed config, with the workspace handed in so any ledger_path in
the config resolves relative to it. --quality-floor overrides the
config-level default_quality_floor for a single invocation.

run_generation gains routing_config + quality_floor kwargs and
_adapter_for grew a "routing" branch. Missing --routing-config with
--provider routing fails fast with InfospaceError("missing_routing_config");
missing API key for any candidate fails fast with
InfospaceError("missing_routing_api_key").

Two small bug fixes surfaced while writing T03:

- routing._identify_adapter now also reads ``_model`` from llm-connect
  adapters (their public attribute is private), so the per-stage
  adapter-choice line shows the model id rather than just the class
  name.
- budget.TOKEN_EVENTS_PATH corrected from /state/token-events to the
  state-hub HTTP endpoint /token-events/ that actually exists; the
  failure-isolation in emit_token_event already kept the prior typo
  from breaking runs, but the hub never saw the events.

Five new tests cover: _adapter_for refusal on missing config,
_adapter_for happy path, run_generation end-to-end through routing
with a stubbed OpenRouterAdapter.execute_prompt (no network),
workspace-relative ledger resolution, and a CLI subprocess smoke
asserting fast-fail on missing API key.

173 tests pass, 1 skipped.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 22:08:51 +02:00
82468c2165 IB-WP-0020-T02: routing config loader
build_routing_policy_from_config(config, *, workspace=None, env=None,
adapter_factory=None) materialises a parsed RoutingConfig into a live
llm-connect routing policy:

- Static RoutingPolicy when the config has no adaptive signals; one
  RoutingRule per task type, prefer = first candidate, fallback =
  second candidate (when present), max_cost_per_1k pulled from the
  preferred candidate.
- AdaptiveRoutingPolicy when default_quality_floor, any per-task
  quality_floor, or ledger_path is set. ledger_path resolves relative
  to the supplied workspace; parent directory is created so the
  ledger writes never fail on first call.
- API-key resolution from env (default os.environ) using the
  per-provider DEFAULT_API_KEY_ENV map; candidate.api_key_env overrides
  the default. Missing key raises InfospaceError("missing_routing_api_key")
  before any provider constructor runs.
- claude_code candidates need no API key (shells out to the local CLI).
- adapter_factory hook lets tests inject a sentinel-returning factory
  so policy construction stays network- and llm-adapter-free.

Eight new tests cover: static-policy default, adaptive selection via
ledger_path, adaptive selection via quality_floor, multi-candidate
fallback rule, real-factory smoke (OpenRouterAdapter constructed with
env API key), missing-key fast-fail, claude_code zero-key path, and
custom api_key_env override.

168 tests pass, 1 skipped.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 19:58:15 +02:00
c11a942bb7 IB-WP-0020-T01: routing config schema and parser
Add a small YAML routing config schema (schema_version 1) and a
parser-only loader at src/infospace_bench/routing_config.py. The
loader validates the declarative shape — task_types with candidates,
optional per-task quality_floor, optional default_quality_floor,
optional ledger_path, optional stage_to_task_type override map — and
refuses bad shapes before any network or workspace work happens.

Supported provider names: openrouter, claude_code, openai, gemini.
Unknown providers, missing required candidate fields, out-of-range
quality floors, negative max_cost_per_1k, duplicate candidate ids
within a task type, and non-mapping stage_to_task_type all raise
focused InfospaceError codes that callers can pattern-match.

docs/routing-config.md documents the schema with two annotated
examples (OpenRouter-only two-tier, and adaptive with a ClaudeCode
baseline) plus the full "what fails fast" list.

16 parser tests cover happy-path round-trip, file load, missing file,
malformed YAML, and every validation surface (wrong/missing schema
version, empty task_types, empty candidates, missing required fields,
unsupported provider, negative cost, out-of-range quality_floor,
duplicate ids, non-mapping stage_map, non-string ledger_path).

T02 will turn a RoutingConfig into a live llm-connect RoutingPolicy /
AdaptiveRoutingPolicy with constructed LLMAdapter instances.

160 tests pass, 1 skipped.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 18:09:28 +02:00
f818acfc62 IB-WP-0018-T03+T04: shadow sampling + report/CLI surfacing; close IB-WP-0018
T03 — wrap_with_shadow_sampling() helper in routing.py: builds a
llm-connect ShadowingAdapter around any candidate LLMAdapter with a
caller-supplied baseline, grader, and QualityLedger. async_shadow=True
by default so production load is not doubled; on_shadow_error escape
hatch keeps caller logs informed when a baseline outage swallows the
shadow path. The returned adapter is still an LLMAdapter so it slots
into a RoutingPolicy rule without further code change.

T04 — generation report enrichment plus a small CLI helper:

- _collect_adapter_choices walks artifact provenance, groups by
  (stage_id, adapter_id), and surfaces calls + prompt/completion tokens
  per (stage, adapter) pair in a new ## Per-stage adapter choices
  section. Runs that did not go through the bridge have no
  provider_metadata.adapter_id and emit an empty list, so fixture-only
  reports stay terse.
- summarise_quality_ledger() rolls a llm-connect QualityLedger up by
  (task_type, adapter_id) with mean quality, mean cost, observations,
  and cumulative tokens.
- infospace-bench routing ledger <path> CLI prints the rollup as JSON.

Five new tests cover shadow happy-path, shadow failure isolation,
ledger rollup, the routing CLI, and the report's adapter-choice
aggregation. Closes IB-WP-0018: T01-T05 are all done and the workplan
status flips from blocked to done now that LLM-WP-0004's primitives
have shipped.

144 tests pass, 1 skipped (the OpenRouter live smoke, gated as before).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 11:52:05 +02:00
0a83e908ce IB-WP-0018-T01+T02+T05: routing bridge to llm-connect
T01 — task-type taxonomy. docs/routing-task-types.md names the five
generation stages as the default identity-mapped task types
(summarize-source, extract-entities, extract-relations,
evaluate-entity, synthesize-report) and records the recommended quality
floors per stage. The taxonomy explicitly does not decide which adapter
ships per task type, where the ledger lives, or what a quality score
means — those stay with the caller per the LLM-WP-0004 scope guardrail.

T02 — RoutingAssistedGenerationAdapter bridge in
src/infospace_bench/routing.py. Wraps any llm-connect RoutingPolicy or
AdaptiveRoutingPolicy as an infospace-bench AssistedGenerationAdapter:
maps stage_id -> task_type (overridable), resolves an LLMAdapter,
delegates execute_prompt with a configurable RunConfig, and surfaces
the resolved adapter id, task type, model, usage, and finish_reason
back on AssistedGenerationResult.metadata. Provider tag stays
back-compatible with the strings already used in run records and the
budget rollup (openrouter / claude_code / openai / gemini / mock /
routing).

T05 — eight tests in tests/test_routing_adapter.py cover: static-policy
per-stage resolution, stage_to_task_type overrides, default-mapping
completeness, fall-through for unmapped stage ids, the adaptive path
selecting the cheaper qualifying adapter when a quality_floor is set,
adaptive policy falling back to static when no floor is set, response
metadata round-trip with provider tagging, and estimated_cost_per_1k
pass-through.

Adds llm-connect as a path dependency on pyproject.toml and to the
pytest pythonpath. Static OpenRouter and fixture paths are unchanged;
this commit only adds the option of routing.

139 tests pass, 1 skipped (the OpenRouter live smoke, gated as before).

T03 (shadow-mode integration) and T04 (CLI + per-stage chosen-adapter
in the generation report) follow next.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 11:33:58 +02:00
1d62dffae9 IB-WP-0016-T07: review report and output policy; close IB-WP-0016
Enrich reports/generation-summary.md with the review-oriented sections
that the 2026-05-17 smoke run flagged as missing: ## Chapter coverage
(per-chapter source/entity/relation/anchor counts), ## Entities (the
deduped title list), ## Unmapped source chunks (sources with no
downstream generated artifact), and ## Page anchors (total plus
deterministic sample). Sections are conditional on data being present
so generic non-Lefevre runs stay terse.

Add docs/lefevre-readiness.md as the final sign-off document for
IB-WP-0016: what is wired (T01-T06 recap), an output policy table
(checked-in fixture sources vs disposable generated infospaces vs
archive targets), a seven-item reviewer checklist (duplicate entities,
relation endpoints, weak evidence, overgeneralization, anchor
coverage, unmapped sources, plan-vs-actual variance), a scale-up plan
from one-chapter to full-book, and the load-bearing risks still
outstanding (cross-chunk dedup, whole-run resume, adaptive routing
deferred to LLM-WP-0004 / IB-WP-0018, rate-table drift).

Closes IB-WP-0016 (Lefevre EPUB3 Infospace Readiness Pilot): T01-T07
all done; the workplan is set to status=done.

131 tests pass, 1 skipped (live OpenRouter smoke, correctly gated).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 01:22:41 +02:00
ab23c5873e IB-WP-0016-T06: OpenRouter live-run guardrails
Add --chapter / --from-chapter / --to-chapter / --chunk selection flags
to generate init and generate from-source, plumb them into
init_generation_infospace via a new _filter_chunks_by_chapter helper,
and refuse to create an infospace when the filters reject every chunk
(InfospaceError "empty_chapter_selection"). The flags use the same
T03/T02 plumbing (chapter labels, roman numerals, chunk ids) so a
single-chapter selection is a one-flag command.

OpenRouter run-record metadata (model, request_id, usage tokens,
retry_count, duration_seconds) already lands in
output/workflows/runs/*.yaml; this task just adds the smoke test that
proves it stays there, plus the parallel guarantee that the same
provider metadata reaches generated artifact provenance via
provenance.provider_metadata.

tests/test_openrouter_live.py covers:
- chapter-filter, from/to-chapter range, and empty-selection failure on
  init (non-live, deterministic)
- CLI smoke through generate from-source with --chapter
- a pytest-skipped live OpenRouter one-chapter end-to-end gated by
  OPENROUTER_API_KEY + INFOSPACE_BENCH_ENABLE_LIVE_OPENROUTER, with
  INFOSPACE_BENCH_LIVE_MODEL override (default openai/gpt-4o-mini)

docs/generic-source-generator.md gains a "Live OpenRouter runs (handle
with care)" section that walks plan-before-run, single-chapter live
run, the budget/usage artifacts, and the checks a reviewer should run
before scaling to the full book.

129 tests pass, 1 skipped (the live smoke, correctly gated).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 23:04:19 +02:00
348deca9f2 IB-WP-0016-T05: deterministic Lefevre acceptance fixture
Check in a small Lefevre-shaped EPUB fixture as separate source files
under tests/fixtures/lefevre/sources/ (container.xml, OPF, nav, cover,
PG header, three roman-numeral chapters with page anchors,
transcriber notes, license, PG footer). The test helper assembles
these into an EPUB at test time so the inputs stay inspectable in git.

Fixture responses tuned to the trading-literature profile (T04) live
at tests/fixtures/lefevre/responses.yaml: trader / institution /
strategy categories on entities, strategy_outcome / actor_venue
relation types, and all four trading-tuned evaluation criteria.

Three tests cover the acceptance:
- end-to-end Python pipeline: stable chapter-NN source slugs, full
  artifact tree (entities, relations, evaluations, metrics, history,
  generation report), budget registry persisted, chapter_number
  provenance round-trips through artifacts/index.yaml
- regression: PG boilerplate (cover, nav, header, notes, license,
  footer) is excluded by default and only appears under
  include_non_body=True
- CLI smoke through generate from-source --profile trading-literature
  --fixture-responses ...

125 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 22:31:17 +02:00
bb70b2f4b9 IB-WP-0019-T07: archive integration; close IB-WP-0019
The default archive include set already pulls output/ in wholesale, so
output/budget/ already lands inside the archive package with no code
change. Add a budget_summary block to ArchiveRecord.metadata so
catalog-level tools can see plans_count, runs_count, total_tokens,
total_cost_usd_known, total_cost_usd_estimated, and the
latest_snapshot_id without unpacking the archive. An infospace with no
budget data still archives cleanly with an empty metadata dict.

Closes IB-WP-0019 (Budget and Usage Registry): T01-T07 all done.
Three-layer design landed end-to-end — layer 1 (per-infospace
plans.yaml / usage.yaml / summary.yaml) and layer 3 (state-hub
record_token_event emission with failure isolation) live here; layer 2
(cross-application QualityLedger for adaptive routing) is parked in
llm-connect LLM-WP-0004 and infospace-bench IB-WP-0018 awaits it.

122 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 21:53:28 +02:00
816a95b3ef IB-WP-0019-T06: workspace budget CLI
infospace-bench budget list <workspace> walks <workspace>/infospaces/*
and prints one row per infospace with slug, plans_count, runs_count,
total_tokens, total_cost_usd_known, total_cost_usd_estimated,
last_run_at, and latest_snapshot_id. infospace-bench budget show
<root> dumps the full plans/usage/summary structure for a single
infospace.

Missing budget directories are treated as zero rows rather than errors,
so the CLI is safe to run on partially-populated or fresh workspaces.

120 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 20:44:40 +02:00
110c78b9ad IB-WP-0019-T05: state-hub token-event emission with failure isolation
Emit one record_token_event payload per completed generate run, derived
from the just-recorded usage rollup. tokens_in/out come from the
rollup, model defaults to the dominant model used (or "mixed" when
buckets disagree), agent="infospace-bench", ref_type="session", and
ref_id="<slug>/run-<run_index>". The note carries the infospace slug,
workspace, snapshot_id, and any known/estimated cost so the hub event
is self-describing.

Failure isolation: any exception from the HTTP poster (hub down,
timeout, 5xx) is caught, logged to stderr, and reported as
status=failed; the generate run still completes. INFOSPACE_BENCH_HUB_URL
overrides the default http://127.0.0.1:8000 base;
INFOSPACE_BENCH_DISABLE_HUB_TOKEN_EVENTS skips emission entirely.

Tests cover the happy path, the disable env var, poster failure, the
no-usage skip, multi-model coalescing to "mixed", and an end-to-end
run_generation against an unbindable hub port to prove the run survives
when the hub is unreachable. 116 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 20:33:29 +02:00
d4c9c56f5c IB-WP-0019-T04: plan-vs-actual variance and surfacing
After every generate run, compute variance between the executing plan
snapshot and the just-recorded usage rollup, persist it to
output/budget/summary.yaml (overwrite-on-run), and surface it both in
the generate status JSON (new budget_summary field) and as a "Plan
variance" line in reports/generation-summary.md.

Variance fields: calls / prompt_tokens / total_tokens each carry
{estimated, actual, delta, ratio}; cost_usd carries {estimated,
actual_known, actual_estimated_from_rates, actual_total, delta, ratio};
per_workflow rolls the per-bucket usage up to the same workflow_id grain
the plan reports. Runs whose snapshot_id cannot be resolved (no prior
plan, or pruned from the retention window) still record a variance row
with null comparison fields and snapshot_resolved=false, so the
consumer always sees a current summary.

Reordered run_generation so usage and variance are written before the
generation report, allowing the report to embed the variance line on
the same pass.

110 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 20:06:19 +02:00
a4dde53fc3 IB-WP-0019-T03: rate-table cost computation
Ship a starter model rate table at src/infospace_bench/model_rates.yaml
(prompt_per_1k / completion_per_1k for the OpenRouter models we have
actually touched: gpt-4o, gpt-4o-mini, gpt-4-turbo, claude 3.5 sonnet
and haiku, claude 3 opus, gemini 1.5 flash/pro, llama 3.1 70b) and a
load_rate_table() / estimate_cost_usd() pair that overlays an optional
<workspace>/model-rates.yaml on top of the bundled defaults.

generate run now passes a workspace-aware cost_resolver into
record_run_usage, so cost_usd_estimated lands on every usage bucket
whose model matches the table. Adapter-returned cost still wins
(cost_status="known"); rate-table cost is reported under
cost_status="estimated"; unmatched models are recorded as
cost_status="unknown" rather than silently zeroed. Rate-table file is
listed in pyproject.toml package-data so pip-installed users keep the
defaults.

106 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 19:54:30 +02:00
678508226a IB-WP-0019-T02: usage rollup from run records
Every completed generate run now aggregates per-call adapter usage from
the workflow-engine run records into output/budget/usage.yaml. Per-call
data is bucketed by (workflow_id, stage_id, provider, model) with
running totals for calls, prompt_tokens, completion_tokens,
total_tokens, and cost_usd_known (sum of adapter-reported cost when the
provider returns it; usually zero today). A run-level entry captures
run_index, started_at, completed_at, duration_seconds, the executing
plan snapshot_id (resolved from the latest plans.yaml entry), and the
workflow-level run_id / stage_count summaries.

cost_usd_estimated is left as None for this task; T03 wires the
rate-table resolver so the same bucket gets a model-priced fallback
when the adapter does not return cost directly.

Fixture-mode runs are recorded with provider='fixture', zero tokens,
and cost_status='unknown' rather than silently skipped, so the rollup
honestly reflects which stages actually ran.

102 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 19:46:40 +02:00
182f7011bb IB-WP-0019-T01: plan snapshot persistence
Every generate plan invocation now appends its compact summary to
output/budget/plans.yaml with a deterministic 12-char snapshot_id
hashed over the selection filters and the estimated call/token/cost
totals. Identical-fingerprint plans refresh the most recent entry's
recorded_at instead of stacking duplicates. Retention defaults to the
last 50 snapshots; older entries are pruned and counted on a top-level
pruned_count field.

The summary now echoes its input filters (chapter_filter, chunk_filter,
from_chapter, to_chapter) so reviewers can read the snapshot without
cross-referencing the CLI invocation.

New module src/infospace_bench/budget.py owns layer 1 (per-infospace
recording) of the IB-WP-0019 three-layer design; layer 2 still belongs
in llm-connect LLM-WP-0004 and layer 3 in state-hub.

99 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 19:19:35 +02:00
df87e212a2 IB-WP-0016-T04: trading-literature profile
Ship a specialized profile for trading memoirs and market-structure
texts. The profile names eight entity categories (trader, market,
strategy, error, psychological_pattern, institution, instrument,
evidence_bearing_claim), five relation types (cause_effect,
lesson_evidence, risk_mitigation, actor_venue, strategy_outcome), and
four evaluation criteria (groundedness, lesson_clarity,
historical_context, overgeneralization_risk). Each is reflected in the
prompts and contracts so the LLM is steered toward operator-level
findings rather than biographical detail or moralising.

The generic profile remains the default. A 2-chapter Lefevre smoke run
with --profile trading-literature completes end-to-end with viable
metrics; 93 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 18:59:45 +02:00
13f9c1895c IB-WP-0016-T03: scale-aware planning
Replace generate plan's full-prompt dump with a compact summary that
reports selected-chunk counts, selected chapter numbers, per-workflow
call counts, prompt-word and token estimates, and a rough USD cost when
--cost-per-1k is supplied. Selection filters --chapter (label or number,
repeatable), --from-chapter / --to-chapter (numeric range), and --chunk
(repeatable id) shape the estimate. Budget caps --max-calls and
--cost-cap are reported as exceeds_* booleans so callers can fail fast
before run.

The old full per-workflow plan with prompts remains available behind
--full so deep inspection is opt-in instead of the default.

Whole-Lefevre estimate at default max_words=800: 146 chunks, 730 calls,
~518k prompt tokens, ~$155 at $0.30/1k. Chapters 3-5 only: 19 chunks,
95 calls, ~64k tokens. 87 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 18:18:09 +02:00
b9173b6569 IB-WP-0016-T02: chapter-aware chunking and stable IDs
Resolve chapter labels from EPUB nav entries (when present) and from the
first in-document h1/h2/h3 heading, parse roman-numeral and "Chapter N"
labels into numeric chapter indices, and generate stable IDs of the form
chapter-NN with -part-NNN suffix when a chapter exceeds max_words. The
chunker now operates on cleaned body text, distributes id="Page_*" page
anchors per part via inline markers extracted before splitting, and
supports a configurable overlap_words evidence window between adjacent
parts of the same chapter. Reclassify body sections whose chapter label
matches contents/transcriber-notes/license/colophon tokens so they leave
the body stream by default. Strip <head>...</head> from HTML body
extraction to stop the <title> tag from duplicating heading text in the
chunk markdown.

Real Lefevre EPUB now detects all 24 roman-numeral chapters with stable
chapter-NN IDs, distributes Page_N anchors across multi-part chapters,
and reclassifies Contents and Transcriber's Notes out of body
(role histogram body=67, cover=1, header=1, toc=1, notes=1, footer=2).
82 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 15:52:47 +02:00
5b6a63fb7a IB-WP-0016-T01: spine-aware EPUB3 intake
Parse META-INF/container.xml and the OPF package document, then iterate
documents in spine reading order instead of archive-name sort. Classify
each spine item (body, cover, nav, toc, header, footer, notes, license,
auxiliary) and exclude non-body sections by default; include_non_body=True
opts them back in for inspection. Capture OPF book metadata (title,
creator, language, subjects, rights, identifier, source_url, modified)
onto every chunk and propagate it through source artifact provenance.
Preserve the legacy zip-without-OPF fallback for malformed EPUBs.

Real Lefevre EPUB now yields 148 body chunks in spine order (was 155
mixed, archive-sorted) with cover=1, header=1, footer=4 detected and
dropped. 78 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 13:52:24 +02:00
37c28d2298 archive: include contracts/, schemas/; report skipped top-level dirs
Two of yesterday's archives silently dropped infospace content: the default
include set was missing contracts/, so wealth-vsm-generation-pilot (16 files)
and wealth-vsm-legacy-slice (12 files) were preserved as 14 and 10 files
respectively. Fix the include set and make silent drops visible.

- DEFAULT_INCLUDE now: infospace.yaml, artifacts, contracts, schemas,
  workflows, output, reports, exports
- ArchiveRecord gains skipped_top_level: top-level entries present in the
  live root that are not in the include set, not excluded, and not auto-
  hidden (hidden dotfiles, empty dirs, .store/index.yaml). Surfaces in
  index.yaml only when non-empty.
- Re-archived the two affected pilots with correct counts. Prior records
  remain in each index.yaml as history.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 12:21:19 +02:00
ddefd69f71 IB-WP-0014: archive-list, restore, retention annotation, docs (T03-T05)
Round out IB-WP-0014 with the remaining archive operations and docs.

- restore_archive() and `infospace-bench restore <pkg> --target <dir>` round-trip
  a finalized package's bytes back to disk. Refuses to overwrite a non-empty
  target unless --force. --from <infospace-root> resolves the store location.
- archive-list CLI with --with-retention flag; annotate_retention() opens the
  per-infospace registry and joins each record with its current retention
  state (effective class, expires, holds, eligibility).
- docs/archive-integration.md covers when to archive, the include set,
  retention classes, storage layout, credentials policy, and the explicit
  non-goal that S3/git backends live in artifact-store.
- SCOPE.md cross-links the new doc.
- Workplan flipped to status: done. Full pytest suite: 72 passed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 11:46:23 +02:00
36bfa33fb9 IB-WP-0014: archive integration with artifact-store (T01+T02)
Reframe IB-WP-0014 from "in-repo S3/git backend adapters" to "durable archive
surface via artifact-store". The live infospace stays in a local working folder;
finalized snapshots are bundled into content-addressed artifact-store packages.

- New module infospace_bench.archive: archive_infospace(), list_archives(),
  ArchiveRecord. Self-bootstraps a SQLite + local-FS registry under
  output/archives/.store/ when no Registry is passed in.
- New output/archives/index.yaml records each archive event (package id,
  manifest digest, retention class, included paths, file count, note).
- artifactstore added as a path dep; Python floor bumped to 3.12 to match.
- Makefile for venv-based dev setup; stack-and-commands.md updated.
- tests/test_archive.py covers index write, list, recursive-capture guard,
  caller-supplied include, and empty-include error. Full suite 65 passed.

Remaining tasks (T03 list CLI, T04 restore, T05 docs) tracked in the workplan.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 11:30:49 +02:00
c3b62a6ec3 Agentic memory profile 2026-05-15 16:01:35 +02:00
46aad3cce8 generic source-to-infospace generator 2026-05-14 19:33:22 +02:00
a729a7643e infospace pipeline for wealth of nations example 2026-05-14 18:04:38 +02:00
3de72eb0d2 command parity and migration guide 2026-05-14 17:16:39 +02:00
5d53c33d3e Kontextual Engine Integration Boundary 2026-05-14 16:43:29 +02:00
fc70acb257 engine and lifecycle 2026-05-14 16:26:42 +02:00
55405d8a5a acceptance matrix and workflow generation 2026-05-14 16:01:32 +02:00
7f54dec585 eval history and metrics 2026-05-14 15:35:04 +02:00
9627d03c1a entity relationship model 2026-05-14 15:06:17 +02:00
6eb3c6a0fb markitect-tool integration 2026-05-14 14:53:16 +02:00
916a895a85 Initial implementation 2026-05-14 11:32:25 +02:00