infospace-bench

Author	SHA1	Message	Date
tegwick	df87e212a2	IB-WP-0016-T04: trading-literature profile Ship a specialized profile for trading memoirs and market-structure texts. The profile names eight entity categories (trader, market, strategy, error, psychological_pattern, institution, instrument, evidence_bearing_claim), five relation types (cause_effect, lesson_evidence, risk_mitigation, actor_venue, strategy_outcome), and four evaluation criteria (groundedness, lesson_clarity, historical_context, overgeneralization_risk). Each is reflected in the prompts and contracts so the LLM is steered toward operator-level findings rather than biographical detail or moralising. The generic profile remains the default. A 2-chapter Lefevre smoke run with --profile trading-literature completes end-to-end with viable metrics; 93 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 18:59:45 +02:00
tegwick	13f9c1895c	IB-WP-0016-T03: scale-aware planning Replace generate plan's full-prompt dump with a compact summary that reports selected-chunk counts, selected chapter numbers, per-workflow call counts, prompt-word and token estimates, and a rough USD cost when --cost-per-1k is supplied. Selection filters --chapter (label or number, repeatable), --from-chapter / --to-chapter (numeric range), and --chunk (repeatable id) shape the estimate. Budget caps --max-calls and --cost-cap are reported as exceeds_* booleans so callers can fail fast before run. The old full per-workflow plan with prompts remains available behind --full so deep inspection is opt-in instead of the default. Whole-Lefevre estimate at default max_words=800: 146 chunks, 730 calls, ~518k prompt tokens, ~$155 at $0.30/1k. Chapters 3-5 only: 19 chunks, 95 calls, ~64k tokens. 87 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 18:18:09 +02:00
tegwick	001b64d67b	IB-WP-0016: refresh validation baseline after T01/T02 smoke run Run a fixture-backed end-to-end smoke against the real Lefevre EPUB (max-chunks 3) and capture the result in the validation note and the workplan. The pipeline produces a complete infospace with stable chapter-01-part-NNN source IDs, full chapter/book/anchor provenance on every source artifact, viable metrics, and exact-title entity dedupe. Refresh the workplan validation baseline to reflect the post-T01/T02 state, and add a remaining-gaps section that maps the open issues to the right follow-on tasks: cost/scope controls and plan preview to T03, the trading-literature profile to T04, chunk-level resume to T06, and a richer generation-summary report (entity titles, chapter coverage, anchor links) to T07. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 16:13:39 +02:00
tegwick	b9173b6569	IB-WP-0016-T02: chapter-aware chunking and stable IDs Resolve chapter labels from EPUB nav entries (when present) and from the first in-document h1/h2/h3 heading, parse roman-numeral and "Chapter N" labels into numeric chapter indices, and generate stable IDs of the form chapter-NN with -part-NNN suffix when a chapter exceeds max_words. The chunker now operates on cleaned body text, distributes id="Page_*" page anchors per part via inline markers extracted before splitting, and supports a configurable overlap_words evidence window between adjacent parts of the same chapter. Reclassify body sections whose chapter label matches contents/transcriber-notes/license/colophon tokens so they leave the body stream by default. Strip <head>...</head> from HTML body extraction to stop the <title> tag from duplicating heading text in the chunk markdown. Real Lefevre EPUB now detects all 24 roman-numeral chapters with stable chapter-NN IDs, distributes Page_N anchors across multi-part chapters, and reclassifies Contents and Transcriber's Notes out of body (role histogram body=67, cover=1, header=1, toc=1, notes=1, footer=2). 82 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 15:52:47 +02:00
tegwick	5b6a63fb7a	IB-WP-0016-T01: spine-aware EPUB3 intake Parse META-INF/container.xml and the OPF package document, then iterate documents in spine reading order instead of archive-name sort. Classify each spine item (body, cover, nav, toc, header, footer, notes, license, auxiliary) and exclude non-body sections by default; include_non_body=True opts them back in for inspection. Capture OPF book metadata (title, creator, language, subjects, rights, identifier, source_url, modified) onto every chunk and propagate it through source artifact provenance. Preserve the legacy zip-without-OPF fallback for malformed EPUBs. Real Lefevre EPUB now yields 148 body chunks in spine order (was 155 mixed, archive-sorted) with cover=1, header=1, footer=4 detected and dropped. 78 tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 13:52:24 +02:00
tegwick	ddefd69f71	IB-WP-0014: archive-list, restore, retention annotation, docs (T03-T05) Round out IB-WP-0014 with the remaining archive operations and docs. - restore_archive() and `infospace-bench restore <pkg> --target <dir>` round-trip a finalized package's bytes back to disk. Refuses to overwrite a non-empty target unless --force. --from <infospace-root> resolves the store location. - archive-list CLI with --with-retention flag; annotate_retention() opens the per-infospace registry and joins each record with its current retention state (effective class, expires, holds, eligibility). - docs/archive-integration.md covers when to archive, the include set, retention classes, storage layout, credentials policy, and the explicit non-goal that S3/git backends live in artifact-store. - SCOPE.md cross-links the new doc. - Workplan flipped to status: done. Full pytest suite: 72 passed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-17 11:46:23 +02:00
tegwick	c3b62a6ec3	Agentic memory profile	2026-05-15 16:01:35 +02:00
tegwick	9d1a2088aa	Workplan for practical example	2026-05-14 22:05:10 +02:00
tegwick	46aad3cce8	generic source-to-infospace generator	2026-05-14 19:33:22 +02:00
tegwick	a729a7643e	infospace pipeline for wealth of nations example	2026-05-14 18:04:38 +02:00
tegwick	3de72eb0d2	command parity and migration guide	2026-05-14 17:16:39 +02:00
tegwick	5d53c33d3e	Kontextual Engine Integration Boundary	2026-05-14 16:43:29 +02:00
tegwick	fc70acb257	engine and lifecycle	2026-05-14 16:26:42 +02:00
tegwick	55405d8a5a	acceptance matrix and workflow generation	2026-05-14 16:01:32 +02:00
tegwick	7f54dec585	eval history and metrics	2026-05-14 15:35:04 +02:00
tegwick	9627d03c1a	entity relationship model	2026-05-14 15:06:17 +02:00
tegwick	6eb3c6a0fb	markitect-tool integration	2026-05-14 14:53:16 +02:00
tegwick	28de86f13e	docs and stuff	2026-05-14 13:47:36 +02:00
tegwick	9d643f6e99	Reestablishing intent based goals and workplans	2026-05-14 13:09:58 +02:00
tegwick	916a895a85	Initial implementation	2026-05-14 11:32:25 +02:00
tegwick	f25bd2cf84	State-hub connect and initial workplans	2026-05-03 20:43:56 +02:00

21 Commits