Blocked stub that names the dependency on llm-connect WP-0004 (adaptive
cost-quality routing). Activates once T01..T03 of that workplan land
and the QualityLedger / BaselineGrader / AdaptiveRoutingPolicy APIs are
stable.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Run a fixture-backed end-to-end smoke against the real Lefevre EPUB
(max-chunks 3) and capture the result in the validation note and the
workplan. The pipeline produces a complete infospace with stable
chapter-01-part-NNN source IDs, full chapter/book/anchor provenance on
every source artifact, viable metrics, and exact-title entity dedupe.
Refresh the workplan validation baseline to reflect the post-T01/T02
state, and add a remaining-gaps section that maps the open issues to the
right follow-on tasks: cost/scope controls and plan preview to T03, the
trading-literature profile to T04, chunk-level resume to T06, and a
richer generation-summary report (entity titles, chapter coverage,
anchor links) to T07.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Resolve chapter labels from EPUB nav entries (when present) and from the
first in-document h1/h2/h3 heading, parse roman-numeral and "Chapter N"
labels into numeric chapter indices, and generate stable IDs of the form
chapter-NN with -part-NNN suffix when a chapter exceeds max_words. The
chunker now operates on cleaned body text, distributes id="Page_*" page
anchors per part via inline markers extracted before splitting, and
supports a configurable overlap_words evidence window between adjacent
parts of the same chapter. Reclassify body sections whose chapter label
matches contents/transcriber-notes/license/colophon tokens so they leave
the body stream by default. Strip <head>...</head> from HTML body
extraction to stop the <title> tag from duplicating heading text in the
chunk markdown.
Real Lefevre EPUB now detects all 24 roman-numeral chapters with stable
chapter-NN IDs, distributes Page_N anchors across multi-part chapters,
and reclassifies Contents and Transcriber's Notes out of body
(role histogram body=67, cover=1, header=1, toc=1, notes=1, footer=2).
82 tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Parse META-INF/container.xml and the OPF package document, then iterate
documents in spine reading order instead of archive-name sort. Classify
each spine item (body, cover, nav, toc, header, footer, notes, license,
auxiliary) and exclude non-body sections by default; include_non_body=True
opts them back in for inspection. Capture OPF book metadata (title,
creator, language, subjects, rights, identifier, source_url, modified)
onto every chunk and propagate it through source artifact provenance.
Preserve the legacy zip-without-OPF fallback for malformed EPUBs.
Real Lefevre EPUB now yields 148 body chunks in spine order (was 155
mixed, archive-sorted) with cover=1, header=1, footer=4 detected and
dropped. 78 tests pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two of yesterday's archives silently dropped infospace content: the default
include set was missing contracts/, so wealth-vsm-generation-pilot (16 files)
and wealth-vsm-legacy-slice (12 files) were preserved as 14 and 10 files
respectively. Fix the include set and make silent drops visible.
- DEFAULT_INCLUDE now: infospace.yaml, artifacts, contracts, schemas,
workflows, output, reports, exports
- ArchiveRecord gains skipped_top_level: top-level entries present in the
live root that are not in the include set, not excluded, and not auto-
hidden (hidden dotfiles, empty dirs, .store/index.yaml). Surfaces in
index.yaml only when non-empty.
- Re-archived the two affected pilots with correct counts. Prior records
remain in each index.yaml as history.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Capture the in-flight and legacy pilots as artifact-store packages, all at
retention class release-evidence (default expiry 2033-05-15).
- wealth-vsm-generation-pilot — pkg ed977a9c, 14 files (in flight, IB-WP-0013)
- wealth-vsm-legacy-slice — pkg 9d114264, 10 files (legacy parity ref)
- bootstrap-pilot — pkg fb31721e, 9 files (initial scaffold ref)
Each infospace now has its own self-contained .store/ (gitignored) and an
output/archives/index.yaml pointer log (tracked).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Run archive_infospace() against infospaces/agentic-memory-profile-pilot to
preserve the v1 frozen state. 13 files captured at retention class
release-evidence (default expiry 2033-05-15). The pointer index.yaml is
tracked; the self-contained artifact-store registry under .store/ is
gitignored — bytes and event log are reconstructable from the artifact-store
deployment.
- Package id: d3c1ff32-2eed-4b9c-9868-dbff0af723b4
- Manifest digest: blake3:5ff7fc5d7974d5f4fd4b66a181cda729f61d399f7b3c8e7dea2aa9af8fd2025b
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Round out IB-WP-0014 with the remaining archive operations and docs.
- restore_archive() and `infospace-bench restore <pkg> --target <dir>` round-trip
a finalized package's bytes back to disk. Refuses to overwrite a non-empty
target unless --force. --from <infospace-root> resolves the store location.
- archive-list CLI with --with-retention flag; annotate_retention() opens the
per-infospace registry and joins each record with its current retention
state (effective class, expires, holds, eligibility).
- docs/archive-integration.md covers when to archive, the include set,
retention classes, storage layout, credentials policy, and the explicit
non-goal that S3/git backends live in artifact-store.
- SCOPE.md cross-links the new doc.
- Workplan flipped to status: done. Full pytest suite: 72 passed.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reframe IB-WP-0014 from "in-repo S3/git backend adapters" to "durable archive
surface via artifact-store". The live infospace stays in a local working folder;
finalized snapshots are bundled into content-addressed artifact-store packages.
- New module infospace_bench.archive: archive_infospace(), list_archives(),
ArchiveRecord. Self-bootstraps a SQLite + local-FS registry under
output/archives/.store/ when no Registry is passed in.
- New output/archives/index.yaml records each archive event (package id,
manifest digest, retention class, included paths, file count, note).
- artifactstore added as a path dep; Python floor bumped to 3.12 to match.
- Makefile for venv-based dev setup; stack-and-commands.md updated.
- tests/test_archive.py covers index write, list, recursive-capture guard,
caller-supplied include, and empty-include error. Full suite 65 passed.
Remaining tasks (T03 list CLI, T04 restore, T05 docs) tracked in the workplan.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>