Commit Graph

69 Commits

Author SHA1 Message Date
745edc8b81 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-17:
  - update .custodian-brief.md for infospace-bench
2026-05-17 16:00:10 +02:00
b9173b6569 IB-WP-0016-T02: chapter-aware chunking and stable IDs
Resolve chapter labels from EPUB nav entries (when present) and from the
first in-document h1/h2/h3 heading, parse roman-numeral and "Chapter N"
labels into numeric chapter indices, and generate stable IDs of the form
chapter-NN with -part-NNN suffix when a chapter exceeds max_words. The
chunker now operates on cleaned body text, distributes id="Page_*" page
anchors per part via inline markers extracted before splitting, and
supports a configurable overlap_words evidence window between adjacent
parts of the same chapter. Reclassify body sections whose chapter label
matches contents/transcriber-notes/license/colophon tokens so they leave
the body stream by default. Strip <head>...</head> from HTML body
extraction to stop the <title> tag from duplicating heading text in the
chunk markdown.

Real Lefevre EPUB now detects all 24 roman-numeral chapters with stable
chapter-NN IDs, distributes Page_N anchors across multi-part chapters,
and reclassifies Contents and Transcriber's Notes out of body
(role histogram body=67, cover=1, header=1, toc=1, notes=1, footer=2).
82 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 15:52:47 +02:00
ef19aa6de7 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-17:
  - update .custodian-brief.md for infospace-bench
2026-05-17 13:55:50 +02:00
a696f75280 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-17:
  - IB-WP-0016-T02: todo → in_progress
2026-05-17 13:55:49 +02:00
5b6a63fb7a IB-WP-0016-T01: spine-aware EPUB3 intake
Parse META-INF/container.xml and the OPF package document, then iterate
documents in spine reading order instead of archive-name sort. Classify
each spine item (body, cover, nav, toc, header, footer, notes, license,
auxiliary) and exclude non-body sections by default; include_non_body=True
opts them back in for inspection. Capture OPF book metadata (title,
creator, language, subjects, rights, identifier, source_url, modified)
onto every chunk and propagate it through source artifact provenance.
Preserve the legacy zip-without-OPF fallback for malformed EPUBs.

Real Lefevre EPUB now yields 148 body chunks in spine order (was 155
mixed, archive-sorted) with cover=1, header=1, footer=4 detected and
dropped. 78 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 13:52:24 +02:00
ead2f335f3 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-17:
  - update .custodian-brief.md for infospace-bench
2026-05-17 13:38:30 +02:00
e05cdab042 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-17:
  - update .custodian-brief.md for infospace-bench
2026-05-17 12:28:39 +02:00
2bcd9396f8 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-17:
  - IB-WP-0016-T01: todo → in_progress
2026-05-17 12:28:38 +02:00
37c28d2298 archive: include contracts/, schemas/; report skipped top-level dirs
Two of yesterday's archives silently dropped infospace content: the default
include set was missing contracts/, so wealth-vsm-generation-pilot (16 files)
and wealth-vsm-legacy-slice (12 files) were preserved as 14 and 10 files
respectively. Fix the include set and make silent drops visible.

- DEFAULT_INCLUDE now: infospace.yaml, artifacts, contracts, schemas,
  workflows, output, reports, exports
- ArchiveRecord gains skipped_top_level: top-level entries present in the
  live root that are not in the include set, not excluded, and not auto-
  hidden (hidden dotfiles, empty dirs, .store/index.yaml). Surfaces in
  index.yaml only when non-empty.
- Re-archived the two affected pilots with correct counts. Prior records
  remain in each index.yaml as history.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 12:21:19 +02:00
523db6d341 IB-WP-0014: archive remaining three infospaces
Capture the in-flight and legacy pilots as artifact-store packages, all at
retention class release-evidence (default expiry 2033-05-15).

- wealth-vsm-generation-pilot — pkg ed977a9c, 14 files (in flight, IB-WP-0013)
- wealth-vsm-legacy-slice     — pkg 9d114264, 10 files (legacy parity ref)
- bootstrap-pilot             — pkg fb31721e, 9  files (initial scaffold ref)

Each infospace now has its own self-contained .store/ (gitignored) and an
output/archives/index.yaml pointer log (tracked).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 12:14:54 +02:00
fb379e1a18 IB-WP-0014: first real archive — agentic memory profile pilot
Run archive_infospace() against infospaces/agentic-memory-profile-pilot to
preserve the v1 frozen state. 13 files captured at retention class
release-evidence (default expiry 2033-05-15). The pointer index.yaml is
tracked; the self-contained artifact-store registry under .store/ is
gitignored — bytes and event log are reconstructable from the artifact-store
deployment.

- Package id: d3c1ff32-2eed-4b9c-9868-dbff0af723b4
- Manifest digest: blake3:5ff7fc5d7974d5f4fd4b66a181cda729f61d399f7b3c8e7dea2aa9af8fd2025b

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 12:10:18 +02:00
e92b3ce1a6 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-17:
  - update .custodian-brief.md for infospace-bench
2026-05-17 11:52:21 +02:00
7825608307 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-17:
  - IB-WP-0014-T05: todo → done
2026-05-17 11:52:20 +02:00
e7be3f41b8 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-17:
  - IB-WP-0014-T04: todo → done
2026-05-17 11:52:20 +02:00
d31be49db6 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-17:
  - IB-WP-0014-T03: todo → done
2026-05-17 11:52:20 +02:00
ddefd69f71 IB-WP-0014: archive-list, restore, retention annotation, docs (T03-T05)
Round out IB-WP-0014 with the remaining archive operations and docs.

- restore_archive() and `infospace-bench restore <pkg> --target <dir>` round-trip
  a finalized package's bytes back to disk. Refuses to overwrite a non-empty
  target unless --force. --from <infospace-root> resolves the store location.
- archive-list CLI with --with-retention flag; annotate_retention() opens the
  per-infospace registry and joins each record with its current retention
  state (effective class, expires, holds, eligibility).
- docs/archive-integration.md covers when to archive, the include set,
  retention classes, storage layout, credentials policy, and the explicit
  non-goal that S3/git backends live in artifact-store.
- SCOPE.md cross-links the new doc.
- Workplan flipped to status: done. Full pytest suite: 72 passed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 11:46:23 +02:00
e343443d77 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-17:
  - update .custodian-brief.md for infospace-bench
2026-05-17 11:36:36 +02:00
f1085e8571 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-17:
  - IB-WP-0014-T02: todo → done
2026-05-17 11:36:34 +02:00
a8177474d2 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-17:
  - IB-WP-0014-T01: in_progress → done
2026-05-17 11:36:34 +02:00
36bfa33fb9 IB-WP-0014: archive integration with artifact-store (T01+T02)
Reframe IB-WP-0014 from "in-repo S3/git backend adapters" to "durable archive
surface via artifact-store". The live infospace stays in a local working folder;
finalized snapshots are bundled into content-addressed artifact-store packages.

- New module infospace_bench.archive: archive_infospace(), list_archives(),
  ArchiveRecord. Self-bootstraps a SQLite + local-FS registry under
  output/archives/.store/ when no Registry is passed in.
- New output/archives/index.yaml records each archive event (package id,
  manifest digest, retention class, included paths, file count, note).
- artifactstore added as a path dep; Python floor bumped to 3.12 to match.
- Makefile for venv-based dev setup; stack-and-commands.md updated.
- tests/test_archive.py covers index write, list, recursive-capture guard,
  caller-supplied include, and empty-include error. Full suite 65 passed.

Remaining tasks (T03 list CLI, T04 restore, T05 docs) tracked in the workplan.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 11:30:49 +02:00
673ed6e274 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-17:
  - update .custodian-brief.md for infospace-bench
2026-05-17 06:16:14 +02:00
c3b62a6ec3 Agentic memory profile 2026-05-15 16:01:35 +02:00
9a03fd1606 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-15:
  - update .custodian-brief.md for infospace-bench
2026-05-15 11:36:20 +02:00
a2daf9a46b docs(workplans): add memory profile pilot plan 2026-05-15 00:23:29 +02:00
d07f349168 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-15:
  - update .custodian-brief.md for infospace-bench
2026-05-15 00:18:31 +02:00
9d1a2088aa Workplan for practical example 2026-05-14 22:05:10 +02:00
937acde0b7 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-14:
  - update .custodian-brief.md for infospace-bench
2026-05-14 19:51:29 +02:00
46aad3cce8 generic source-to-infospace generator 2026-05-14 19:33:22 +02:00
065e17f42e chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-14:
  - update .custodian-brief.md for infospace-bench
2026-05-14 18:51:45 +02:00
889e2b2266 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-14:
  - update .custodian-brief.md for infospace-bench
2026-05-14 18:37:40 +02:00
b442a2de47 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-14:
  - IB-WP-0015-T03: todo → in_progress
2026-05-14 18:37:40 +02:00
ca9929d659 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-14:
  - IB-WP-0015-T02: todo → in_progress
2026-05-14 18:37:40 +02:00
66cd85d0fc chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-14:
  - IB-WP-0015-T01: todo → in_progress
2026-05-14 18:37:40 +02:00
b0acd3725b Workplan for infospace creation 2026-05-14 18:30:44 +02:00
01c3c2d1ae chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-14:
  - update .custodian-brief.md for infospace-bench
2026-05-14 18:05:15 +02:00
a729a7643e infospace pipeline for wealth of nations example 2026-05-14 18:04:38 +02:00
8804461ca3 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-14:
  - update .custodian-brief.md for infospace-bench
2026-05-14 17:50:22 +02:00
7b5510a4c3 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-14:
  - IB-WP-0013-T02: todo → in_progress
2026-05-14 17:50:20 +02:00
97a9c3b155 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-14:
  - IB-WP-0013-T01: todo → in_progress
2026-05-14 17:50:20 +02:00
448b432942 Workplans to actually create infospaces 2026-05-14 17:46:48 +02:00
3de72eb0d2 command parity and migration guide 2026-05-14 17:16:39 +02:00
0753c32c1b chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-14:
  - update .custodian-brief.md for infospace-bench
2026-05-14 17:04:41 +02:00
5d53c33d3e Kontextual Engine Integration Boundary 2026-05-14 16:43:29 +02:00
e78e5d8f43 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-14:
  - update .custodian-brief.md for infospace-bench
2026-05-14 16:34:10 +02:00
fc70acb257 engine and lifecycle 2026-05-14 16:26:42 +02:00
c0a535f1d1 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-14:
  - update .custodian-brief.md for infospace-bench
2026-05-14 16:17:45 +02:00
55405d8a5a acceptance matrix and workflow generation 2026-05-14 16:01:32 +02:00
4026f34174 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-14:
  - update .custodian-brief.md for infospace-bench
2026-05-14 15:48:39 +02:00
7f54dec585 eval history and metrics 2026-05-14 15:35:04 +02:00
d0c1f82863 chore(consistency): sync task status from DB [auto]
Updated by fix-consistency on 2026-05-14:
  - update .custodian-brief.md for infospace-bench
2026-05-14 15:18:00 +02:00