generated from coulomb/repo-seed
Workplan for practical example
This commit is contained in:
@@ -32,6 +32,7 @@ Start with:
|
|||||||
- `docs/replacement-readiness-decision.md`
|
- `docs/replacement-readiness-decision.md`
|
||||||
- `docs/wealth-vsm-generation-pipeline.md`
|
- `docs/wealth-vsm-generation-pipeline.md`
|
||||||
- `docs/generic-source-generator.md`
|
- `docs/generic-source-generator.md`
|
||||||
|
- `docs/lefevre-epub3-validation.md`
|
||||||
- `infospaces/bootstrap-pilot/`
|
- `infospaces/bootstrap-pilot/`
|
||||||
- `infospaces/wealth-vsm-legacy-slice/`
|
- `infospaces/wealth-vsm-legacy-slice/`
|
||||||
- `infospaces/wealth-vsm-generation-pilot/`
|
- `infospaces/wealth-vsm-generation-pilot/`
|
||||||
|
|||||||
70
docs/lefevre-epub3-validation.md
Normal file
70
docs/lefevre-epub3-validation.md
Normal file
@@ -0,0 +1,70 @@
|
|||||||
|
# Lefevre EPUB3 Validation
|
||||||
|
|
||||||
|
Date: 2026-05-14
|
||||||
|
|
||||||
|
## Source
|
||||||
|
|
||||||
|
Local source file:
|
||||||
|
|
||||||
|
`/mnt/c/Users/bernd.worsch/Downloads/LefevreEdwin-ReminiscencesOfAStockOperator.epub`
|
||||||
|
|
||||||
|
The EPUB is Project Gutenberg edition 60979, EPUB package version 3.0. The OPF
|
||||||
|
metadata identifies:
|
||||||
|
|
||||||
|
- title: `Reminiscences of a Stock Operator`
|
||||||
|
- creator: `Edwin Lefevre`
|
||||||
|
- subjects: `Speculation`, `New York Stock Exchange`, `Investments`
|
||||||
|
- rights: public domain in the USA
|
||||||
|
|
||||||
|
## Current Infrastructure Result
|
||||||
|
|
||||||
|
The current generic generator can initialize a disposable infospace from the
|
||||||
|
file and run non-provider metrics:
|
||||||
|
|
||||||
|
- disposable root:
|
||||||
|
`/tmp/infospace-bench-lefevre-583mopy_/infospaces/reminiscences-stock-operator`
|
||||||
|
- source chunks: 155
|
||||||
|
- entities: 0
|
||||||
|
- relations: 0
|
||||||
|
- evaluations: 0
|
||||||
|
- stale status: false
|
||||||
|
- metrics snapshot: `5978ece0`
|
||||||
|
|
||||||
|
The source-only metrics were:
|
||||||
|
|
||||||
|
- redundancy ratio: `0.9225806451612903`
|
||||||
|
- coverage ratio: `1.0`
|
||||||
|
- coherence components: `155.0`
|
||||||
|
- consistency cycles: `0.0`
|
||||||
|
- granularity entropy: `-0.0`
|
||||||
|
|
||||||
|
## Findings
|
||||||
|
|
||||||
|
The EPUB intake works mechanically, but it is not ready for a serious full-book
|
||||||
|
OpenRouter generation run.
|
||||||
|
|
||||||
|
- EPUB spine order is visible in `OEBPS/content.opf`, but current intake reads
|
||||||
|
XHTML files by archive-name sorting.
|
||||||
|
- Current titles mostly collapse to the same long Gutenberg page title instead
|
||||||
|
of chapter labels such as `I`, `II`, and `III`.
|
||||||
|
- Current intake includes non-body material such as cover/header/footer/license
|
||||||
|
candidates unless the caller manually filters after import.
|
||||||
|
- `generate plan` is not yet a compact cost/risk plan for a long book; a full
|
||||||
|
all-stage run would imply hundreds of provider calls.
|
||||||
|
- Resume state is run-level enough for the small generic path, but a long ebook
|
||||||
|
needs chunk-level retry, stale, and skip policy.
|
||||||
|
- Cross-chunk entity deduplication and merge/review policy are needed before a
|
||||||
|
full narrative book becomes a coherent infospace.
|
||||||
|
|
||||||
|
## Desired Readiness Bar
|
||||||
|
|
||||||
|
Before building the real Lefevre infospace with OpenRouter, the CLI should be
|
||||||
|
able to show:
|
||||||
|
|
||||||
|
- book metadata and selected source sections
|
||||||
|
- body-only chapter order
|
||||||
|
- stable chapter/chunk IDs
|
||||||
|
- estimated provider call count and token/cost budget
|
||||||
|
- selected chapter or chunk filters for smoke runs
|
||||||
|
- deterministic fixture acceptance on a small Lefevre-like subset
|
||||||
|
- optional live one-chapter smoke run with explicit provider/model/cost caps
|
||||||
214
workplans/IB-WP-0016-lefevre-ebook-infospace-readiness.md
Normal file
214
workplans/IB-WP-0016-lefevre-ebook-infospace-readiness.md
Normal file
@@ -0,0 +1,214 @@
|
|||||||
|
---
|
||||||
|
id: IB-WP-0016
|
||||||
|
type: workplan
|
||||||
|
title: "Lefevre EPUB3 Infospace Readiness Pilot"
|
||||||
|
domain: markitect
|
||||||
|
repo: infospace-bench
|
||||||
|
status: active
|
||||||
|
owner: markitect
|
||||||
|
topic_slug: markitect
|
||||||
|
created: "2026-05-14"
|
||||||
|
updated: "2026-05-14"
|
||||||
|
state_hub_workstream_slug: "ib-wp-0016-lefevre-ebook-infospace-readiness"
|
||||||
|
state_hub_workstream_id: "23be7d20-b01f-4b17-9851-4d540e4c0984"
|
||||||
|
depends_on_workplans:
|
||||||
|
- IB-WP-0015
|
||||||
|
related_workplans:
|
||||||
|
- IB-WP-0014
|
||||||
|
---
|
||||||
|
|
||||||
|
# IB-WP-0016 - Lefevre EPUB3 Infospace Readiness Pilot
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
|
Use Edwin Lefevre's `Reminiscences of a Stock Operator` EPUB3 as the next real
|
||||||
|
ebook example for `infospace-bench`, and close the gaps that prevent a serious
|
||||||
|
OpenRouter-backed full-book infospace build.
|
||||||
|
|
||||||
|
This workplan should leave us able to run a bounded command like:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
infospace-bench generate from-source \
|
||||||
|
/mnt/c/Users/bernd.worsch/Downloads/LefevreEdwin-ReminiscencesOfAStockOperator.epub \
|
||||||
|
--slug reminiscences-stock-operator \
|
||||||
|
--name "Reminiscences of a Stock Operator" \
|
||||||
|
--profile trading-literature \
|
||||||
|
--provider openrouter \
|
||||||
|
--model <openrouter-model-id> \
|
||||||
|
--chapter I \
|
||||||
|
--cost-cap <cap> \
|
||||||
|
--apply
|
||||||
|
```
|
||||||
|
|
||||||
|
and then scale from one reviewed chapter to the full book without losing
|
||||||
|
provenance, reviewability, or cost control.
|
||||||
|
|
||||||
|
## Validation Baseline
|
||||||
|
|
||||||
|
Validation note: `docs/lefevre-epub3-validation.md`.
|
||||||
|
|
||||||
|
Current WP-0015 infrastructure can initialize the local EPUB and run
|
||||||
|
source-only metrics in a disposable workspace:
|
||||||
|
|
||||||
|
- source chunks: 155
|
||||||
|
- entity count: 0
|
||||||
|
- relation count: 0
|
||||||
|
- evaluation count: 0
|
||||||
|
- source-only metrics history can be written without provider calls
|
||||||
|
|
||||||
|
The run proves the basic intake path works, but also shows why a live all-book
|
||||||
|
run should wait:
|
||||||
|
|
||||||
|
- most generated chunk titles collapse to the same Gutenberg page title
|
||||||
|
- EPUB spine/chapter metadata is not yet honored deeply enough
|
||||||
|
- archive-order sorting risks confusing reading order
|
||||||
|
- non-body sections such as cover/header/footer/license need explicit policy
|
||||||
|
- plan output is too prompt-heavy for cost review on a 155-chunk book
|
||||||
|
- long-book resume needs chunk-level state, not only whole-run skip
|
||||||
|
- generated entities need cross-chunk dedupe/merge policy
|
||||||
|
|
||||||
|
## Non-Goals
|
||||||
|
|
||||||
|
- Do not commit a full generated Lefevre infospace before review.
|
||||||
|
- Do not make live OpenRouter calls in the default test suite.
|
||||||
|
- Do not store API keys or provider secrets in the infospace.
|
||||||
|
- Do not build a general-purpose EPUB conversion suite beyond what the
|
||||||
|
infospace generator needs.
|
||||||
|
|
||||||
|
## Tasks
|
||||||
|
|
||||||
|
### T01 - Spine-aware EPUB3 intake
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: IB-WP-0016-T01
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "a672fcf9-1b80-4faf-b16d-84ca52601dc9"
|
||||||
|
```
|
||||||
|
|
||||||
|
- Parse `META-INF/container.xml` to find the package document
|
||||||
|
- Parse OPF metadata, manifest, and spine
|
||||||
|
- Follow spine reading order instead of archive-name sorting
|
||||||
|
- Preserve book title, creator, source URL, subjects, language, rights, and
|
||||||
|
modified timestamp in source provenance
|
||||||
|
- Exclude or tag cover, nav, table-of-contents, Project Gutenberg header,
|
||||||
|
transcriber notes, and license/footer material by explicit policy
|
||||||
|
- Add tests using a small EPUB3 fixture with nav, cover, body, notes, and footer
|
||||||
|
|
||||||
|
### T02 - Chapter-aware chunking and IDs
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: IB-WP-0016-T02
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "47de1110-36d0-4d63-bf87-389746509e03"
|
||||||
|
```
|
||||||
|
|
||||||
|
- Resolve chapter labels from EPUB nav entries and in-document headings
|
||||||
|
- Generate stable IDs like `chapter-01`, `chapter-01-part-002`, not repeated
|
||||||
|
Gutenberg document titles
|
||||||
|
- Chunk within chapter boundaries with a configurable word limit
|
||||||
|
- Consider overlap or evidence-window context without duplicating headings
|
||||||
|
- Preserve page anchors where available as optional provenance
|
||||||
|
- Add tests showing `Reminiscences`-style roman numeral chapters become stable
|
||||||
|
ordered source chunks
|
||||||
|
|
||||||
|
### T03 - Scale-aware generation planning
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: IB-WP-0016-T03
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "bee5c38a-f052-4edb-9313-b3a2ee5a6c26"
|
||||||
|
```
|
||||||
|
|
||||||
|
- Add compact plan output for long sources
|
||||||
|
- Report estimated chunks, workflow stages, provider call count, prompt word or
|
||||||
|
token estimate, and rough cost inputs
|
||||||
|
- Add CLI selection filters such as `--chapter`, `--chunk`, `--from-chapter`,
|
||||||
|
`--to-chapter`, `--max-calls`, and `--cost-cap`
|
||||||
|
- Keep full prompt inspection available, but do not make it the default for
|
||||||
|
large corpora
|
||||||
|
- Add tests proving plan output is compact and does not dump hundreds of prompts
|
||||||
|
|
||||||
|
### T04 - Trading-literature profile
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: IB-WP-0016-T04
|
||||||
|
status: todo
|
||||||
|
priority: medium
|
||||||
|
state_hub_task_id: "1a1b8fde-773f-46a6-887a-3c87a425d7a3"
|
||||||
|
```
|
||||||
|
|
||||||
|
- Add or specialize a profile for trading memoir and market-structure texts
|
||||||
|
- Tune entity prompts for traders, markets, strategies, errors, psychological
|
||||||
|
patterns, institutions, instruments, and evidence-bearing claims
|
||||||
|
- Tune relation prompts for cause/effect, lesson/evidence, risk/mitigation,
|
||||||
|
actor/venue, and strategy/outcome links
|
||||||
|
- Tune evaluation criteria for groundedness, lesson clarity, historical context,
|
||||||
|
and overgeneralization risk
|
||||||
|
- Keep the generic profile usable for non-trading books
|
||||||
|
|
||||||
|
### T05 - Deterministic Lefevre acceptance fixture
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: IB-WP-0016-T05
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "c9bbc84e-691b-4530-a79a-6ecfa9c41fdd"
|
||||||
|
```
|
||||||
|
|
||||||
|
- Add a small checked-in EPUB-like or extracted chapter fixture derived from
|
||||||
|
public-domain Lefevre structure
|
||||||
|
- Add deterministic fixture responses for source summary, entity extraction,
|
||||||
|
relation extraction, and evaluation
|
||||||
|
- Prove the fixture generates a manifest-backed infospace with stable source,
|
||||||
|
entity, relation, evaluation, metrics, history, and report artifacts
|
||||||
|
- Include a regression test for excluding Gutenberg boilerplate when requested
|
||||||
|
|
||||||
|
### T06 - OpenRouter live-run guardrails
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: IB-WP-0016-T06
|
||||||
|
status: todo
|
||||||
|
priority: high
|
||||||
|
state_hub_task_id: "c6bf97c3-1c2c-4993-8f4f-97a48e01cce2"
|
||||||
|
```
|
||||||
|
|
||||||
|
- Add an optional live smoke test path that is skipped unless credentials and an
|
||||||
|
explicit opt-in environment variable are present
|
||||||
|
- Support a one-chapter OpenRouter run with selected model, bounded retries,
|
||||||
|
cost/call cap, provider metadata, and resume
|
||||||
|
- Record provider model, request IDs, timing, usage, and retry counts in run
|
||||||
|
records and generated artifact provenance
|
||||||
|
- Document how to run the smoke safely and how to stop before a full-book build
|
||||||
|
|
||||||
|
### T07 - Example output and review policy
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: IB-WP-0016-T07
|
||||||
|
status: todo
|
||||||
|
priority: medium
|
||||||
|
state_hub_task_id: "5ff1f11e-49ad-4c2d-bd4c-b8cc261309bc"
|
||||||
|
```
|
||||||
|
|
||||||
|
- Define where generated Lefevre outputs live
|
||||||
|
- Decide what is committed, what remains disposable, and what needs human review
|
||||||
|
- Add a review checklist for duplicate entities, relation endpoints, weak
|
||||||
|
evidence, and over-broad trading lessons
|
||||||
|
- Add a final readiness report before generating the full book
|
||||||
|
|
||||||
|
## Acceptance
|
||||||
|
|
||||||
|
- Current local EPUB can be inspected as EPUB3 with metadata and ordered body
|
||||||
|
sections
|
||||||
|
- `generate init` can import the book as body-only ordered chapter chunks
|
||||||
|
- Chunk titles and IDs are stable, readable, and not dominated by Project
|
||||||
|
Gutenberg boilerplate
|
||||||
|
- `generate plan` gives compact cost/call planning for the full book
|
||||||
|
- A deterministic Lefevre-style fixture generates a complete infospace without
|
||||||
|
network access
|
||||||
|
- Optional one-chapter OpenRouter smoke run is explicit, bounded, resumable, and
|
||||||
|
skipped by default
|
||||||
|
- A full-book run has documented review and output policy before execution
|
||||||
|
|
||||||
Reference in New Issue
Block a user