Workplans to actually create infospaces

This commit is contained in:
2026-05-14 17:46:48 +02:00
parent 3de72eb0d2
commit 448b432942
2 changed files with 315 additions and 0 deletions

View File

@@ -0,0 +1,158 @@
---
id: IB-WP-0013
type: workplan
title: "Wealth VSM Generation Pipeline Parity"
domain: markitect
repo: infospace-bench
status: planned
owner: markitect
topic_slug: markitect
created: "2026-05-14"
updated: "2026-05-14"
state_hub_workstream_slug: "ib-wp-0013-wealth-vsm-generation-pipeline-parity"
state_hub_workstream_id: "74dc579e-9b03-4a00-b739-84b1007cfb94"
---
# IB-WP-0013 - Wealth VSM Generation Pipeline Parity
## Goal
Make `infospace-bench` capable of regenerating the Adam Smith
`Wealth of Nations` / VSM infospace through explicit, auditable workflows.
This should replace the old `markitect-project` generation path without
copying its hidden provider calls, implicit output conventions, or monolithic
`process` command shape.
## Intent
The legacy implementation could run a chapter corpus through:
- entity extraction
- VSM mapping
- chapter-level analysis synthesis
- entity evaluation
- classification and relation enrichment
- collection metrics
The successor should express those stages as declared infospace workflows with
deterministic planning, fake-adapter tests, explicit assisted-generation
requests, stable manifest registration, and clear provenance.
## Non-Goals
- Recreate the old `process_chapters.py` script as-is.
- Hide provider-specific LLM calls behind a generic command.
- Require a live provider or network access for default tests.
- Commit the full regenerated Wealth/VSM output before a one-chapter pilot is
proven.
- Move durable runtime, retrieval, or audit responsibilities into
`infospace-bench`; those remain `kontextual-engine` concerns.
## Tasks
### T01 - Legacy pipeline decomposition and corpus map
```task
id: IB-WP-0013-T01
status: todo
priority: high
state_hub_task_id: "2c558d1e-290f-4e0e-abe6-37302cc31ac4"
```
- Map legacy `examples/infospace-with-history/process_chapters.py`
- Inventory old templates: `extract-entities`, `map-to-vsm`,
`synthesize-analysis`, `evaluate-entity`, and `assess-metrics`
- Inventory source corpus, guidelines, VSM reference artifacts, generated
outputs, processing logs, and metrics files
- Record what must be migrated, reframed, delegated, deferred, or retired
- Pick the first one-chapter golden target, preferably Book I Chapter III so it
aligns with the current pruned legacy slice
### T02 - Assisted generation adapter and CLI boundary
```task
id: IB-WP-0013-T02
status: todo
priority: high
state_hub_task_id: "70beb49c-49a3-49f4-9b3a-a4c5bdb88485"
```
- Extend workflow execution so assisted stages can be executed through an
explicit adapter selected by the caller
- Keep dry-run planning as the default safe path
- Add a deterministic fake adapter for tests
- Persist assisted requests, provider metadata, generated outputs, and run
records
- Expose CLI/API behavior without embedding provider-specific code in core
workflow logic
### T03 - Entity bundle splitting and manifest registration
```task
id: IB-WP-0013-T03
status: todo
priority: high
state_hub_task_id: "4a340077-f0ab-40fe-a0bc-0fa94a325774"
```
- Parse generated chapter-level entity bundles into individual entity artifacts
- Normalize stable artifact IDs and filenames
- Register each artifact in `artifacts/index.yaml`
- Preserve source chapter, workflow, stage, provider, and input provenance
- Make reruns idempotent: unchanged artifacts should not duplicate manifest
entries
- Add tests for malformed bundles, duplicate entities, and manifest updates
### T04 - VSM mapping analysis and evaluation workflows
```task
id: IB-WP-0013-T04
status: todo
priority: high
state_hub_task_id: "62696191-d6fa-4d34-bf18-97f390a31b61"
```
- Recreate `map-to-vsm` as an explicit assisted workflow
- Recreate `synthesize-analysis` as an explicit assisted workflow
- Recreate entity evaluation as an explicit assisted workflow that writes
successor `artifact_id` evaluation files
- Ensure generated mappings and relations can be parsed by current semantic
models or clearly identify required model extensions
- Connect generated evaluations to metrics/history and viability checks
### T05 - Wealth VSM pilot scale-up acceptance
```task
id: IB-WP-0013-T05
status: todo
priority: medium
state_hub_task_id: "fe8dd175-9630-4fe1-99aa-2f3e58172a52"
```
- Prove one-chapter regeneration end to end with deterministic tests
- Add a committed pilot report comparing regenerated successor output with the
legacy generated output shape
- Add docs for running a live provider-backed generation outside the default
test suite
- Document cost, rate-limit, resume, and reproducibility guidance
- Define the acceptance path for scaling from one chapter to the full corpus
## Acceptance
- A user can inspect, plan, and run the Wealth/VSM generation workflow over a
one-chapter pilot without using the old `markitect-project` process script
- Default tests use fake adapters and are deterministic
- Generated entities are split into stable files and registered in the manifest
- Evaluation outputs use successor `artifact_id` semantics and feed metrics
history
- The workflow clearly distinguishes deterministic template stages from
assisted provider-backed stages
- Remaining full-corpus risks are documented before any large generation run
## Relationship To IB-WP-0014
This workplan can start on the current local-folder backend. It should avoid
hard-coding storage assumptions where reasonable, but it is not blocked by the
backend abstraction workplan.

View File

@@ -0,0 +1,157 @@
---
id: IB-WP-0014
type: workplan
title: "Infospace Backend Abstraction"
domain: markitect
repo: infospace-bench
status: planned
owner: markitect
topic_slug: markitect
created: "2026-05-14"
updated: "2026-05-14"
state_hub_workstream_slug: "ib-wp-0014-infospace-backend-abstraction"
state_hub_workstream_id: "c2d23ee7-6b2b-4db0-b660-a9e295c94956"
---
# IB-WP-0014 - Infospace Backend Abstraction
## Goal
Allow an infospace to live behind a selectable backend instead of assuming only
a local filesystem directory.
Target backends:
- local folder
- remote or mounted folder
- S3-compatible bucket/prefix
- git repository
This is a new successor capability, not legacy parity. It should be designed so
generation, validation, evaluation, and inspection logic do not care where the
infospace is physically stored.
## Intent
The current repo is intentionally file-backed. That should remain the default.
The improvement is to formalize the storage boundary so the same lifecycle and
workflow APIs can operate on other backing stores through explicit adapters.
The design should keep `infospace-bench` as an application workspace, not a
durable storage engine. Credentials, remote locking, rich audit, and runtime
orchestration should be delegated or integrated carefully rather than invented
inside core application logic.
## Non-Goals
- Replace the existing local folder behavior.
- Require S3 or git dependencies for ordinary local use.
- Store secrets in `infospace.yaml`.
- Build a general database, sync server, or object storage service inside this
repo.
- Solve multi-writer conflict resolution beyond clear detection and reporting
in the first pass.
## Tasks
### T01 - Backend contract and URI model
```task
id: IB-WP-0014-T01
status: todo
priority: high
state_hub_task_id: "75b7df31-066a-47ac-bb94-a4ae908569fd"
```
- Define a backend-neutral infospace location model
- Support local paths without changing current user flows
- Define URI examples for local, mounted folder, S3-compatible, and git-backed
infospaces
- Define backend capabilities: read, write, list, exists, atomic write,
digest, version, sync, lock, and credentials-required
- Document where credentials and remote configuration are allowed to live
### T02 - Local and remote folder backend baseline
```task
id: IB-WP-0014-T02
status: todo
priority: high
state_hub_task_id: "2e33d98a-0cd0-4608-b7a1-76c5a7bb26ca"
```
- Refactor lifecycle reads and writes behind a backend adapter while preserving
current `Path`-based behavior
- Keep local folders as the default backend
- Treat mounted or remote folders as folder backends when the OS exposes them
as paths
- Add tests proving current pilots and CLI commands still work unchanged
- Add tests for backend errors such as missing files, write failures, and
unsafe paths
### T03 - S3 object-store backend adapter
```task
id: IB-WP-0014-T03
status: todo
priority: high
state_hub_task_id: "e2ee9497-0a6c-419f-a045-fb994bf73b05"
```
- Design an optional S3-compatible backend adapter
- Use a fake in-memory or local test double for default tests
- Keep real credentials and network calls out of the default test suite
- Define object key layout for manifests, artifacts, reports, exports, and run
records
- Decide how digests, optimistic concurrency, and partial writes are reported
### T04 - Git repository backend adapter
```task
id: IB-WP-0014-T04
status: todo
priority: high
state_hub_task_id: "e2938c5b-e6c2-468a-b782-b39962e5a81b"
```
- Support opening or initializing an infospace backed by a git repository
- Prove behavior against local test repositories before any remote network
workflow
- Define when commits are created, when they are only suggested, and how dirty
trees are reported
- Keep automatic commits opt-in
- Preserve compatibility with the existing State Hub and workplan workflow
### T05 - Backend CLI docs and migration path
```task
id: IB-WP-0014-T05
status: todo
priority: medium
state_hub_task_id: "20d75d49-f62a-4236-a895-698cd2fae45a"
```
- Expose backend selection in CLI/API docs
- Add examples for local, mounted folder, S3-compatible, and git-backed
infospaces
- Document backend capabilities and limitations
- Add a migration guide for moving a local infospace to another backend
- Update acceptance docs so backend support is distinct from Wealth/VSM
generation parity
## Acceptance
- Existing local-folder behavior remains backward compatible
- Lifecycle, validation, inspection, workflow, metrics, history, and graph
commands can operate through the backend contract
- Default tests remain deterministic and do not require network credentials
- Backend-specific capabilities and failure modes are visible to callers
- S3 and git support are optional and clearly documented
- Storage backend concerns stay separate from generation workflow semantics
## Relationship To IB-WP-0013
`IB-WP-0013` should prove generation parity on the default local backend first.
This workplan then makes the same infospace operations portable across storage
backends.