From 448b4329421b2f7ae78250280fcec056e3fd8588 Mon Sep 17 00:00:00 2001 From: tegwick Date: Thu, 14 May 2026 17:46:48 +0200 Subject: [PATCH] Workplans to actually create infospaces --- ...3-wealth-vsm-generation-pipeline-parity.md | 158 ++++++++++++++++++ ...B-WP-0014-infospace-backend-abstraction.md | 157 +++++++++++++++++ 2 files changed, 315 insertions(+) create mode 100644 workplans/IB-WP-0013-wealth-vsm-generation-pipeline-parity.md create mode 100644 workplans/IB-WP-0014-infospace-backend-abstraction.md diff --git a/workplans/IB-WP-0013-wealth-vsm-generation-pipeline-parity.md b/workplans/IB-WP-0013-wealth-vsm-generation-pipeline-parity.md new file mode 100644 index 0000000..1fcae84 --- /dev/null +++ b/workplans/IB-WP-0013-wealth-vsm-generation-pipeline-parity.md @@ -0,0 +1,158 @@ +--- +id: IB-WP-0013 +type: workplan +title: "Wealth VSM Generation Pipeline Parity" +domain: markitect +repo: infospace-bench +status: planned +owner: markitect +topic_slug: markitect +created: "2026-05-14" +updated: "2026-05-14" +state_hub_workstream_slug: "ib-wp-0013-wealth-vsm-generation-pipeline-parity" +state_hub_workstream_id: "74dc579e-9b03-4a00-b739-84b1007cfb94" +--- + +# IB-WP-0013 - Wealth VSM Generation Pipeline Parity + +## Goal + +Make `infospace-bench` capable of regenerating the Adam Smith +`Wealth of Nations` / VSM infospace through explicit, auditable workflows. + +This should replace the old `markitect-project` generation path without +copying its hidden provider calls, implicit output conventions, or monolithic +`process` command shape. + +## Intent + +The legacy implementation could run a chapter corpus through: + +- entity extraction +- VSM mapping +- chapter-level analysis synthesis +- entity evaluation +- classification and relation enrichment +- collection metrics + +The successor should express those stages as declared infospace workflows with +deterministic planning, fake-adapter tests, explicit assisted-generation +requests, stable manifest registration, and clear provenance. + +## Non-Goals + +- Recreate the old `process_chapters.py` script as-is. +- Hide provider-specific LLM calls behind a generic command. +- Require a live provider or network access for default tests. +- Commit the full regenerated Wealth/VSM output before a one-chapter pilot is + proven. +- Move durable runtime, retrieval, or audit responsibilities into + `infospace-bench`; those remain `kontextual-engine` concerns. + +## Tasks + +### T01 - Legacy pipeline decomposition and corpus map + +```task +id: IB-WP-0013-T01 +status: todo +priority: high +state_hub_task_id: "2c558d1e-290f-4e0e-abe6-37302cc31ac4" +``` + +- Map legacy `examples/infospace-with-history/process_chapters.py` +- Inventory old templates: `extract-entities`, `map-to-vsm`, + `synthesize-analysis`, `evaluate-entity`, and `assess-metrics` +- Inventory source corpus, guidelines, VSM reference artifacts, generated + outputs, processing logs, and metrics files +- Record what must be migrated, reframed, delegated, deferred, or retired +- Pick the first one-chapter golden target, preferably Book I Chapter III so it + aligns with the current pruned legacy slice + +### T02 - Assisted generation adapter and CLI boundary + +```task +id: IB-WP-0013-T02 +status: todo +priority: high +state_hub_task_id: "70beb49c-49a3-49f4-9b3a-a4c5bdb88485" +``` + +- Extend workflow execution so assisted stages can be executed through an + explicit adapter selected by the caller +- Keep dry-run planning as the default safe path +- Add a deterministic fake adapter for tests +- Persist assisted requests, provider metadata, generated outputs, and run + records +- Expose CLI/API behavior without embedding provider-specific code in core + workflow logic + +### T03 - Entity bundle splitting and manifest registration + +```task +id: IB-WP-0013-T03 +status: todo +priority: high +state_hub_task_id: "4a340077-f0ab-40fe-a0bc-0fa94a325774" +``` + +- Parse generated chapter-level entity bundles into individual entity artifacts +- Normalize stable artifact IDs and filenames +- Register each artifact in `artifacts/index.yaml` +- Preserve source chapter, workflow, stage, provider, and input provenance +- Make reruns idempotent: unchanged artifacts should not duplicate manifest + entries +- Add tests for malformed bundles, duplicate entities, and manifest updates + +### T04 - VSM mapping analysis and evaluation workflows + +```task +id: IB-WP-0013-T04 +status: todo +priority: high +state_hub_task_id: "62696191-d6fa-4d34-bf18-97f390a31b61" +``` + +- Recreate `map-to-vsm` as an explicit assisted workflow +- Recreate `synthesize-analysis` as an explicit assisted workflow +- Recreate entity evaluation as an explicit assisted workflow that writes + successor `artifact_id` evaluation files +- Ensure generated mappings and relations can be parsed by current semantic + models or clearly identify required model extensions +- Connect generated evaluations to metrics/history and viability checks + +### T05 - Wealth VSM pilot scale-up acceptance + +```task +id: IB-WP-0013-T05 +status: todo +priority: medium +state_hub_task_id: "fe8dd175-9630-4fe1-99aa-2f3e58172a52" +``` + +- Prove one-chapter regeneration end to end with deterministic tests +- Add a committed pilot report comparing regenerated successor output with the + legacy generated output shape +- Add docs for running a live provider-backed generation outside the default + test suite +- Document cost, rate-limit, resume, and reproducibility guidance +- Define the acceptance path for scaling from one chapter to the full corpus + +## Acceptance + +- A user can inspect, plan, and run the Wealth/VSM generation workflow over a + one-chapter pilot without using the old `markitect-project` process script +- Default tests use fake adapters and are deterministic +- Generated entities are split into stable files and registered in the manifest +- Evaluation outputs use successor `artifact_id` semantics and feed metrics + history +- The workflow clearly distinguishes deterministic template stages from + assisted provider-backed stages +- Remaining full-corpus risks are documented before any large generation run + +## Relationship To IB-WP-0014 + +This workplan can start on the current local-folder backend. It should avoid +hard-coding storage assumptions where reasonable, but it is not blocked by the +backend abstraction workplan. + diff --git a/workplans/IB-WP-0014-infospace-backend-abstraction.md b/workplans/IB-WP-0014-infospace-backend-abstraction.md new file mode 100644 index 0000000..e58c3ad --- /dev/null +++ b/workplans/IB-WP-0014-infospace-backend-abstraction.md @@ -0,0 +1,157 @@ +--- +id: IB-WP-0014 +type: workplan +title: "Infospace Backend Abstraction" +domain: markitect +repo: infospace-bench +status: planned +owner: markitect +topic_slug: markitect +created: "2026-05-14" +updated: "2026-05-14" +state_hub_workstream_slug: "ib-wp-0014-infospace-backend-abstraction" +state_hub_workstream_id: "c2d23ee7-6b2b-4db0-b660-a9e295c94956" +--- + +# IB-WP-0014 - Infospace Backend Abstraction + +## Goal + +Allow an infospace to live behind a selectable backend instead of assuming only +a local filesystem directory. + +Target backends: + +- local folder +- remote or mounted folder +- S3-compatible bucket/prefix +- git repository + +This is a new successor capability, not legacy parity. It should be designed so +generation, validation, evaluation, and inspection logic do not care where the +infospace is physically stored. + +## Intent + +The current repo is intentionally file-backed. That should remain the default. +The improvement is to formalize the storage boundary so the same lifecycle and +workflow APIs can operate on other backing stores through explicit adapters. + +The design should keep `infospace-bench` as an application workspace, not a +durable storage engine. Credentials, remote locking, rich audit, and runtime +orchestration should be delegated or integrated carefully rather than invented +inside core application logic. + +## Non-Goals + +- Replace the existing local folder behavior. +- Require S3 or git dependencies for ordinary local use. +- Store secrets in `infospace.yaml`. +- Build a general database, sync server, or object storage service inside this + repo. +- Solve multi-writer conflict resolution beyond clear detection and reporting + in the first pass. + +## Tasks + +### T01 - Backend contract and URI model + +```task +id: IB-WP-0014-T01 +status: todo +priority: high +state_hub_task_id: "75b7df31-066a-47ac-bb94-a4ae908569fd" +``` + +- Define a backend-neutral infospace location model +- Support local paths without changing current user flows +- Define URI examples for local, mounted folder, S3-compatible, and git-backed + infospaces +- Define backend capabilities: read, write, list, exists, atomic write, + digest, version, sync, lock, and credentials-required +- Document where credentials and remote configuration are allowed to live + +### T02 - Local and remote folder backend baseline + +```task +id: IB-WP-0014-T02 +status: todo +priority: high +state_hub_task_id: "2e33d98a-0cd0-4608-b7a1-76c5a7bb26ca" +``` + +- Refactor lifecycle reads and writes behind a backend adapter while preserving + current `Path`-based behavior +- Keep local folders as the default backend +- Treat mounted or remote folders as folder backends when the OS exposes them + as paths +- Add tests proving current pilots and CLI commands still work unchanged +- Add tests for backend errors such as missing files, write failures, and + unsafe paths + +### T03 - S3 object-store backend adapter + +```task +id: IB-WP-0014-T03 +status: todo +priority: high +state_hub_task_id: "e2ee9497-0a6c-419f-a045-fb994bf73b05" +``` + +- Design an optional S3-compatible backend adapter +- Use a fake in-memory or local test double for default tests +- Keep real credentials and network calls out of the default test suite +- Define object key layout for manifests, artifacts, reports, exports, and run + records +- Decide how digests, optimistic concurrency, and partial writes are reported + +### T04 - Git repository backend adapter + +```task +id: IB-WP-0014-T04 +status: todo +priority: high +state_hub_task_id: "e2938c5b-e6c2-468a-b782-b39962e5a81b" +``` + +- Support opening or initializing an infospace backed by a git repository +- Prove behavior against local test repositories before any remote network + workflow +- Define when commits are created, when they are only suggested, and how dirty + trees are reported +- Keep automatic commits opt-in +- Preserve compatibility with the existing State Hub and workplan workflow + +### T05 - Backend CLI docs and migration path + +```task +id: IB-WP-0014-T05 +status: todo +priority: medium +state_hub_task_id: "20d75d49-f62a-4236-a895-698cd2fae45a" +``` + +- Expose backend selection in CLI/API docs +- Add examples for local, mounted folder, S3-compatible, and git-backed + infospaces +- Document backend capabilities and limitations +- Add a migration guide for moving a local infospace to another backend +- Update acceptance docs so backend support is distinct from Wealth/VSM + generation parity + +## Acceptance + +- Existing local-folder behavior remains backward compatible +- Lifecycle, validation, inspection, workflow, metrics, history, and graph + commands can operate through the backend contract +- Default tests remain deterministic and do not require network credentials +- Backend-specific capabilities and failure modes are visible to callers +- S3 and git support are optional and clearly documented +- Storage backend concerns stay separate from generation workflow semantics + +## Relationship To IB-WP-0013 + +`IB-WP-0013` should prove generation parity on the default local backend first. +This workplan then makes the same infospace operations portable across storage +backends. +