infospace-bench/SCOPE.md

# SCOPE

> This file helps humans and agents quickly understand what this repository is
> for, when it is relevant, and where its boundaries are.

---

## One-liner

File-backed application workspace and current CLI for creating, developing,
evaluating, inspecting, exporting, archiving, and iterating concrete structured
knowledge spaces.

---

## Core Idea

`infospace-bench` turns the infospace ideas that emerged in `markitect-main`
into a focused successor project. It hosts real infospaces, their manifests,
profiles, workflow declarations, deterministic fixtures, generation runs,
inspection outputs, budget records, archive records, exports, and pilot reports.

The strategic layer model remains:

- `markitect-tool` owns the syntax layer: markdown structure, structured
  markdown validation, document transformations, and reusable Markitect
  contracts.
- `kontextual-engine` owns the system layer: durable persistence,
  orchestration, permissions, retrieval, workflow runtime, and audit concerns.
- `infospace-bench` owns the application layer: concrete applied knowledge
  spaces, their lifecycle, evaluation methodology, and reference pilots.

Current supporting integrations include:

- `artifact-store` for durable content-addressed packages, archive identity,
  storage, retention, and archive backend concerns.
- `llm-connect` for reusable provider-routing primitives and quality-ledger
  policy mechanics.

These supporting integrations do not redefine the strategic layer ownership.
The default operating mode is file-backed and inspectable. Optional integrations
are explicit, reviewable adapters rather than hidden infrastructure drift.

`infospace-bench` should also act as a reference environment for applied
knowledge-engineering practice: concrete pilots, reviewable outputs, reusable
patterns, and clear evidence of what should move into lower layers.

---

## Primary Actors

- Knowledge engineers and developers building structured content systems
- Researchers and practitioners organizing or analyzing domain knowledge
- Automation systems (`atm`) executing scoped knowledge workflows
- LLM agents (`agt`) helping generate, review, and refine infospaces
- Human reviewers deciding when generated artifacts, metrics, and archives are
  good enough to trust or preserve

---

## In Scope

- Defining infospaces as first-class, manifest-backed project artifacts
- Populating infospaces from local sources, EPUB-like inputs, profiles, and
  domain-specific workflow templates
- Running deterministic fixture workflows and explicit live provider workflows
  for generation, extraction, relation mapping, evaluation, and reports
- Reviewing, pruning, revising, refining, and exporting knowledge artifacts as
  infospaces evolve
- Evaluating entity quality, collection quality, viability thresholds, metrics
  history, and plan-vs-actual generation behavior
- Inspecting entities, relations, artifact graphs, provenance, workflow runs,
  provider metadata, budget records, and archive records
- Capturing reusable applied patterns that may later move into lower-layer repos
- Maintaining reference pilots that make abstract infospace concepts concrete
- Documenting best-practice evidence for applied knowledge engineering without
  turning this repo into the reusable infrastructure layer
- Planning and recording one-way syncs from file-backed artifacts into an engine
  adapter while keeping the local manifest authoritative
- Archiving reviewed infospace snapshots through `artifact-store` without making
  archives a substitute for the working folder or git

---

## Out of Scope

- Low-level markdown parsing, schema syntax primitives, or rendering engines
- Generic persistence infrastructure, retrieval systems, permissions, audit, or
  workflow orchestration platforms
- Artifact storage backends, retention-policy implementation, replication, or
  backup operations
- General content management, publishing, or WYSIWYG editing
- Reusable provider-routing policy engines or cross-repo LLM infrastructure
- Secret management for provider keys, archive backends, or engine deployments
- Silent coupling to a single LLM vendor or runtime
- Final ownership of production domain artifacts once a dedicated domain repo
  should take over

---

## Relevant When

- A real corpus, book, project, or organization needs an explicit infospace
- Knowledge artifacts need systematic evaluation and iteration history
- Relationship structure, provenance, and quality metrics need inspection over
  time
- Agent-assisted knowledge development needs scoped project context
- A MarkiTect infospace experiment needs to be migrated, pruned, compared, or
  reimplemented
- Generation work needs deterministic fixture runs, guarded live model runs,
  routing observations, and budget evidence in one inspectable workspace
- A reviewed infospace milestone needs a content-addressed archive package

---

## Not Relevant When

- The work is only markdown syntax manipulation
- The work is engine/runtime infrastructure or durable memory persistence
- The work is only artifact-store backend, retention, or storage operations
- A finalized domain repository should own the production artifact
- A few simple documents only need ordinary editing
- A live provider run is being attempted without budget planning, review gates,
  and explicit secrets supplied outside the repo

---

## Current State

- Status: implemented application-layer workspace with a Python CLI, test suite,
  reference docs, committed pilots, and deterministic fixtures
- Package entry point: `infospace-bench` / `python3 -m infospace_bench`
- Service posture: `INTENT.md` still treats "service" as a strategic maturity
  target, but this repo does not currently ship a standalone server or API
  service; the implemented surface is file-backed plus CLI
- Current CLI surface: lifecycle, artifact add/export/validate, readiness
  status, entity and relation listing, metrics/checks, history diffs, viability,
  graph export, workflow inspect/plan/run, source generation, routing ledger
  summaries, budget rollups, archive/restore, and engine sync planning
- Current infospaces:
  - `bootstrap-pilot`
  - `wealth-vsm-legacy-slice`
  - `wealth-vsm-generation-pilot`
  - `agentic-memory-profile-pilot`
  - `lefevre-reminiscences-of-a-stock-operator`
  - `patterns-of-it-securita-architecture`
- Current profiles: bundled `general-knowledge` and `trading-literature`, with
  the Lefevre infospace carrying a checked-in trading-literature profile copy
- Current provider posture: fixture runs are deterministic by default;
  OpenRouter and routed live runs are explicit, budget-aware, and guarded by
  environment-provided credentials
- Current archive posture: `infospace-bench archive`, `archive-list`, and
  `restore` integrate with `artifact-store` for reviewed snapshots
- Current engine posture: local file-backed manifests remain authoritative;
  engine sync is dry-run by default and currently uses an inspectable local
  adapter store

---

## Important Boundaries

- `artifacts/index.yaml` is the authoritative manifest for an infospace in this
  repo.
- Service or API behavior must be added explicitly; it should not be inferred
  from the strategic service wording in `INTENT.md`.
- Generated outputs, budget records, metrics, workflow runs, and reports are
  evidence for review; they do not silently become durable engine state.
- Live LLM output is review material. Scaling from a one-chapter or bounded run
  to a larger corpus requires explicit planning and human review.
- Archives are immutable evidence packages. Use git for in-flight working state
  and artifact-store archives for milestone preservation.
- Successful applied patterns may inform `markitect-tool`, `kontextual-engine`,
  `artifact-store`, or `llm-connect`, but this repo should not absorb their
  reusable infrastructure responsibilities.

---

## Getting Oriented

- Start with: `README.md`, `INTENT.md`, and this file
- Product framing: `wiki/ProductRequirementsDocument.md`,
  `wiki/FunctionalRequirementsSpecification.md`
- Layout and lifecycle: `docs/infospace-layout.md`,
  `docs/evaluation-and-inspection.md`, `docs/entity-relation-model.md`
- Generation and pilots: `docs/generic-source-generator.md`,
  `docs/wealth-vsm-generation-pipeline.md`,
  `docs/agentic-memory-profile-pilot.md`, `docs/lefevre-readiness.md`
- Integrations and boundaries: `docs/markitect-tool-adapter.md`,
  `docs/kontextual-engine-boundary.md`, `docs/archive-integration.md`,
  `docs/routing-config.md`, `docs/replacement-readiness-decision.md`
- Code map: `src/infospace_bench/`
- Pilots: `infospaces/`
- Tests: `tests/`
- Workplans: `workplans/`
- State Hub and session rules: `AGENTS.md`, `CLAUDE.md`, and
  `.claude/rules/`