diff --git a/roadmap/infospace-tooling/viable-information-spaces.md b/roadmap/infospace-tooling/viable-information-spaces.md new file mode 100644 index 00000000..bc1178c9 --- /dev/null +++ b/roadmap/infospace-tooling/viable-information-spaces.md @@ -0,0 +1,381 @@ +# Viable Information Spaces + +*A preliminary introduction to the concepts, structure, and purpose of +viable information spaces as a framework for structured knowledge work.* + +--- + +## What is an Information Space? + +An information space is a curated collection of concepts — each precisely +defined, grounded in source material, and connected to the others — that +together explain a topic. It is not a database, not a knowledge graph in +the technical sense, and not a document collection. It is closer to what +a domain expert carries in their head: a working vocabulary of ideas, +their relationships, and the judgment to know which idea applies where. + +The difference is that an information space makes this vocabulary +**explicit, evaluable, and composable**. Every concept has a written +definition. Every relationship can be traced. The quality of the whole +collection can be measured and improved over time. + +We use the term **infospace** as shorthand. + +--- + +## Why "Viable"? + +The word comes from Stafford Beer's Viable System Model, but the idea +generalises beyond it. A viable system is one that can maintain a +separate existence — it is complete enough to function, coherent enough +to hold together, and adaptive enough to improve when circumstances +change. + +A **viable infospace** has the same properties: + +- **Complete enough** — it covers the topic well enough to answer the + questions it was built to answer. Not every detail, but every concept + that matters. +- **Coherent enough** — its concepts connect into an explanatory web, + not a disconnected list. You can trace how one idea leads to another. +- **Consistent enough** — concepts don't contradict each other. Terms + are used the same way throughout. Definitions don't go in circles. +- **Balanced enough** — concepts operate at comparable levels of + abstraction. The infospace doesn't mix foundational theories with + trivial observations without acknowledging the difference. +- **Non-redundant enough** — each concept earns its place. Two concepts + that mean the same thing should be one concept. + +None of these are absolute. "Enough" is defined by the purpose. An +infospace built for teaching needs different coverage than one built for +research. Viability is a profile of scores against thresholds that the +user sets. + +--- + +## The Anatomy of an Infospace + +### Topic + +Every infospace is built to explain something specific. The **topic** is +the subject matter: a text, a system, a body of knowledge, a problem +domain. In our first example, the topic is Adam Smith's *The Wealth of +Nations* — the economic ideas contained in that specific work. + +A topic sits within a broader **domain** (economics, biology, software +engineering) but is more focused. The domain provides context; the topic +provides the source material from which concepts are extracted. + +### Entities + +The atomic units of an infospace are its **entities** — the individual +concepts, mechanisms, and observations that constitute its vocabulary. +Each entity has: + +- A **name** and unique identifier +- A **definition** — precise, non-circular, distinguishable from + neighbouring concepts +- **Provenance** — where it came from (which chapter, passage, or data + source) +- A **domain placement** — which area of the topic it belongs to +- **Quality scores** — how well it is defined, grounded, and connected + +Entities are stored as individual files, one concept per file. This makes +them independently addressable, diffable, and composable. + +### Schemas + +**Schemas** define what a well-formed entity looks like: which sections +it must have, what validation rules apply, what quality metrics are +evaluated. A schema is not code — it is a markdown document that both +humans and LLMs read as instructions. + +Schemas serve two purposes: + +1. **Structural** — they tell the extraction pipeline what to produce + (required sections, word count ranges, heading formats) +2. **Evaluative** — they define quality rubrics against which each entity + is scored (definition precision, source grounding, explanatory value) + +By changing a schema, you change what the infospace considers "good" +without changing any infrastructure. + +### Disciplines + +Here is where things get interesting. An infospace doesn't just catalogue +what's in the source material — it looks at the source through a +**lens**. We call this lens a **discipline**: a structured framework of +concepts from another domain, applied to illuminate the topic at hand. + +In our example, the discipline is Stafford Beer's Viable System Model — +a set of concepts from systems theory (System 1 through System 5, +recursion, variety, viability) applied to the economic ideas in Smith's +work. The VSM provides the analytical structure; Smith provides the raw +material. + +The key insight: **a discipline is itself an infospace.** The VSM +concepts (S1-S5, recursion, variety, algedonic signals) form their own +curated, evaluable collection of ideas. To use the VSM as a discipline, +it must first be a viable infospace in its own right — its concepts must +be well-defined, coherent, and complete. + +This leads to a recursive property: infospaces can be built on top of +other infospaces. The Wealth of Nations infospace, viewed through the +VSM lens, could itself become a discipline applied to analyse a modern +supply chain. Each layer adds structure without losing the detail +beneath it. + +--- + +## How Infospaces Are Built + +Building an infospace is an incremental process with four repeating +phases: + +### 1. Extract + +Source material is processed one unit at a time (a chapter, a document, +a dataset). For each unit, an LLM extracts entities according to the +schemas and guidelines. Entities that already exist are recognised and +skipped — the infospace grows by accumulation, not duplication. + +### 2. Map + +Extracted entities are mapped to the discipline. In our example, each +economic concept is mapped to a VSM system with a strength rating and +rationale. This is where the discipline lens does its work: it forces +the question "what role does this concept play in the larger system?" + +### 3. Evaluate + +After extraction and mapping, the infospace is evaluated at two levels: + +- **Per-entity**: each concept is scored against quality rubrics. Is the + definition precise? Is it grounded in the source? Does it connect + meaningfully to the discipline? +- **Collection-level**: the set of concepts is assessed for redundancy, + coverage, coherence, consistency, and granularity balance. + +Evaluation produces structured, machine-readable scores — not prose +narratives. These scores are tracked over time. + +### 4. Refine + +Evaluation reveals what needs improvement. Redundant concepts are merged +or archived. Coverage gaps are addressed by re-extracting with improved +guidelines. Inconsistencies are resolved by clarifying definitions. +Guidelines and schemas are updated. The cycle repeats. + +This loop — extract, map, evaluate, refine — is the heartbeat of a +viable infospace. Each iteration makes the infospace more viable: +more complete, more coherent, more consistent. + +--- + +## How Infospaces Are Evaluated + +Quality is assessed through two complementary mechanisms: + +### LLM Evaluation + +A language model reads an entity (or a pair of entities) and judges it +against defined rubrics. This captures qualitative aspects that can't be +computed mechanically: Is this definition actually precise? Does this +mapping rationale make sense? Are these two concepts really different? + +LLM evaluation is always **delegated** — it runs through prompt templates +and the platform's LLM integration, never through the human or agent +working on infrastructure. This separation keeps domain judgment in the +problem space. + +### Deterministic Aggregation + +Structured scores from LLM evaluation, plus metrics computed directly +from files (section counts, word lengths, graph properties, similarity +matrices), are aggregated into collection-level indicators. These are +numbers that can be tracked, diffed, and plotted: + +- **Redundancy ratio** — what fraction of concepts substantially overlap +- **Coverage ratio** — what fraction of the domain-discipline matrix is + populated +- **Graph density** — how connected the concept web is +- **Cycle count** — how many circular definition chains exist +- **Granularity entropy** — how balanced the abstraction levels are + +These indicators, compared against user-defined thresholds, determine +whether the infospace is **viable** for its intended purpose. + +--- + +## Five Concerns of Collection Quality + +Individual concept quality (is this definition good?) is necessary but +not sufficient. An infospace made of individually excellent concepts can +still fail as a collection. Five concerns capture what can go wrong: + +### Redundancy + +Do two concepts mean the same thing? Overlap wastes the reader's +attention and creates ambiguity about which concept to use. Redundancy is +detected through embedding similarity (are the definitions close in +meaning?) confirmed by LLM judgment (are they genuinely the same +concept, or merely related?). + +### Coverage + +Does the concept set cover the domain? Are there areas of the topic that +have no corresponding concepts? Coverage is assessed structurally (which +cells in the domain-discipline matrix are empty?) and functionally (can +the infospace answer the questions it was built to answer?). + +### Coherence + +Do the concepts form a connected web of explanations, or a fragmented +list of isolated ideas? Coherence is measured through graph analysis: +connected components (is everything reachable?), modularity (are there +meaningful clusters?), and bridge concepts (which ideas connect different +areas?). + +### Consistency + +Are concepts defined in terms of each other without contradiction? Are +there circular definition chains? Do definitions use terms that should +be concepts but aren't? Consistency is checked through dependency graph +analysis (cycles, undefined terms) and LLM pairwise judgment +(do related definitions contradict each other?). + +### Granularity Balance + +Are concepts at comparable levels of abstraction? An infospace that mixes +broad theoretical principles with narrow observations — without +acknowledging the difference — confuses more than it explains. Balance +is assessed by classifying each concept's abstraction level and measuring +the distribution. + +--- + +## Infospaces as Organisms + +The biological metaphor is deliberate. A viable organism maintains its +identity while exchanging material with its environment. It has internal +coherence (its parts work together), boundary integrity (it is +distinguishable from its surroundings), and adaptive capacity (it +responds to change). + +Infospaces exhibit the same properties: + +- **Internal coherence** — concepts connect and support each other +- **Boundary** — the topic and discipline define what belongs and what + doesn't +- **Adaptation** — evaluation and refinement allow the infospace to + improve + +And like organisms, infospaces don't exist in isolation. + +### Hierarchical Composition + +One infospace can serve as a discipline for another. The VSM infospace +provides the lens for the Wealth of Nations infospace, which could +provide the lens for a supply chain infospace. Each layer adds structure +and interpretive power. This is analogous to biological organisation: +cells compose into tissues, tissues into organs, organs into organisms. + +For this to work, the lower-level infospace must be viable — you can't +build reliable analysis on a shaky foundation. A discipline that is +incomplete or inconsistent will produce unreliable mappings. + +### Network Composition + +Infospaces can also relate laterally. Two infospaces at the same level +might share concepts, reference each other's entities, or provide +complementary views of overlapping domains. A Wealth of Nations infospace +and a Marx's Capital infospace might share economic entities while +differing in their analytical discipline. + +This networked structure mirrors how knowledge actually works: fields +overlap, vocabularies are shared and contested, and understanding grows +by connecting islands of well-organised thought. + +### Swarm Behaviour + +When many infospaces exist and interact, emergent properties appear. +Common entities across many infospaces become well-tested through +repeated evaluation in different contexts. Concepts that survive across +multiple disciplines are more likely to be fundamental. Gaps visible from +one perspective may be filled by insights from another. + +This is speculative territory for now, but the tooling should be designed +with it in mind: infospaces as first-class, composable, addressable +units of knowledge. + +--- + +## The Role of Tooling + +An infospace is a living artefact that requires ongoing maintenance. The +tooling must support every phase of the lifecycle: + +### Creating an infospace + +Declaring a topic, binding disciplines, defining schemas and competency +questions, setting viability thresholds. This should be a single +configuration step, not a programming exercise. + +### Populating an infospace + +Processing source material through the extract-map pipeline, one unit at +a time. Progress is tracked. Each addition is committed to version +history. + +### Evaluating an infospace + +Running per-entity and collection-level checks. Producing structured, +machine-readable scores. Comparing against viability thresholds. +Identifying specific issues (this entity is redundant, this domain gap +needs filling, these definitions contradict). + +### Refining an infospace + +Acting on evaluation results: archiving redundant entities, re-extracting +with improved guidelines, updating schemas, re-evaluating. Every change +is traceable. + +### Composing infospaces + +Binding one infospace as a discipline for another. Checking that the +discipline is viable. Propagating changes when the discipline's concepts +are updated. + +### Monitoring an infospace + +Tracking metrics over time. Seeing how coverage, coherence, and +consistency evolve as content is added. Detecting regressions when a +re-extraction reduces quality. + +The tooling should present these operations as simple, well-documented +commands — not as infrastructure details. The user thinks in terms of +"evaluate my infospace" and "check for redundancy", not in terms of +embedding vectors and graph algorithms. + +--- + +## Where We Are + +We have built the first example infospace: 85 economic entities from +Adam Smith's *The Wealth of Nations*, mapped to Beer's Viable System +Model, with schemas, prompt templates, and a chapter-by-chapter +pipeline. + +This example has taught us what works (incremental extraction, +deduplication, flat canonical entity sets, transclusion views) and what's +missing (per-concept evaluation, collection-level checks, composition +model, clean tooling commands). + +The work ahead is to generalise from this example: build the platform +capabilities needed, create the tooling layer that makes infospace +operations accessible, and then revisit the example as both a validation +and a tutorial. + +The goal is that anyone with a body of source material and an analytical +framework can create a viable infospace — and that infospaces, once +built, become reusable intellectual tools for future work.