docs: preliminary introduction to Viable Information Spaces

Conceptual overview of infospaces as structured, evaluable, composable knowledge collections. Establishes the vocabulary (topic, discipline, entity, viability), the build cycle (extract, map, evaluate, refine), the five collection quality concerns, and the composition model (hierarchical, networked, swarm). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 23:54:53 +01:00
parent 4ce856d4d0
commit b5e994b014
1 changed files with 381 additions and 0 deletions
--- a/roadmap/infospace-tooling/viable-information-spaces.md
+++ b/roadmap/infospace-tooling/viable-information-spaces.md
@@ -0,0 +1,381 @@
+# Viable Information Spaces
+
+*A preliminary introduction to the concepts, structure, and purpose of
+viable information spaces as a framework for structured knowledge work.*
+
+---
+
+## What is an Information Space?
+
+An information space is a curated collection of concepts — each precisely
+defined, grounded in source material, and connected to the others — that
+together explain a topic. It is not a database, not a knowledge graph in
+the technical sense, and not a document collection. It is closer to what
+a domain expert carries in their head: a working vocabulary of ideas,
+their relationships, and the judgment to know which idea applies where.
+
+The difference is that an information space makes this vocabulary
+**explicit, evaluable, and composable**. Every concept has a written
+definition. Every relationship can be traced. The quality of the whole
+collection can be measured and improved over time.
+
+We use the term **infospace** as shorthand.
+
+---
+
+## Why "Viable"?
+
+The word comes from Stafford Beer's Viable System Model, but the idea
+generalises beyond it. A viable system is one that can maintain a
+separate existence — it is complete enough to function, coherent enough
+to hold together, and adaptive enough to improve when circumstances
+change.
+
+A **viable infospace** has the same properties:
+
+- **Complete enough** — it covers the topic well enough to answer the
+  questions it was built to answer. Not every detail, but every concept
+  that matters.
+- **Coherent enough** — its concepts connect into an explanatory web,
+  not a disconnected list. You can trace how one idea leads to another.
+- **Consistent enough** — concepts don't contradict each other. Terms
+  are used the same way throughout. Definitions don't go in circles.
+- **Balanced enough** — concepts operate at comparable levels of
+  abstraction. The infospace doesn't mix foundational theories with
+  trivial observations without acknowledging the difference.
+- **Non-redundant enough** — each concept earns its place. Two concepts
+  that mean the same thing should be one concept.
+
+None of these are absolute. "Enough" is defined by the purpose. An
+infospace built for teaching needs different coverage than one built for
+research. Viability is a profile of scores against thresholds that the
+user sets.
+
+---
+
+## The Anatomy of an Infospace
+
+### Topic
+
+Every infospace is built to explain something specific. The **topic** is
+the subject matter: a text, a system, a body of knowledge, a problem
+domain. In our first example, the topic is Adam Smith's *The Wealth of
+Nations* — the economic ideas contained in that specific work.
+
+A topic sits within a broader **domain** (economics, biology, software
+engineering) but is more focused. The domain provides context; the topic
+provides the source material from which concepts are extracted.
+
+### Entities
+
+The atomic units of an infospace are its **entities** — the individual
+concepts, mechanisms, and observations that constitute its vocabulary.
+Each entity has:
+
+- A **name** and unique identifier
+- A **definition** — precise, non-circular, distinguishable from
+  neighbouring concepts
+- **Provenance** — where it came from (which chapter, passage, or data
+  source)
+- A **domain placement** — which area of the topic it belongs to
+- **Quality scores** — how well it is defined, grounded, and connected
+
+Entities are stored as individual files, one concept per file. This makes
+them independently addressable, diffable, and composable.
+
+### Schemas
+
+**Schemas** define what a well-formed entity looks like: which sections
+it must have, what validation rules apply, what quality metrics are
+evaluated. A schema is not code — it is a markdown document that both
+humans and LLMs read as instructions.
+
+Schemas serve two purposes:
+
+1. **Structural** — they tell the extraction pipeline what to produce
+   (required sections, word count ranges, heading formats)
+2. **Evaluative** — they define quality rubrics against which each entity
+   is scored (definition precision, source grounding, explanatory value)
+
+By changing a schema, you change what the infospace considers "good"
+without changing any infrastructure.
+
+### Disciplines
+
+Here is where things get interesting. An infospace doesn't just catalogue
+what's in the source material — it looks at the source through a
+**lens**. We call this lens a **discipline**: a structured framework of
+concepts from another domain, applied to illuminate the topic at hand.
+
+In our example, the discipline is Stafford Beer's Viable System Model —
+a set of concepts from systems theory (System 1 through System 5,
+recursion, variety, viability) applied to the economic ideas in Smith's
+work. The VSM provides the analytical structure; Smith provides the raw
+material.
+
+The key insight: **a discipline is itself an infospace.** The VSM
+concepts (S1-S5, recursion, variety, algedonic signals) form their own
+curated, evaluable collection of ideas. To use the VSM as a discipline,
+it must first be a viable infospace in its own right — its concepts must
+be well-defined, coherent, and complete.
+
+This leads to a recursive property: infospaces can be built on top of
+other infospaces. The Wealth of Nations infospace, viewed through the
+VSM lens, could itself become a discipline applied to analyse a modern
+supply chain. Each layer adds structure without losing the detail
+beneath it.
+
+---
+
+## How Infospaces Are Built
+
+Building an infospace is an incremental process with four repeating
+phases:
+
+### 1. Extract
+
+Source material is processed one unit at a time (a chapter, a document,
+a dataset). For each unit, an LLM extracts entities according to the
+schemas and guidelines. Entities that already exist are recognised and
+skipped — the infospace grows by accumulation, not duplication.
+
+### 2. Map
+
+Extracted entities are mapped to the discipline. In our example, each
+economic concept is mapped to a VSM system with a strength rating and
+rationale. This is where the discipline lens does its work: it forces
+the question "what role does this concept play in the larger system?"
+
+### 3. Evaluate
+
+After extraction and mapping, the infospace is evaluated at two levels:
+
+- **Per-entity**: each concept is scored against quality rubrics. Is the
+  definition precise? Is it grounded in the source? Does it connect
+  meaningfully to the discipline?
+- **Collection-level**: the set of concepts is assessed for redundancy,
+  coverage, coherence, consistency, and granularity balance.
+
+Evaluation produces structured, machine-readable scores — not prose
+narratives. These scores are tracked over time.
+
+### 4. Refine
+
+Evaluation reveals what needs improvement. Redundant concepts are merged
+or archived. Coverage gaps are addressed by re-extracting with improved
+guidelines. Inconsistencies are resolved by clarifying definitions.
+Guidelines and schemas are updated. The cycle repeats.
+
+This loop — extract, map, evaluate, refine — is the heartbeat of a
+viable infospace. Each iteration makes the infospace more viable:
+more complete, more coherent, more consistent.
+
+---
+
+## How Infospaces Are Evaluated
+
+Quality is assessed through two complementary mechanisms:
+
+### LLM Evaluation
+
+A language model reads an entity (or a pair of entities) and judges it
+against defined rubrics. This captures qualitative aspects that can't be
+computed mechanically: Is this definition actually precise? Does this
+mapping rationale make sense? Are these two concepts really different?
+
+LLM evaluation is always **delegated** — it runs through prompt templates
+and the platform's LLM integration, never through the human or agent
+working on infrastructure. This separation keeps domain judgment in the
+problem space.
+
+### Deterministic Aggregation
+
+Structured scores from LLM evaluation, plus metrics computed directly
+from files (section counts, word lengths, graph properties, similarity
+matrices), are aggregated into collection-level indicators. These are
+numbers that can be tracked, diffed, and plotted:
+
+- **Redundancy ratio** — what fraction of concepts substantially overlap
+- **Coverage ratio** — what fraction of the domain-discipline matrix is
+  populated
+- **Graph density** — how connected the concept web is
+- **Cycle count** — how many circular definition chains exist
+- **Granularity entropy** — how balanced the abstraction levels are
+
+These indicators, compared against user-defined thresholds, determine
+whether the infospace is **viable** for its intended purpose.
+
+---
+
+## Five Concerns of Collection Quality
+
+Individual concept quality (is this definition good?) is necessary but
+not sufficient. An infospace made of individually excellent concepts can
+still fail as a collection. Five concerns capture what can go wrong:
+
+### Redundancy
+
+Do two concepts mean the same thing? Overlap wastes the reader's
+attention and creates ambiguity about which concept to use. Redundancy is
+detected through embedding similarity (are the definitions close in
+meaning?) confirmed by LLM judgment (are they genuinely the same
+concept, or merely related?).
+
+### Coverage
+
+Does the concept set cover the domain? Are there areas of the topic that
+have no corresponding concepts? Coverage is assessed structurally (which
+cells in the domain-discipline matrix are empty?) and functionally (can
+the infospace answer the questions it was built to answer?).
+
+### Coherence
+
+Do the concepts form a connected web of explanations, or a fragmented
+list of isolated ideas? Coherence is measured through graph analysis:
+connected components (is everything reachable?), modularity (are there
+meaningful clusters?), and bridge concepts (which ideas connect different
+areas?).
+
+### Consistency
+
+Are concepts defined in terms of each other without contradiction? Are
+there circular definition chains? Do definitions use terms that should
+be concepts but aren't? Consistency is checked through dependency graph
+analysis (cycles, undefined terms) and LLM pairwise judgment
+(do related definitions contradict each other?).
+
+### Granularity Balance
+
+Are concepts at comparable levels of abstraction? An infospace that mixes
+broad theoretical principles with narrow observations — without
+acknowledging the difference — confuses more than it explains. Balance
+is assessed by classifying each concept's abstraction level and measuring
+the distribution.
+
+---
+
+## Infospaces as Organisms
+
+The biological metaphor is deliberate. A viable organism maintains its
+identity while exchanging material with its environment. It has internal
+coherence (its parts work together), boundary integrity (it is
+distinguishable from its surroundings), and adaptive capacity (it
+responds to change).
+
+Infospaces exhibit the same properties:
+
+- **Internal coherence** — concepts connect and support each other
+- **Boundary** — the topic and discipline define what belongs and what
+  doesn't
+- **Adaptation** — evaluation and refinement allow the infospace to
+  improve
+
+And like organisms, infospaces don't exist in isolation.
+
+### Hierarchical Composition
+
+One infospace can serve as a discipline for another. The VSM infospace
+provides the lens for the Wealth of Nations infospace, which could
+provide the lens for a supply chain infospace. Each layer adds structure
+and interpretive power. This is analogous to biological organisation:
+cells compose into tissues, tissues into organs, organs into organisms.
+
+For this to work, the lower-level infospace must be viable — you can't
+build reliable analysis on a shaky foundation. A discipline that is
+incomplete or inconsistent will produce unreliable mappings.
+
+### Network Composition
+
+Infospaces can also relate laterally. Two infospaces at the same level
+might share concepts, reference each other's entities, or provide
+complementary views of overlapping domains. A Wealth of Nations infospace
+and a Marx's Capital infospace might share economic entities while
+differing in their analytical discipline.
+
+This networked structure mirrors how knowledge actually works: fields
+overlap, vocabularies are shared and contested, and understanding grows
+by connecting islands of well-organised thought.
+
+### Swarm Behaviour
+
+When many infospaces exist and interact, emergent properties appear.
+Common entities across many infospaces become well-tested through
+repeated evaluation in different contexts. Concepts that survive across
+multiple disciplines are more likely to be fundamental. Gaps visible from
+one perspective may be filled by insights from another.
+
+This is speculative territory for now, but the tooling should be designed
+with it in mind: infospaces as first-class, composable, addressable
+units of knowledge.
+
+---
+
+## The Role of Tooling
+
+An infospace is a living artefact that requires ongoing maintenance. The
+tooling must support every phase of the lifecycle:
+
+### Creating an infospace
+
+Declaring a topic, binding disciplines, defining schemas and competency
+questions, setting viability thresholds. This should be a single
+configuration step, not a programming exercise.
+
+### Populating an infospace
+
+Processing source material through the extract-map pipeline, one unit at
+a time. Progress is tracked. Each addition is committed to version
+history.
+
+### Evaluating an infospace
+
+Running per-entity and collection-level checks. Producing structured,
+machine-readable scores. Comparing against viability thresholds.
+Identifying specific issues (this entity is redundant, this domain gap
+needs filling, these definitions contradict).
+
+### Refining an infospace
+
+Acting on evaluation results: archiving redundant entities, re-extracting
+with improved guidelines, updating schemas, re-evaluating. Every change
+is traceable.
+
+### Composing infospaces
+
+Binding one infospace as a discipline for another. Checking that the
+discipline is viable. Propagating changes when the discipline's concepts
+are updated.
+
+### Monitoring an infospace
+
+Tracking metrics over time. Seeing how coverage, coherence, and
+consistency evolve as content is added. Detecting regressions when a
+re-extraction reduces quality.
+
+The tooling should present these operations as simple, well-documented
+commands — not as infrastructure details. The user thinks in terms of
+"evaluate my infospace" and "check for redundancy", not in terms of
+embedding vectors and graph algorithms.
+
+---
+
+## Where We Are
+
+We have built the first example infospace: 85 economic entities from
+Adam Smith's *The Wealth of Nations*, mapped to Beer's Viable System
+Model, with schemas, prompt templates, and a chapter-by-chapter
+pipeline.
+
+This example has taught us what works (incremental extraction,
+deduplication, flat canonical entity sets, transclusion views) and what's
+missing (per-concept evaluation, collection-level checks, composition
+model, clean tooling commands).
+
+The work ahead is to generalise from this example: build the platform
+capabilities needed, create the tooling layer that makes infospace
+operations accessible, and then revisit the example as both a validation
+and a tutorial.
+
+The goal is that anyone with a body of source material and an analytical
+framework can create a viable infospace — and that infospaces, once
+built, become reusable intellectual tools for future work.