docs: preliminary introduction to Viable Information Spaces

Conceptual overview of infospaces as structured, evaluable, composable
knowledge collections. Establishes the vocabulary (topic, discipline,
entity, viability), the build cycle (extract, map, evaluate, refine),
the five collection quality concerns, and the composition model
(hierarchical, networked, swarm).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-18 23:54:53 +01:00
parent 4ce856d4d0
commit b5e994b014

View File

@@ -0,0 +1,381 @@
# Viable Information Spaces
*A preliminary introduction to the concepts, structure, and purpose of
viable information spaces as a framework for structured knowledge work.*
---
## What is an Information Space?
An information space is a curated collection of concepts — each precisely
defined, grounded in source material, and connected to the others — that
together explain a topic. It is not a database, not a knowledge graph in
the technical sense, and not a document collection. It is closer to what
a domain expert carries in their head: a working vocabulary of ideas,
their relationships, and the judgment to know which idea applies where.
The difference is that an information space makes this vocabulary
**explicit, evaluable, and composable**. Every concept has a written
definition. Every relationship can be traced. The quality of the whole
collection can be measured and improved over time.
We use the term **infospace** as shorthand.
---
## Why "Viable"?
The word comes from Stafford Beer's Viable System Model, but the idea
generalises beyond it. A viable system is one that can maintain a
separate existence — it is complete enough to function, coherent enough
to hold together, and adaptive enough to improve when circumstances
change.
A **viable infospace** has the same properties:
- **Complete enough** — it covers the topic well enough to answer the
questions it was built to answer. Not every detail, but every concept
that matters.
- **Coherent enough** — its concepts connect into an explanatory web,
not a disconnected list. You can trace how one idea leads to another.
- **Consistent enough** — concepts don't contradict each other. Terms
are used the same way throughout. Definitions don't go in circles.
- **Balanced enough** — concepts operate at comparable levels of
abstraction. The infospace doesn't mix foundational theories with
trivial observations without acknowledging the difference.
- **Non-redundant enough** — each concept earns its place. Two concepts
that mean the same thing should be one concept.
None of these are absolute. "Enough" is defined by the purpose. An
infospace built for teaching needs different coverage than one built for
research. Viability is a profile of scores against thresholds that the
user sets.
---
## The Anatomy of an Infospace
### Topic
Every infospace is built to explain something specific. The **topic** is
the subject matter: a text, a system, a body of knowledge, a problem
domain. In our first example, the topic is Adam Smith's *The Wealth of
Nations* — the economic ideas contained in that specific work.
A topic sits within a broader **domain** (economics, biology, software
engineering) but is more focused. The domain provides context; the topic
provides the source material from which concepts are extracted.
### Entities
The atomic units of an infospace are its **entities** — the individual
concepts, mechanisms, and observations that constitute its vocabulary.
Each entity has:
- A **name** and unique identifier
- A **definition** — precise, non-circular, distinguishable from
neighbouring concepts
- **Provenance** — where it came from (which chapter, passage, or data
source)
- A **domain placement** — which area of the topic it belongs to
- **Quality scores** — how well it is defined, grounded, and connected
Entities are stored as individual files, one concept per file. This makes
them independently addressable, diffable, and composable.
### Schemas
**Schemas** define what a well-formed entity looks like: which sections
it must have, what validation rules apply, what quality metrics are
evaluated. A schema is not code — it is a markdown document that both
humans and LLMs read as instructions.
Schemas serve two purposes:
1. **Structural** — they tell the extraction pipeline what to produce
(required sections, word count ranges, heading formats)
2. **Evaluative** — they define quality rubrics against which each entity
is scored (definition precision, source grounding, explanatory value)
By changing a schema, you change what the infospace considers "good"
without changing any infrastructure.
### Disciplines
Here is where things get interesting. An infospace doesn't just catalogue
what's in the source material — it looks at the source through a
**lens**. We call this lens a **discipline**: a structured framework of
concepts from another domain, applied to illuminate the topic at hand.
In our example, the discipline is Stafford Beer's Viable System Model —
a set of concepts from systems theory (System 1 through System 5,
recursion, variety, viability) applied to the economic ideas in Smith's
work. The VSM provides the analytical structure; Smith provides the raw
material.
The key insight: **a discipline is itself an infospace.** The VSM
concepts (S1-S5, recursion, variety, algedonic signals) form their own
curated, evaluable collection of ideas. To use the VSM as a discipline,
it must first be a viable infospace in its own right — its concepts must
be well-defined, coherent, and complete.
This leads to a recursive property: infospaces can be built on top of
other infospaces. The Wealth of Nations infospace, viewed through the
VSM lens, could itself become a discipline applied to analyse a modern
supply chain. Each layer adds structure without losing the detail
beneath it.
---
## How Infospaces Are Built
Building an infospace is an incremental process with four repeating
phases:
### 1. Extract
Source material is processed one unit at a time (a chapter, a document,
a dataset). For each unit, an LLM extracts entities according to the
schemas and guidelines. Entities that already exist are recognised and
skipped — the infospace grows by accumulation, not duplication.
### 2. Map
Extracted entities are mapped to the discipline. In our example, each
economic concept is mapped to a VSM system with a strength rating and
rationale. This is where the discipline lens does its work: it forces
the question "what role does this concept play in the larger system?"
### 3. Evaluate
After extraction and mapping, the infospace is evaluated at two levels:
- **Per-entity**: each concept is scored against quality rubrics. Is the
definition precise? Is it grounded in the source? Does it connect
meaningfully to the discipline?
- **Collection-level**: the set of concepts is assessed for redundancy,
coverage, coherence, consistency, and granularity balance.
Evaluation produces structured, machine-readable scores — not prose
narratives. These scores are tracked over time.
### 4. Refine
Evaluation reveals what needs improvement. Redundant concepts are merged
or archived. Coverage gaps are addressed by re-extracting with improved
guidelines. Inconsistencies are resolved by clarifying definitions.
Guidelines and schemas are updated. The cycle repeats.
This loop — extract, map, evaluate, refine — is the heartbeat of a
viable infospace. Each iteration makes the infospace more viable:
more complete, more coherent, more consistent.
---
## How Infospaces Are Evaluated
Quality is assessed through two complementary mechanisms:
### LLM Evaluation
A language model reads an entity (or a pair of entities) and judges it
against defined rubrics. This captures qualitative aspects that can't be
computed mechanically: Is this definition actually precise? Does this
mapping rationale make sense? Are these two concepts really different?
LLM evaluation is always **delegated** — it runs through prompt templates
and the platform's LLM integration, never through the human or agent
working on infrastructure. This separation keeps domain judgment in the
problem space.
### Deterministic Aggregation
Structured scores from LLM evaluation, plus metrics computed directly
from files (section counts, word lengths, graph properties, similarity
matrices), are aggregated into collection-level indicators. These are
numbers that can be tracked, diffed, and plotted:
- **Redundancy ratio** — what fraction of concepts substantially overlap
- **Coverage ratio** — what fraction of the domain-discipline matrix is
populated
- **Graph density** — how connected the concept web is
- **Cycle count** — how many circular definition chains exist
- **Granularity entropy** — how balanced the abstraction levels are
These indicators, compared against user-defined thresholds, determine
whether the infospace is **viable** for its intended purpose.
---
## Five Concerns of Collection Quality
Individual concept quality (is this definition good?) is necessary but
not sufficient. An infospace made of individually excellent concepts can
still fail as a collection. Five concerns capture what can go wrong:
### Redundancy
Do two concepts mean the same thing? Overlap wastes the reader's
attention and creates ambiguity about which concept to use. Redundancy is
detected through embedding similarity (are the definitions close in
meaning?) confirmed by LLM judgment (are they genuinely the same
concept, or merely related?).
### Coverage
Does the concept set cover the domain? Are there areas of the topic that
have no corresponding concepts? Coverage is assessed structurally (which
cells in the domain-discipline matrix are empty?) and functionally (can
the infospace answer the questions it was built to answer?).
### Coherence
Do the concepts form a connected web of explanations, or a fragmented
list of isolated ideas? Coherence is measured through graph analysis:
connected components (is everything reachable?), modularity (are there
meaningful clusters?), and bridge concepts (which ideas connect different
areas?).
### Consistency
Are concepts defined in terms of each other without contradiction? Are
there circular definition chains? Do definitions use terms that should
be concepts but aren't? Consistency is checked through dependency graph
analysis (cycles, undefined terms) and LLM pairwise judgment
(do related definitions contradict each other?).
### Granularity Balance
Are concepts at comparable levels of abstraction? An infospace that mixes
broad theoretical principles with narrow observations — without
acknowledging the difference — confuses more than it explains. Balance
is assessed by classifying each concept's abstraction level and measuring
the distribution.
---
## Infospaces as Organisms
The biological metaphor is deliberate. A viable organism maintains its
identity while exchanging material with its environment. It has internal
coherence (its parts work together), boundary integrity (it is
distinguishable from its surroundings), and adaptive capacity (it
responds to change).
Infospaces exhibit the same properties:
- **Internal coherence** — concepts connect and support each other
- **Boundary** — the topic and discipline define what belongs and what
doesn't
- **Adaptation** — evaluation and refinement allow the infospace to
improve
And like organisms, infospaces don't exist in isolation.
### Hierarchical Composition
One infospace can serve as a discipline for another. The VSM infospace
provides the lens for the Wealth of Nations infospace, which could
provide the lens for a supply chain infospace. Each layer adds structure
and interpretive power. This is analogous to biological organisation:
cells compose into tissues, tissues into organs, organs into organisms.
For this to work, the lower-level infospace must be viable — you can't
build reliable analysis on a shaky foundation. A discipline that is
incomplete or inconsistent will produce unreliable mappings.
### Network Composition
Infospaces can also relate laterally. Two infospaces at the same level
might share concepts, reference each other's entities, or provide
complementary views of overlapping domains. A Wealth of Nations infospace
and a Marx's Capital infospace might share economic entities while
differing in their analytical discipline.
This networked structure mirrors how knowledge actually works: fields
overlap, vocabularies are shared and contested, and understanding grows
by connecting islands of well-organised thought.
### Swarm Behaviour
When many infospaces exist and interact, emergent properties appear.
Common entities across many infospaces become well-tested through
repeated evaluation in different contexts. Concepts that survive across
multiple disciplines are more likely to be fundamental. Gaps visible from
one perspective may be filled by insights from another.
This is speculative territory for now, but the tooling should be designed
with it in mind: infospaces as first-class, composable, addressable
units of knowledge.
---
## The Role of Tooling
An infospace is a living artefact that requires ongoing maintenance. The
tooling must support every phase of the lifecycle:
### Creating an infospace
Declaring a topic, binding disciplines, defining schemas and competency
questions, setting viability thresholds. This should be a single
configuration step, not a programming exercise.
### Populating an infospace
Processing source material through the extract-map pipeline, one unit at
a time. Progress is tracked. Each addition is committed to version
history.
### Evaluating an infospace
Running per-entity and collection-level checks. Producing structured,
machine-readable scores. Comparing against viability thresholds.
Identifying specific issues (this entity is redundant, this domain gap
needs filling, these definitions contradict).
### Refining an infospace
Acting on evaluation results: archiving redundant entities, re-extracting
with improved guidelines, updating schemas, re-evaluating. Every change
is traceable.
### Composing infospaces
Binding one infospace as a discipline for another. Checking that the
discipline is viable. Propagating changes when the discipline's concepts
are updated.
### Monitoring an infospace
Tracking metrics over time. Seeing how coverage, coherence, and
consistency evolve as content is added. Detecting regressions when a
re-extraction reduces quality.
The tooling should present these operations as simple, well-documented
commands — not as infrastructure details. The user thinks in terms of
"evaluate my infospace" and "check for redundancy", not in terms of
embedding vectors and graph algorithms.
---
## Where We Are
We have built the first example infospace: 85 economic entities from
Adam Smith's *The Wealth of Nations*, mapped to Beer's Viable System
Model, with schemas, prompt templates, and a chapter-by-chapter
pipeline.
This example has taught us what works (incremental extraction,
deduplication, flat canonical entity sets, transclusion views) and what's
missing (per-concept evaluation, collection-level checks, composition
model, clean tooling commands).
The work ahead is to generalise from this example: build the platform
capabilities needed, create the tooling layer that makes infospace
operations accessible, and then revisit the example as both a validation
and a tutorial.
The goal is that anyone with a body of source material and an analytical
framework can create a viable infospace — and that infospaces, once
built, become reusable intellectual tools for future work.