feat(infospace): add L2 entity classification with type × VSM matrix (S2.9)
Implements the L2 typed-entities layer — each entity is assigned an
Entity Type (Element, Process, Relation, Principle, Institution) and a
VSM System (S1–S5) by an LLM, with one-sentence rationales for each.
New modules:
- markitect/infospace/classification.py — EntityClassification dataclass
+ ENTITY_TYPES / VSM_SYSTEMS controlled vocabularies
- markitect/infospace/classification_io.py — write/read classification
files (YAML frontmatter + markdown body, mirrors evaluation_io)
- markitect/infospace/classifier.py — build_classification_prompt(),
parse_classification_response(), run_entity_classification(); batch
runner writes files incrementally (same resumable pattern as evaluate)
CLI: markitect infospace classify [--entity SLUG] [--provider P] [--model M]
- Incremental skip: checks output/classifications/ for existing files
- Defaults to openrouter provider; 2000 max_tokens (Gemini 2.5 Flash
uses ~787 thinking tokens, so 800 was too low)
CLI: markitect infospace classify-summary [--update-metrics]
- Entity type counts + VSM system counts with percentages
- 5 × 6 type × VSM matrix (spots structural blind spots at a glance)
- --update-metrics writes type_distribution, type_entropy,
vsm_type_matrix_cells to metrics.yaml
Config: InfospaceConfig gains classifications_dir (default output/classifications)
Schema: schemas/typed-entity-schema-v1.0.md — type/VSM vocabulary tables,
rationale format rules, validation rules, metrics enabled at L2
infospace.yaml: schemas.typed_entity references typed-entity-schema-v1.0.md
Seed classifications (3): division_of_labour (Process/S1),
natural_price_as_central_price (Principle/S2),
invisible_hand_mechanism (Principle/S4)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -18,6 +18,7 @@ schemas:
|
||||
mapping: schemas/vsm-mapping-schema-v1.0.md
|
||||
analysis: schemas/chapter-analysis-schema-v1.0.md
|
||||
relation: schemas/relation-schema-v1.0.md
|
||||
typed_entity: schemas/typed-entity-schema-v1.0.md
|
||||
|
||||
competency_questions: |
|
||||
1. How does Smith's division of labour map to VSM System 1 operations?
|
||||
|
||||
@@ -0,0 +1,30 @@
|
||||
---
|
||||
entity_slug: division_of_labour
|
||||
entity_type: Process
|
||||
vsm_system: S1
|
||||
type_rationale: The definition describes "the separation of a work process into distinct
|
||||
tasks performed by specialised workers," which is an activity or transformation
|
||||
in how work is conducted.
|
||||
vsm_rationale: Division of Labour directly concerns the organization and execution
|
||||
of "productive activities" by specialized workers to increase output, which is the
|
||||
core function of S1.
|
||||
classified_at: '2026-02-23T05:14:54.928218'
|
||||
---
|
||||
|
||||
# Classification: Division Of Labour
|
||||
|
||||
## Entity Type
|
||||
|
||||
Process
|
||||
|
||||
## VSM System
|
||||
|
||||
S1
|
||||
|
||||
## Type Rationale
|
||||
|
||||
The definition describes "the separation of a work process into distinct tasks performed by specialised workers," which is an activity or transformation in how work is conducted.
|
||||
|
||||
## VSM Rationale
|
||||
|
||||
Division of Labour directly concerns the organization and execution of "productive activities" by specialized workers to increase output, which is the core function of S1.
|
||||
@@ -0,0 +1,30 @@
|
||||
---
|
||||
entity_slug: invisible_hand_mechanism
|
||||
entity_type: Principle
|
||||
vsm_system: S4
|
||||
type_rationale: The Invisible Hand Mechanism is an abstract theoretical claim about
|
||||
how individual self-interest unintentionally leads to broader public welfare, functioning
|
||||
as a fundamental rule of market operation.
|
||||
vsm_rationale: The Invisible Hand Mechanism describes the system's inherent capacity
|
||||
for adaptation and self-organization, producing beneficial outcomes from individual
|
||||
actions without central direction, aligning with S4's function of intelligence.
|
||||
classified_at: '2026-02-23T05:15:10.936874'
|
||||
---
|
||||
|
||||
# Classification: Invisible Hand Mechanism
|
||||
|
||||
## Entity Type
|
||||
|
||||
Principle
|
||||
|
||||
## VSM System
|
||||
|
||||
S4
|
||||
|
||||
## Type Rationale
|
||||
|
||||
The Invisible Hand Mechanism is an abstract theoretical claim about how individual self-interest unintentionally leads to broader public welfare, functioning as a fundamental rule of market operation.
|
||||
|
||||
## VSM Rationale
|
||||
|
||||
The Invisible Hand Mechanism describes the system's inherent capacity for adaptation and self-organization, producing beneficial outcomes from individual actions without central direction, aligning with S4's function of intelligence.
|
||||
@@ -0,0 +1,30 @@
|
||||
---
|
||||
entity_slug: natural_price_as_central_price
|
||||
entity_type: Principle
|
||||
vsm_system: S2
|
||||
type_rationale: The natural price is an abstract concept describing an equilibrium
|
||||
point and a tendency for market prices to gravitate towards it, functioning as a
|
||||
fundamental economic law.
|
||||
vsm_rationale: The natural price acts as a central price signal that coordinates market
|
||||
activity by drawing fluctuating market prices towards an equilibrium, thereby performing
|
||||
an anti-oscillation function.
|
||||
classified_at: '2026-02-23T05:15:04.916853'
|
||||
---
|
||||
|
||||
# Classification: Natural Price As Central Price
|
||||
|
||||
## Entity Type
|
||||
|
||||
Principle
|
||||
|
||||
## VSM System
|
||||
|
||||
S2
|
||||
|
||||
## Type Rationale
|
||||
|
||||
The natural price is an abstract concept describing an equilibrium point and a tendency for market prices to gravitate towards it, functioning as a fundamental economic law.
|
||||
|
||||
## VSM Rationale
|
||||
|
||||
The natural price acts as a central price signal that coordinates market activity by drawing fluctuating market prices towards an equilibrium, thereby performing an anti-oscillation function.
|
||||
@@ -0,0 +1,126 @@
|
||||
# Typed Entity Schema v1.0
|
||||
|
||||
Extends the economic entity schema with two classification fields produced
|
||||
by the L2 `classify-entities` pipeline stage. An entity that has passed
|
||||
through L2 classification has been assigned an **Entity Type** and a
|
||||
**VSM System** by an LLM, each with a one-sentence rationale.
|
||||
|
||||
---
|
||||
|
||||
## Additional Sections
|
||||
|
||||
The following sections are added to the base entity file (or stored as
|
||||
separate classification files in `output/classifications/`):
|
||||
|
||||
### Entity Type
|
||||
|
||||
**Required.** One of the five controlled values below.
|
||||
|
||||
| Value | Definition |
|
||||
|---|---|
|
||||
| **Element** | A stock, agent, artifact, or institution that persists — a *noun*, something that exists independently (e.g. Capital Stock, Corn, Colony, Guild) |
|
||||
| **Process** | A flow, activity, or transformation with duration — something that *happens* rather than *exists* (e.g. Division of Labour, Credit Extension, Trade Route) |
|
||||
| **Relation** | A structural dependency or causal link between two elements — a *connector* or mechanism (e.g. Rent determined by Price; Wages bounded by Profit Margin) |
|
||||
| **Principle** | An abstract law or invariant that holds across contexts — a rule or theoretical claim (e.g. Comparative Advantage, Diminishing Returns, Opportunity Cost) |
|
||||
| **Institution** | A socially constructed rule system, norm, or governance structure (e.g. Banking System, Apprenticeship Law, Taille, Navigation Acts) |
|
||||
|
||||
**Note:** Types are not mutually exclusive at the margin — *Market Price*
|
||||
is both a Relation (between cost components and clearing condition) and an
|
||||
emergent property of an Element (the market). Assign the **primary** type:
|
||||
the one that best explains the entity's role in Smith's argument.
|
||||
|
||||
### VSM System
|
||||
|
||||
**Required.** One of the six controlled values below.
|
||||
|
||||
| Value | Beer's definition | WoN examples |
|
||||
|---|---|---|
|
||||
| **S1** | Primary operations — the productive activities of the system | Agricultural labour, manufacturing, carrying trade |
|
||||
| **S2** | Coordination — anti-oscillation, price signals between operations | Market Price, Natural Price, Wages of Labour |
|
||||
| **S3** | Management — resource allocation and operational control | Capital Allocation, Banking, Taxation |
|
||||
| **S3\*** | Audit — inspection, compliance, integrity checking | Customs Enforcement, Assay, Coinage |
|
||||
| **S4** | Intelligence — adaptation, environmental scanning | Invisible Hand, Comparative Advantage, Foreign Trade Intelligence |
|
||||
| **S5** | Policy — identity, ultimate authority, normative purpose | Mercantile System, System of Natural Liberty, Public Debt Policy |
|
||||
|
||||
### Type Rationale
|
||||
|
||||
**Required.** One sentence explaining why this Entity Type was assigned,
|
||||
grounded in the entity definition.
|
||||
|
||||
> *Example:* "Capital Stock is a persistent stock of accumulated resources
|
||||
> that enables productive operations, making it an Element rather than a
|
||||
> Process."
|
||||
|
||||
### VSM Rationale
|
||||
|
||||
**Required.** One sentence grounding the VSM assignment in Beer's
|
||||
definitions as applied to the WoN domain.
|
||||
|
||||
> *Example:* "Capital Stock is deployed at the operational level to
|
||||
> produce goods and services, placing it squarely within S1 (primary
|
||||
> operations)."
|
||||
|
||||
---
|
||||
|
||||
## Validation Rules
|
||||
|
||||
1. **Entity Type** MUST be one of: Element, Process, Relation, Principle,
|
||||
Institution. Any other value is a validation error.
|
||||
2. **VSM System** MUST be one of: S1, S2, S3, S3*, S4, S5.
|
||||
3. **Type Rationale** and **VSM Rationale** MUST be non-empty strings.
|
||||
4. A classification file for slug `X` MUST be stored at
|
||||
`output/classifications/X.md`.
|
||||
|
||||
---
|
||||
|
||||
## Metrics Enabled by L2
|
||||
|
||||
Once all entities are classified, the following collection-level metrics
|
||||
become available:
|
||||
|
||||
| Metric | Concern | Question |
|
||||
|---|---|---|
|
||||
| **type_distribution** | Granularity | Is the collection balanced? |
|
||||
| **vsm_type_matrix_cells** | Coverage | How many (type, VSM) coordinate pairs are occupied? |
|
||||
| **type_entropy** | Granularity | Is the type distribution diverse or dominated by one type? |
|
||||
| **orphan_relations** | Coherence | Are Relation-typed entities that name no elements they connect? |
|
||||
| **principle_grounding** | Consistency | Does each Principle have at least one Element or Process it constrains? |
|
||||
|
||||
---
|
||||
|
||||
## File Format
|
||||
|
||||
Classification files use YAML frontmatter + markdown body:
|
||||
|
||||
```markdown
|
||||
---
|
||||
entity_slug: capital_stock
|
||||
entity_type: Element
|
||||
vsm_system: S1
|
||||
type_rationale: Capital Stock is a persistent stock of accumulated resources
|
||||
that enables productive operations.
|
||||
vsm_rationale: It is the primary productive resource deployed at the
|
||||
operational level (S1).
|
||||
classified_by: openrouter/claude-sonnet-4
|
||||
classified_at: 2026-02-23T14:00:00Z
|
||||
---
|
||||
|
||||
# Classification: Capital Stock
|
||||
|
||||
## Entity Type
|
||||
|
||||
Element
|
||||
|
||||
## VSM System
|
||||
|
||||
S1
|
||||
|
||||
## Type Rationale
|
||||
|
||||
Capital Stock is a persistent stock of accumulated resources that enables
|
||||
productive operations.
|
||||
|
||||
## VSM Rationale
|
||||
|
||||
It is the primary productive resource deployed at the operational level (S1).
|
||||
```
|
||||
64
markitect/infospace/classification.py
Normal file
64
markitect/infospace/classification.py
Normal file
@@ -0,0 +1,64 @@
|
||||
"""
|
||||
Data models for entity classification (L2 typed entities).
|
||||
|
||||
Each entity is assigned an Entity Type (what kind of thing it is) and a
|
||||
VSM System (which control layer it inhabits). Both assignments come with
|
||||
a one-sentence rationale from the LLM, stored alongside the classification.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from datetime import datetime
|
||||
from typing import Any, Dict, Optional
|
||||
|
||||
|
||||
#: Controlled vocabulary for entity types.
|
||||
ENTITY_TYPES = ["Element", "Process", "Relation", "Principle", "Institution"]
|
||||
|
||||
#: Controlled vocabulary for VSM system assignments.
|
||||
VSM_SYSTEMS = ["S1", "S2", "S3", "S3*", "S4", "S5"]
|
||||
|
||||
|
||||
@dataclass
|
||||
class EntityClassification:
|
||||
"""L2 classification for a single entity."""
|
||||
|
||||
entity_slug: str
|
||||
entity_type: str # one of ENTITY_TYPES
|
||||
vsm_system: str # one of VSM_SYSTEMS
|
||||
type_rationale: str = "" # one sentence
|
||||
vsm_rationale: str = "" # one sentence
|
||||
classified_by: str = "" # model name
|
||||
classified_at: Optional[datetime] = None
|
||||
|
||||
def to_dict(self) -> Dict[str, Any]:
|
||||
d: Dict[str, Any] = {
|
||||
"entity_slug": self.entity_slug,
|
||||
"entity_type": self.entity_type,
|
||||
"vsm_system": self.vsm_system,
|
||||
}
|
||||
if self.type_rationale:
|
||||
d["type_rationale"] = self.type_rationale
|
||||
if self.vsm_rationale:
|
||||
d["vsm_rationale"] = self.vsm_rationale
|
||||
if self.classified_by:
|
||||
d["classified_by"] = self.classified_by
|
||||
if self.classified_at is not None:
|
||||
d["classified_at"] = self.classified_at.isoformat()
|
||||
return d
|
||||
|
||||
@classmethod
|
||||
def from_dict(cls, data: Dict[str, Any]) -> "EntityClassification":
|
||||
classified_at: Optional[datetime] = None
|
||||
if "classified_at" in data:
|
||||
classified_at = datetime.fromisoformat(data["classified_at"])
|
||||
return cls(
|
||||
entity_slug=data["entity_slug"],
|
||||
entity_type=data["entity_type"],
|
||||
vsm_system=data["vsm_system"],
|
||||
type_rationale=data.get("type_rationale", ""),
|
||||
vsm_rationale=data.get("vsm_rationale", ""),
|
||||
classified_by=data.get("classified_by", ""),
|
||||
classified_at=classified_at,
|
||||
)
|
||||
80
markitect/infospace/classification_io.py
Normal file
80
markitect/infospace/classification_io.py
Normal file
@@ -0,0 +1,80 @@
|
||||
"""
|
||||
Read/write utilities for entity classification files (L2).
|
||||
|
||||
Classification files use YAML frontmatter (machine-readable) plus a
|
||||
markdown body (human-readable), matching the convention used by evaluation
|
||||
files.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
from typing import List
|
||||
|
||||
import yaml
|
||||
|
||||
from .classification import EntityClassification
|
||||
|
||||
|
||||
_FRONTMATTER_SEP = "---"
|
||||
|
||||
|
||||
def write_entity_classification(c: EntityClassification, path: Path) -> None:
|
||||
"""Write a per-entity classification as YAML frontmatter + markdown body."""
|
||||
fm = c.to_dict()
|
||||
|
||||
lines: List[str] = []
|
||||
lines.append(_FRONTMATTER_SEP)
|
||||
lines.append(yaml.safe_dump(fm, default_flow_style=False, sort_keys=False).rstrip())
|
||||
lines.append(_FRONTMATTER_SEP)
|
||||
lines.append("")
|
||||
|
||||
title = c.entity_slug.replace("_", " ").replace("-", " ").title()
|
||||
lines.append(f"# Classification: {title}")
|
||||
lines.append("")
|
||||
|
||||
lines.append("## Entity Type")
|
||||
lines.append("")
|
||||
lines.append(c.entity_type)
|
||||
lines.append("")
|
||||
|
||||
lines.append("## VSM System")
|
||||
lines.append("")
|
||||
lines.append(c.vsm_system)
|
||||
lines.append("")
|
||||
|
||||
if c.type_rationale:
|
||||
lines.append("## Type Rationale")
|
||||
lines.append("")
|
||||
lines.append(c.type_rationale)
|
||||
lines.append("")
|
||||
|
||||
if c.vsm_rationale:
|
||||
lines.append("## VSM Rationale")
|
||||
lines.append("")
|
||||
lines.append(c.vsm_rationale)
|
||||
lines.append("")
|
||||
|
||||
path.parent.mkdir(parents=True, exist_ok=True)
|
||||
path.write_text("\n".join(lines), encoding="utf-8")
|
||||
|
||||
|
||||
def read_entity_classification(path: Path) -> EntityClassification:
|
||||
"""Read a classification file (YAML frontmatter + markdown body)."""
|
||||
text = path.read_text(encoding="utf-8")
|
||||
parts = text.split(f"{_FRONTMATTER_SEP}\n", maxsplit=2)
|
||||
if len(parts) < 3:
|
||||
raise ValueError(f"No YAML frontmatter found in {path}")
|
||||
fm = yaml.safe_load(parts[1])
|
||||
return EntityClassification.from_dict(fm)
|
||||
|
||||
|
||||
def read_classifications_directory(directory: Path) -> List[EntityClassification]:
|
||||
"""Read all classification files from a directory."""
|
||||
results: List[EntityClassification] = []
|
||||
for p in sorted(directory.glob("*.md")):
|
||||
try:
|
||||
results.append(read_entity_classification(p))
|
||||
except Exception:
|
||||
pass
|
||||
return results
|
||||
258
markitect/infospace/classifier.py
Normal file
258
markitect/infospace/classifier.py
Normal file
@@ -0,0 +1,258 @@
|
||||
"""
|
||||
Per-entity classification pipeline for L2 typed entities.
|
||||
|
||||
Builds a concise LLM prompt asking the model to assign an Entity Type and
|
||||
a VSM System to each entity, then parses the structured response. Batch
|
||||
execution mirrors the evaluate.py pattern: incremental file writing makes
|
||||
long runs safe to interrupt and resume.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Callable, List, Optional
|
||||
|
||||
from markitect.infospace.classification import (
|
||||
ENTITY_TYPES,
|
||||
VSM_SYSTEMS,
|
||||
EntityClassification,
|
||||
)
|
||||
from markitect.infospace.classification_io import write_entity_classification
|
||||
from markitect.infospace.config import InfospaceConfig
|
||||
from markitect.infospace.models import EntityMeta
|
||||
from markitect.prompts.execution.batch import BatchEvaluator, BatchItem, BatchSummary
|
||||
from markitect.prompts.execution.llm_adapter import LLMAdapter
|
||||
from markitect.prompts.execution.models import RunConfig
|
||||
|
||||
|
||||
# ── Type and VSM system descriptions ─────────────────────────────────────────
|
||||
|
||||
_TYPE_DEFS = {
|
||||
"Element": (
|
||||
"a stock, agent, artifact, or institution that persists — a noun, "
|
||||
"something that exists independently (e.g. Capital Stock, Corn, Colony)"
|
||||
),
|
||||
"Process": (
|
||||
"a flow, activity, or transformation with duration — something that "
|
||||
"happens rather than exists (e.g. Division of Labour, Credit Extension, Trade)"
|
||||
),
|
||||
"Relation": (
|
||||
"a structural dependency or causal link between two entities — a connector "
|
||||
"or mechanism (e.g. Rent determined by Price; Wages bounded by Profit)"
|
||||
),
|
||||
"Principle": (
|
||||
"an abstract law or invariant that holds across contexts — a rule or "
|
||||
"theoretical claim (e.g. Comparative Advantage, Diminishing Returns)"
|
||||
),
|
||||
"Institution": (
|
||||
"a socially constructed rule system, norm, or governance structure "
|
||||
"(e.g. Banking System, Apprenticeship Law, Taille)"
|
||||
),
|
||||
}
|
||||
|
||||
_VSM_DEFS = {
|
||||
"S1": "Primary operations — productive activities (agricultural labour, manufacturing, carrying trade)",
|
||||
"S2": "Coordination — anti-oscillation, price signals (market price, natural price, wages)",
|
||||
"S3": "Management — resource allocation, operational control (capital allocation, taxation, banking)",
|
||||
"S3*": "Audit — inspection, compliance, integrity (customs enforcement, assay, coinage)",
|
||||
"S4": "Intelligence — adaptation, environment scanning (invisible hand, foreign trade analysis)",
|
||||
"S5": "Policy — identity, ultimate authority, purpose (political economy systems, public debt policy)",
|
||||
}
|
||||
|
||||
_PROMPT_TEMPLATE = """\
|
||||
You are classifying an entity from an infospace about "{topic}".
|
||||
|
||||
Your task: assign exactly one **Entity Type** and one **VSM System** to the entity, \
|
||||
then give a one-sentence rationale for each choice.
|
||||
|
||||
## Entity: {title}
|
||||
|
||||
**Domain:** {domain}
|
||||
**Source chapter:** {source_chapter}
|
||||
|
||||
### Definition
|
||||
|
||||
{definition}
|
||||
|
||||
### Context
|
||||
|
||||
{context}
|
||||
|
||||
---
|
||||
|
||||
## Entity Types — choose exactly one
|
||||
|
||||
- **Element** — {type_Element}
|
||||
- **Process** — {type_Process}
|
||||
- **Relation** — {type_Relation}
|
||||
- **Principle** — {type_Principle}
|
||||
- **Institution** — {type_Institution}
|
||||
|
||||
## VSM Systems — choose exactly one
|
||||
|
||||
- **S1** — {vsm_S1}
|
||||
- **S2** — {vsm_S2}
|
||||
- **S3** — {vsm_S3}
|
||||
- **S3*** — {vsm_S3s}
|
||||
- **S4** — {vsm_S4}
|
||||
- **S5** — {vsm_S5}
|
||||
|
||||
---
|
||||
|
||||
## Instructions
|
||||
|
||||
1. Read the definition and context carefully.
|
||||
2. Choose the **most appropriate** Entity Type. When uncertain between two, \
|
||||
pick the type that best reflects the entity's primary role in the argument.
|
||||
3. Choose the **most appropriate** VSM System. An entity may relate to multiple \
|
||||
systems — assign the one where it does its primary work.
|
||||
4. Write one sentence of rationale for each, grounded in the definition above.
|
||||
5. Use **exactly** the output format below — no preamble, no extra lines.
|
||||
|
||||
## Output format
|
||||
|
||||
TYPE: <one of: Element, Process, Relation, Principle, Institution>
|
||||
VSM: <one of: S1, S2, S3, S3*, S4, S5>
|
||||
TYPE_RATIONALE: <one sentence explaining the type choice>
|
||||
VSM_RATIONALE: <one sentence grounding the VSM assignment in Beer's definitions>
|
||||
"""
|
||||
|
||||
|
||||
# ── Prompt builder ────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def build_classification_prompt(entity: EntityMeta, topic: str) -> str:
|
||||
"""Build a classification prompt for a single entity."""
|
||||
return _PROMPT_TEMPLATE.format(
|
||||
topic=topic,
|
||||
title=entity.title,
|
||||
domain=entity.domain or "(unspecified)",
|
||||
source_chapter=entity.source_chapter or "(unspecified)",
|
||||
definition=entity.definition or "(no definition provided)",
|
||||
context=entity.context or "(no context provided)",
|
||||
type_Element=_TYPE_DEFS["Element"],
|
||||
type_Process=_TYPE_DEFS["Process"],
|
||||
type_Relation=_TYPE_DEFS["Relation"],
|
||||
type_Principle=_TYPE_DEFS["Principle"],
|
||||
type_Institution=_TYPE_DEFS["Institution"],
|
||||
vsm_S1=_VSM_DEFS["S1"],
|
||||
vsm_S2=_VSM_DEFS["S2"],
|
||||
vsm_S3=_VSM_DEFS["S3"],
|
||||
vsm_S3s=_VSM_DEFS["S3*"],
|
||||
vsm_S4=_VSM_DEFS["S4"],
|
||||
vsm_S5=_VSM_DEFS["S5"],
|
||||
)
|
||||
|
||||
|
||||
# ── Response parser ───────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def parse_classification_response(text: str) -> dict:
|
||||
"""Parse TYPE/VSM/TYPE_RATIONALE/VSM_RATIONALE from an LLM response.
|
||||
|
||||
Returns a dict with keys: entity_type, vsm_system, type_rationale,
|
||||
vsm_rationale. Values are None / empty string if not found.
|
||||
"""
|
||||
result: dict = {
|
||||
"entity_type": None,
|
||||
"vsm_system": None,
|
||||
"type_rationale": "",
|
||||
"vsm_rationale": "",
|
||||
}
|
||||
|
||||
for line in text.splitlines():
|
||||
stripped = line.strip()
|
||||
upper = stripped.upper()
|
||||
|
||||
if upper.startswith("TYPE_RATIONALE:"):
|
||||
result["type_rationale"] = stripped.split(":", 1)[1].strip()
|
||||
elif upper.startswith("VSM_RATIONALE:"):
|
||||
result["vsm_rationale"] = stripped.split(":", 1)[1].strip()
|
||||
elif upper.startswith("TYPE:"):
|
||||
raw = stripped.split(":", 1)[1].strip()
|
||||
# Case-insensitive match against controlled vocabulary
|
||||
for t in ENTITY_TYPES:
|
||||
if t.lower() == raw.lower():
|
||||
result["entity_type"] = t
|
||||
break
|
||||
else:
|
||||
result["entity_type"] = raw # keep raw if unrecognised
|
||||
elif upper.startswith("VSM:"):
|
||||
raw = stripped.split(":", 1)[1].strip()
|
||||
for v in VSM_SYSTEMS:
|
||||
if v.lower() == raw.lower():
|
||||
result["vsm_system"] = v
|
||||
break
|
||||
else:
|
||||
result["vsm_system"] = raw
|
||||
|
||||
return result
|
||||
|
||||
|
||||
# ── Batch runner ──────────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
def run_entity_classification(
|
||||
config: InfospaceConfig,
|
||||
entities: List[EntityMeta],
|
||||
adapter: LLMAdapter,
|
||||
run_config: Optional[RunConfig] = None,
|
||||
output_dir: Optional[Path] = None,
|
||||
progress_callback: Optional[Callable] = None,
|
||||
) -> BatchSummary:
|
||||
"""Run per-entity classification using the batch evaluator.
|
||||
|
||||
Classification files are written **incrementally** after each successful
|
||||
result, so a long run is resumable and safe to interrupt.
|
||||
|
||||
Args:
|
||||
config: The infospace configuration.
|
||||
entities: Entities to classify.
|
||||
adapter: LLM adapter.
|
||||
run_config: LLM execution configuration.
|
||||
output_dir: Where to write classification results. Defaults to
|
||||
``config.classifications_dir`` relative to CWD.
|
||||
progress_callback: Called after each item with (done, total, result).
|
||||
|
||||
Returns:
|
||||
A :class:`BatchSummary` with per-entity results.
|
||||
"""
|
||||
topic = config.topic.name
|
||||
cls_path = output_dir or Path(config.classifications_dir)
|
||||
classifier_name = (run_config.model_name if run_config else "unknown")
|
||||
|
||||
def _write_and_notify(done: int, total: int, result) -> None:
|
||||
if result.status == "success" and result.response is not None:
|
||||
parsed = parse_classification_response(result.response.content)
|
||||
entity_type = parsed["entity_type"] or "Unknown"
|
||||
vsm_system = parsed["vsm_system"] or "Unknown"
|
||||
classification = EntityClassification(
|
||||
entity_slug=result.key,
|
||||
entity_type=entity_type,
|
||||
vsm_system=vsm_system,
|
||||
type_rationale=parsed["type_rationale"],
|
||||
vsm_rationale=parsed["vsm_rationale"],
|
||||
classified_by=classifier_name,
|
||||
classified_at=datetime.utcnow(),
|
||||
)
|
||||
dest = cls_path / f"{result.key}.md"
|
||||
write_entity_classification(classification, dest)
|
||||
|
||||
if progress_callback is not None:
|
||||
progress_callback(done, total, result)
|
||||
|
||||
items = [
|
||||
BatchItem(
|
||||
key=entity.slug,
|
||||
prompt=build_classification_prompt(entity, topic),
|
||||
)
|
||||
for entity in entities
|
||||
]
|
||||
|
||||
evaluator = BatchEvaluator(
|
||||
adapter=adapter,
|
||||
config=run_config,
|
||||
progress_callback=_write_and_notify,
|
||||
)
|
||||
return evaluator.evaluate(items)
|
||||
@@ -419,6 +419,172 @@ def relations(config_path: Optional[str], entity_slug: Optional[str],
|
||||
click.echo(f"{subj:<35} {pred:<30} {obj:<35} {r.vsm_channel}")
|
||||
|
||||
|
||||
# ── classify ─────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
@infospace_commands.command()
|
||||
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
|
||||
@click.option("--entity", "entity_slug", default=None,
|
||||
help="Classify a single entity by slug.")
|
||||
@click.option("--provider", default="openrouter",
|
||||
help="LLM provider (openrouter, gemini, openai, …).")
|
||||
@click.option("--model", default=None, help="Model name override.")
|
||||
def classify(config_path: Optional[str], entity_slug: Optional[str],
|
||||
provider: str, model: Optional[str]):
|
||||
"""Classify entities with Entity Type and VSM System (L2)."""
|
||||
cfg, cfg_path = _load_config_or_exit(config_path)
|
||||
root = cfg_path.parent
|
||||
|
||||
from markitect.infospace.classifier import run_entity_classification
|
||||
from markitect.llm import create_adapter
|
||||
from markitect.prompts.execution.models import RunConfig
|
||||
|
||||
entity_list = parse_entity_directory(root / cfg.entities_dir)
|
||||
if not entity_list:
|
||||
click.echo("No entities found in " + str(root / cfg.entities_dir), err=True)
|
||||
return
|
||||
|
||||
output_dir = root / cfg.classifications_dir
|
||||
|
||||
if entity_slug:
|
||||
entity_list = [e for e in entity_list if e.slug == entity_slug]
|
||||
if not entity_list:
|
||||
click.echo(f"Entity '{entity_slug}' not found.", err=True)
|
||||
return
|
||||
else:
|
||||
# Incremental skip — entities already classified are omitted
|
||||
if output_dir.is_dir():
|
||||
done_slugs = {p.stem for p in output_dir.glob("*.md")}
|
||||
before = len(entity_list)
|
||||
entity_list = [e for e in entity_list if e.slug not in done_slugs]
|
||||
skipped = before - len(entity_list)
|
||||
if skipped:
|
||||
click.echo(f"Skipping {skipped} already-classified entities.")
|
||||
if not entity_list:
|
||||
click.echo("All entities already classified. Nothing to do.")
|
||||
return
|
||||
|
||||
click.echo(f"Classifying {len(entity_list)} entities …")
|
||||
output_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
adapter = create_adapter(provider, model=model)
|
||||
run_config = RunConfig(model_name=model, temperature=0.1, max_tokens=2000)
|
||||
|
||||
def _progress(done: int, total: int, result) -> None:
|
||||
if result.status == "success":
|
||||
click.echo(f" [{done}/{total}] {result.key}")
|
||||
else:
|
||||
click.echo(f" [{done}/{total}] {result.key} — FAILED: {result.error}")
|
||||
|
||||
summary = run_entity_classification(
|
||||
config=cfg,
|
||||
entities=entity_list,
|
||||
adapter=adapter,
|
||||
run_config=run_config,
|
||||
output_dir=output_dir,
|
||||
progress_callback=_progress,
|
||||
)
|
||||
click.echo(f"\nDone: {summary.succeeded} classified, {summary.failed} failed.")
|
||||
|
||||
|
||||
# ── classify-summary ──────────────────────────────────────────────────
|
||||
|
||||
|
||||
@infospace_commands.command(name="classify-summary")
|
||||
@click.option("--config", "config_path", default=None, help="Path to infospace.yaml.")
|
||||
@click.option("--update-metrics", "update_metrics", is_flag=True, default=False,
|
||||
help="Write type_distribution metrics to metrics.yaml.")
|
||||
def classify_summary(config_path: Optional[str], update_metrics: bool):
|
||||
"""Show type × VSM distribution across all classified entities (L2)."""
|
||||
cfg, cfg_path = _load_config_or_exit(config_path)
|
||||
root = cfg_path.parent
|
||||
|
||||
from markitect.infospace.classification import ENTITY_TYPES, VSM_SYSTEMS
|
||||
from markitect.infospace.classification_io import read_classifications_directory
|
||||
|
||||
cls_dir = root / cfg.classifications_dir
|
||||
if not cls_dir.is_dir():
|
||||
click.echo("No classifications directory found. Run 'classify' first.")
|
||||
return
|
||||
|
||||
all_cls = read_classifications_directory(cls_dir)
|
||||
if not all_cls:
|
||||
click.echo("No classification files found.")
|
||||
return
|
||||
|
||||
n = len(all_cls)
|
||||
type_counts: dict = {}
|
||||
vsm_counts: dict = {}
|
||||
matrix: dict = {} # (entity_type, vsm_system) → count
|
||||
|
||||
for c in all_cls:
|
||||
type_counts[c.entity_type] = type_counts.get(c.entity_type, 0) + 1
|
||||
vsm_counts[c.vsm_system] = vsm_counts.get(c.vsm_system, 0) + 1
|
||||
key = (c.entity_type, c.vsm_system)
|
||||
matrix[key] = matrix.get(key, 0) + 1
|
||||
|
||||
click.echo(f"Classification summary — {n} entities\n")
|
||||
|
||||
click.echo("Entity types:")
|
||||
for t, count in sorted(type_counts.items(), key=lambda x: -x[1]):
|
||||
pct = 100 * count / n if n else 0.0
|
||||
click.echo(f" {t:<15} {count:>4} ({pct:.1f}%)")
|
||||
click.echo()
|
||||
|
||||
vsm_order = ["S1", "S2", "S3", "S3*", "S4", "S5"]
|
||||
click.echo("VSM systems:")
|
||||
for v in vsm_order:
|
||||
if v in vsm_counts:
|
||||
count = vsm_counts[v]
|
||||
pct = 100 * count / n if n else 0.0
|
||||
click.echo(f" {v:<6} {count:>4} ({pct:.1f}%)")
|
||||
click.echo()
|
||||
|
||||
# Type × VSM matrix
|
||||
header = f"{'':15}" + "".join(f"{v:>7}" for v in vsm_order)
|
||||
sep = "-" * (15 + 7 * len(vsm_order))
|
||||
click.echo(header)
|
||||
click.echo(sep)
|
||||
for t in ENTITY_TYPES:
|
||||
row = f"{t:<15}"
|
||||
for v in vsm_order:
|
||||
c = matrix.get((t, v), 0)
|
||||
row += f"{c if c else '.':>7}"
|
||||
click.echo(row)
|
||||
click.echo()
|
||||
|
||||
filled_cells = len(matrix)
|
||||
total_cells = len(ENTITY_TYPES) * len(vsm_order)
|
||||
click.echo(f"Matrix fill: {filled_cells}/{total_cells} cells occupied")
|
||||
click.echo()
|
||||
|
||||
if update_metrics:
|
||||
import math
|
||||
from markitect.infospace.history import read_metrics_file, write_metrics_file
|
||||
metrics_dir = root / cfg.metrics_dir
|
||||
metrics_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Type entropy
|
||||
type_entropy = 0.0
|
||||
for count in type_counts.values():
|
||||
p = count / n
|
||||
if p > 0:
|
||||
type_entropy -= p * math.log2(p)
|
||||
|
||||
existing = read_metrics_file(metrics_dir / "metrics.yaml")
|
||||
new_metrics = {
|
||||
"type_distribution": type_counts,
|
||||
"vsm_type_matrix_cells": filled_cells,
|
||||
"type_entropy": round(type_entropy, 4),
|
||||
}
|
||||
merged = {**existing, **new_metrics}
|
||||
write_metrics_file(merged, metrics_dir / "metrics.yaml")
|
||||
click.echo(
|
||||
f"Updated metrics.yaml: type_entropy={type_entropy:.4f}, "
|
||||
f"vsm_type_matrix_cells={filled_cells}"
|
||||
)
|
||||
|
||||
|
||||
# ── viability ────────────────────────────────────────────────────────
|
||||
|
||||
|
||||
|
||||
@@ -253,6 +253,7 @@ class InfospaceConfig:
|
||||
pipeline: Optional[PipelineConfig] = None
|
||||
entities_dir: str = "output/entities"
|
||||
evaluations_dir: str = "output/evaluations"
|
||||
classifications_dir: str = "output/classifications"
|
||||
metrics_dir: str = "output/metrics"
|
||||
relations_dir: str = "output/relations"
|
||||
|
||||
@@ -275,6 +276,8 @@ class InfospaceConfig:
|
||||
d["entities_dir"] = self.entities_dir
|
||||
if self.evaluations_dir != "output/evaluations":
|
||||
d["evaluations_dir"] = self.evaluations_dir
|
||||
if self.classifications_dir != "output/classifications":
|
||||
d["classifications_dir"] = self.classifications_dir
|
||||
if self.metrics_dir != "output/metrics":
|
||||
d["metrics_dir"] = self.metrics_dir
|
||||
if self.relations_dir != "output/relations":
|
||||
@@ -303,6 +306,7 @@ class InfospaceConfig:
|
||||
pipeline=pipeline,
|
||||
entities_dir=data.get("entities_dir", "output/entities"),
|
||||
evaluations_dir=data.get("evaluations_dir", "output/evaluations"),
|
||||
classifications_dir=data.get("classifications_dir", "output/classifications"),
|
||||
metrics_dir=data.get("metrics_dir", "output/metrics"),
|
||||
relations_dir=data.get("relations_dir", "output/relations"),
|
||||
)
|
||||
|
||||
Reference in New Issue
Block a user