Files
markitect-main/examples/infospace-with-history/artifacts/guidelines/extraction-rules.md
tegwick 8095a1da4c fix(example): standardise domain enum and source chapter format in schema/rules
Two root causes of metric fragmentation observed in collection checks:

1. Schema's Economic Domain used free-form examples ("labour economics,
   trade theory") which overrode the enum in extraction-rules.md, causing
   the LLM to produce multi-domain strings and non-canonical values.
   Fix: schema now specifies the exact 7-value enum with descriptions.

2. Source Chapter had no format constraint, producing 9 different formats
   for 7 chapters (full titles, mixed Roman/Arabic numerals, asterisks).
   Fix: extraction-rules now mandate "Book [Roman], Chapter [n]" exactly.

These fixes are prerequisites for clean reprocessing (S3.2 continuation).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 13:01:09 +01:00

67 lines
2.7 KiB
Markdown

---
id: extraction-rules
name: extraction_rules
artifact_type: content
description: Guidelines for extracting economic entities from source text
version: 1.0.0
---
# Entity Extraction Rules
## What Constitutes an Entity
An economic entity is a distinct concept, actor, mechanism, or institution
that plays a functional role in Adam Smith's economic analysis. Extract
entities at the level of specificity where they carry independent meaning.
## Extraction Criteria
1. **Concepts**: Abstract economic ideas (e.g., "division of labour",
"effectual demand", "natural price"). Extract when Smith defines,
explains, or argues about the concept.
2. **Actors**: Economic agents with defined roles (e.g., "the labourer",
"the merchant", "the sovereign"). Extract when the actor performs
a distinct economic function.
3. **Mechanisms**: Processes or dynamics that produce economic effects
(e.g., "accumulation of stock", "market price adjustment",
"foreign trade"). Extract when the mechanism is described as
producing specific outcomes.
4. **Institutions**: Organised structures that shape economic behaviour
(e.g., "the corporation", "the guild", "the joint-stock company").
Extract when the institution's economic function is described.
## Granularity Rules
- Extract at the level of a single coherent concept.
- Do NOT extract synonyms as separate entities — choose the primary term
Smith uses and note variations.
- DO extract distinct aspects of a broad concept as separate entities when
Smith treats them independently (e.g., "wages of labour" and "profits
of stock" are separate from "price of commodities" even though they
compose it).
- If an entity appears across multiple chapters, extract it on first
significant appearance and note cross-references in later chapters.
## Naming Conventions
- Use Smith's own terminology where possible.
- Normalise to lowercase except for proper nouns.
- Use the most common form Smith uses (e.g., "division of labour" not
"divided labour").
## Quality Checks
- Each entity must have a definition that would be comprehensible without
reading the source chapter.
- Each entity must cite the specific book and chapter of first appearance.
- **Economic Domain** must be EXACTLY ONE of: Production, Distribution,
Exchange, Consumption, Accumulation, Regulation, or General Theory.
Do not combine multiple domains. Do not use any other value.
- **Source Chapter format**: Use `Book [Roman numeral], Chapter [number]`
— for example `Book I, Chapter 3`. Do not include the chapter title,
quotation marks, markdown formatting, or asterisks. Use Roman numerals
for the book (I, II, III, IV, V).