Two root causes of metric fragmentation observed in collection checks:
1. Schema's Economic Domain used free-form examples ("labour economics,
trade theory") which overrode the enum in extraction-rules.md, causing
the LLM to produce multi-domain strings and non-canonical values.
Fix: schema now specifies the exact 7-value enum with descriptions.
2. Source Chapter had no format constraint, producing 9 different formats
for 7 chapters (full titles, mixed Roman/Arabic numerals, asterisks).
Fix: extraction-rules now mandate "Book [Roman], Chapter [n]" exactly.
These fixes are prerequisites for clean reprocessing (S3.2 continuation).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2.7 KiB
id, name, artifact_type, description, version
| id | name | artifact_type | description | version |
|---|---|---|---|---|
| extraction-rules | extraction_rules | content | Guidelines for extracting economic entities from source text | 1.0.0 |
Entity Extraction Rules
What Constitutes an Entity
An economic entity is a distinct concept, actor, mechanism, or institution that plays a functional role in Adam Smith's economic analysis. Extract entities at the level of specificity where they carry independent meaning.
Extraction Criteria
-
Concepts: Abstract economic ideas (e.g., "division of labour", "effectual demand", "natural price"). Extract when Smith defines, explains, or argues about the concept.
-
Actors: Economic agents with defined roles (e.g., "the labourer", "the merchant", "the sovereign"). Extract when the actor performs a distinct economic function.
-
Mechanisms: Processes or dynamics that produce economic effects (e.g., "accumulation of stock", "market price adjustment", "foreign trade"). Extract when the mechanism is described as producing specific outcomes.
-
Institutions: Organised structures that shape economic behaviour (e.g., "the corporation", "the guild", "the joint-stock company"). Extract when the institution's economic function is described.
Granularity Rules
- Extract at the level of a single coherent concept.
- Do NOT extract synonyms as separate entities — choose the primary term Smith uses and note variations.
- DO extract distinct aspects of a broad concept as separate entities when Smith treats them independently (e.g., "wages of labour" and "profits of stock" are separate from "price of commodities" even though they compose it).
- If an entity appears across multiple chapters, extract it on first significant appearance and note cross-references in later chapters.
Naming Conventions
- Use Smith's own terminology where possible.
- Normalise to lowercase except for proper nouns.
- Use the most common form Smith uses (e.g., "division of labour" not "divided labour").
Quality Checks
- Each entity must have a definition that would be comprehensible without reading the source chapter.
- Each entity must cite the specific book and chapter of first appearance.
- Economic Domain must be EXACTLY ONE of: Production, Distribution, Exchange, Consumption, Accumulation, Regulation, or General Theory. Do not combine multiple domains. Do not use any other value.
- Source Chapter format: Use
Book [Roman numeral], Chapter [number]— for exampleBook I, Chapter 3. Do not include the chapter title, quotation marks, markdown formatting, or asterisks. Use Roman numerals for the book (I, II, III, IV, V).