Two root causes of metric fragmentation observed in collection checks:
1. Schema's Economic Domain used free-form examples ("labour economics,
trade theory") which overrode the enum in extraction-rules.md, causing
the LLM to produce multi-domain strings and non-canonical values.
Fix: schema now specifies the exact 7-value enum with descriptions.
2. Source Chapter had no format constraint, producing 9 different formats
for 7 chapters (full titles, mixed Roman/Arabic numerals, asterisks).
Fix: extraction-rules now mandate "Book [Roman], Chapter [n]" exactly.
These fixes are prerequisites for clean reprocessing (S3.2 continuation).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
67 lines
2.7 KiB
Markdown
67 lines
2.7 KiB
Markdown
---
|
|
id: extraction-rules
|
|
name: extraction_rules
|
|
artifact_type: content
|
|
description: Guidelines for extracting economic entities from source text
|
|
version: 1.0.0
|
|
---
|
|
|
|
# Entity Extraction Rules
|
|
|
|
## What Constitutes an Entity
|
|
|
|
An economic entity is a distinct concept, actor, mechanism, or institution
|
|
that plays a functional role in Adam Smith's economic analysis. Extract
|
|
entities at the level of specificity where they carry independent meaning.
|
|
|
|
## Extraction Criteria
|
|
|
|
1. **Concepts**: Abstract economic ideas (e.g., "division of labour",
|
|
"effectual demand", "natural price"). Extract when Smith defines,
|
|
explains, or argues about the concept.
|
|
|
|
2. **Actors**: Economic agents with defined roles (e.g., "the labourer",
|
|
"the merchant", "the sovereign"). Extract when the actor performs
|
|
a distinct economic function.
|
|
|
|
3. **Mechanisms**: Processes or dynamics that produce economic effects
|
|
(e.g., "accumulation of stock", "market price adjustment",
|
|
"foreign trade"). Extract when the mechanism is described as
|
|
producing specific outcomes.
|
|
|
|
4. **Institutions**: Organised structures that shape economic behaviour
|
|
(e.g., "the corporation", "the guild", "the joint-stock company").
|
|
Extract when the institution's economic function is described.
|
|
|
|
## Granularity Rules
|
|
|
|
- Extract at the level of a single coherent concept.
|
|
- Do NOT extract synonyms as separate entities — choose the primary term
|
|
Smith uses and note variations.
|
|
- DO extract distinct aspects of a broad concept as separate entities when
|
|
Smith treats them independently (e.g., "wages of labour" and "profits
|
|
of stock" are separate from "price of commodities" even though they
|
|
compose it).
|
|
- If an entity appears across multiple chapters, extract it on first
|
|
significant appearance and note cross-references in later chapters.
|
|
|
|
## Naming Conventions
|
|
|
|
- Use Smith's own terminology where possible.
|
|
- Normalise to lowercase except for proper nouns.
|
|
- Use the most common form Smith uses (e.g., "division of labour" not
|
|
"divided labour").
|
|
|
|
## Quality Checks
|
|
|
|
- Each entity must have a definition that would be comprehensible without
|
|
reading the source chapter.
|
|
- Each entity must cite the specific book and chapter of first appearance.
|
|
- **Economic Domain** must be EXACTLY ONE of: Production, Distribution,
|
|
Exchange, Consumption, Accumulation, Regulation, or General Theory.
|
|
Do not combine multiple domains. Do not use any other value.
|
|
- **Source Chapter format**: Use `Book [Roman numeral], Chapter [number]`
|
|
— for example `Book I, Chapter 3`. Do not include the chapter title,
|
|
quotation marks, markdown formatting, or asterisks. Use Roman numerals
|
|
for the book (I, II, III, IV, V).
|