Files
markitect-main/examples/infospace-with-history/artifacts/guidelines/extraction-rules.md
tegwick 8095a1da4c fix(example): standardise domain enum and source chapter format in schema/rules
Two root causes of metric fragmentation observed in collection checks:

1. Schema's Economic Domain used free-form examples ("labour economics,
   trade theory") which overrode the enum in extraction-rules.md, causing
   the LLM to produce multi-domain strings and non-canonical values.
   Fix: schema now specifies the exact 7-value enum with descriptions.

2. Source Chapter had no format constraint, producing 9 different formats
   for 7 chapters (full titles, mixed Roman/Arabic numerals, asterisks).
   Fix: extraction-rules now mandate "Book [Roman], Chapter [n]" exactly.

These fixes are prerequisites for clean reprocessing (S3.2 continuation).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-19 13:01:09 +01:00

2.7 KiB

id, name, artifact_type, description, version
id name artifact_type description version
extraction-rules extraction_rules content Guidelines for extracting economic entities from source text 1.0.0

Entity Extraction Rules

What Constitutes an Entity

An economic entity is a distinct concept, actor, mechanism, or institution that plays a functional role in Adam Smith's economic analysis. Extract entities at the level of specificity where they carry independent meaning.

Extraction Criteria

  1. Concepts: Abstract economic ideas (e.g., "division of labour", "effectual demand", "natural price"). Extract when Smith defines, explains, or argues about the concept.

  2. Actors: Economic agents with defined roles (e.g., "the labourer", "the merchant", "the sovereign"). Extract when the actor performs a distinct economic function.

  3. Mechanisms: Processes or dynamics that produce economic effects (e.g., "accumulation of stock", "market price adjustment", "foreign trade"). Extract when the mechanism is described as producing specific outcomes.

  4. Institutions: Organised structures that shape economic behaviour (e.g., "the corporation", "the guild", "the joint-stock company"). Extract when the institution's economic function is described.

Granularity Rules

  • Extract at the level of a single coherent concept.
  • Do NOT extract synonyms as separate entities — choose the primary term Smith uses and note variations.
  • DO extract distinct aspects of a broad concept as separate entities when Smith treats them independently (e.g., "wages of labour" and "profits of stock" are separate from "price of commodities" even though they compose it).
  • If an entity appears across multiple chapters, extract it on first significant appearance and note cross-references in later chapters.

Naming Conventions

  • Use Smith's own terminology where possible.
  • Normalise to lowercase except for proper nouns.
  • Use the most common form Smith uses (e.g., "division of labour" not "divided labour").

Quality Checks

  • Each entity must have a definition that would be comprehensible without reading the source chapter.
  • Each entity must cite the specific book and chapter of first appearance.
  • Economic Domain must be EXACTLY ONE of: Production, Distribution, Exchange, Consumption, Accumulation, Regulation, or General Theory. Do not combine multiple domains. Do not use any other value.
  • Source Chapter format: Use Book [Roman numeral], Chapter [number] — for example Book I, Chapter 3. Do not include the chapter title, quotation marks, markdown formatting, or asterisks. Use Roman numerals for the book (I, II, III, IV, V).