fix(example): standardise domain enum and source chapter format in schema/rules

Two root causes of metric fragmentation observed in collection checks:

1. Schema's Economic Domain used free-form examples ("labour economics,
   trade theory") which overrode the enum in extraction-rules.md, causing
   the LLM to produce multi-domain strings and non-canonical values.
   Fix: schema now specifies the exact 7-value enum with descriptions.

2. Source Chapter had no format constraint, producing 9 different formats
   for 7 chapters (full titles, mixed Roman/Arabic numerals, asterisks).
   Fix: extraction-rules now mandate "Book [Roman], Chapter [n]" exactly.

These fixes are prerequisites for clean reprocessing (S3.2 continuation).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-19 13:01:09 +01:00
parent 715ef19d1c
commit 77dd3fee6d
2 changed files with 19 additions and 4 deletions

View File

@@ -16,8 +16,18 @@ The broader context in which this entity appears within the source text.
Describe the argument or passage where the entity is discussed.
### Economic Domain
The area of economics this entity belongs to (e.g., labour economics,
trade theory, market theory, institutional economics).
The area of economics this entity belongs to. Use **exactly one** value
from this list:
- **Production** — labour, manufacturing, technology, productivity
- **Distribution** — wages, profit, rent, income shares
- **Exchange** — markets, prices, trade, money, barter
- **Consumption** — demand, utility, wants, expenditure
- **Accumulation** — capital, savings, stock, investment
- **Regulation** — policy, law, institutions, monopoly, government
- **General Theory** — foundational principles spanning multiple domains
Do not combine multiple values. Do not use any other domain name.
## Optional Sections