feat(llm): add OpenAI adapter, entity archive policy, process chapters 5-7

Add OpenAIAdapter for the OpenAI chat completions API (apikey-chatgpt.txt
or OPENAI_API_KEY). Set default model to arcee-ai/trinity-large-preview:free
for the infospace pipeline and increase max_tokens from 4096 to 8192.

Reprocess chapter 05 with Trinity Large (was Gemini: 1 truncated entity,
now 19 complete entities). Process chapters 06 (Aurora Alpha, 10 entities)
and 07 (Trinity Large, 15 entities including regenerated violent-policy.md).
Canonical set now at 85 unique entities.

Add entity archive policy: entities are never silently deleted. Retired
entities move to output/entities/archive/ with a dated reason header.
New CLI option: --archive-entity <slug> --reason "...". The --list
output shows the archive count alongside the canonical set.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-11 23:39:44 +01:00
parent 880c1d1374
commit 41773f1320
68 changed files with 6500 additions and 136 deletions

View File

@@ -94,6 +94,25 @@ automatically updates every chapter view that references it.
focuses on genuinely new entities. At the file level, slug collisions
are detected and skipped as a safety net.
**Entity lifecycle**: Once an entity enters the canonical set, it is
**never silently deleted**. Entities may only be retired when they have been
subsumed by another entity, found to partially map onto other entities, or
otherwise determined to be redundant. Retired entities are **archived**
moved to `output/entities/archive/` with a dated header documenting the
reason. This preserves the intellectual history of the infospace: every
decision to drop an entity is a deliberate, documented learning.
```bash
# Archive an entity that has been subsumed by another
python process_chapters.py --archive-entity enlarged-monopoly \
--reason "Subsumed by monopoly-price — both describe the same market distortion"
# The archived file retains its full content with an explanatory header
cat output/entities/archive/enlarged-monopoly.md
```
The `--list` command shows both the active canonical set and the archive count.
---
## 3. Designing Schemas
@@ -565,13 +584,24 @@ rm -f examples/infospace-with-history/output/analyses/book-1-chapter-03-analysis
python process_chapters.py --chapter book-1-chapter-03 --provider openrouter --no-commit
```
To also re-extract specific entities, delete their canonical files first:
**Important**: never silently delete canonical entity files. If an entity
is no longer needed, **archive** it with a documented reason:
```bash
rm -f examples/infospace-with-history/output/entities/extent-of-the-market.md
# then re-process the chapter as above
# Entity found to be redundant — archive it
python process_chapters.py --archive-entity extent-of-the-market \
--reason "Subsumed by market-price and effectual-demand — the concept is fully covered by these two entities"
# Then re-process the chapter
python process_chapters.py --chapter book-1-chapter-03 --provider openrouter --no-commit
```
If you genuinely need to re-extract an entity with different content
(e.g., improving its definition), archive the old version first, then
delete the archive copy only after confirming the new version is better.
The archive in `output/entities/archive/` preserves the full intellectual
history of the infospace — every refinement decision is traceable.
---
## 12. Infrastructure Issues Found and Fixed