feat(llm): add OpenAI adapter, entity archive policy, process chapters 5-7

Add OpenAIAdapter for the OpenAI chat completions API (apikey-chatgpt.txt or OPENAI_API_KEY). Set default model to arcee-ai/trinity-large-preview:free for the infospace pipeline and increase max_tokens from 4096 to 8192. Reprocess chapter 05 with Trinity Large (was Gemini: 1 truncated entity, now 19 complete entities). Process chapters 06 (Aurora Alpha, 10 entities) and 07 (Trinity Large, 15 entities including regenerated violent-policy.md). Canonical set now at 85 unique entities. Add entity archive policy: entities are never silently deleted. Retired entities move to output/entities/archive/ with a dated reason header. New CLI option: --archive-entity <slug> --reason "...". The --list output shows the archive count alongside the canonical set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11 23:39:44 +01:00
parent 880c1d1374
commit 41773f1320
68 changed files with 6500 additions and 136 deletions
--- a/examples/infospace-with-history/TUTORIAL.md
+++ b/examples/infospace-with-history/TUTORIAL.md
@@ -94,6 +94,25 @@ automatically updates every chapter view that references it.
 focuses on genuinely new entities. At the file level, slug collisions
 are detected and skipped as a safety net.

+**Entity lifecycle**: Once an entity enters the canonical set, it is
+**never silently deleted**. Entities may only be retired when they have been
+subsumed by another entity, found to partially map onto other entities, or
+otherwise determined to be redundant. Retired entities are **archived** —
+moved to `output/entities/archive/` with a dated header documenting the
+reason. This preserves the intellectual history of the infospace: every
+decision to drop an entity is a deliberate, documented learning.
+
+```bash
+# Archive an entity that has been subsumed by another
+python process_chapters.py --archive-entity enlarged-monopoly \
+  --reason "Subsumed by monopoly-price — both describe the same market distortion"
+
+# The archived file retains its full content with an explanatory header
+cat output/entities/archive/enlarged-monopoly.md
+```
+
+The `--list` command shows both the active canonical set and the archive count.
+
 ---

 ## 3. Designing Schemas
@@ -565,13 +584,24 @@ rm -f examples/infospace-with-history/output/analyses/book-1-chapter-03-analysis
 python process_chapters.py --chapter book-1-chapter-03 --provider openrouter --no-commit
 ```

-To also re-extract specific entities, delete their canonical files first:
+**Important**: never silently delete canonical entity files. If an entity
+is no longer needed, **archive** it with a documented reason:

 ```bash
-rm -f examples/infospace-with-history/output/entities/extent-of-the-market.md
-# then re-process the chapter as above
+# Entity found to be redundant — archive it
+python process_chapters.py --archive-entity extent-of-the-market \
+  --reason "Subsumed by market-price and effectual-demand — the concept is fully covered by these two entities"
+
+# Then re-process the chapter
+python process_chapters.py --chapter book-1-chapter-03 --provider openrouter --no-commit
 ```

+If you genuinely need to re-extract an entity with different content
+(e.g., improving its definition), archive the old version first, then
+delete the archive copy only after confirming the new version is better.
+The archive in `output/entities/archive/` preserves the full intellectual
+history of the infospace — every refinement decision is traceable.
+
 ---

 ## 12. Infrastructure Issues Found and Fixed