feat(llm): add OpenAI adapter, entity archive policy, process chapters 5-7
Add OpenAIAdapter for the OpenAI chat completions API (apikey-chatgpt.txt or OPENAI_API_KEY). Set default model to arcee-ai/trinity-large-preview:free for the infospace pipeline and increase max_tokens from 4096 to 8192. Reprocess chapter 05 with Trinity Large (was Gemini: 1 truncated entity, now 19 complete entities). Process chapters 06 (Aurora Alpha, 10 entities) and 07 (Trinity Large, 15 entities including regenerated violent-policy.md). Canonical set now at 85 unique entities. Add entity archive policy: entities are never silently deleted. Retired entities move to output/entities/archive/ with a dated reason header. New CLI option: --archive-entity <slug> --reason "...". The --list output shows the archive count alongside the canonical set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -94,6 +94,25 @@ automatically updates every chapter view that references it.
|
||||
focuses on genuinely new entities. At the file level, slug collisions
|
||||
are detected and skipped as a safety net.
|
||||
|
||||
**Entity lifecycle**: Once an entity enters the canonical set, it is
|
||||
**never silently deleted**. Entities may only be retired when they have been
|
||||
subsumed by another entity, found to partially map onto other entities, or
|
||||
otherwise determined to be redundant. Retired entities are **archived** —
|
||||
moved to `output/entities/archive/` with a dated header documenting the
|
||||
reason. This preserves the intellectual history of the infospace: every
|
||||
decision to drop an entity is a deliberate, documented learning.
|
||||
|
||||
```bash
|
||||
# Archive an entity that has been subsumed by another
|
||||
python process_chapters.py --archive-entity enlarged-monopoly \
|
||||
--reason "Subsumed by monopoly-price — both describe the same market distortion"
|
||||
|
||||
# The archived file retains its full content with an explanatory header
|
||||
cat output/entities/archive/enlarged-monopoly.md
|
||||
```
|
||||
|
||||
The `--list` command shows both the active canonical set and the archive count.
|
||||
|
||||
---
|
||||
|
||||
## 3. Designing Schemas
|
||||
@@ -565,13 +584,24 @@ rm -f examples/infospace-with-history/output/analyses/book-1-chapter-03-analysis
|
||||
python process_chapters.py --chapter book-1-chapter-03 --provider openrouter --no-commit
|
||||
```
|
||||
|
||||
To also re-extract specific entities, delete their canonical files first:
|
||||
**Important**: never silently delete canonical entity files. If an entity
|
||||
is no longer needed, **archive** it with a documented reason:
|
||||
|
||||
```bash
|
||||
rm -f examples/infospace-with-history/output/entities/extent-of-the-market.md
|
||||
# then re-process the chapter as above
|
||||
# Entity found to be redundant — archive it
|
||||
python process_chapters.py --archive-entity extent-of-the-market \
|
||||
--reason "Subsumed by market-price and effectual-demand — the concept is fully covered by these two entities"
|
||||
|
||||
# Then re-process the chapter
|
||||
python process_chapters.py --chapter book-1-chapter-03 --provider openrouter --no-commit
|
||||
```
|
||||
|
||||
If you genuinely need to re-extract an entity with different content
|
||||
(e.g., improving its definition), archive the old version first, then
|
||||
delete the archive copy only after confirming the new version is better.
|
||||
The archive in `output/entities/archive/` preserves the full intellectual
|
||||
history of the infospace — every refinement decision is traceable.
|
||||
|
||||
---
|
||||
|
||||
## 12. Infrastructure Issues Found and Fixed
|
||||
|
||||
Reference in New Issue
Block a user