docs(infospace): document infospace.db and add to .gitignore

The SQLite artifact database is a derived cache regenerable from committed files — no LLM calls needed. Added tutorial section explaining why it is excluded and how to rebuild it after a fresh clone. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 22:27:08 +01:00
parent 60f33443ae
commit 2f0989f9bf
2 changed files with 55 additions and 7 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -78,6 +78,7 @@ Thumbs.db

 # MarkiTect database files (local development)
 markitect.db
+**/infospace.db
 assets/assets.db
 **/assets.db
 .markitect/
--- a/examples/infospace-with-history/TUTORIAL.md
+++ b/examples/infospace-with-history/TUTORIAL.md
@@ -43,6 +43,7 @@ examples/infospace-with-history/
 ├── TUTORIAL.md                 # This file
 ├── INFRA-TASKS.md              # Infrastructure issues found during the experiment
 ├── process_chapters.py         # Pipeline script
+├── infospace.db                # SQLite artifact database (generated, not in git)
 │
 ├── schemas/                    # Output structure definitions
 │   ├── economic-entity-schema-v1.0.md
@@ -369,7 +370,53 @@ python process_chapters.py --stats

 ---

-## 7. How the LLM Integration Works
+## 7. The Artifact Database (`infospace.db`)
+
+The pipeline stores all artifacts (source text, templates, guidelines, generated
+outputs) and their dependency edges in a local SQLite database —
+`infospace.db`. This file is **not checked into git** because it is a derived
+cache that can be regenerated deterministically from the files already in the
+repository.
+
+### Why it is excluded
+
+- **Binary format** — SQLite databases don't produce meaningful diffs and
+  would bloat the git history with every pipeline run.
+- **Fully derived** — every piece of data in the database originates from
+  markdown files that *are* tracked in git (sources, templates, schemas,
+  guidelines, and generated output).
+- **Reproducible** — re-running the pipeline rebuilds the database from
+  scratch without any LLM calls, because each stage checks for existing
+  output files on disk before invoking the LLM.
+
+### How to regenerate it
+
+If `infospace.db` is missing (e.g. after a fresh clone), rebuild it by
+re-running the pipeline over the chapters that already have output on disk:
+
+```bash
+# Regenerate the database from existing output files (no LLM calls needed):
+python process_chapters.py --all --no-commit
+```
+
+This will:
+
+1. Create a fresh `infospace.db`
+2. Load all static artifacts (templates, guidelines, VSM reference)
+3. For each chapter whose output files already exist, import them into the
+   database and record dependency edges
+4. Skip LLM calls entirely — existing files are detected and reused
+
+After regeneration, `--list` and `--stats` work as normal:
+
+```bash
+python process_chapters.py --list
+python process_chapters.py --stats
+```
+
+---
+
+## 8. How the LLM Integration Works

 The pipeline uses MarkiTect's `markitect.llm` module, which provides three
 adapter backends that implement the `LLMAdapter` interface:
@@ -423,7 +470,7 @@ supports `gemini-2.5-flash` with generous rate limits.

 ---

-## 8. Tracking History with Git
+## 9. Tracking History with Git

 Every processed chapter produces a git commit containing:

@@ -459,7 +506,7 @@ git commit -m "infospace: process book-1-chapter-05"

 ---

-## 9. Cost and Performance
+## 10. Cost and Performance

 From our measurements processing chapters 3-5:

@@ -486,7 +533,7 @@ To reduce costs further, use a cheaper model:

 ---

-## 10. Completing the Remaining Chapters
+## 11. Completing the Remaining Chapters

 As of now, 5 of 35 chapters are processed (Book I, Chapters 1-5). Here is
 how to complete the rest.
@@ -555,7 +602,7 @@ fill the remaining gaps in S3*, S5, and regulatory concepts.

 ---

-## 11. Quality Improvement Loop
+## 12. Quality Improvement Loop

 The infospace is designed to be **iteratively refined**:

@@ -604,7 +651,7 @@ history of the infospace — every refinement decision is traceable.

 ---

-## 12. Infrastructure Issues Found and Fixed
+## 13. Infrastructure Issues Found and Fixed

 During development we documented three issues with the MarkiTect
 infrastructure in `INFRA-TASKS.md`:
@@ -624,7 +671,7 @@ See `INFRA-TASKS.md` for details on each fix.

 ---

-## 13. Adapting This Pattern to Your Own Project
+## 14. Adapting This Pattern to Your Own Project

 To build your own infospace using this pattern: