docs(infospace): document infospace.db and add to .gitignore

The SQLite artifact database is a derived cache regenerable from
committed files — no LLM calls needed. Added tutorial section
explaining why it is excluded and how to rebuild it after a fresh clone.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-18 22:27:08 +01:00
parent 60f33443ae
commit 2f0989f9bf
2 changed files with 55 additions and 7 deletions

1
.gitignore vendored
View File

@@ -78,6 +78,7 @@ Thumbs.db
# MarkiTect database files (local development)
markitect.db
**/infospace.db
assets/assets.db
**/assets.db
.markitect/

View File

@@ -43,6 +43,7 @@ examples/infospace-with-history/
├── TUTORIAL.md # This file
├── INFRA-TASKS.md # Infrastructure issues found during the experiment
├── process_chapters.py # Pipeline script
├── infospace.db # SQLite artifact database (generated, not in git)
├── schemas/ # Output structure definitions
│ ├── economic-entity-schema-v1.0.md
@@ -369,7 +370,53 @@ python process_chapters.py --stats
---
## 7. How the LLM Integration Works
## 7. The Artifact Database (`infospace.db`)
The pipeline stores all artifacts (source text, templates, guidelines, generated
outputs) and their dependency edges in a local SQLite database —
`infospace.db`. This file is **not checked into git** because it is a derived
cache that can be regenerated deterministically from the files already in the
repository.
### Why it is excluded
- **Binary format** — SQLite databases don't produce meaningful diffs and
would bloat the git history with every pipeline run.
- **Fully derived** — every piece of data in the database originates from
markdown files that *are* tracked in git (sources, templates, schemas,
guidelines, and generated output).
- **Reproducible** — re-running the pipeline rebuilds the database from
scratch without any LLM calls, because each stage checks for existing
output files on disk before invoking the LLM.
### How to regenerate it
If `infospace.db` is missing (e.g. after a fresh clone), rebuild it by
re-running the pipeline over the chapters that already have output on disk:
```bash
# Regenerate the database from existing output files (no LLM calls needed):
python process_chapters.py --all --no-commit
```
This will:
1. Create a fresh `infospace.db`
2. Load all static artifacts (templates, guidelines, VSM reference)
3. For each chapter whose output files already exist, import them into the
database and record dependency edges
4. Skip LLM calls entirely — existing files are detected and reused
After regeneration, `--list` and `--stats` work as normal:
```bash
python process_chapters.py --list
python process_chapters.py --stats
```
---
## 8. How the LLM Integration Works
The pipeline uses MarkiTect's `markitect.llm` module, which provides three
adapter backends that implement the `LLMAdapter` interface:
@@ -423,7 +470,7 @@ supports `gemini-2.5-flash` with generous rate limits.
---
## 8. Tracking History with Git
## 9. Tracking History with Git
Every processed chapter produces a git commit containing:
@@ -459,7 +506,7 @@ git commit -m "infospace: process book-1-chapter-05"
---
## 9. Cost and Performance
## 10. Cost and Performance
From our measurements processing chapters 3-5:
@@ -486,7 +533,7 @@ To reduce costs further, use a cheaper model:
---
## 10. Completing the Remaining Chapters
## 11. Completing the Remaining Chapters
As of now, 5 of 35 chapters are processed (Book I, Chapters 1-5). Here is
how to complete the rest.
@@ -555,7 +602,7 @@ fill the remaining gaps in S3*, S5, and regulatory concepts.
---
## 11. Quality Improvement Loop
## 12. Quality Improvement Loop
The infospace is designed to be **iteratively refined**:
@@ -604,7 +651,7 @@ history of the infospace — every refinement decision is traceable.
---
## 12. Infrastructure Issues Found and Fixed
## 13. Infrastructure Issues Found and Fixed
During development we documented three issues with the MarkiTect
infrastructure in `INFRA-TASKS.md`:
@@ -624,7 +671,7 @@ See `INFRA-TASKS.md` for details on each fix.
---
## 13. Adapting This Pattern to Your Own Project
## 14. Adapting This Pattern to Your Own Project
To build your own infospace using this pattern: