docs(infospace): document infospace.db and add to .gitignore
The SQLite artifact database is a derived cache regenerable from committed files — no LLM calls needed. Added tutorial section explaining why it is excluded and how to rebuild it after a fresh clone. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
1
.gitignore
vendored
1
.gitignore
vendored
@@ -78,6 +78,7 @@ Thumbs.db
|
||||
|
||||
# MarkiTect database files (local development)
|
||||
markitect.db
|
||||
**/infospace.db
|
||||
assets/assets.db
|
||||
**/assets.db
|
||||
.markitect/
|
||||
|
||||
@@ -43,6 +43,7 @@ examples/infospace-with-history/
|
||||
├── TUTORIAL.md # This file
|
||||
├── INFRA-TASKS.md # Infrastructure issues found during the experiment
|
||||
├── process_chapters.py # Pipeline script
|
||||
├── infospace.db # SQLite artifact database (generated, not in git)
|
||||
│
|
||||
├── schemas/ # Output structure definitions
|
||||
│ ├── economic-entity-schema-v1.0.md
|
||||
@@ -369,7 +370,53 @@ python process_chapters.py --stats
|
||||
|
||||
---
|
||||
|
||||
## 7. How the LLM Integration Works
|
||||
## 7. The Artifact Database (`infospace.db`)
|
||||
|
||||
The pipeline stores all artifacts (source text, templates, guidelines, generated
|
||||
outputs) and their dependency edges in a local SQLite database —
|
||||
`infospace.db`. This file is **not checked into git** because it is a derived
|
||||
cache that can be regenerated deterministically from the files already in the
|
||||
repository.
|
||||
|
||||
### Why it is excluded
|
||||
|
||||
- **Binary format** — SQLite databases don't produce meaningful diffs and
|
||||
would bloat the git history with every pipeline run.
|
||||
- **Fully derived** — every piece of data in the database originates from
|
||||
markdown files that *are* tracked in git (sources, templates, schemas,
|
||||
guidelines, and generated output).
|
||||
- **Reproducible** — re-running the pipeline rebuilds the database from
|
||||
scratch without any LLM calls, because each stage checks for existing
|
||||
output files on disk before invoking the LLM.
|
||||
|
||||
### How to regenerate it
|
||||
|
||||
If `infospace.db` is missing (e.g. after a fresh clone), rebuild it by
|
||||
re-running the pipeline over the chapters that already have output on disk:
|
||||
|
||||
```bash
|
||||
# Regenerate the database from existing output files (no LLM calls needed):
|
||||
python process_chapters.py --all --no-commit
|
||||
```
|
||||
|
||||
This will:
|
||||
|
||||
1. Create a fresh `infospace.db`
|
||||
2. Load all static artifacts (templates, guidelines, VSM reference)
|
||||
3. For each chapter whose output files already exist, import them into the
|
||||
database and record dependency edges
|
||||
4. Skip LLM calls entirely — existing files are detected and reused
|
||||
|
||||
After regeneration, `--list` and `--stats` work as normal:
|
||||
|
||||
```bash
|
||||
python process_chapters.py --list
|
||||
python process_chapters.py --stats
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. How the LLM Integration Works
|
||||
|
||||
The pipeline uses MarkiTect's `markitect.llm` module, which provides three
|
||||
adapter backends that implement the `LLMAdapter` interface:
|
||||
@@ -423,7 +470,7 @@ supports `gemini-2.5-flash` with generous rate limits.
|
||||
|
||||
---
|
||||
|
||||
## 8. Tracking History with Git
|
||||
## 9. Tracking History with Git
|
||||
|
||||
Every processed chapter produces a git commit containing:
|
||||
|
||||
@@ -459,7 +506,7 @@ git commit -m "infospace: process book-1-chapter-05"
|
||||
|
||||
---
|
||||
|
||||
## 9. Cost and Performance
|
||||
## 10. Cost and Performance
|
||||
|
||||
From our measurements processing chapters 3-5:
|
||||
|
||||
@@ -486,7 +533,7 @@ To reduce costs further, use a cheaper model:
|
||||
|
||||
---
|
||||
|
||||
## 10. Completing the Remaining Chapters
|
||||
## 11. Completing the Remaining Chapters
|
||||
|
||||
As of now, 5 of 35 chapters are processed (Book I, Chapters 1-5). Here is
|
||||
how to complete the rest.
|
||||
@@ -555,7 +602,7 @@ fill the remaining gaps in S3*, S5, and regulatory concepts.
|
||||
|
||||
---
|
||||
|
||||
## 11. Quality Improvement Loop
|
||||
## 12. Quality Improvement Loop
|
||||
|
||||
The infospace is designed to be **iteratively refined**:
|
||||
|
||||
@@ -604,7 +651,7 @@ history of the infospace — every refinement decision is traceable.
|
||||
|
||||
---
|
||||
|
||||
## 12. Infrastructure Issues Found and Fixed
|
||||
## 13. Infrastructure Issues Found and Fixed
|
||||
|
||||
During development we documented three issues with the MarkiTect
|
||||
infrastructure in `INFRA-TASKS.md`:
|
||||
@@ -624,7 +671,7 @@ See `INFRA-TASKS.md` for details on each fix.
|
||||
|
||||
---
|
||||
|
||||
## 13. Adapting This Pattern to Your Own Project
|
||||
## 14. Adapting This Pattern to Your Own Project
|
||||
|
||||
To build your own infospace using this pattern:
|
||||
|
||||
|
||||
Reference in New Issue
Block a user