docs(infospace): document infospace.db and add to .gitignore
The SQLite artifact database is a derived cache regenerable from committed files — no LLM calls needed. Added tutorial section explaining why it is excluded and how to rebuild it after a fresh clone. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
1
.gitignore
vendored
1
.gitignore
vendored
@@ -78,6 +78,7 @@ Thumbs.db
|
|||||||
|
|
||||||
# MarkiTect database files (local development)
|
# MarkiTect database files (local development)
|
||||||
markitect.db
|
markitect.db
|
||||||
|
**/infospace.db
|
||||||
assets/assets.db
|
assets/assets.db
|
||||||
**/assets.db
|
**/assets.db
|
||||||
.markitect/
|
.markitect/
|
||||||
|
|||||||
@@ -43,6 +43,7 @@ examples/infospace-with-history/
|
|||||||
├── TUTORIAL.md # This file
|
├── TUTORIAL.md # This file
|
||||||
├── INFRA-TASKS.md # Infrastructure issues found during the experiment
|
├── INFRA-TASKS.md # Infrastructure issues found during the experiment
|
||||||
├── process_chapters.py # Pipeline script
|
├── process_chapters.py # Pipeline script
|
||||||
|
├── infospace.db # SQLite artifact database (generated, not in git)
|
||||||
│
|
│
|
||||||
├── schemas/ # Output structure definitions
|
├── schemas/ # Output structure definitions
|
||||||
│ ├── economic-entity-schema-v1.0.md
|
│ ├── economic-entity-schema-v1.0.md
|
||||||
@@ -369,7 +370,53 @@ python process_chapters.py --stats
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 7. How the LLM Integration Works
|
## 7. The Artifact Database (`infospace.db`)
|
||||||
|
|
||||||
|
The pipeline stores all artifacts (source text, templates, guidelines, generated
|
||||||
|
outputs) and their dependency edges in a local SQLite database —
|
||||||
|
`infospace.db`. This file is **not checked into git** because it is a derived
|
||||||
|
cache that can be regenerated deterministically from the files already in the
|
||||||
|
repository.
|
||||||
|
|
||||||
|
### Why it is excluded
|
||||||
|
|
||||||
|
- **Binary format** — SQLite databases don't produce meaningful diffs and
|
||||||
|
would bloat the git history with every pipeline run.
|
||||||
|
- **Fully derived** — every piece of data in the database originates from
|
||||||
|
markdown files that *are* tracked in git (sources, templates, schemas,
|
||||||
|
guidelines, and generated output).
|
||||||
|
- **Reproducible** — re-running the pipeline rebuilds the database from
|
||||||
|
scratch without any LLM calls, because each stage checks for existing
|
||||||
|
output files on disk before invoking the LLM.
|
||||||
|
|
||||||
|
### How to regenerate it
|
||||||
|
|
||||||
|
If `infospace.db` is missing (e.g. after a fresh clone), rebuild it by
|
||||||
|
re-running the pipeline over the chapters that already have output on disk:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Regenerate the database from existing output files (no LLM calls needed):
|
||||||
|
python process_chapters.py --all --no-commit
|
||||||
|
```
|
||||||
|
|
||||||
|
This will:
|
||||||
|
|
||||||
|
1. Create a fresh `infospace.db`
|
||||||
|
2. Load all static artifacts (templates, guidelines, VSM reference)
|
||||||
|
3. For each chapter whose output files already exist, import them into the
|
||||||
|
database and record dependency edges
|
||||||
|
4. Skip LLM calls entirely — existing files are detected and reused
|
||||||
|
|
||||||
|
After regeneration, `--list` and `--stats` work as normal:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python process_chapters.py --list
|
||||||
|
python process_chapters.py --stats
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. How the LLM Integration Works
|
||||||
|
|
||||||
The pipeline uses MarkiTect's `markitect.llm` module, which provides three
|
The pipeline uses MarkiTect's `markitect.llm` module, which provides three
|
||||||
adapter backends that implement the `LLMAdapter` interface:
|
adapter backends that implement the `LLMAdapter` interface:
|
||||||
@@ -423,7 +470,7 @@ supports `gemini-2.5-flash` with generous rate limits.
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 8. Tracking History with Git
|
## 9. Tracking History with Git
|
||||||
|
|
||||||
Every processed chapter produces a git commit containing:
|
Every processed chapter produces a git commit containing:
|
||||||
|
|
||||||
@@ -459,7 +506,7 @@ git commit -m "infospace: process book-1-chapter-05"
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 9. Cost and Performance
|
## 10. Cost and Performance
|
||||||
|
|
||||||
From our measurements processing chapters 3-5:
|
From our measurements processing chapters 3-5:
|
||||||
|
|
||||||
@@ -486,7 +533,7 @@ To reduce costs further, use a cheaper model:
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 10. Completing the Remaining Chapters
|
## 11. Completing the Remaining Chapters
|
||||||
|
|
||||||
As of now, 5 of 35 chapters are processed (Book I, Chapters 1-5). Here is
|
As of now, 5 of 35 chapters are processed (Book I, Chapters 1-5). Here is
|
||||||
how to complete the rest.
|
how to complete the rest.
|
||||||
@@ -555,7 +602,7 @@ fill the remaining gaps in S3*, S5, and regulatory concepts.
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 11. Quality Improvement Loop
|
## 12. Quality Improvement Loop
|
||||||
|
|
||||||
The infospace is designed to be **iteratively refined**:
|
The infospace is designed to be **iteratively refined**:
|
||||||
|
|
||||||
@@ -604,7 +651,7 @@ history of the infospace — every refinement decision is traceable.
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 12. Infrastructure Issues Found and Fixed
|
## 13. Infrastructure Issues Found and Fixed
|
||||||
|
|
||||||
During development we documented three issues with the MarkiTect
|
During development we documented three issues with the MarkiTect
|
||||||
infrastructure in `INFRA-TASKS.md`:
|
infrastructure in `INFRA-TASKS.md`:
|
||||||
@@ -624,7 +671,7 @@ See `INFRA-TASKS.md` for details on each fix.
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 13. Adapting This Pattern to Your Own Project
|
## 14. Adapting This Pattern to Your Own Project
|
||||||
|
|
||||||
To build your own infospace using this pattern:
|
To build your own infospace using this pattern:
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user