generated from coulomb/repo-seed
structured logging around key workflows and docs for operational readiness
This commit is contained in:
93
docs/operations.md
Normal file
93
docs/operations.md
Normal file
@@ -0,0 +1,93 @@
|
||||
# Operational Readiness
|
||||
|
||||
This note captures the runtime knobs and baseline operating procedures for the
|
||||
Repository Ability Registry service.
|
||||
|
||||
## Configuration
|
||||
|
||||
Configuration is read from environment variables with the `REPO_REGISTRY_`
|
||||
prefix.
|
||||
|
||||
| Variable | Default | Purpose |
|
||||
| --- | --- | --- |
|
||||
| `REPO_REGISTRY_DATABASE_PATH` | `var/repo-registry.sqlite3` | SQLite database file used by the default store. |
|
||||
| `REPO_REGISTRY_CHECKOUT_ROOT` | `var/checkouts` | Local checkout cache used during repository ingestion. |
|
||||
| `REPO_REGISTRY_LLM_PROVIDER` | unset | Optional LLM provider name for candidate extraction. |
|
||||
| `REPO_REGISTRY_LLM_MODEL` | unset | Optional model name passed to the configured LLM provider. |
|
||||
| `REPO_REGISTRY_EMBEDDING_PROVIDER` | unset | Set to `hashing` to enable deterministic local hybrid search scoring. |
|
||||
| `REPO_REGISTRY_LOG_LEVEL` | `INFO` | Log level for the `repo_registry.operations` structured event logger. |
|
||||
|
||||
## Health Checks
|
||||
|
||||
`GET /health` returns service status plus the operational dependencies that can
|
||||
be checked locally:
|
||||
|
||||
```json
|
||||
{
|
||||
"status": "ok",
|
||||
"database": {
|
||||
"path": "var/repo-registry.sqlite3",
|
||||
"reachable": true,
|
||||
"error": null
|
||||
},
|
||||
"checkout_root": {
|
||||
"path": "var/checkouts",
|
||||
"exists": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`status` is `degraded` when the database cannot be initialized or queried. The
|
||||
checkout root is reported as metadata because it may be created lazily by the
|
||||
ingestion path.
|
||||
|
||||
## Structured Logs
|
||||
|
||||
Operational events are emitted through the `repo_registry.operations` logger as
|
||||
single-line JSON messages. Current events include repository registration,
|
||||
analysis start/completion/failure, LLM extraction usage/failure, and review
|
||||
decisions.
|
||||
|
||||
Configure the Python or ASGI server logging stack to route this logger to the
|
||||
same sink as application logs. `REPO_REGISTRY_LOG_LEVEL` controls the logger
|
||||
level used by API-created service instances.
|
||||
|
||||
## SQLite Backup And Restore
|
||||
|
||||
For single-node SQLite deployments, prefer the SQLite backup API so readers can
|
||||
continue while the backup is created:
|
||||
|
||||
```bash
|
||||
mkdir -p backups
|
||||
sqlite3 var/repo-registry.sqlite3 ".backup 'backups/repo-registry-$(date +%F).sqlite3'"
|
||||
```
|
||||
|
||||
For the most conservative backup window, stop writes first, run the backup, then
|
||||
resume the service. Verify a backup with:
|
||||
|
||||
```bash
|
||||
sqlite3 backups/repo-registry-YYYY-MM-DD.sqlite3 "PRAGMA integrity_check;"
|
||||
```
|
||||
|
||||
To restore, stop the service, move the current database aside, copy the backup to
|
||||
`REPO_REGISTRY_DATABASE_PATH`, start the service, and verify `GET /health`.
|
||||
|
||||
## PostgreSQL Migration Notes
|
||||
|
||||
The storage interface is intentionally kept behind `RegistryStore` so a
|
||||
PostgreSQL-backed implementation can be introduced alongside SQLite before
|
||||
cutover. A production migration should:
|
||||
|
||||
1. Add a PostgreSQL store that preserves the current repository, analysis,
|
||||
observed fact, content chunk, candidate, approved registry, and review
|
||||
decision contracts.
|
||||
2. Manage schema changes with explicit migrations rather than implicit table
|
||||
creation.
|
||||
3. Export from SQLite and import into PostgreSQL in a repeatable script, then
|
||||
compare repository counts, approved ability maps, search results, and recent
|
||||
review decisions.
|
||||
4. Keep vector search optional. If pgvector is enabled, follow the plan in
|
||||
`docs/semantic-retrieval.md` and validate hybrid ranking before making it the
|
||||
default.
|
||||
5. Take a final SQLite backup immediately before cutover and retain it until the
|
||||
PostgreSQL deployment has passed health and smoke tests.
|
||||
Reference in New Issue
Block a user