feat(sbom): scan mode, domain grouping dashboard, SBOM convention doc

- ingest_sbom.py: add --scan flag (recursive lockfile discovery) +
  --lockfile repeatable for explicit multi-file ingestion; skip
  .venv/node_modules/.git/dist/etc; Makefile gains SCAN= and REPO_PATH= vars
- sbom.md: add /domains/ fetch; domain-level summary table; per-repo
  accordion with details/summary; domain filter on package table; dual-
  licence false-positive note; +1 KPI card (Domains Covered)
- canon/standards/sbom-convention_v0.1.md: authoritative lockfile table,
  ingest workflow (single/scan/explicit), snapshot semantics, direct-vs-
  transitive caveats, licence governance + copyleft escalation, update
  cadence, multi-repo domain pattern, planned enhancements

First ingest: the-custodian — 420 pkgs (88 python + 332 node), 13 licence
groups, 1 copyleft flag (jszip dual-licensed MIT OR GPL-3.0-or-later)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-01 16:15:40 +01:00
parent e471ed2cd5
commit 1c3c6ef27d
4 changed files with 450 additions and 29 deletions

View File

@@ -0,0 +1,253 @@
---
id: SBOM-CONV-001
type: standard
title: "SBOM Convention v0.1 — Dependency Tracking & Licence Governance"
domain: custodian
status: active
version: "0.1"
created: "2026-03-01"
updated: "2026-03-01"
---
# SBOM Convention v0.1 — Dependency Tracking & Licence Governance
## Purpose
This convention defines how every Custodian-registered project captures,
stores, and reports its software supply-chain inventory to the State Hub SBOM
store. It establishes:
- Which lockfiles are authoritative per ecosystem
- How to run SBOM ingestion (single-ecosystem and multi-ecosystem repos)
- How to keep the data current
- Licence governance rules and escalation thresholds
The State Hub SBOM store aggregates across all registered repos. The
dashboard (`/sbom`) provides domain-level and repo-level drill-down.
---
## 1. Authoritative Lockfiles per Ecosystem
| Ecosystem | Authoritative file | Notes |
|-----------|-------------------|-------|
| Python | `uv.lock` | Preferred. `requirements.txt` accepted as fallback |
| Node / npm | `package-lock.json` | Preferred. `yarn.lock` accepted |
| Rust | `Cargo.lock` | Auto-detected |
| Go | `go.sum` | *Not yet parsed — planned* |
| Java / JVM | `gradle.lockfile` / `pom.xml` | *Not yet parsed — planned* |
**Principle:** commit lockfiles to the repo. Lockfiles are the SBOM source
of truth; do not generate them at ingest time.
---
## 2. Repo Registration Prerequisite
Before SBOM data can be reported, the repo must be registered in the State Hub:
```bash
cd ~/the-custodian/state-hub
make add-repo DOMAIN=<domain-slug> SLUG=<repo-slug> NAME="<Display Name>" PATH=/absolute/path/to/repo
```
Check registered repos:
```bash
make list-repos
# or
curl -s http://127.0.0.1:8000/repos/ | python3 -m json.tool
```
---
## 3. SBOM Ingestion
### 3.1 Standard ingest (single lockfile at repo root)
```bash
cd ~/the-custodian/state-hub
make ingest-sbom REPO=<slug> REPO_PATH=/path/to/repo
```
The script auto-detects the first recognised lockfile at `REPO_PATH`.
### 3.2 Multi-ecosystem repos (recommended for complex repos)
Use `SCAN=1` to walk the repo tree and combine **all** lockfiles into a single
snapshot. Non-dep directories (`.venv`, `node_modules`, `.git`, `dist`, etc.)
are automatically skipped.
```bash
make ingest-sbom REPO=the-custodian SCAN=1 REPO_PATH=/home/worsch/the-custodian
```
This is the correct approach for repos that contain both a backend and a
frontend (e.g., a Python API + Node/Observable dashboard).
### 3.3 Explicit lockfile path
```bash
make ingest-sbom REPO=<slug> LOCKFILE=/path/to/specific/uv.lock
```
Multiple lockfiles can be passed by calling the script directly with repeated
`--lockfile` flags:
```bash
cd ~/the-custodian/state-hub
.venv/bin/python scripts/ingest_sbom.py \
--repo <slug> \
--lockfile /path/to/uv.lock \
--lockfile /path/to/package-lock.json
```
### 3.4 Dry run (inspect without submitting)
```bash
make ingest-sbom REPO=<slug> SCAN=1 REPO_PATH=/path/to/repo
# append: add --dry-run to the command, or run the script directly:
.venv/bin/python scripts/ingest_sbom.py --repo <slug> --scan --repo-path /path/to/repo --dry-run
```
---
## 4. Snapshot Semantics
Each `POST /sbom/ingest/` call **replaces** the entire previous snapshot for
that repo. This means:
- There is always exactly one snapshot per repo (the most recent ingest)
- Re-running ingest after a dependency update is idempotent — it simply
refreshes the data
- Historical snapshots are **not** retained (v0.1 scope; versioned history is
a planned extension)
The `last_sbom_at` timestamp on the managed_repo record indicates when the
last ingest ran.
---
## 5. Direct vs Transitive Dependencies
| Source | `is_direct` | Notes |
|--------|-------------|-------|
| `package-lock.json` | Accurate — npm `indirect` flag used | Dev packages also detected via `dev` flag |
| `yarn.lock` | `false` for all (yarn.lock doesn't distinguish) | Treat output as transitive |
| `uv.lock` | `false` for all (uv.lock doesn't distinguish direct from transitive) | |
| `requirements.txt` | `true` for all (every line is a direct dep) | |
| `Cargo.lock` | `false` for all (workspace member packages not yet distinguished) | |
**Governance implication:** `is_direct=true` entries receive stricter licence
scrutiny. Copyleft risk is reported specifically for `is_direct=true AND is_dev=false`.
---
## 6. Licence Governance
### 6.1 Copyleft detection
The following SPDX identifier substrings trigger a copyleft flag:
`GPL`, `AGPL`, `LGPL`, `EUPL`, `CDDL`, `MPL`
A copyleft flag on a **direct prod dependency** (`is_direct=true`, `is_dev=false`)
increments the `licence_risk_count` in the State Hub summary and triggers a
warning on the SBOM dashboard.
### 6.2 Dual-licensed packages
Packages with SPDX expressions like `(MIT OR GPL-3.0-or-later)` are flagged
**conservatively** — the presence of a copyleft identifier in the SPDX string
is sufficient to trigger the flag, regardless of the OR clause.
**Action required:** review flagged packages. If the non-copyleft licence is
used in practice, document this decision in a `contrib/` BR or FR artifact and
note it in the repo's CLAUDE.md.
### 6.3 Unknown licences
Packages with `license_spdx = null` are those whose lockfile did not contain
licence metadata (`uv.lock`, `yarn.lock`, `Cargo.lock` do not embed licence
info). These are listed in the dashboard but do not trigger risk flags.
To resolve unknowns, consult the package's registry page (PyPI, npm, crates.io)
and either accept the unknown status or enhance the ingest script.
### 6.4 Escalation
Per the Custodian Constitution, a copyleft direct prod dep **must be reviewed**
before the next production deployment. Record the decision via:
```
register_contribution(type="br", title="Licence review: <package>", ...)
```
or directly in `contrib/bug-reports/` using the BR template.
---
## 7. Keeping Data Current
### 7.1 When to re-run ingest
Re-run `make ingest-sbom` after any of the following:
- `uv add` / `uv remove` (Python)
- `npm install` / `npm update` (Node)
- `cargo add` / `cargo update` (Rust)
- Any lockfile regeneration
### 7.2 Recommended workflow integration
Add to your repo's CLAUDE.md (or developer runbook):
> After updating dependencies, run:
> ```bash
> cd ~/the-custodian/state-hub
> make ingest-sbom REPO=<your-slug> SCAN=1 REPO_PATH=<your-repo-path>
> ```
### 7.3 Verification
After ingest:
```bash
curl -s http://127.0.0.1:8000/sbom/<your-slug>/ | python3 -m json.tool | head -30
curl -s http://127.0.0.1:8000/sbom/report/licences/ | python3 -m json.tool
```
Or visit the State Hub dashboard → SBOM → By Repo to see the updated snapshot.
---
## 8. Multi-Repo Domains
When a domain has multiple repos (e.g., `api` + `frontend` + `infra`), each
repo should be registered separately and ingested separately:
```bash
make ingest-sbom REPO=myapp-api SCAN=1 REPO_PATH=/home/worsch/myapp
make ingest-sbom REPO=myapp-frontend SCAN=1 REPO_PATH=/home/worsch/myapp-frontend
```
The SBOM dashboard aggregates across all repos within a domain in the
**By Domain** table.
---
## 9. Current Registered Repos & Status
| Repo | Domain | Ecosystems | Last Ingest |
|------|--------|------------|-------------|
| `the-custodian` | custodian | python, node | 2026-03-01 |
*(This table is informational. The live view is at the SBOM dashboard.)*
---
## 10. Planned Enhancements
- **Go / Java parsers** — add to `ingest_sbom.py`
- **Versioned snapshots** — retain history per repo for trend analysis
- **Licence override file** — allow repos to document known-acceptable
copyleft exceptions (`.sbom-overrides.yaml`)
- **CI integration** — GitHub Actions step to run ingest on lockfile change
- **Direct-dep detection for uv.lock** — parse `pyproject.toml` `[project.dependencies]`
to mark direct deps accurately