- Migration d6e7f8a9b0c1: add terraform, ansible, tool to Ecosystem enum - ingest_sbom.py: new Ansible Galaxy requirements.yml parser (collections + roles) - ingest_sbom.py: new sbom-tools.yaml manifest parser (agent-generated tool deps) - ingest_sbom.py: promote .terraform.lock.hcl parser from ecosystem=other → terraform - ingest_sbom.py: detect_all() runs all four parsers in one comprehensive scan - capture_sbom_tools.py: agent-assisted tool manifest generator (claude -p) - prompts/sbom-capture-agent.md: parameterised prompt for repo tool discovery - Makefile: capture-tools target; ingest-sbom updated docs and DRY_RUN support - 29 unit tests covering all new parsers and detect_all() behaviour - canon/standards/sbom-convention_v0.1.md: updated with four-mechanism model and workflow Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
301 lines
9.6 KiB
Markdown
301 lines
9.6 KiB
Markdown
---
|
|
id: SBOM-CONV-001
|
|
type: standard
|
|
title: "SBOM Convention v0.1 — Dependency Tracking & Licence Governance"
|
|
domain: custodian
|
|
status: active
|
|
version: "0.1"
|
|
created: "2026-03-01"
|
|
updated: "2026-03-12"
|
|
---
|
|
|
|
# SBOM Convention v0.1 — Dependency Tracking & Licence Governance
|
|
|
|
## Purpose
|
|
|
|
This convention defines how every Custodian-registered project captures,
|
|
stores, and reports its software supply-chain inventory to the State Hub SBOM
|
|
store. It establishes:
|
|
|
|
- Which lockfiles are authoritative per ecosystem
|
|
- How to run SBOM ingestion (single-ecosystem and multi-ecosystem repos)
|
|
- How to keep the data current
|
|
- Licence governance rules and escalation thresholds
|
|
|
|
The State Hub SBOM store aggregates across all registered repos. The
|
|
dashboard (`/sbom`) provides domain-level and repo-level drill-down.
|
|
|
|
---
|
|
|
|
## 1. Capture Mechanisms
|
|
|
|
`ingest_sbom.py` runs all four mechanisms in a single scan when given `--repo-path`.
|
|
No flags needed — comprehensive detection is the default.
|
|
|
|
| Mechanism | File(s) | Ecosystem | Detection scope |
|
|
|-----------|---------|-----------|-----------------|
|
|
| **Package manager lockfiles** | `uv.lock`, `requirements.txt`, `package-lock.json`, `yarn.lock`, `Cargo.lock` | `python`, `node`, `rust` | Anywhere in tree |
|
|
| **Terraform provider lock** | `.terraform.lock.hcl` | `terraform` | Anywhere in tree |
|
|
| **Ansible Galaxy manifest** | `ansible/requirements.yml` or `.yaml` | `ansible` | Under directories named `ansible/` |
|
|
| **Tool manifest** | `sbom-tools.yaml` (repo root) | `tool`, `ansible`, `terraform`, etc. | Repo root only |
|
|
|
|
**Go / Java parsers** (`go.sum`, `pom.xml`, `gradle.lockfile`) are *not yet
|
|
implemented* — planned for a future workplan.
|
|
|
|
**Principle:** commit lockfiles and `sbom-tools.yaml` to the repo. These are
|
|
the SBOM source of truth; do not generate them at ingest time.
|
|
|
|
---
|
|
|
|
## 2. Repo Registration Prerequisite
|
|
|
|
Before SBOM data can be reported, the repo must be registered in the State Hub:
|
|
|
|
```bash
|
|
cd ~/the-custodian/state-hub
|
|
make add-repo DOMAIN=<domain-slug> SLUG=<repo-slug> NAME="<Display Name>" PATH=/absolute/path/to/repo
|
|
```
|
|
|
|
Check registered repos:
|
|
```bash
|
|
make list-repos
|
|
# or
|
|
curl -s http://127.0.0.1:8000/repos/ | python3 -m json.tool
|
|
```
|
|
|
|
---
|
|
|
|
## 3. SBOM Ingestion
|
|
|
|
### 3.1 Standard ingest (all mechanisms, recommended)
|
|
|
|
```bash
|
|
cd ~/the-custodian/state-hub
|
|
make ingest-sbom REPO=<slug> REPO_PATH=/path/to/repo
|
|
```
|
|
|
|
`ingest_sbom.py` automatically runs all four mechanisms in one scan — lockfiles,
|
|
Terraform provider locks, Ansible Galaxy manifests, and `sbom-tools.yaml`. All
|
|
results are merged into a single snapshot. Non-dep directories (`.venv`,
|
|
`node_modules`, `.git`, `dist`, etc.) are automatically skipped.
|
|
|
|
### 3.2 Repos with system-level tools: capture first, then ingest
|
|
|
|
For repos that use system-level tools not tracked by any lockfile (Terraform
|
|
binary, Helm, kubectl, k3s, goss, etc.):
|
|
|
|
```bash
|
|
# Step 1: generate sbom-tools.yaml via agent
|
|
make capture-tools REPO=<slug> REPO_PATH=/path/to/repo
|
|
|
|
# Step 2: review sbom-tools.yaml — correct any confidence: low entries
|
|
|
|
# Step 3: commit sbom-tools.yaml
|
|
git -C /path/to/repo add sbom-tools.yaml && git -C /path/to/repo commit -m "chore(sbom): add tool manifest"
|
|
|
|
# Step 4: ingest everything
|
|
make ingest-sbom REPO=<slug> REPO_PATH=/path/to/repo
|
|
```
|
|
|
|
### 3.3 Explicit lockfile path
|
|
|
|
```bash
|
|
make ingest-sbom REPO=<slug> LOCKFILE=/path/to/specific/uv.lock
|
|
```
|
|
|
|
Multiple lockfiles can be passed by calling the script directly with repeated
|
|
`--lockfile` flags:
|
|
|
|
```bash
|
|
uv run python scripts/ingest_sbom.py \
|
|
--repo <slug> \
|
|
--lockfile /path/to/uv.lock \
|
|
--lockfile /path/to/package-lock.json
|
|
```
|
|
|
|
### 3.4 Dry run (inspect without submitting)
|
|
|
|
```bash
|
|
make ingest-sbom REPO=<slug> REPO_PATH=/path/to/repo DRY_RUN=1
|
|
```
|
|
|
|
### 3.5 sbom-tools.yaml: the tool manifest
|
|
|
|
Create `sbom-tools.yaml` at the repo root for any system-level tools not
|
|
covered by lockfiles. Schema:
|
|
|
|
```yaml
|
|
# sbom-tools.yaml
|
|
tools:
|
|
- name: terraform
|
|
version: "1.9.5" # confidence: medium
|
|
ecosystem: terraform
|
|
license_spdx: BSL-1.1
|
|
is_direct: true
|
|
is_dev: false
|
|
- name: helm
|
|
version: null # confidence: low (no version pin found)
|
|
ecosystem: tool
|
|
license_spdx: Apache-2.0
|
|
is_direct: true
|
|
is_dev: false
|
|
```
|
|
|
|
**Valid ecosystem values:** `python`, `node`, `rust`, `go`, `java`, `terraform`,
|
|
`ansible`, `tool`, `other`
|
|
|
|
Annotate each version with a `# confidence: high/medium/low` comment.
|
|
Entries with `confidence: low` need human verification before committing.
|
|
|
|
The `make capture-tools` command generates this file automatically using the
|
|
SBOM capture agent prompt (`state-hub/prompts/sbom-capture-agent.md`).
|
|
|
|
---
|
|
|
|
## 4. Snapshot Semantics
|
|
|
|
Each `POST /sbom/ingest/` call **replaces** the entire previous snapshot for
|
|
that repo. This means:
|
|
|
|
- There is always exactly one snapshot per repo (the most recent ingest)
|
|
- Re-running ingest after a dependency update is idempotent — it simply
|
|
refreshes the data
|
|
- Historical snapshots are **not** retained (v0.1 scope; versioned history is
|
|
a planned extension)
|
|
|
|
The `last_sbom_at` timestamp on the managed_repo record indicates when the
|
|
last ingest ran.
|
|
|
|
---
|
|
|
|
## 5. Direct vs Transitive Dependencies
|
|
|
|
| Source | `is_direct` | Notes |
|
|
|--------|-------------|-------|
|
|
| `package-lock.json` | Accurate — npm `indirect` flag used | Dev packages also detected via `dev` flag |
|
|
| `yarn.lock` | `false` for all (yarn.lock doesn't distinguish) | Treat output as transitive |
|
|
| `uv.lock` | `false` for all (uv.lock doesn't distinguish direct from transitive) | |
|
|
| `requirements.txt` | `true` for all (every line is a direct dep) | |
|
|
| `Cargo.lock` | `false` for all (workspace member packages not yet distinguished) | |
|
|
|
|
**Governance implication:** `is_direct=true` entries receive stricter licence
|
|
scrutiny. Copyleft risk is reported specifically for `is_direct=true AND is_dev=false`.
|
|
|
|
---
|
|
|
|
## 6. Licence Governance
|
|
|
|
### 6.1 Copyleft detection
|
|
|
|
The following SPDX identifier substrings trigger a copyleft flag:
|
|
`GPL`, `AGPL`, `LGPL`, `EUPL`, `CDDL`, `MPL`
|
|
|
|
A copyleft flag on a **direct prod dependency** (`is_direct=true`, `is_dev=false`)
|
|
increments the `licence_risk_count` in the State Hub summary and triggers a
|
|
warning on the SBOM dashboard.
|
|
|
|
### 6.2 Dual-licensed packages
|
|
|
|
Packages with SPDX expressions like `(MIT OR GPL-3.0-or-later)` are flagged
|
|
**conservatively** — the presence of a copyleft identifier in the SPDX string
|
|
is sufficient to trigger the flag, regardless of the OR clause.
|
|
|
|
**Action required:** review flagged packages. If the non-copyleft licence is
|
|
used in practice, document this decision in a `contrib/` BR or FR artifact and
|
|
note it in the repo's CLAUDE.md.
|
|
|
|
### 6.3 Unknown licences
|
|
|
|
Packages with `license_spdx = null` are those whose lockfile did not contain
|
|
licence metadata (`uv.lock`, `yarn.lock`, `Cargo.lock` do not embed licence
|
|
info). These are listed in the dashboard but do not trigger risk flags.
|
|
|
|
To resolve unknowns, consult the package's registry page (PyPI, npm, crates.io)
|
|
and either accept the unknown status or enhance the ingest script.
|
|
|
|
### 6.4 Escalation
|
|
|
|
Per the Custodian Constitution, a copyleft direct prod dep **must be reviewed**
|
|
before the next production deployment. Record the decision via:
|
|
|
|
```
|
|
register_contribution(type="br", title="Licence review: <package>", ...)
|
|
```
|
|
|
|
or directly in `contrib/bug-reports/` using the BR template.
|
|
|
|
---
|
|
|
|
## 7. Keeping Data Current
|
|
|
|
### 7.1 When to re-run ingest
|
|
|
|
Re-run `make ingest-sbom` after any of the following:
|
|
- `uv add` / `uv remove` (Python)
|
|
- `npm install` / `npm update` (Node)
|
|
- `cargo add` / `cargo update` (Rust)
|
|
- Any lockfile regeneration
|
|
|
|
### 7.2 Recommended workflow integration
|
|
|
|
Add to your repo's CLAUDE.md (or developer runbook):
|
|
|
|
> After updating dependencies, run:
|
|
> ```bash
|
|
> cd ~/the-custodian/state-hub
|
|
> make ingest-sbom REPO=<your-slug> SCAN=1 REPO_PATH=<your-repo-path>
|
|
> ```
|
|
|
|
### 7.3 Verification
|
|
|
|
After ingest:
|
|
```bash
|
|
curl -s http://127.0.0.1:8000/sbom/<your-slug>/ | python3 -m json.tool | head -30
|
|
curl -s http://127.0.0.1:8000/sbom/report/licences/ | python3 -m json.tool
|
|
```
|
|
|
|
Or visit the State Hub dashboard → SBOM → By Repo to see the updated snapshot.
|
|
|
|
---
|
|
|
|
## 8. Multi-Repo Domains
|
|
|
|
When a domain has multiple repos (e.g., `api` + `frontend` + `infra`), each
|
|
repo should be registered separately and ingested separately:
|
|
|
|
```bash
|
|
make ingest-sbom REPO=myapp-api SCAN=1 REPO_PATH=/home/worsch/myapp
|
|
make ingest-sbom REPO=myapp-frontend SCAN=1 REPO_PATH=/home/worsch/myapp-frontend
|
|
```
|
|
|
|
The SBOM dashboard aggregates across all repos within a domain in the
|
|
**By Domain** table.
|
|
|
|
---
|
|
|
|
## 9. Current Registered Repos & Status
|
|
|
|
| Repo | Domain | Ecosystems | Last Ingest |
|
|
|------|--------|------------|-------------|
|
|
| `the-custodian` | custodian | python, node | 2026-03-01 |
|
|
| `railiance-bootstrap` | railiance | — (Ansible + shell, no lockfile) | — |
|
|
| `railiance-hosts` | railiance | terraform (2 providers) | 2026-03-01 |
|
|
|
|
*(This table is informational. The live view is at the SBOM dashboard.)*
|
|
|
|
---
|
|
|
|
## 10. Planned Enhancements
|
|
|
|
- **Go / Java parsers** — add `go.sum`, `pom.xml`, `gradle.lockfile` support to `ingest_sbom.py`
|
|
- **Versioned snapshots** — retain history per repo for trend analysis
|
|
- **Licence override file** — allow repos to document known-acceptable
|
|
copyleft exceptions (`.sbom-overrides.yaml`)
|
|
- **CI integration** — GitHub Actions step to run ingest on lockfile change
|
|
- **Direct-dep detection for uv.lock** — parse `pyproject.toml` `[project.dependencies]`
|
|
to mark direct deps accurately
|
|
- **Galaxy API licence lookup** — resolve `license_spdx` for Ansible collections
|
|
via the Galaxy API at ingest time
|
|
- **Tool version pinning guidance** — tooling to detect `confidence: low` entries
|
|
across all registered repos and flag them for resolution
|