Files
the-custodian/canon/standards/sbom-convention_v0.1.md
tegwick 1c94f5545c feat(sbom): CUST-WP-0013 — expand SBOM infra to terraform, ansible, and tool manifests
- Migration d6e7f8a9b0c1: add terraform, ansible, tool to Ecosystem enum
- ingest_sbom.py: new Ansible Galaxy requirements.yml parser (collections + roles)
- ingest_sbom.py: new sbom-tools.yaml manifest parser (agent-generated tool deps)
- ingest_sbom.py: promote .terraform.lock.hcl parser from ecosystem=other → terraform
- ingest_sbom.py: detect_all() runs all four parsers in one comprehensive scan
- capture_sbom_tools.py: agent-assisted tool manifest generator (claude -p)
- prompts/sbom-capture-agent.md: parameterised prompt for repo tool discovery
- Makefile: capture-tools target; ingest-sbom updated docs and DRY_RUN support
- 29 unit tests covering all new parsers and detect_all() behaviour
- canon/standards/sbom-convention_v0.1.md: updated with four-mechanism model and workflow

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 04:40:26 +01:00

9.6 KiB

id, type, title, domain, status, version, created, updated
id type title domain status version created updated
SBOM-CONV-001 standard SBOM Convention v0.1 — Dependency Tracking & Licence Governance custodian active 0.1 2026-03-01 2026-03-12

SBOM Convention v0.1 — Dependency Tracking & Licence Governance

Purpose

This convention defines how every Custodian-registered project captures, stores, and reports its software supply-chain inventory to the State Hub SBOM store. It establishes:

  • Which lockfiles are authoritative per ecosystem
  • How to run SBOM ingestion (single-ecosystem and multi-ecosystem repos)
  • How to keep the data current
  • Licence governance rules and escalation thresholds

The State Hub SBOM store aggregates across all registered repos. The dashboard (/sbom) provides domain-level and repo-level drill-down.


1. Capture Mechanisms

ingest_sbom.py runs all four mechanisms in a single scan when given --repo-path. No flags needed — comprehensive detection is the default.

Mechanism File(s) Ecosystem Detection scope
Package manager lockfiles uv.lock, requirements.txt, package-lock.json, yarn.lock, Cargo.lock python, node, rust Anywhere in tree
Terraform provider lock .terraform.lock.hcl terraform Anywhere in tree
Ansible Galaxy manifest ansible/requirements.yml or .yaml ansible Under directories named ansible/
Tool manifest sbom-tools.yaml (repo root) tool, ansible, terraform, etc. Repo root only

Go / Java parsers (go.sum, pom.xml, gradle.lockfile) are not yet implemented — planned for a future workplan.

Principle: commit lockfiles and sbom-tools.yaml to the repo. These are the SBOM source of truth; do not generate them at ingest time.


2. Repo Registration Prerequisite

Before SBOM data can be reported, the repo must be registered in the State Hub:

cd ~/the-custodian/state-hub
make add-repo DOMAIN=<domain-slug> SLUG=<repo-slug> NAME="<Display Name>" PATH=/absolute/path/to/repo

Check registered repos:

make list-repos
# or
curl -s http://127.0.0.1:8000/repos/ | python3 -m json.tool

3. SBOM Ingestion

cd ~/the-custodian/state-hub
make ingest-sbom REPO=<slug> REPO_PATH=/path/to/repo

ingest_sbom.py automatically runs all four mechanisms in one scan — lockfiles, Terraform provider locks, Ansible Galaxy manifests, and sbom-tools.yaml. All results are merged into a single snapshot. Non-dep directories (.venv, node_modules, .git, dist, etc.) are automatically skipped.

3.2 Repos with system-level tools: capture first, then ingest

For repos that use system-level tools not tracked by any lockfile (Terraform binary, Helm, kubectl, k3s, goss, etc.):

# Step 1: generate sbom-tools.yaml via agent
make capture-tools REPO=<slug> REPO_PATH=/path/to/repo

# Step 2: review sbom-tools.yaml — correct any confidence: low entries

# Step 3: commit sbom-tools.yaml
git -C /path/to/repo add sbom-tools.yaml && git -C /path/to/repo commit -m "chore(sbom): add tool manifest"

# Step 4: ingest everything
make ingest-sbom REPO=<slug> REPO_PATH=/path/to/repo

3.3 Explicit lockfile path

make ingest-sbom REPO=<slug> LOCKFILE=/path/to/specific/uv.lock

Multiple lockfiles can be passed by calling the script directly with repeated --lockfile flags:

uv run python scripts/ingest_sbom.py \
  --repo <slug> \
  --lockfile /path/to/uv.lock \
  --lockfile /path/to/package-lock.json

3.4 Dry run (inspect without submitting)

make ingest-sbom REPO=<slug> REPO_PATH=/path/to/repo DRY_RUN=1

3.5 sbom-tools.yaml: the tool manifest

Create sbom-tools.yaml at the repo root for any system-level tools not covered by lockfiles. Schema:

# sbom-tools.yaml
tools:
  - name: terraform
    version: "1.9.5"         # confidence: medium
    ecosystem: terraform
    license_spdx: BSL-1.1
    is_direct: true
    is_dev: false
  - name: helm
    version: null             # confidence: low (no version pin found)
    ecosystem: tool
    license_spdx: Apache-2.0
    is_direct: true
    is_dev: false

Valid ecosystem values: python, node, rust, go, java, terraform, ansible, tool, other

Annotate each version with a # confidence: high/medium/low comment. Entries with confidence: low need human verification before committing.

The make capture-tools command generates this file automatically using the SBOM capture agent prompt (state-hub/prompts/sbom-capture-agent.md).


4. Snapshot Semantics

Each POST /sbom/ingest/ call replaces the entire previous snapshot for that repo. This means:

  • There is always exactly one snapshot per repo (the most recent ingest)
  • Re-running ingest after a dependency update is idempotent — it simply refreshes the data
  • Historical snapshots are not retained (v0.1 scope; versioned history is a planned extension)

The last_sbom_at timestamp on the managed_repo record indicates when the last ingest ran.


5. Direct vs Transitive Dependencies

Source is_direct Notes
package-lock.json Accurate — npm indirect flag used Dev packages also detected via dev flag
yarn.lock false for all (yarn.lock doesn't distinguish) Treat output as transitive
uv.lock false for all (uv.lock doesn't distinguish direct from transitive)
requirements.txt true for all (every line is a direct dep)
Cargo.lock false for all (workspace member packages not yet distinguished)

Governance implication: is_direct=true entries receive stricter licence scrutiny. Copyleft risk is reported specifically for is_direct=true AND is_dev=false.


6. Licence Governance

6.1 Copyleft detection

The following SPDX identifier substrings trigger a copyleft flag: GPL, AGPL, LGPL, EUPL, CDDL, MPL

A copyleft flag on a direct prod dependency (is_direct=true, is_dev=false) increments the licence_risk_count in the State Hub summary and triggers a warning on the SBOM dashboard.

6.2 Dual-licensed packages

Packages with SPDX expressions like (MIT OR GPL-3.0-or-later) are flagged conservatively — the presence of a copyleft identifier in the SPDX string is sufficient to trigger the flag, regardless of the OR clause.

Action required: review flagged packages. If the non-copyleft licence is used in practice, document this decision in a contrib/ BR or FR artifact and note it in the repo's CLAUDE.md.

6.3 Unknown licences

Packages with license_spdx = null are those whose lockfile did not contain licence metadata (uv.lock, yarn.lock, Cargo.lock do not embed licence info). These are listed in the dashboard but do not trigger risk flags.

To resolve unknowns, consult the package's registry page (PyPI, npm, crates.io) and either accept the unknown status or enhance the ingest script.

6.4 Escalation

Per the Custodian Constitution, a copyleft direct prod dep must be reviewed before the next production deployment. Record the decision via:

register_contribution(type="br", title="Licence review: <package>", ...)

or directly in contrib/bug-reports/ using the BR template.


7. Keeping Data Current

7.1 When to re-run ingest

Re-run make ingest-sbom after any of the following:

  • uv add / uv remove (Python)
  • npm install / npm update (Node)
  • cargo add / cargo update (Rust)
  • Any lockfile regeneration

Add to your repo's CLAUDE.md (or developer runbook):

After updating dependencies, run:

cd ~/the-custodian/state-hub
make ingest-sbom REPO=<your-slug> SCAN=1 REPO_PATH=<your-repo-path>

7.3 Verification

After ingest:

curl -s http://127.0.0.1:8000/sbom/<your-slug>/ | python3 -m json.tool | head -30
curl -s http://127.0.0.1:8000/sbom/report/licences/ | python3 -m json.tool

Or visit the State Hub dashboard → SBOM → By Repo to see the updated snapshot.


8. Multi-Repo Domains

When a domain has multiple repos (e.g., api + frontend + infra), each repo should be registered separately and ingested separately:

make ingest-sbom REPO=myapp-api      SCAN=1 REPO_PATH=/home/worsch/myapp
make ingest-sbom REPO=myapp-frontend SCAN=1 REPO_PATH=/home/worsch/myapp-frontend

The SBOM dashboard aggregates across all repos within a domain in the By Domain table.


9. Current Registered Repos & Status

Repo Domain Ecosystems Last Ingest
the-custodian custodian python, node 2026-03-01
railiance-bootstrap railiance — (Ansible + shell, no lockfile)
railiance-hosts railiance terraform (2 providers) 2026-03-01

(This table is informational. The live view is at the SBOM dashboard.)


10. Planned Enhancements

  • Go / Java parsers — add go.sum, pom.xml, gradle.lockfile support to ingest_sbom.py
  • Versioned snapshots — retain history per repo for trend analysis
  • Licence override file — allow repos to document known-acceptable copyleft exceptions (.sbom-overrides.yaml)
  • CI integration — GitHub Actions step to run ingest on lockfile change
  • Direct-dep detection for uv.lock — parse pyproject.toml [project.dependencies] to mark direct deps accurately
  • Galaxy API licence lookup — resolve license_spdx for Ansible collections via the Galaxy API at ingest time
  • Tool version pinning guidance — tooling to detect confidence: low entries across all registered repos and flag them for resolution