- Migration d6e7f8a9b0c1: add terraform, ansible, tool to Ecosystem enum - ingest_sbom.py: new Ansible Galaxy requirements.yml parser (collections + roles) - ingest_sbom.py: new sbom-tools.yaml manifest parser (agent-generated tool deps) - ingest_sbom.py: promote .terraform.lock.hcl parser from ecosystem=other → terraform - ingest_sbom.py: detect_all() runs all four parsers in one comprehensive scan - capture_sbom_tools.py: agent-assisted tool manifest generator (claude -p) - prompts/sbom-capture-agent.md: parameterised prompt for repo tool discovery - Makefile: capture-tools target; ingest-sbom updated docs and DRY_RUN support - 29 unit tests covering all new parsers and detect_all() behaviour - canon/standards/sbom-convention_v0.1.md: updated with four-mechanism model and workflow Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
387 lines
12 KiB
Markdown
387 lines
12 KiB
Markdown
---
|
|
id: CUST-WP-0013
|
|
type: workplan
|
|
title: "SBOM Infrastructure Expansion"
|
|
domain: custodian
|
|
repo: the-custodian
|
|
status: completed
|
|
owner: custodian
|
|
topic_slug: custodian
|
|
state_hub_workstream_id: f4ba84c8-4d47-492d-b65e-73b157271a2b
|
|
created: "2026-03-12"
|
|
updated: "2026-03-12"
|
|
---
|
|
|
|
# CUST-WP-0013 — SBOM Infrastructure Expansion
|
|
|
|
**Scope:** Extend SBOM capture beyond Python packages to cover Terraform providers,
|
|
Ansible Galaxy collections, and system-level tools (Ansible, Terraform, Helm, k3s,
|
|
cloud-init, etc.). Introduces an agent-assisted tool manifest capture workflow,
|
|
new ecosystem enum values, comprehensive auto-detection in `ingest_sbom.py`, and
|
|
delivers full SBOM coverage for `railiance-infra` and `railiance-cluster`.
|
|
|
|
**Drives:** Licence risk visibility across the full dependency graph, not just
|
|
language-level packages.
|
|
|
|
---
|
|
|
|
## Design Decisions
|
|
|
|
### Tool manifest: agent-generated, not hand-written
|
|
|
|
System tools (Ansible, Terraform, Helm, k3s, etc.) live outside any lockfile —
|
|
they are provisioned, not installed by a package manager. Rather than asking
|
|
operators to maintain a hand-written manifest, the SBOM capture agent inspects
|
|
the repo and generates/updates `sbom-tools.yaml` automatically.
|
|
|
|
The agent prompt (`state-hub/prompts/sbom-capture-agent.md`) is parameterised
|
|
per repo. It reads the repo's CLAUDE.md, Makefile, README, CI configs, version
|
|
pins, and provisioning files, then emits a structured `sbom-tools.yaml` with
|
|
tool name, version, ecosystem, SPDX licence, and directness flags.
|
|
|
|
A thin wrapper script (`state-hub/scripts/capture_sbom_tools.py`) invokes the
|
|
agent prompt via `claude -p` (or prints it for manual use) and writes the result
|
|
to `<repo-root>/sbom-tools.yaml`.
|
|
|
|
### Comprehensive ingest: all mechanisms per repo
|
|
|
|
`make ingest-sbom REPO=<slug>` must run all applicable parsers, not just
|
|
whichever lockfile happens to be auto-detected first. The updated auto-detection
|
|
in `ingest_sbom.py` scans:
|
|
|
|
1. Package manager lockfiles (`uv.lock`, `requirements.txt`, `package-lock.json`,
|
|
`yarn.lock`, `Cargo.lock`, `go.sum`)
|
|
2. Terraform provider locks (`.terraform.lock.hcl`, anywhere in the tree)
|
|
3. Ansible Galaxy manifests (`requirements.yml` / `requirements.yaml`, anywhere
|
|
in the tree under `ansible/`)
|
|
4. Agent-generated tool manifest (`sbom-tools.yaml` at repo root)
|
|
|
|
All parsers run and their results are merged into a single snapshot.
|
|
|
|
---
|
|
|
|
## Phase 1 — Schema: Ecosystem Enum Extension
|
|
|
|
**Acceptance:** `terraform` and `ansible` are valid ecosystem values; existing
|
|
`other` entries are unaffected; migration applies cleanly.
|
|
|
|
### T01 — Alembic migration: add terraform and ansible enum values
|
|
|
|
```task
|
|
id: CUST-WP-0013-T01
|
|
state_hub_task_id: c0b6edc4-86ab-4cee-88a8-6c66fb81adee
|
|
status: done
|
|
priority: high
|
|
```
|
|
|
|
Add `terraform` and `ansible` to the `Ecosystem` enum in the DB. Check whether
|
|
the column uses a native PostgreSQL ENUM type (requiring `ALTER TYPE`) or a
|
|
`String` column (requiring no migration). Write the migration accordingly.
|
|
Also add `tool` as a catch-all for tool-manifest entries that don't fit a
|
|
named ecosystem.
|
|
|
|
---
|
|
|
|
## Phase 2 — Parser Improvements in ingest_sbom.py
|
|
|
|
**Acceptance:** `--dry-run` on railiance-infra shows terraform providers and
|
|
ansible collections correctly labelled; tool manifest entries appear with the
|
|
declared ecosystem.
|
|
|
|
### T02 — Promote Terraform parser: other → terraform ecosystem
|
|
|
|
```task
|
|
id: CUST-WP-0013-T02
|
|
state_hub_task_id: 7686bccd-022c-4e30-8081-c8487eb82253
|
|
status: done
|
|
priority: high
|
|
```
|
|
|
|
The `.terraform.lock.hcl` parser already exists in `ingest_sbom.py` but stores
|
|
entries as `ecosystem="other"`. Change to `ecosystem="terraform"` after T01
|
|
migration lands. Re-ingest any repos that previously ingested terraform entries
|
|
as `other` to correct the label.
|
|
|
|
### T03 — Implement Ansible Galaxy requirements.yml parser
|
|
|
|
```task
|
|
id: CUST-WP-0013-T03
|
|
state_hub_task_id: 48658bdd-4d16-4be0-a87e-45df4f4901b0
|
|
status: done
|
|
priority: high
|
|
```
|
|
|
|
Parse `requirements.yml` / `requirements.yaml` files found in `ansible/`
|
|
subdirectories. Standard format:
|
|
|
|
```yaml
|
|
collections:
|
|
- name: community.general
|
|
version: "9.5.0"
|
|
roles:
|
|
- name: geerlingguy.docker
|
|
version: "6.x"
|
|
```
|
|
|
|
Store as `ecosystem="ansible"`, `is_direct=True`. Licence left `null` (Galaxy
|
|
API lookup is deferred). Handle both `collections:` and `roles:` blocks.
|
|
|
|
### T04 — Implement sbom-tools.yaml manifest parser
|
|
|
|
```task
|
|
id: CUST-WP-0013-T04
|
|
state_hub_task_id: 4522ea04-134b-40ee-a7a2-ea0e4c1c061d
|
|
status: done
|
|
priority: high
|
|
```
|
|
|
|
Parse `sbom-tools.yaml` at the repo root (written by the capture agent). Schema:
|
|
|
|
```yaml
|
|
# Generated by sbom-capture-agent — review before committing
|
|
tools:
|
|
- name: ansible
|
|
version: "12.3.0"
|
|
ecosystem: ansible # or: terraform, other, python, etc.
|
|
license_spdx: GPL-3.0-only
|
|
is_direct: true
|
|
is_dev: false
|
|
- name: helm
|
|
version: "3.17.x"
|
|
ecosystem: other
|
|
license_spdx: Apache-2.0
|
|
is_direct: true
|
|
is_dev: false
|
|
```
|
|
|
|
Supports all existing ecosystem values plus `tool`. Pass entries through the
|
|
same normalisation as lockfile entries. Skip entries with `version: unknown`
|
|
with a warning (agent could not determine version).
|
|
|
|
### T05 — Comprehensive auto-detection: all formats in one scan
|
|
|
|
```task
|
|
id: CUST-WP-0013-T05
|
|
state_hub_task_id: cdda6bf2-2a44-4444-a04a-ac2fe2314923
|
|
status: done
|
|
priority: high
|
|
```
|
|
|
|
Refactor the `--repo-path` scan to discover and run all applicable parsers,
|
|
not just the first match. Scan order:
|
|
|
|
1. Walk tree for all `uv.lock`, `requirements.txt`, `package-lock.json`,
|
|
`yarn.lock`, `Cargo.lock`
|
|
2. Walk tree for all `.terraform.lock.hcl`
|
|
3. Walk tree for `ansible/requirements.yml` and `ansible/requirements.yaml`
|
|
4. Check repo root for `sbom-tools.yaml`
|
|
|
|
Merge all results into a single batch for the snapshot ingest call. Log a
|
|
summary line per parser: ` <parser>: N packages from <path>`.
|
|
|
|
### T06 — Unit tests for new parsers
|
|
|
|
```task
|
|
id: CUST-WP-0013-T06
|
|
state_hub_task_id: fee37e66-8f41-4dba-995b-97fc66493caf
|
|
status: done
|
|
priority: medium
|
|
```
|
|
|
|
Add test fixtures and unit tests for:
|
|
- Ansible Galaxy requirements.yml (collections + roles, version pinned and
|
|
unpinned)
|
|
- sbom-tools.yaml (valid, missing version, unknown ecosystem)
|
|
- Multi-parser scan: repo root with uv.lock + .terraform.lock.hcl +
|
|
sbom-tools.yaml produces merged results
|
|
|
|
---
|
|
|
|
## Phase 3 — SBOM Capture Agent
|
|
|
|
**Acceptance:** `make capture-tools REPO=railiance-infra` produces a reviewed
|
|
`sbom-tools.yaml` that correctly identifies Ansible, Terraform, Helm, and other
|
|
declared tools with versions and SPDX licences.
|
|
|
|
### T07 — Write SBOM capture agent prompt
|
|
|
|
```task
|
|
id: CUST-WP-0013-T07
|
|
state_hub_task_id: a3b919b5-63b0-44f7-a048-ebfae603ef7b
|
|
status: done
|
|
priority: high
|
|
```
|
|
|
|
Write `state-hub/prompts/sbom-capture-agent.md` — a Claude agent prompt
|
|
parameterised with `{repo_slug}` and `{repo_path}`. The prompt instructs the
|
|
agent to:
|
|
|
|
1. Read `CLAUDE.md`, `Makefile`, `README.md`, `pyproject.toml`, `.tool-versions`,
|
|
CI configs, Dockerfiles, and provisioning files in `{repo_path}`
|
|
2. Identify all system-level tools: name, version (from version pins, Makefile
|
|
vars, or documented prerequisites), ecosystem, SPDX licence
|
|
3. Identify indirect/transitive tool deps (e.g. Ansible → Python; Terraform →
|
|
provider plugins already captured by `.terraform.lock.hcl`)
|
|
4. Emit a well-formed `sbom-tools.yaml` with a comment header noting generation
|
|
date and confidence level per entry (`# confidence: high/medium/low`)
|
|
5. Flag any tools where version could not be determined (`version: unknown`) for
|
|
human review
|
|
|
|
The prompt must not hallucinate versions — it must derive them from evidence in
|
|
the repo or mark them unknown.
|
|
|
|
### T08 — Implement capture_sbom_tools.py
|
|
|
|
```task
|
|
id: CUST-WP-0013-T08
|
|
state_hub_task_id: 9593dca7-e713-4d7a-b4f2-c5333ae0b3d2
|
|
status: done
|
|
priority: high
|
|
```
|
|
|
|
Write `state-hub/scripts/capture_sbom_tools.py`:
|
|
|
|
- Accepts `--repo SLUG` and `--repo-path PATH`
|
|
- Resolves repo path from slug via the state-hub API if `--repo-path` is omitted
|
|
- Loads the agent prompt from `prompts/sbom-capture-agent.md`, substitutes
|
|
`{repo_slug}` and `{repo_path}`
|
|
- Invokes `claude -p "<prompt>"` (non-interactive) and captures stdout
|
|
- Parses the YAML block from the response
|
|
- Writes or updates `<repo-path>/sbom-tools.yaml`
|
|
- Prints a diff of changes if the file already exists
|
|
- `--dry-run` flag: print the prompt and diff without writing
|
|
|
|
### T09 — Add make capture-tools target
|
|
|
|
```task
|
|
id: CUST-WP-0013-T09
|
|
state_hub_task_id: 6948e1d2-9c97-4709-bdb0-4b6ded700a22
|
|
status: done
|
|
priority: medium
|
|
```
|
|
|
|
Add to `state-hub/Makefile`:
|
|
|
|
```makefile
|
|
capture-tools: ## Run SBOM capture agent for a repo (REPO=slug, REPO_PATH=path)
|
|
uv run python scripts/capture_sbom_tools.py --repo $(REPO) $(if $(REPO_PATH),--repo-path $(REPO_PATH),)
|
|
```
|
|
|
|
Also update `make ingest-sbom` to note that `capture-tools` should be run first
|
|
for repos that have system-level tool dependencies.
|
|
|
|
---
|
|
|
|
## Phase 4 — Ingest railiance-infra
|
|
|
|
**Acceptance:** `make ingest-sbom REPO=railiance-infra` shows terraform providers,
|
|
ansible collections, and tool manifest entries in one snapshot.
|
|
|
|
### T10 — Capture tools manifest for railiance-infra
|
|
|
|
```task
|
|
id: CUST-WP-0013-T10
|
|
state_hub_task_id: 99b23998-5129-4777-9d42-7bee5981cdbb
|
|
status: done
|
|
priority: medium
|
|
```
|
|
|
|
Run `make capture-tools REPO=railiance-infra`. Review the generated
|
|
`railiance-infra/sbom-tools.yaml` — verify Ansible, Terraform, cloud-init, goss,
|
|
and any other tools with their versions and licences. Correct any `unknown`
|
|
versions by consulting the repo. Commit the file.
|
|
|
|
### T11 — Ingest railiance-infra
|
|
|
|
```task
|
|
id: CUST-WP-0013-T11
|
|
state_hub_task_id: bb516909-f903-48ce-b60b-a24245e7382e
|
|
status: done
|
|
priority: medium
|
|
```
|
|
|
|
Run `make ingest-sbom REPO=railiance-infra REPO_PATH=~/railiance-infra`. Verify
|
|
the snapshot contains:
|
|
- Terraform providers (from `.terraform.lock.hcl`)
|
|
- Ansible Galaxy collections (from `ansible/requirements.yaml`)
|
|
- System tools (from `sbom-tools.yaml`)
|
|
|
|
Check the licence report for any copyleft or BSL flags.
|
|
|
|
---
|
|
|
|
## Phase 5 — Ingest railiance-cluster
|
|
|
|
**Acceptance:** railiance-cluster SBOM covers both Python packages (uv.lock) and
|
|
system tools in a single snapshot.
|
|
|
|
### T12 — Capture tools manifest for railiance-cluster
|
|
|
|
```task
|
|
id: CUST-WP-0013-T12
|
|
state_hub_task_id: 7a890f1a-da9f-4e6d-86a7-4fd1aefd5b3f
|
|
status: done
|
|
priority: medium
|
|
```
|
|
|
|
Run `make capture-tools REPO=railiance-cluster`. Review the generated
|
|
`railiance-cluster/sbom-tools.yaml` — verify Helm, kubectl, k3s, and any other
|
|
operational tools. Commit the file.
|
|
|
|
### T13 — Re-ingest railiance-cluster
|
|
|
|
```task
|
|
id: CUST-WP-0013-T13
|
|
state_hub_task_id: 789dbe93-011a-4470-9fec-ebf249cd7134
|
|
status: done
|
|
priority: medium
|
|
```
|
|
|
|
Run `make ingest-sbom REPO=railiance-cluster REPO_PATH=~/railiance-cluster`.
|
|
Verify the snapshot merges uv.lock (Python packages including ansible-core) and
|
|
sbom-tools.yaml entries into one coherent snapshot. Confirm ansible-core GPL-3.0
|
|
flag appears in the licence report.
|
|
|
|
---
|
|
|
|
## Phase 6 — Convention Documentation
|
|
|
|
**Acceptance:** A developer reading the SBOM convention doc knows exactly how to
|
|
add a new repo to SBOM coverage.
|
|
|
|
### T14 — Document SBOM capture convention in canon/standards
|
|
|
|
```task
|
|
id: CUST-WP-0013-T14
|
|
state_hub_task_id: dc3bb2a3-882e-4dd7-ab7c-8b1e88279a7d
|
|
status: done
|
|
priority: low
|
|
```
|
|
|
|
Write `canon/standards/sbom-convention_v0.1.md` documenting:
|
|
- The four capture mechanisms and when each applies
|
|
- The `sbom-tools.yaml` schema (with confidence annotation convention)
|
|
- The `make capture-tools` → review → commit → `make ingest-sbom` workflow
|
|
- Licence risk thresholds: copyleft = flag for review; BSL = flag for review;
|
|
null licence = acceptable for infra tools if well-known open source
|
|
|
|
---
|
|
|
|
## Licence Risk Preview
|
|
|
|
Based on known tool licences, expect these flags once ingested:
|
|
|
|
| Tool / Package | Licence | Risk level |
|
|
|---|---|---|
|
|
| ansible-core | GPL-3.0-only | Copyleft — flag (ops toolchain, not shipped) |
|
|
| terraform ≥ 1.5.6 | BSL-1.1 | Non-OSI — flag for review |
|
|
| hashicorp providers | BSL-1.1 | Same |
|
|
| community.general | GPL-3.0 | Copyleft — flag (ops toolchain) |
|
|
| Helm | Apache-2.0 | Clean |
|
|
| k3s | Apache-2.0 | Clean |
|
|
| cloud-init | Apache-2.0 / GPL-3.0 | Mixed — check version |
|
|
| goss | Apache-2.0 | Clean |
|
|
|
|
All copyleft/BSL entries here are **operational toolchain** dependencies, not
|
|
shipped code — risk is low but worth tracking for compliance awareness.
|