Files
the-custodian/workplans/CUST-WP-0013-sbom-infra-expansion.md
tegwick 1c94f5545c feat(sbom): CUST-WP-0013 — expand SBOM infra to terraform, ansible, and tool manifests
- Migration d6e7f8a9b0c1: add terraform, ansible, tool to Ecosystem enum
- ingest_sbom.py: new Ansible Galaxy requirements.yml parser (collections + roles)
- ingest_sbom.py: new sbom-tools.yaml manifest parser (agent-generated tool deps)
- ingest_sbom.py: promote .terraform.lock.hcl parser from ecosystem=other → terraform
- ingest_sbom.py: detect_all() runs all four parsers in one comprehensive scan
- capture_sbom_tools.py: agent-assisted tool manifest generator (claude -p)
- prompts/sbom-capture-agent.md: parameterised prompt for repo tool discovery
- Makefile: capture-tools target; ingest-sbom updated docs and DRY_RUN support
- 29 unit tests covering all new parsers and detect_all() behaviour
- canon/standards/sbom-convention_v0.1.md: updated with four-mechanism model and workflow

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 04:40:26 +01:00

387 lines
12 KiB
Markdown

---
id: CUST-WP-0013
type: workplan
title: "SBOM Infrastructure Expansion"
domain: custodian
repo: the-custodian
status: completed
owner: custodian
topic_slug: custodian
state_hub_workstream_id: f4ba84c8-4d47-492d-b65e-73b157271a2b
created: "2026-03-12"
updated: "2026-03-12"
---
# CUST-WP-0013 — SBOM Infrastructure Expansion
**Scope:** Extend SBOM capture beyond Python packages to cover Terraform providers,
Ansible Galaxy collections, and system-level tools (Ansible, Terraform, Helm, k3s,
cloud-init, etc.). Introduces an agent-assisted tool manifest capture workflow,
new ecosystem enum values, comprehensive auto-detection in `ingest_sbom.py`, and
delivers full SBOM coverage for `railiance-infra` and `railiance-cluster`.
**Drives:** Licence risk visibility across the full dependency graph, not just
language-level packages.
---
## Design Decisions
### Tool manifest: agent-generated, not hand-written
System tools (Ansible, Terraform, Helm, k3s, etc.) live outside any lockfile —
they are provisioned, not installed by a package manager. Rather than asking
operators to maintain a hand-written manifest, the SBOM capture agent inspects
the repo and generates/updates `sbom-tools.yaml` automatically.
The agent prompt (`state-hub/prompts/sbom-capture-agent.md`) is parameterised
per repo. It reads the repo's CLAUDE.md, Makefile, README, CI configs, version
pins, and provisioning files, then emits a structured `sbom-tools.yaml` with
tool name, version, ecosystem, SPDX licence, and directness flags.
A thin wrapper script (`state-hub/scripts/capture_sbom_tools.py`) invokes the
agent prompt via `claude -p` (or prints it for manual use) and writes the result
to `<repo-root>/sbom-tools.yaml`.
### Comprehensive ingest: all mechanisms per repo
`make ingest-sbom REPO=<slug>` must run all applicable parsers, not just
whichever lockfile happens to be auto-detected first. The updated auto-detection
in `ingest_sbom.py` scans:
1. Package manager lockfiles (`uv.lock`, `requirements.txt`, `package-lock.json`,
`yarn.lock`, `Cargo.lock`, `go.sum`)
2. Terraform provider locks (`.terraform.lock.hcl`, anywhere in the tree)
3. Ansible Galaxy manifests (`requirements.yml` / `requirements.yaml`, anywhere
in the tree under `ansible/`)
4. Agent-generated tool manifest (`sbom-tools.yaml` at repo root)
All parsers run and their results are merged into a single snapshot.
---
## Phase 1 — Schema: Ecosystem Enum Extension
**Acceptance:** `terraform` and `ansible` are valid ecosystem values; existing
`other` entries are unaffected; migration applies cleanly.
### T01 — Alembic migration: add terraform and ansible enum values
```task
id: CUST-WP-0013-T01
state_hub_task_id: c0b6edc4-86ab-4cee-88a8-6c66fb81adee
status: done
priority: high
```
Add `terraform` and `ansible` to the `Ecosystem` enum in the DB. Check whether
the column uses a native PostgreSQL ENUM type (requiring `ALTER TYPE`) or a
`String` column (requiring no migration). Write the migration accordingly.
Also add `tool` as a catch-all for tool-manifest entries that don't fit a
named ecosystem.
---
## Phase 2 — Parser Improvements in ingest_sbom.py
**Acceptance:** `--dry-run` on railiance-infra shows terraform providers and
ansible collections correctly labelled; tool manifest entries appear with the
declared ecosystem.
### T02 — Promote Terraform parser: other → terraform ecosystem
```task
id: CUST-WP-0013-T02
state_hub_task_id: 7686bccd-022c-4e30-8081-c8487eb82253
status: done
priority: high
```
The `.terraform.lock.hcl` parser already exists in `ingest_sbom.py` but stores
entries as `ecosystem="other"`. Change to `ecosystem="terraform"` after T01
migration lands. Re-ingest any repos that previously ingested terraform entries
as `other` to correct the label.
### T03 — Implement Ansible Galaxy requirements.yml parser
```task
id: CUST-WP-0013-T03
state_hub_task_id: 48658bdd-4d16-4be0-a87e-45df4f4901b0
status: done
priority: high
```
Parse `requirements.yml` / `requirements.yaml` files found in `ansible/`
subdirectories. Standard format:
```yaml
collections:
- name: community.general
version: "9.5.0"
roles:
- name: geerlingguy.docker
version: "6.x"
```
Store as `ecosystem="ansible"`, `is_direct=True`. Licence left `null` (Galaxy
API lookup is deferred). Handle both `collections:` and `roles:` blocks.
### T04 — Implement sbom-tools.yaml manifest parser
```task
id: CUST-WP-0013-T04
state_hub_task_id: 4522ea04-134b-40ee-a7a2-ea0e4c1c061d
status: done
priority: high
```
Parse `sbom-tools.yaml` at the repo root (written by the capture agent). Schema:
```yaml
# Generated by sbom-capture-agent — review before committing
tools:
- name: ansible
version: "12.3.0"
ecosystem: ansible # or: terraform, other, python, etc.
license_spdx: GPL-3.0-only
is_direct: true
is_dev: false
- name: helm
version: "3.17.x"
ecosystem: other
license_spdx: Apache-2.0
is_direct: true
is_dev: false
```
Supports all existing ecosystem values plus `tool`. Pass entries through the
same normalisation as lockfile entries. Skip entries with `version: unknown`
with a warning (agent could not determine version).
### T05 — Comprehensive auto-detection: all formats in one scan
```task
id: CUST-WP-0013-T05
state_hub_task_id: cdda6bf2-2a44-4444-a04a-ac2fe2314923
status: done
priority: high
```
Refactor the `--repo-path` scan to discover and run all applicable parsers,
not just the first match. Scan order:
1. Walk tree for all `uv.lock`, `requirements.txt`, `package-lock.json`,
`yarn.lock`, `Cargo.lock`
2. Walk tree for all `.terraform.lock.hcl`
3. Walk tree for `ansible/requirements.yml` and `ansible/requirements.yaml`
4. Check repo root for `sbom-tools.yaml`
Merge all results into a single batch for the snapshot ingest call. Log a
summary line per parser: ` <parser>: N packages from <path>`.
### T06 — Unit tests for new parsers
```task
id: CUST-WP-0013-T06
state_hub_task_id: fee37e66-8f41-4dba-995b-97fc66493caf
status: done
priority: medium
```
Add test fixtures and unit tests for:
- Ansible Galaxy requirements.yml (collections + roles, version pinned and
unpinned)
- sbom-tools.yaml (valid, missing version, unknown ecosystem)
- Multi-parser scan: repo root with uv.lock + .terraform.lock.hcl +
sbom-tools.yaml produces merged results
---
## Phase 3 — SBOM Capture Agent
**Acceptance:** `make capture-tools REPO=railiance-infra` produces a reviewed
`sbom-tools.yaml` that correctly identifies Ansible, Terraform, Helm, and other
declared tools with versions and SPDX licences.
### T07 — Write SBOM capture agent prompt
```task
id: CUST-WP-0013-T07
state_hub_task_id: a3b919b5-63b0-44f7-a048-ebfae603ef7b
status: done
priority: high
```
Write `state-hub/prompts/sbom-capture-agent.md` — a Claude agent prompt
parameterised with `{repo_slug}` and `{repo_path}`. The prompt instructs the
agent to:
1. Read `CLAUDE.md`, `Makefile`, `README.md`, `pyproject.toml`, `.tool-versions`,
CI configs, Dockerfiles, and provisioning files in `{repo_path}`
2. Identify all system-level tools: name, version (from version pins, Makefile
vars, or documented prerequisites), ecosystem, SPDX licence
3. Identify indirect/transitive tool deps (e.g. Ansible → Python; Terraform →
provider plugins already captured by `.terraform.lock.hcl`)
4. Emit a well-formed `sbom-tools.yaml` with a comment header noting generation
date and confidence level per entry (`# confidence: high/medium/low`)
5. Flag any tools where version could not be determined (`version: unknown`) for
human review
The prompt must not hallucinate versions — it must derive them from evidence in
the repo or mark them unknown.
### T08 — Implement capture_sbom_tools.py
```task
id: CUST-WP-0013-T08
state_hub_task_id: 9593dca7-e713-4d7a-b4f2-c5333ae0b3d2
status: done
priority: high
```
Write `state-hub/scripts/capture_sbom_tools.py`:
- Accepts `--repo SLUG` and `--repo-path PATH`
- Resolves repo path from slug via the state-hub API if `--repo-path` is omitted
- Loads the agent prompt from `prompts/sbom-capture-agent.md`, substitutes
`{repo_slug}` and `{repo_path}`
- Invokes `claude -p "<prompt>"` (non-interactive) and captures stdout
- Parses the YAML block from the response
- Writes or updates `<repo-path>/sbom-tools.yaml`
- Prints a diff of changes if the file already exists
- `--dry-run` flag: print the prompt and diff without writing
### T09 — Add make capture-tools target
```task
id: CUST-WP-0013-T09
state_hub_task_id: 6948e1d2-9c97-4709-bdb0-4b6ded700a22
status: done
priority: medium
```
Add to `state-hub/Makefile`:
```makefile
capture-tools: ## Run SBOM capture agent for a repo (REPO=slug, REPO_PATH=path)
uv run python scripts/capture_sbom_tools.py --repo $(REPO) $(if $(REPO_PATH),--repo-path $(REPO_PATH),)
```
Also update `make ingest-sbom` to note that `capture-tools` should be run first
for repos that have system-level tool dependencies.
---
## Phase 4 — Ingest railiance-infra
**Acceptance:** `make ingest-sbom REPO=railiance-infra` shows terraform providers,
ansible collections, and tool manifest entries in one snapshot.
### T10 — Capture tools manifest for railiance-infra
```task
id: CUST-WP-0013-T10
state_hub_task_id: 99b23998-5129-4777-9d42-7bee5981cdbb
status: done
priority: medium
```
Run `make capture-tools REPO=railiance-infra`. Review the generated
`railiance-infra/sbom-tools.yaml` — verify Ansible, Terraform, cloud-init, goss,
and any other tools with their versions and licences. Correct any `unknown`
versions by consulting the repo. Commit the file.
### T11 — Ingest railiance-infra
```task
id: CUST-WP-0013-T11
state_hub_task_id: bb516909-f903-48ce-b60b-a24245e7382e
status: done
priority: medium
```
Run `make ingest-sbom REPO=railiance-infra REPO_PATH=~/railiance-infra`. Verify
the snapshot contains:
- Terraform providers (from `.terraform.lock.hcl`)
- Ansible Galaxy collections (from `ansible/requirements.yaml`)
- System tools (from `sbom-tools.yaml`)
Check the licence report for any copyleft or BSL flags.
---
## Phase 5 — Ingest railiance-cluster
**Acceptance:** railiance-cluster SBOM covers both Python packages (uv.lock) and
system tools in a single snapshot.
### T12 — Capture tools manifest for railiance-cluster
```task
id: CUST-WP-0013-T12
state_hub_task_id: 7a890f1a-da9f-4e6d-86a7-4fd1aefd5b3f
status: done
priority: medium
```
Run `make capture-tools REPO=railiance-cluster`. Review the generated
`railiance-cluster/sbom-tools.yaml` — verify Helm, kubectl, k3s, and any other
operational tools. Commit the file.
### T13 — Re-ingest railiance-cluster
```task
id: CUST-WP-0013-T13
state_hub_task_id: 789dbe93-011a-4470-9fec-ebf249cd7134
status: done
priority: medium
```
Run `make ingest-sbom REPO=railiance-cluster REPO_PATH=~/railiance-cluster`.
Verify the snapshot merges uv.lock (Python packages including ansible-core) and
sbom-tools.yaml entries into one coherent snapshot. Confirm ansible-core GPL-3.0
flag appears in the licence report.
---
## Phase 6 — Convention Documentation
**Acceptance:** A developer reading the SBOM convention doc knows exactly how to
add a new repo to SBOM coverage.
### T14 — Document SBOM capture convention in canon/standards
```task
id: CUST-WP-0013-T14
state_hub_task_id: dc3bb2a3-882e-4dd7-ab7c-8b1e88279a7d
status: done
priority: low
```
Write `canon/standards/sbom-convention_v0.1.md` documenting:
- The four capture mechanisms and when each applies
- The `sbom-tools.yaml` schema (with confidence annotation convention)
- The `make capture-tools` → review → commit → `make ingest-sbom` workflow
- Licence risk thresholds: copyleft = flag for review; BSL = flag for review;
null licence = acceptable for infra tools if well-known open source
---
## Licence Risk Preview
Based on known tool licences, expect these flags once ingested:
| Tool / Package | Licence | Risk level |
|---|---|---|
| ansible-core | GPL-3.0-only | Copyleft — flag (ops toolchain, not shipped) |
| terraform ≥ 1.5.6 | BSL-1.1 | Non-OSI — flag for review |
| hashicorp providers | BSL-1.1 | Same |
| community.general | GPL-3.0 | Copyleft — flag (ops toolchain) |
| Helm | Apache-2.0 | Clean |
| k3s | Apache-2.0 | Clean |
| cloud-init | Apache-2.0 / GPL-3.0 | Mixed — check version |
| goss | Apache-2.0 | Clean |
All copyleft/BSL entries here are **operational toolchain** dependencies, not
shipped code — risk is low but worth tracking for compliance awareness.