feat(sbom): CUST-WP-0013 — expand SBOM infra to terraform, ansible, and tool manifests

- Migration d6e7f8a9b0c1: add terraform, ansible, tool to Ecosystem enum
- ingest_sbom.py: new Ansible Galaxy requirements.yml parser (collections + roles)
- ingest_sbom.py: new sbom-tools.yaml manifest parser (agent-generated tool deps)
- ingest_sbom.py: promote .terraform.lock.hcl parser from ecosystem=other → terraform
- ingest_sbom.py: detect_all() runs all four parsers in one comprehensive scan
- capture_sbom_tools.py: agent-assisted tool manifest generator (claude -p)
- prompts/sbom-capture-agent.md: parameterised prompt for repo tool discovery
- Makefile: capture-tools target; ingest-sbom updated docs and DRY_RUN support
- 29 unit tests covering all new parsers and detect_all() behaviour
- canon/standards/sbom-convention_v0.1.md: updated with four-mechanism model and workflow

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-12 04:40:26 +01:00
parent 3c69ad2929
commit 1c94f5545c
9 changed files with 1378 additions and 81 deletions

View File

@@ -6,7 +6,7 @@ domain: custodian
status: active
version: "0.1"
created: "2026-03-01"
updated: "2026-03-01"
updated: "2026-03-12"
---
# SBOM Convention v0.1 — Dependency Tracking & Licence Governance
@@ -27,20 +27,23 @@ dashboard (`/sbom`) provides domain-level and repo-level drill-down.
---
## 1. Authoritative Lockfiles per Ecosystem
## 1. Capture Mechanisms
| Ecosystem | Authoritative file | Notes |
|-----------|-------------------|-------|
| Python | `uv.lock` | Preferred. `requirements.txt` accepted as fallback |
| Node / npm | `package-lock.json` | Preferred. `yarn.lock` accepted |
| Rust | `Cargo.lock` | Auto-detected |
| Terraform | `.terraform.lock.hcl` | Provider pins; ecosystem stored as `other` until ENUM extended |
| Go | `go.sum` | *Not yet parsed — planned* |
| Java / JVM | `gradle.lockfile` / `pom.xml` | *Not yet parsed — planned* |
| Ansible | `requirements.yml` | *Not yet parsed — planned* |
`ingest_sbom.py` runs all four mechanisms in a single scan when given `--repo-path`.
No flags needed — comprehensive detection is the default.
**Principle:** commit lockfiles to the repo. Lockfiles are the SBOM source
of truth; do not generate them at ingest time.
| Mechanism | File(s) | Ecosystem | Detection scope |
|-----------|---------|-----------|-----------------|
| **Package manager lockfiles** | `uv.lock`, `requirements.txt`, `package-lock.json`, `yarn.lock`, `Cargo.lock` | `python`, `node`, `rust` | Anywhere in tree |
| **Terraform provider lock** | `.terraform.lock.hcl` | `terraform` | Anywhere in tree |
| **Ansible Galaxy manifest** | `ansible/requirements.yml` or `.yaml` | `ansible` | Under directories named `ansible/` |
| **Tool manifest** | `sbom-tools.yaml` (repo root) | `tool`, `ansible`, `terraform`, etc. | Repo root only |
**Go / Java parsers** (`go.sum`, `pom.xml`, `gradle.lockfile`) are *not yet
implemented* — planned for a future workplan.
**Principle:** commit lockfiles and `sbom-tools.yaml` to the repo. These are
the SBOM source of truth; do not generate them at ingest time.
---
@@ -64,27 +67,35 @@ curl -s http://127.0.0.1:8000/repos/ | python3 -m json.tool
## 3. SBOM Ingestion
### 3.1 Standard ingest (single lockfile at repo root)
### 3.1 Standard ingest (all mechanisms, recommended)
```bash
cd ~/the-custodian/state-hub
make ingest-sbom REPO=<slug> REPO_PATH=/path/to/repo
```
The script auto-detects the first recognised lockfile at `REPO_PATH`.
`ingest_sbom.py` automatically runs all four mechanisms in one scan — lockfiles,
Terraform provider locks, Ansible Galaxy manifests, and `sbom-tools.yaml`. All
results are merged into a single snapshot. Non-dep directories (`.venv`,
`node_modules`, `.git`, `dist`, etc.) are automatically skipped.
### 3.2 Multi-ecosystem repos (recommended for complex repos)
### 3.2 Repos with system-level tools: capture first, then ingest
Use `SCAN=1` to walk the repo tree and combine **all** lockfiles into a single
snapshot. Non-dep directories (`.venv`, `node_modules`, `.git`, `dist`, etc.)
are automatically skipped.
For repos that use system-level tools not tracked by any lockfile (Terraform
binary, Helm, kubectl, k3s, goss, etc.):
```bash
make ingest-sbom REPO=the-custodian SCAN=1 REPO_PATH=/home/worsch/the-custodian
```
# Step 1: generate sbom-tools.yaml via agent
make capture-tools REPO=<slug> REPO_PATH=/path/to/repo
This is the correct approach for repos that contain both a backend and a
frontend (e.g., a Python API + Node/Observable dashboard).
# Step 2: review sbom-tools.yaml — correct any confidence: low entries
# Step 3: commit sbom-tools.yaml
git -C /path/to/repo add sbom-tools.yaml && git -C /path/to/repo commit -m "chore(sbom): add tool manifest"
# Step 4: ingest everything
make ingest-sbom REPO=<slug> REPO_PATH=/path/to/repo
```
### 3.3 Explicit lockfile path
@@ -96,8 +107,7 @@ Multiple lockfiles can be passed by calling the script directly with repeated
`--lockfile` flags:
```bash
cd ~/the-custodian/state-hub
.venv/bin/python scripts/ingest_sbom.py \
uv run python scripts/ingest_sbom.py \
--repo <slug> \
--lockfile /path/to/uv.lock \
--lockfile /path/to/package-lock.json
@@ -106,11 +116,40 @@ cd ~/the-custodian/state-hub
### 3.4 Dry run (inspect without submitting)
```bash
make ingest-sbom REPO=<slug> SCAN=1 REPO_PATH=/path/to/repo
# append: add --dry-run to the command, or run the script directly:
.venv/bin/python scripts/ingest_sbom.py --repo <slug> --scan --repo-path /path/to/repo --dry-run
make ingest-sbom REPO=<slug> REPO_PATH=/path/to/repo DRY_RUN=1
```
### 3.5 sbom-tools.yaml: the tool manifest
Create `sbom-tools.yaml` at the repo root for any system-level tools not
covered by lockfiles. Schema:
```yaml
# sbom-tools.yaml
tools:
- name: terraform
version: "1.9.5" # confidence: medium
ecosystem: terraform
license_spdx: BSL-1.1
is_direct: true
is_dev: false
- name: helm
version: null # confidence: low (no version pin found)
ecosystem: tool
license_spdx: Apache-2.0
is_direct: true
is_dev: false
```
**Valid ecosystem values:** `python`, `node`, `rust`, `go`, `java`, `terraform`,
`ansible`, `tool`, `other`
Annotate each version with a `# confidence: high/medium/low` comment.
Entries with `confidence: low` need human verification before committing.
The `make capture-tools` command generates this file automatically using the
SBOM capture agent prompt (`state-hub/prompts/sbom-capture-agent.md`).
---
## 4. Snapshot Semantics
@@ -248,10 +287,14 @@ The SBOM dashboard aggregates across all repos within a domain in the
## 10. Planned Enhancements
- **Go / Java parsers** — add to `ingest_sbom.py`
- **Go / Java parsers** — add `go.sum`, `pom.xml`, `gradle.lockfile` support to `ingest_sbom.py`
- **Versioned snapshots** — retain history per repo for trend analysis
- **Licence override file** — allow repos to document known-acceptable
copyleft exceptions (`.sbom-overrides.yaml`)
- **CI integration** — GitHub Actions step to run ingest on lockfile change
- **Direct-dep detection for uv.lock** — parse `pyproject.toml` `[project.dependencies]`
to mark direct deps accurately
- **Galaxy API licence lookup** — resolve `license_spdx` for Ansible collections
via the Galaxy API at ingest time
- **Tool version pinning guidance** — tooling to detect `confidence: low` entries
across all registered repos and flag them for resolution

View File

@@ -133,16 +133,26 @@ list-repos:
@test -n "$(DOMAIN)" || (echo "ERROR: DOMAIN is required."; exit 1)
curl -sf "http://127.0.0.1:8000/repos/?domain=$(DOMAIN)" | python3 -m json.tool
## Ingest SBOM data for a repo.
## Ingest SBOM data for a repo (all mechanisms: lockfiles + ansible + sbom-tools.yaml).
## Auto-detect all sources: make ingest-sbom REPO=the-custodian REPO_PATH=/home/worsch/the-custodian
## Single lockfile (explicit): make ingest-sbom REPO=the-custodian LOCKFILE=/path/to/uv.lock
## Scan all lockfiles in tree: make ingest-sbom REPO=the-custodian SCAN=1 REPO_PATH=/home/worsch/the-custodian
## Auto-detect at repo root: make ingest-sbom REPO=the-custodian REPO_PATH=/home/worsch/the-custodian
## Dry-run (no submit): make ingest-sbom REPO=the-custodian REPO_PATH=... DRY_RUN=1
## Tip: run capture-tools first for repos with system-level tool dependencies.
ingest-sbom:
@test -n "$(REPO)" || (echo "ERROR: REPO is required."; exit 1)
uv run python scripts/ingest_sbom.py --repo "$(REPO)" \
$(if $(LOCKFILE),--lockfile "$(LOCKFILE)") \
$(if $(SCAN),--scan) \
$(if $(REPO_PATH),--repo-path "$(REPO_PATH)")
$(if $(REPO_PATH),--repo-path "$(REPO_PATH)") \
$(if $(DRY_RUN),--dry-run)
## Run SBOM capture agent for a repo — generates/updates sbom-tools.yaml.
## Usage: make capture-tools REPO=railiance-infra [REPO_PATH=/home/worsch/railiance-infra]
## Add DRY_RUN=1 to preview without writing.
capture-tools:
@test -n "$(REPO)" || (echo "ERROR: REPO is required."; exit 1)
uv run python scripts/capture_sbom_tools.py --repo "$(REPO)" \
$(if $(REPO_PATH),--repo-path "$(REPO_PATH)") \
$(if $(DRY_RUN),--dry-run)
## Check a repo for ADR-001 compliance: make validate-adr REPO=/path/to/repo [DOMAIN=custodian]
validate-adr:

View File

@@ -15,6 +15,9 @@ class Ecosystem(str, enum.Enum):
rust = "rust"
go = "go"
java = "java"
terraform = "terraform"
ansible = "ansible"
tool = "tool"
other = "other"

View File

@@ -0,0 +1,30 @@
"""SBOM ecosystem enum expansion: add terraform, ansible, tool
Revision ID: d6e7f8a9b0c1
Revises: c5d6e7f8a9b0
Create Date: 2026-03-12 00:00:00.000000
"""
from typing import Sequence, Union
from alembic import op
revision: str = "d6e7f8a9b0c1"
down_revision: Union[str, None] = "c5d6e7f8a9b0"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
# PostgreSQL requires each ADD VALUE in its own statement and cannot be
# run inside a transaction that also modifies data. ADD VALUE is
# transactional in PG 12+ (no COMMIT needed).
op.execute("ALTER TYPE ecosystem ADD VALUE IF NOT EXISTS 'terraform'")
op.execute("ALTER TYPE ecosystem ADD VALUE IF NOT EXISTS 'ansible'")
op.execute("ALTER TYPE ecosystem ADD VALUE IF NOT EXISTS 'tool'")
def downgrade() -> None:
# PostgreSQL does not support removing enum values without recreating the
# type. Document the limitation and do nothing — reverting this migration
# requires a full type recreation if needed.
pass

View File

@@ -0,0 +1,90 @@
# SBOM Capture Agent Prompt
**Task:** Generate or update `sbom-tools.yaml` for the repository at `{repo_path}` (slug: `{repo_slug}`).
This file captures system-level tool dependencies that are not tracked by any package manager lockfile — tools that are installed via provisioning, Homebrew, system packages, or assumed present in the environment.
---
## Instructions
1. **Read the following files** in `{repo_path}` (read each that exists; skip gracefully if absent):
- `CLAUDE.md` — look for stack declarations, tool prerequisites, dev commands
- `README.md` / `QUICKSTART.md` — prerequisites sections, tool version requirements
- `Makefile` — tool invocations, version variables (e.g. `ANSIBLE_VERSION := 12.3`)
- `pyproject.toml` — Python tool dependencies (already covered by uv.lock; note but don't duplicate)
- `.tool-versions` — asdf version pins
- `.terraform-version` — tfenv pin
- `.ansible-version` — if present
- `Dockerfile` / `docker-compose.yml` — base image versions, tool installs
- `.github/workflows/*.yml` / `.gitlab-ci.yml` — CI tool install steps, version pins
- `ansible/requirements.yml`**already captured by lockfile parser; do NOT include Galaxy collections here**
- Any `scripts/setup*.sh`, `scripts/bootstrap*.sh`, or `tools/` directory
2. **Identify system-level tools only** — tools that:
- Are invoked as CLI commands (e.g. `ansible-playbook`, `terraform`, `helm`, `kubectl`, `k3s`, `goss`, `age`, `sops`)
- Are NOT installed via `uv`/`pip`/`npm`/`cargo` into a project virtualenv (those are in lockfiles)
- Note: `ansible` itself as a CLI tool is a system dep even if `ansible-core` appears in `uv.lock`
3. **For each tool, determine**:
- `name`: canonical tool name (e.g. `ansible`, `terraform`, `helm`, `kubectl`, `k3s`, `goss`, `age`, `sops`, `cloud-init`)
- `version`: the pinned or documented version. Use `unknown` only if no evidence found anywhere.
- `ecosystem`: one of `python`, `node`, `rust`, `go`, `java`, `terraform`, `ansible`, `tool`, `other`
- Use `ansible` for Ansible itself; `terraform` for Terraform itself; `tool` for generic CLI tools
- `license_spdx`: the SPDX identifier. Common known licences (use these exact strings):
- ansible / ansible-core: `GPL-3.0-only`
- terraform ≤ 1.5.5: `MPL-2.0`; terraform ≥ 1.5.6: `BSL-1.1`
- helm: `Apache-2.0`
- kubectl: `Apache-2.0`
- k3s: `Apache-2.0`
- goss: `Apache-2.0`
- age: `BSD-3-Clause`
- sops: `MPL-2.0`
- cloud-init: `Apache-2.0` (or `GPL-3.0-only` for older versions — check)
- docker: `Apache-2.0`
- If unknown, use `null`
- `is_direct`: `true` if this repo directly declares/uses it; `false` if it's a transitive dependency of another tool
- `is_dev`: `true` only if the tool is only used for development/testing, not production operation
4. **Confidence annotation**: Add a `# confidence: high/medium/low` comment after each entry:
- `high`: version found explicitly pinned in a file
- `medium`: version inferred from context (e.g. "Ansible 12" in README)
- `low`: version not found; using `unknown` or a reasonable guess
5. **Do NOT include**:
- Python packages already covered by `uv.lock` or `requirements.txt`
- Ansible Galaxy collections (covered by `ansible/requirements.yml`)
- Terraform providers (covered by `.terraform.lock.hcl`)
- Node packages, Rust crates, etc. (covered by their lockfiles)
- Operating system packages unless the repo explicitly declares them
6. **Output format**: Emit ONLY the YAML block below — no prose, no markdown fences, no explanation. The output must be valid YAML that can be written directly to `sbom-tools.yaml`.
---
## Output format
```yaml
# sbom-tools.yaml — system-level tool dependencies for {repo_slug}
# Generated by sbom-capture-agent on {date}
# Review each entry before committing. Entries with confidence: low need human verification.
tools:
- name: example-tool
version: "1.2.3" # confidence: high
ecosystem: tool
license_spdx: Apache-2.0
is_direct: true
is_dev: false
```
If no system-level tools are found, output:
```yaml
# sbom-tools.yaml — system-level tool dependencies for {repo_slug}
# Generated by sbom-capture-agent on {date}
# No system-level tools identified — all dependencies are covered by lockfiles.
tools: []
```
---
Now read `{repo_path}` and produce the `sbom-tools.yaml` content.

View File

@@ -0,0 +1,187 @@
#!/usr/bin/env python3
"""Invoke the SBOM capture agent to generate/update sbom-tools.yaml for a repo.
Usage:
python capture_sbom_tools.py --repo <slug> [--repo-path <path>] [--dry-run]
The script:
1. Resolves repo path from the state-hub API (if --repo-path is not given)
2. Loads the agent prompt from prompts/sbom-capture-agent.md
3. Substitutes {repo_slug}, {repo_path}, {date} placeholders
4. Invokes `claude -p "<prompt>"` non-interactively
5. Extracts the YAML block from the response
6. Writes (or shows diff of) sbom-tools.yaml in the repo root
Requirements:
- `claude` CLI must be on PATH (Claude Code)
- PyYAML must be available in the active venv
"""
from __future__ import annotations
import argparse
import datetime
import difflib
import json
import os
import re
import subprocess
import sys
import urllib.error
import urllib.request
from pathlib import Path
API_BASE = os.environ.get("API_BASE", "http://127.0.0.1:8000").rstrip("/")
SCRIPT_DIR = Path(__file__).parent
PROMPT_FILE = SCRIPT_DIR.parent / "prompts" / "sbom-capture-agent.md"
def resolve_repo_path(repo_slug: str) -> Path | None:
"""Look up the registered path for a repo slug via the state-hub API."""
url = f"{API_BASE}/repos/{repo_slug}/"
try:
with urllib.request.urlopen(url, timeout=10) as resp:
data = json.loads(resp.read())
path_str = data.get("local_path")
if path_str:
return Path(path_str)
except (urllib.error.URLError, KeyError):
pass
return None
def load_prompt(repo_slug: str, repo_path: Path) -> str:
if not PROMPT_FILE.exists():
print(f"Error: prompt file not found at {PROMPT_FILE}", file=sys.stderr)
sys.exit(1)
template = PROMPT_FILE.read_text()
today = datetime.date.today().isoformat()
return (
template
.replace("{repo_slug}", repo_slug)
.replace("{repo_path}", str(repo_path))
.replace("{date}", today)
)
def invoke_agent(prompt: str) -> str:
"""Run `claude -p <prompt>` and return stdout."""
try:
result = subprocess.run(
["claude", "-p", prompt],
capture_output=True,
text=True,
timeout=120,
)
except FileNotFoundError:
print("Error: `claude` CLI not found on PATH. Install Claude Code.", file=sys.stderr)
sys.exit(1)
except subprocess.TimeoutExpired:
print("Error: claude invocation timed out after 120s.", file=sys.stderr)
sys.exit(1)
if result.returncode != 0:
print(f"Error: claude exited with code {result.returncode}", file=sys.stderr)
if result.stderr:
print(result.stderr, file=sys.stderr)
sys.exit(1)
return result.stdout
def extract_yaml(response: str) -> str:
"""Extract YAML content from the agent response.
Accepts:
- Raw YAML (starts with # or 'tools:')
- YAML wrapped in ```yaml ... ``` fences
"""
# Try fenced block first
m = re.search(r"```(?:yaml)?\s*\n(.*?)```", response, re.DOTALL)
if m:
return m.group(1).strip()
# Otherwise treat entire response as YAML
stripped = response.strip()
if stripped.startswith("#") or stripped.startswith("tools:"):
return stripped
print("Warning: could not extract YAML from agent response.", file=sys.stderr)
print("Raw response:", file=sys.stderr)
print(response[:500], file=sys.stderr)
sys.exit(1)
def show_diff(old: str | None, new: str, target: Path) -> None:
if old is None:
print(f"[new file] {target}")
for line in new.splitlines():
print(f" + {line}")
else:
diff = list(difflib.unified_diff(
old.splitlines(keepends=True),
new.splitlines(keepends=True),
fromfile=f"a/{target.name}",
tofile=f"b/{target.name}",
))
if diff:
print("".join(diff))
else:
print(f"[no changes] {target}")
def main() -> None:
parser = argparse.ArgumentParser(
description="Generate/update sbom-tools.yaml for a repo using the SBOM capture agent."
)
parser.add_argument("--repo", required=True, help="Repo slug (e.g. 'railiance-infra')")
parser.add_argument("--repo-path", help="Path to repo root (auto-resolved from state-hub if omitted)")
parser.add_argument("--dry-run", action="store_true",
help="Show prompt and diff without writing sbom-tools.yaml")
parser.add_argument("--print-prompt", action="store_true",
help="Print the rendered prompt and exit (useful for inspection)")
args = parser.parse_args()
# Resolve repo path
if args.repo_path:
repo_path = Path(args.repo_path).resolve()
else:
repo_path = resolve_repo_path(args.repo)
if repo_path is None:
# Fall back to ~/repo_slug convention
repo_path = Path.home() / args.repo
print(f"Could not resolve path from API; trying {repo_path}", file=sys.stderr)
if not repo_path.exists():
print(f"Error: repo path does not exist: {repo_path}", file=sys.stderr)
sys.exit(1)
target = repo_path / "sbom-tools.yaml"
existing_content = target.read_text() if target.exists() else None
prompt = load_prompt(args.repo, repo_path)
if args.print_prompt:
print(prompt)
return
print(f"Running SBOM capture agent for {args.repo} ({repo_path})…")
response = invoke_agent(prompt)
yaml_content = extract_yaml(response)
# Ensure trailing newline
if not yaml_content.endswith("\n"):
yaml_content += "\n"
show_diff(existing_content, yaml_content, target)
if args.dry_run:
print("\n[dry-run] sbom-tools.yaml not written.")
return
target.write_text(yaml_content)
print(f"\nWritten: {target}")
print("Review the file, correct any 'confidence: low' entries, then commit.")
if __name__ == "__main__":
main()

View File

@@ -1,15 +1,19 @@
#!/usr/bin/env python3
"""Ingest a repo's lockfile into the State Hub SBOM store.
"""Ingest a repo's lockfiles and tool manifests into the State Hub SBOM store.
Usage:
python ingest_sbom.py --repo <slug> [--lockfile <path>] [--api-base <url>]
python ingest_sbom.py --repo <slug> [--repo-path <path>] [--dry-run]
Auto-detects lockfile type:
uv.lock → Python ecosystem
requirements.txt → Python ecosystem (basic)
package-lock.json → Node ecosystem
yarn.lock → Node ecosystem
Cargo.lock → Rust ecosystem
Auto-detects all of the following in one scan:
uv.lock python
requirements.txt python
package-lock.json node
yarn.lock node
Cargo.lock rust
.terraform.lock.hcl → terraform (anywhere in tree)
ansible/requirements.yml → ansible (anywhere under ansible/ dirs)
ansible/requirements.yaml → ansible
sbom-tools.yaml → tool (repo root; agent-generated)
"""
from __future__ import annotations
@@ -22,11 +26,17 @@ import urllib.error
import urllib.request
from pathlib import Path
try:
import yaml # optional; only needed for sbom-tools.yaml and ansible parsers
_YAML_AVAILABLE = True
except ImportError:
_YAML_AVAILABLE = False
API_BASE = os.environ.get("API_BASE", "http://127.0.0.1:8000").rstrip("/")
# ---------------------------------------------------------------------------
# Lockfile parsers
# Lockfile parsers — each returns list[dict]
# ---------------------------------------------------------------------------
def _parse_uv_lock(path: Path) -> list[dict]:
@@ -55,7 +65,7 @@ def _parse_uv_lock(path: Path) -> list[dict]:
"package_version": e.get("package_version"),
"ecosystem": "python",
"license_spdx": None,
"is_direct": False, # uv.lock doesn't distinguish; treat all as transitive
"is_direct": False,
"is_dev": False,
}
for e in entries
@@ -70,7 +80,6 @@ def _parse_requirements_txt(path: Path) -> list[dict]:
line = line.strip()
if not line or line.startswith("#") or line.startswith("-"):
continue
# Handle: pkg==1.2.3, pkg>=1.2, pkg
m = re.match(r"^([A-Za-z0-9_.\-]+)(?:[>=<!~^]+([^\s;]+))?", line)
if m:
entries.append({
@@ -95,7 +104,7 @@ def _parse_package_lock_json(path: Path) -> list[dict]:
packages = data.get("packages", {})
entries = []
for pkg_path, info in packages.items():
if not pkg_path: # root package
if not pkg_path:
continue
name = info.get("name") or pkg_path.split("node_modules/")[-1]
entries.append({
@@ -120,8 +129,6 @@ def _parse_yarn_lock(path: Path) -> list[dict]:
if not stripped or stripped.startswith("#"):
continue
if not line.startswith(" ") and stripped.endswith(":"):
# New package block header: "name@version::" or "\"name@version\":"
# May list multiple versions: "name@^1.0, name@~1.0:"
current_names = []
current_version = None
for part in stripped.rstrip(":").split(","):
@@ -188,12 +195,10 @@ def _parse_terraform_lock_hcl(path: Path) -> list[dict]:
for line in path.read_text().splitlines():
stripped = line.strip()
# e.g.: provider "registry.terraform.io/hetznercloud/hcloud" {
m = re.match(r'^provider\s+"([^"]+)"\s*\{', stripped)
if m:
# Use full provider address as package_name, short name as display
full = m.group(1)
current_name = full # e.g. "registry.terraform.io/hetznercloud/hcloud"
current_name = full
current_version = None
elif current_name is not None:
vm = re.match(r'version\s*=\s*"([^"]+)"', stripped)
@@ -203,7 +208,7 @@ def _parse_terraform_lock_hcl(path: Path) -> list[dict]:
entries.append({
"package_name": current_name,
"package_version": current_version,
"ecosystem": "other", # "terraform" not yet in ENUM; tracked as other
"ecosystem": "terraform",
"license_spdx": None,
"is_direct": True,
"is_dev": False,
@@ -214,7 +219,114 @@ def _parse_terraform_lock_hcl(path: Path) -> list[dict]:
return entries
_LOCKFILE_PARSERS = {
def _parse_ansible_requirements(path: Path) -> list[dict]:
"""Parse ansible/requirements.yml — collections and roles from Ansible Galaxy."""
if not _YAML_AVAILABLE:
print(f"Warning: PyYAML not available; skipping {path}", file=sys.stderr)
return []
try:
data = yaml.safe_load(path.read_text())
except yaml.YAMLError as e:
print(f"Warning: cannot parse {path}: {e}", file=sys.stderr)
return []
if not isinstance(data, dict):
return []
entries = []
for item in data.get("collections", []) or []:
if isinstance(item, str):
name, version = item, None
elif isinstance(item, dict):
name = item.get("name", "")
version = str(item.get("version", "")) if item.get("version") else None
else:
continue
if name:
entries.append({
"package_name": name,
"package_version": version,
"ecosystem": "ansible",
"license_spdx": None,
"is_direct": True,
"is_dev": False,
})
for item in data.get("roles", []) or []:
if isinstance(item, str):
name, version = item, None
elif isinstance(item, dict):
name = item.get("name", item.get("src", ""))
version = str(item.get("version", "")) if item.get("version") else None
else:
continue
if name:
entries.append({
"package_name": name,
"package_version": version,
"ecosystem": "ansible",
"license_spdx": None,
"is_direct": True,
"is_dev": False,
})
return entries
def _parse_sbom_tools_yaml(path: Path) -> list[dict]:
"""Parse sbom-tools.yaml — agent-generated tool manifest at repo root."""
if not _YAML_AVAILABLE:
print(f"Warning: PyYAML not available; skipping {path}", file=sys.stderr)
return []
try:
data = yaml.safe_load(path.read_text())
except yaml.YAMLError as e:
print(f"Warning: cannot parse {path}: {e}", file=sys.stderr)
return []
if not isinstance(data, dict):
return []
entries = []
valid_ecosystems = {
"python", "node", "rust", "go", "java",
"terraform", "ansible", "tool", "other",
}
for item in data.get("tools", []) or []:
if not isinstance(item, dict):
continue
name = item.get("name", "")
version = str(item.get("version", "")) if item.get("version") else None
if version == "unknown":
print(f" Warning: tool '{name}' has version=unknown — flagged for review", file=sys.stderr)
version = None
ecosystem = item.get("ecosystem", "tool")
if ecosystem not in valid_ecosystems:
print(f" Warning: unknown ecosystem '{ecosystem}' for '{name}'; using 'tool'", file=sys.stderr)
ecosystem = "tool"
license_spdx = item.get("license_spdx") or None
entries.append({
"package_name": name,
"package_version": version,
"ecosystem": ecosystem,
"license_spdx": license_spdx,
"is_direct": bool(item.get("is_direct", True)),
"is_dev": bool(item.get("is_dev", False)),
})
return entries
# ---------------------------------------------------------------------------
# Detection helpers
# ---------------------------------------------------------------------------
# Filename → parser for standard lockfiles (detected by filename anywhere in tree)
_LOCKFILE_PARSERS: dict[str, object] = {
"uv.lock": _parse_uv_lock,
"requirements.txt": _parse_requirements_txt,
"package-lock.json": _parse_package_lock_json,
@@ -234,6 +346,47 @@ _SKIP_DIRS = {
}
def detect_all(repo_path: Path) -> list[tuple[Path, str, object]]:
"""Scan repo_path and return all discovered dependency sources.
Returns list of (path, label, parser_fn) tuples covering:
- Standard lockfiles (anywhere in tree)
- Ansible requirements files (in ansible/ subdirs)
- sbom-tools.yaml at repo root
"""
found: list[tuple[Path, str, object]] = []
seen_paths: set[Path] = set()
# Walk tree for all source types
for dirpath, dirnames, filenames in os.walk(repo_path):
dirnames[:] = sorted(d for d in dirnames if d not in _SKIP_DIRS)
dirpath_p = Path(dirpath)
# Standard lockfiles
for fname, parser in _LOCKFILE_PARSERS.items():
if fname in filenames:
p = dirpath_p / fname
if p not in seen_paths:
found.append((p, fname, parser))
seen_paths.add(p)
# Ansible requirements files — only under directories named "ansible"
if dirpath_p.name == "ansible":
for fname in ("requirements.yml", "requirements.yaml"):
if fname in filenames:
p = dirpath_p / fname
if p not in seen_paths:
found.append((p, f"ansible/{fname}", _parse_ansible_requirements))
seen_paths.add(p)
# sbom-tools.yaml at repo root only
tools_manifest = repo_path / "sbom-tools.yaml"
if tools_manifest.exists() and tools_manifest not in seen_paths:
found.append((tools_manifest, "sbom-tools.yaml", _parse_sbom_tools_yaml))
return found
def detect_lockfile(repo_path: Path) -> tuple[Path, str] | None:
"""Return (lockfile_path, filename) for the first recognised lockfile at repo root."""
for name in _LOCKFILE_PARSERS:
@@ -244,7 +397,10 @@ def detect_lockfile(repo_path: Path) -> tuple[Path, str] | None:
def detect_lockfiles_recursive(repo_path: Path) -> list[Path]:
"""Walk repo_path and return all recognised lockfiles, skipping non-dep dirs."""
"""Walk repo_path and return all recognised lockfiles, skipping non-dep dirs.
Kept for backwards compatibility; prefer detect_all() for new code.
"""
found: list[Path] = []
for dirpath, dirnames, filenames in os.walk(repo_path):
dirnames[:] = sorted(d for d in dirnames if d not in _SKIP_DIRS)
@@ -292,52 +448,47 @@ def post_ingest(api_base: str, repo_slug: str, entries: list[dict]) -> dict:
# ---------------------------------------------------------------------------
def main() -> None:
parser = argparse.ArgumentParser(description="Ingest a repo's lockfiles into the State Hub SBOM store.")
parser = argparse.ArgumentParser(
description="Ingest a repo's lockfiles and tool manifests into the State Hub SBOM store."
)
parser.add_argument("--repo", required=True, help="Managed-repo slug (e.g. 'the-custodian')")
parser.add_argument("--lockfile", action="append", dest="lockfiles",
metavar="PATH", help="Path to a specific lockfile (repeatable)")
parser.add_argument("--repo-path", default=".", help="Repo root for auto-detection/scan (default: cwd)")
parser.add_argument("--scan", action="store_true",
help="Recursively find ALL lockfiles under --repo-path (handles multi-ecosystem repos)")
help="Recursively find ALL lockfiles under --repo-path (deprecated; now default behaviour)")
parser.add_argument("--api-base", default=API_BASE, help="State Hub API base URL")
parser.add_argument("--dry-run", action="store_true", help="Parse only — do not submit")
args = parser.parse_args()
repo_root = Path(args.repo_path).resolve()
lockfile_paths: list[Path] = []
all_entries: list[dict] = []
if args.lockfiles:
lockfile_paths = [Path(lf).resolve() for lf in args.lockfiles]
elif args.scan:
lockfile_paths = detect_lockfiles_recursive(repo_root)
if not lockfile_paths:
print(f"No lockfiles found under '{repo_root}'.", file=sys.stderr)
sys.exit(1)
print(f"Scan found {len(lockfile_paths)} lockfile(s):")
for lf in lockfile_paths:
print(f" {lf.relative_to(repo_root) if lf.is_relative_to(repo_root) else lf}")
# Explicit paths: parse each, detect parser by filename
for lf_str in args.lockfiles:
lf = Path(lf_str).resolve()
parsed = parse_lockfile(lf)
rel = lf.relative_to(repo_root) if lf.is_relative_to(repo_root) else lf
print(f" {rel}: {len(parsed)} packages")
all_entries.extend(parsed)
else:
found = detect_lockfile(repo_root)
if not found:
# Comprehensive auto-detection: all mechanisms in one scan
sources = detect_all(repo_root)
if not sources:
print(
f"No recognised lockfile found in '{repo_root}'. "
f"Supported: {', '.join(_LOCKFILE_PARSERS)}. "
"Use --scan to search subdirectories.",
f"No recognised dependency sources found in '{repo_root}'.",
file=sys.stderr,
)
sys.exit(1)
lockfile_path, _ = found
print(f"Auto-detected: {lockfile_path}")
lockfile_paths = [lockfile_path]
all_entries: list[dict] = []
for lf in lockfile_paths:
parsed = parse_lockfile(lf)
rel = lf.relative_to(repo_root) if lf.is_relative_to(repo_root) else lf
print(f" {rel}: {len(parsed)} packages")
all_entries.extend(parsed)
for src_path, label, parser_fn in sources:
parsed = parser_fn(src_path)
rel = src_path.relative_to(repo_root) if src_path.is_relative_to(repo_root) else src_path
print(f" {label} ({rel}): {len(parsed)} entries")
all_entries.extend(parsed)
print(f"Total: {len(all_entries)} packages across {len(lockfile_paths)} lockfile(s)")
print(f"Total: {len(all_entries)} entries")
if args.dry_run:
print(json.dumps(all_entries[:5], indent=2))

View File

@@ -0,0 +1,397 @@
"""Unit tests for ingest_sbom.py parsers and auto-detection."""
from __future__ import annotations
import json
import sys
import textwrap
from pathlib import Path
import pytest
# Make scripts/ importable
sys.path.insert(0, str(Path(__file__).parent.parent / "scripts"))
import ingest_sbom as sb
# ---------------------------------------------------------------------------
# Terraform parser
# ---------------------------------------------------------------------------
TERRAFORM_LOCK = textwrap.dedent("""\
provider "registry.terraform.io/hashicorp/template" {
version = "2.2.0"
constraints = ">= 2.0.0"
hashes = [
"h1:abc123",
]
}
provider "registry.terraform.io/hetznercloud/hcloud" {
version = "1.52.0"
constraints = ">= 1.40.0"
}
""")
def test_terraform_parser_ecosystem(tmp_path):
lock = tmp_path / ".terraform.lock.hcl"
lock.write_text(TERRAFORM_LOCK)
entries = sb._parse_terraform_lock_hcl(lock)
assert len(entries) == 2
for e in entries:
assert e["ecosystem"] == "terraform", f"expected terraform, got {e['ecosystem']}"
names = {e["package_name"] for e in entries}
assert "registry.terraform.io/hashicorp/template" in names
assert "registry.terraform.io/hetznercloud/hcloud" in names
def test_terraform_parser_versions(tmp_path):
lock = tmp_path / ".terraform.lock.hcl"
lock.write_text(TERRAFORM_LOCK)
entries = sb._parse_terraform_lock_hcl(lock)
by_name = {e["package_name"]: e for e in entries}
assert by_name["registry.terraform.io/hashicorp/template"]["package_version"] == "2.2.0"
assert by_name["registry.terraform.io/hetznercloud/hcloud"]["package_version"] == "1.52.0"
def test_terraform_parser_is_direct(tmp_path):
lock = tmp_path / ".terraform.lock.hcl"
lock.write_text(TERRAFORM_LOCK)
entries = sb._parse_terraform_lock_hcl(lock)
assert all(e["is_direct"] for e in entries)
def test_terraform_parser_empty(tmp_path):
lock = tmp_path / ".terraform.lock.hcl"
lock.write_text("# no providers\n")
entries = sb._parse_terraform_lock_hcl(lock)
assert entries == []
# ---------------------------------------------------------------------------
# Ansible Galaxy parser
# ---------------------------------------------------------------------------
ANSIBLE_REQUIREMENTS_FULL = textwrap.dedent("""\
collections:
- name: community.general
version: "9.5.0"
- name: ansible.posix
version: "1.6.0"
- community.crypto
roles:
- name: geerlingguy.docker
version: "6.1.0"
- geerlingguy.pip
""")
ANSIBLE_REQUIREMENTS_EMPTY = textwrap.dedent("""\
collections: []
roles: []
""")
ANSIBLE_REQUIREMENTS_COLLECTIONS_ONLY = textwrap.dedent("""\
collections:
- name: community.general
version: "9.0.0"
""")
def test_ansible_parser_collections_and_roles(tmp_path):
req = tmp_path / "requirements.yml"
req.write_text(ANSIBLE_REQUIREMENTS_FULL)
entries = sb._parse_ansible_requirements(req)
assert len(entries) == 5
names = {e["package_name"] for e in entries}
assert "community.general" in names
assert "ansible.posix" in names
assert "community.crypto" in names
assert "geerlingguy.docker" in names
assert "geerlingguy.pip" in names
def test_ansible_parser_ecosystem(tmp_path):
req = tmp_path / "requirements.yml"
req.write_text(ANSIBLE_REQUIREMENTS_FULL)
entries = sb._parse_ansible_requirements(req)
for e in entries:
assert e["ecosystem"] == "ansible"
def test_ansible_parser_versions(tmp_path):
req = tmp_path / "requirements.yml"
req.write_text(ANSIBLE_REQUIREMENTS_FULL)
entries = sb._parse_ansible_requirements(req)
by_name = {e["package_name"]: e for e in entries}
assert by_name["community.general"]["package_version"] == "9.5.0"
assert by_name["ansible.posix"]["package_version"] == "1.6.0"
assert by_name["community.crypto"]["package_version"] is None # no version specified
assert by_name["geerlingguy.docker"]["package_version"] == "6.1.0"
assert by_name["geerlingguy.pip"]["package_version"] is None
def test_ansible_parser_is_direct(tmp_path):
req = tmp_path / "requirements.yml"
req.write_text(ANSIBLE_REQUIREMENTS_FULL)
entries = sb._parse_ansible_requirements(req)
assert all(e["is_direct"] for e in entries)
def test_ansible_parser_empty(tmp_path):
req = tmp_path / "requirements.yml"
req.write_text(ANSIBLE_REQUIREMENTS_EMPTY)
entries = sb._parse_ansible_requirements(req)
assert entries == []
def test_ansible_parser_collections_only(tmp_path):
req = tmp_path / "requirements.yml"
req.write_text(ANSIBLE_REQUIREMENTS_COLLECTIONS_ONLY)
entries = sb._parse_ansible_requirements(req)
assert len(entries) == 1
assert entries[0]["package_name"] == "community.general"
def test_ansible_parser_yaml_extension(tmp_path):
"""Both .yml and .yaml extensions must work."""
req = tmp_path / "requirements.yaml"
req.write_text(ANSIBLE_REQUIREMENTS_COLLECTIONS_ONLY)
entries = sb._parse_ansible_requirements(req)
assert len(entries) == 1
def test_ansible_parser_invalid_yaml(tmp_path, capsys):
req = tmp_path / "requirements.yml"
req.write_text("collections: [unclosed")
entries = sb._parse_ansible_requirements(req)
assert entries == []
captured = capsys.readouterr()
assert "Warning" in captured.err
# ---------------------------------------------------------------------------
# sbom-tools.yaml parser
# ---------------------------------------------------------------------------
SBOM_TOOLS_YAML = textwrap.dedent("""\
tools:
- name: ansible
version: "12.3.0"
ecosystem: ansible
license_spdx: GPL-3.0-only
is_direct: true
is_dev: false
- name: terraform
version: "1.10.5"
ecosystem: terraform
license_spdx: BSL-1.1
is_direct: true
is_dev: false
- name: helm
version: "3.17.1"
ecosystem: tool
license_spdx: Apache-2.0
is_direct: true
is_dev: false
- name: k3s
version: unknown
ecosystem: other
license_spdx: Apache-2.0
is_direct: true
is_dev: false
""")
SBOM_TOOLS_YAML_MINIMAL = textwrap.dedent("""\
tools:
- name: kubectl
ecosystem: tool
""")
def test_sbom_tools_parser_basic(tmp_path):
manifest = tmp_path / "sbom-tools.yaml"
manifest.write_text(SBOM_TOOLS_YAML)
entries = sb._parse_sbom_tools_yaml(manifest)
assert len(entries) == 4
names = {e["package_name"] for e in entries}
assert {"ansible", "terraform", "helm", "k3s"} == names
def test_sbom_tools_parser_ecosystems(tmp_path):
manifest = tmp_path / "sbom-tools.yaml"
manifest.write_text(SBOM_TOOLS_YAML)
entries = sb._parse_sbom_tools_yaml(manifest)
by_name = {e["package_name"]: e for e in entries}
assert by_name["ansible"]["ecosystem"] == "ansible"
assert by_name["terraform"]["ecosystem"] == "terraform"
assert by_name["helm"]["ecosystem"] == "tool"
assert by_name["k3s"]["ecosystem"] == "other"
def test_sbom_tools_parser_licenses(tmp_path):
manifest = tmp_path / "sbom-tools.yaml"
manifest.write_text(SBOM_TOOLS_YAML)
entries = sb._parse_sbom_tools_yaml(manifest)
by_name = {e["package_name"]: e for e in entries}
assert by_name["ansible"]["license_spdx"] == "GPL-3.0-only"
assert by_name["terraform"]["license_spdx"] == "BSL-1.1"
assert by_name["helm"]["license_spdx"] == "Apache-2.0"
def test_sbom_tools_parser_unknown_version_becomes_none(tmp_path, capsys):
"""version: unknown must be converted to None and emit a warning."""
manifest = tmp_path / "sbom-tools.yaml"
manifest.write_text(SBOM_TOOLS_YAML)
entries = sb._parse_sbom_tools_yaml(manifest)
by_name = {e["package_name"]: e for e in entries}
assert by_name["k3s"]["package_version"] is None
captured = capsys.readouterr()
assert "unknown" in captured.err
def test_sbom_tools_parser_minimal_entry(tmp_path):
"""Only 'name' and 'ecosystem' required; version and license default to None."""
manifest = tmp_path / "sbom-tools.yaml"
manifest.write_text(SBOM_TOOLS_YAML_MINIMAL)
entries = sb._parse_sbom_tools_yaml(manifest)
assert len(entries) == 1
e = entries[0]
assert e["package_name"] == "kubectl"
assert e["ecosystem"] == "tool"
assert e["package_version"] is None
assert e["license_spdx"] is None
assert e["is_direct"] is True
assert e["is_dev"] is False
def test_sbom_tools_parser_invalid_ecosystem_falls_back(tmp_path, capsys):
manifest = tmp_path / "sbom-tools.yaml"
manifest.write_text("tools:\n - name: foo\n ecosystem: nonsense\n")
entries = sb._parse_sbom_tools_yaml(manifest)
assert entries[0]["ecosystem"] == "tool"
captured = capsys.readouterr()
assert "Warning" in captured.err
def test_sbom_tools_parser_empty_tools(tmp_path):
manifest = tmp_path / "sbom-tools.yaml"
manifest.write_text("tools: []\n")
entries = sb._parse_sbom_tools_yaml(manifest)
assert entries == []
def test_sbom_tools_parser_invalid_yaml(tmp_path, capsys):
manifest = tmp_path / "sbom-tools.yaml"
manifest.write_text("tools: {bad yaml: [unclosed")
entries = sb._parse_sbom_tools_yaml(manifest)
assert entries == []
captured = capsys.readouterr()
assert "Warning" in captured.err
# ---------------------------------------------------------------------------
# detect_all — comprehensive multi-parser scan
# ---------------------------------------------------------------------------
def test_detect_all_uv_lock(tmp_path):
(tmp_path / "uv.lock").write_text("[[package]]\nname = \"typer\"\nversion = \"0.12.0\"\n")
sources = sb.detect_all(tmp_path)
labels = {label for _, label, _ in sources}
assert "uv.lock" in labels
def test_detect_all_terraform_lock(tmp_path):
tf_dir = tmp_path / "terraform" / "hetzner"
tf_dir.mkdir(parents=True)
(tf_dir / ".terraform.lock.hcl").write_text(
'provider "registry.terraform.io/hetznercloud/hcloud" {\n version = "1.52.0"\n}\n'
)
sources = sb.detect_all(tmp_path)
labels = {label for _, label, _ in sources}
assert ".terraform.lock.hcl" in labels
def test_detect_all_ansible_requirements(tmp_path):
ansible_dir = tmp_path / "ansible"
ansible_dir.mkdir()
(ansible_dir / "requirements.yml").write_text("collections:\n - name: community.general\n")
sources = sb.detect_all(tmp_path)
labels = {label for _, label, _ in sources}
assert "ansible/requirements.yml" in labels
def test_detect_all_sbom_tools_yaml(tmp_path):
(tmp_path / "sbom-tools.yaml").write_text("tools:\n - name: helm\n ecosystem: tool\n")
sources = sb.detect_all(tmp_path)
labels = {label for _, label, _ in sources}
assert "sbom-tools.yaml" in labels
def test_detect_all_multi_ecosystem(tmp_path):
"""A repo with Python + Terraform + Ansible + tools manifest yields all four."""
# Python
(tmp_path / "uv.lock").write_text("[[package]]\nname = \"typer\"\nversion = \"0.12.0\"\n")
# Terraform
tf_dir = tmp_path / "terraform"
tf_dir.mkdir()
(tf_dir / ".terraform.lock.hcl").write_text(
'provider "registry.terraform.io/hashicorp/null" {\n version = "3.2.3"\n}\n'
)
# Ansible
ansible_dir = tmp_path / "ansible"
ansible_dir.mkdir()
(ansible_dir / "requirements.yml").write_text("collections:\n - name: ansible.posix\n version: \"1.6.0\"\n")
# Tool manifest
(tmp_path / "sbom-tools.yaml").write_text("tools:\n - name: helm\n ecosystem: tool\n version: \"3.17.1\"\n")
sources = sb.detect_all(tmp_path)
labels = {label for _, label, _ in sources}
assert "uv.lock" in labels
assert ".terraform.lock.hcl" in labels
assert "ansible/requirements.yml" in labels
assert "sbom-tools.yaml" in labels
# Parse all and verify merged entries
all_entries = []
for path, label, parser_fn in sources:
all_entries.extend(parser_fn(path))
ecosystems = {e["ecosystem"] for e in all_entries}
assert "python" in ecosystems
assert "terraform" in ecosystems
assert "ansible" in ecosystems
assert "tool" in ecosystems
def test_detect_all_skips_venv(tmp_path):
"""Lockfiles inside .venv must be ignored."""
venv_dir = tmp_path / ".venv" / "lib"
venv_dir.mkdir(parents=True)
(venv_dir / "requirements.txt").write_text("requests==2.31.0\n")
sources = sb.detect_all(tmp_path)
paths = {str(p) for p, _, _ in sources}
assert not any(".venv" in p for p in paths)
def test_detect_all_ansible_req_only_in_ansible_dir(tmp_path):
"""requirements.yml at repo root (not in ansible/) should not be picked up as ansible."""
(tmp_path / "requirements.yml").write_text("collections:\n - name: community.general\n")
sources = sb.detect_all(tmp_path)
labels = {label for _, label, _ in sources}
# Should NOT be detected since it's not under an 'ansible/' directory
assert "ansible/requirements.yml" not in labels
assert "ansible/requirements.yaml" not in labels
def test_detect_all_no_duplicates(tmp_path):
"""Same file should not appear twice."""
(tmp_path / "uv.lock").write_text("[[package]]\nname = \"x\"\nversion = \"1.0\"\n")
sources = sb.detect_all(tmp_path)
paths = [p for p, _, _ in sources]
assert len(paths) == len(set(paths))
def test_detect_all_empty_repo(tmp_path):
sources = sb.detect_all(tmp_path)
assert sources == []

View File

@@ -0,0 +1,386 @@
---
id: CUST-WP-0013
type: workplan
title: "SBOM Infrastructure Expansion"
domain: custodian
repo: the-custodian
status: completed
owner: custodian
topic_slug: custodian
state_hub_workstream_id: f4ba84c8-4d47-492d-b65e-73b157271a2b
created: "2026-03-12"
updated: "2026-03-12"
---
# CUST-WP-0013 — SBOM Infrastructure Expansion
**Scope:** Extend SBOM capture beyond Python packages to cover Terraform providers,
Ansible Galaxy collections, and system-level tools (Ansible, Terraform, Helm, k3s,
cloud-init, etc.). Introduces an agent-assisted tool manifest capture workflow,
new ecosystem enum values, comprehensive auto-detection in `ingest_sbom.py`, and
delivers full SBOM coverage for `railiance-infra` and `railiance-cluster`.
**Drives:** Licence risk visibility across the full dependency graph, not just
language-level packages.
---
## Design Decisions
### Tool manifest: agent-generated, not hand-written
System tools (Ansible, Terraform, Helm, k3s, etc.) live outside any lockfile —
they are provisioned, not installed by a package manager. Rather than asking
operators to maintain a hand-written manifest, the SBOM capture agent inspects
the repo and generates/updates `sbom-tools.yaml` automatically.
The agent prompt (`state-hub/prompts/sbom-capture-agent.md`) is parameterised
per repo. It reads the repo's CLAUDE.md, Makefile, README, CI configs, version
pins, and provisioning files, then emits a structured `sbom-tools.yaml` with
tool name, version, ecosystem, SPDX licence, and directness flags.
A thin wrapper script (`state-hub/scripts/capture_sbom_tools.py`) invokes the
agent prompt via `claude -p` (or prints it for manual use) and writes the result
to `<repo-root>/sbom-tools.yaml`.
### Comprehensive ingest: all mechanisms per repo
`make ingest-sbom REPO=<slug>` must run all applicable parsers, not just
whichever lockfile happens to be auto-detected first. The updated auto-detection
in `ingest_sbom.py` scans:
1. Package manager lockfiles (`uv.lock`, `requirements.txt`, `package-lock.json`,
`yarn.lock`, `Cargo.lock`, `go.sum`)
2. Terraform provider locks (`.terraform.lock.hcl`, anywhere in the tree)
3. Ansible Galaxy manifests (`requirements.yml` / `requirements.yaml`, anywhere
in the tree under `ansible/`)
4. Agent-generated tool manifest (`sbom-tools.yaml` at repo root)
All parsers run and their results are merged into a single snapshot.
---
## Phase 1 — Schema: Ecosystem Enum Extension
**Acceptance:** `terraform` and `ansible` are valid ecosystem values; existing
`other` entries are unaffected; migration applies cleanly.
### T01 — Alembic migration: add terraform and ansible enum values
```task
id: CUST-WP-0013-T01
state_hub_task_id: c0b6edc4-86ab-4cee-88a8-6c66fb81adee
status: done
priority: high
```
Add `terraform` and `ansible` to the `Ecosystem` enum in the DB. Check whether
the column uses a native PostgreSQL ENUM type (requiring `ALTER TYPE`) or a
`String` column (requiring no migration). Write the migration accordingly.
Also add `tool` as a catch-all for tool-manifest entries that don't fit a
named ecosystem.
---
## Phase 2 — Parser Improvements in ingest_sbom.py
**Acceptance:** `--dry-run` on railiance-infra shows terraform providers and
ansible collections correctly labelled; tool manifest entries appear with the
declared ecosystem.
### T02 — Promote Terraform parser: other → terraform ecosystem
```task
id: CUST-WP-0013-T02
state_hub_task_id: 7686bccd-022c-4e30-8081-c8487eb82253
status: done
priority: high
```
The `.terraform.lock.hcl` parser already exists in `ingest_sbom.py` but stores
entries as `ecosystem="other"`. Change to `ecosystem="terraform"` after T01
migration lands. Re-ingest any repos that previously ingested terraform entries
as `other` to correct the label.
### T03 — Implement Ansible Galaxy requirements.yml parser
```task
id: CUST-WP-0013-T03
state_hub_task_id: 48658bdd-4d16-4be0-a87e-45df4f4901b0
status: done
priority: high
```
Parse `requirements.yml` / `requirements.yaml` files found in `ansible/`
subdirectories. Standard format:
```yaml
collections:
- name: community.general
version: "9.5.0"
roles:
- name: geerlingguy.docker
version: "6.x"
```
Store as `ecosystem="ansible"`, `is_direct=True`. Licence left `null` (Galaxy
API lookup is deferred). Handle both `collections:` and `roles:` blocks.
### T04 — Implement sbom-tools.yaml manifest parser
```task
id: CUST-WP-0013-T04
state_hub_task_id: 4522ea04-134b-40ee-a7a2-ea0e4c1c061d
status: done
priority: high
```
Parse `sbom-tools.yaml` at the repo root (written by the capture agent). Schema:
```yaml
# Generated by sbom-capture-agent — review before committing
tools:
- name: ansible
version: "12.3.0"
ecosystem: ansible # or: terraform, other, python, etc.
license_spdx: GPL-3.0-only
is_direct: true
is_dev: false
- name: helm
version: "3.17.x"
ecosystem: other
license_spdx: Apache-2.0
is_direct: true
is_dev: false
```
Supports all existing ecosystem values plus `tool`. Pass entries through the
same normalisation as lockfile entries. Skip entries with `version: unknown`
with a warning (agent could not determine version).
### T05 — Comprehensive auto-detection: all formats in one scan
```task
id: CUST-WP-0013-T05
state_hub_task_id: cdda6bf2-2a44-4444-a04a-ac2fe2314923
status: done
priority: high
```
Refactor the `--repo-path` scan to discover and run all applicable parsers,
not just the first match. Scan order:
1. Walk tree for all `uv.lock`, `requirements.txt`, `package-lock.json`,
`yarn.lock`, `Cargo.lock`
2. Walk tree for all `.terraform.lock.hcl`
3. Walk tree for `ansible/requirements.yml` and `ansible/requirements.yaml`
4. Check repo root for `sbom-tools.yaml`
Merge all results into a single batch for the snapshot ingest call. Log a
summary line per parser: ` <parser>: N packages from <path>`.
### T06 — Unit tests for new parsers
```task
id: CUST-WP-0013-T06
state_hub_task_id: fee37e66-8f41-4dba-995b-97fc66493caf
status: done
priority: medium
```
Add test fixtures and unit tests for:
- Ansible Galaxy requirements.yml (collections + roles, version pinned and
unpinned)
- sbom-tools.yaml (valid, missing version, unknown ecosystem)
- Multi-parser scan: repo root with uv.lock + .terraform.lock.hcl +
sbom-tools.yaml produces merged results
---
## Phase 3 — SBOM Capture Agent
**Acceptance:** `make capture-tools REPO=railiance-infra` produces a reviewed
`sbom-tools.yaml` that correctly identifies Ansible, Terraform, Helm, and other
declared tools with versions and SPDX licences.
### T07 — Write SBOM capture agent prompt
```task
id: CUST-WP-0013-T07
state_hub_task_id: a3b919b5-63b0-44f7-a048-ebfae603ef7b
status: done
priority: high
```
Write `state-hub/prompts/sbom-capture-agent.md` — a Claude agent prompt
parameterised with `{repo_slug}` and `{repo_path}`. The prompt instructs the
agent to:
1. Read `CLAUDE.md`, `Makefile`, `README.md`, `pyproject.toml`, `.tool-versions`,
CI configs, Dockerfiles, and provisioning files in `{repo_path}`
2. Identify all system-level tools: name, version (from version pins, Makefile
vars, or documented prerequisites), ecosystem, SPDX licence
3. Identify indirect/transitive tool deps (e.g. Ansible → Python; Terraform →
provider plugins already captured by `.terraform.lock.hcl`)
4. Emit a well-formed `sbom-tools.yaml` with a comment header noting generation
date and confidence level per entry (`# confidence: high/medium/low`)
5. Flag any tools where version could not be determined (`version: unknown`) for
human review
The prompt must not hallucinate versions — it must derive them from evidence in
the repo or mark them unknown.
### T08 — Implement capture_sbom_tools.py
```task
id: CUST-WP-0013-T08
state_hub_task_id: 9593dca7-e713-4d7a-b4f2-c5333ae0b3d2
status: done
priority: high
```
Write `state-hub/scripts/capture_sbom_tools.py`:
- Accepts `--repo SLUG` and `--repo-path PATH`
- Resolves repo path from slug via the state-hub API if `--repo-path` is omitted
- Loads the agent prompt from `prompts/sbom-capture-agent.md`, substitutes
`{repo_slug}` and `{repo_path}`
- Invokes `claude -p "<prompt>"` (non-interactive) and captures stdout
- Parses the YAML block from the response
- Writes or updates `<repo-path>/sbom-tools.yaml`
- Prints a diff of changes if the file already exists
- `--dry-run` flag: print the prompt and diff without writing
### T09 — Add make capture-tools target
```task
id: CUST-WP-0013-T09
state_hub_task_id: 6948e1d2-9c97-4709-bdb0-4b6ded700a22
status: done
priority: medium
```
Add to `state-hub/Makefile`:
```makefile
capture-tools: ## Run SBOM capture agent for a repo (REPO=slug, REPO_PATH=path)
uv run python scripts/capture_sbom_tools.py --repo $(REPO) $(if $(REPO_PATH),--repo-path $(REPO_PATH),)
```
Also update `make ingest-sbom` to note that `capture-tools` should be run first
for repos that have system-level tool dependencies.
---
## Phase 4 — Ingest railiance-infra
**Acceptance:** `make ingest-sbom REPO=railiance-infra` shows terraform providers,
ansible collections, and tool manifest entries in one snapshot.
### T10 — Capture tools manifest for railiance-infra
```task
id: CUST-WP-0013-T10
state_hub_task_id: 99b23998-5129-4777-9d42-7bee5981cdbb
status: done
priority: medium
```
Run `make capture-tools REPO=railiance-infra`. Review the generated
`railiance-infra/sbom-tools.yaml` — verify Ansible, Terraform, cloud-init, goss,
and any other tools with their versions and licences. Correct any `unknown`
versions by consulting the repo. Commit the file.
### T11 — Ingest railiance-infra
```task
id: CUST-WP-0013-T11
state_hub_task_id: bb516909-f903-48ce-b60b-a24245e7382e
status: done
priority: medium
```
Run `make ingest-sbom REPO=railiance-infra REPO_PATH=~/railiance-infra`. Verify
the snapshot contains:
- Terraform providers (from `.terraform.lock.hcl`)
- Ansible Galaxy collections (from `ansible/requirements.yaml`)
- System tools (from `sbom-tools.yaml`)
Check the licence report for any copyleft or BSL flags.
---
## Phase 5 — Ingest railiance-cluster
**Acceptance:** railiance-cluster SBOM covers both Python packages (uv.lock) and
system tools in a single snapshot.
### T12 — Capture tools manifest for railiance-cluster
```task
id: CUST-WP-0013-T12
state_hub_task_id: 7a890f1a-da9f-4e6d-86a7-4fd1aefd5b3f
status: done
priority: medium
```
Run `make capture-tools REPO=railiance-cluster`. Review the generated
`railiance-cluster/sbom-tools.yaml` — verify Helm, kubectl, k3s, and any other
operational tools. Commit the file.
### T13 — Re-ingest railiance-cluster
```task
id: CUST-WP-0013-T13
state_hub_task_id: 789dbe93-011a-4470-9fec-ebf249cd7134
status: done
priority: medium
```
Run `make ingest-sbom REPO=railiance-cluster REPO_PATH=~/railiance-cluster`.
Verify the snapshot merges uv.lock (Python packages including ansible-core) and
sbom-tools.yaml entries into one coherent snapshot. Confirm ansible-core GPL-3.0
flag appears in the licence report.
---
## Phase 6 — Convention Documentation
**Acceptance:** A developer reading the SBOM convention doc knows exactly how to
add a new repo to SBOM coverage.
### T14 — Document SBOM capture convention in canon/standards
```task
id: CUST-WP-0013-T14
state_hub_task_id: dc3bb2a3-882e-4dd7-ab7c-8b1e88279a7d
status: done
priority: low
```
Write `canon/standards/sbom-convention_v0.1.md` documenting:
- The four capture mechanisms and when each applies
- The `sbom-tools.yaml` schema (with confidence annotation convention)
- The `make capture-tools` → review → commit → `make ingest-sbom` workflow
- Licence risk thresholds: copyleft = flag for review; BSL = flag for review;
null licence = acceptable for infra tools if well-known open source
---
## Licence Risk Preview
Based on known tool licences, expect these flags once ingested:
| Tool / Package | Licence | Risk level |
|---|---|---|
| ansible-core | GPL-3.0-only | Copyleft — flag (ops toolchain, not shipped) |
| terraform ≥ 1.5.6 | BSL-1.1 | Non-OSI — flag for review |
| hashicorp providers | BSL-1.1 | Same |
| community.general | GPL-3.0 | Copyleft — flag (ops toolchain) |
| Helm | Apache-2.0 | Clean |
| k3s | Apache-2.0 | Clean |
| cloud-init | Apache-2.0 / GPL-3.0 | Mixed — check version |
| goss | Apache-2.0 | Clean |
All copyleft/BSL entries here are **operational toolchain** dependencies, not
shipped code — risk is low but worth tracking for compliance awareness.