feat(sbom): CUST-WP-0013 — expand SBOM infra to terraform, ansible, and tool manifests
- Migration d6e7f8a9b0c1: add terraform, ansible, tool to Ecosystem enum - ingest_sbom.py: new Ansible Galaxy requirements.yml parser (collections + roles) - ingest_sbom.py: new sbom-tools.yaml manifest parser (agent-generated tool deps) - ingest_sbom.py: promote .terraform.lock.hcl parser from ecosystem=other → terraform - ingest_sbom.py: detect_all() runs all four parsers in one comprehensive scan - capture_sbom_tools.py: agent-assisted tool manifest generator (claude -p) - prompts/sbom-capture-agent.md: parameterised prompt for repo tool discovery - Makefile: capture-tools target; ingest-sbom updated docs and DRY_RUN support - 29 unit tests covering all new parsers and detect_all() behaviour - canon/standards/sbom-convention_v0.1.md: updated with four-mechanism model and workflow Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -6,7 +6,7 @@ domain: custodian
|
||||
status: active
|
||||
version: "0.1"
|
||||
created: "2026-03-01"
|
||||
updated: "2026-03-01"
|
||||
updated: "2026-03-12"
|
||||
---
|
||||
|
||||
# SBOM Convention v0.1 — Dependency Tracking & Licence Governance
|
||||
@@ -27,20 +27,23 @@ dashboard (`/sbom`) provides domain-level and repo-level drill-down.
|
||||
|
||||
---
|
||||
|
||||
## 1. Authoritative Lockfiles per Ecosystem
|
||||
## 1. Capture Mechanisms
|
||||
|
||||
| Ecosystem | Authoritative file | Notes |
|
||||
|-----------|-------------------|-------|
|
||||
| Python | `uv.lock` | Preferred. `requirements.txt` accepted as fallback |
|
||||
| Node / npm | `package-lock.json` | Preferred. `yarn.lock` accepted |
|
||||
| Rust | `Cargo.lock` | Auto-detected |
|
||||
| Terraform | `.terraform.lock.hcl` | Provider pins; ecosystem stored as `other` until ENUM extended |
|
||||
| Go | `go.sum` | *Not yet parsed — planned* |
|
||||
| Java / JVM | `gradle.lockfile` / `pom.xml` | *Not yet parsed — planned* |
|
||||
| Ansible | `requirements.yml` | *Not yet parsed — planned* |
|
||||
`ingest_sbom.py` runs all four mechanisms in a single scan when given `--repo-path`.
|
||||
No flags needed — comprehensive detection is the default.
|
||||
|
||||
**Principle:** commit lockfiles to the repo. Lockfiles are the SBOM source
|
||||
of truth; do not generate them at ingest time.
|
||||
| Mechanism | File(s) | Ecosystem | Detection scope |
|
||||
|-----------|---------|-----------|-----------------|
|
||||
| **Package manager lockfiles** | `uv.lock`, `requirements.txt`, `package-lock.json`, `yarn.lock`, `Cargo.lock` | `python`, `node`, `rust` | Anywhere in tree |
|
||||
| **Terraform provider lock** | `.terraform.lock.hcl` | `terraform` | Anywhere in tree |
|
||||
| **Ansible Galaxy manifest** | `ansible/requirements.yml` or `.yaml` | `ansible` | Under directories named `ansible/` |
|
||||
| **Tool manifest** | `sbom-tools.yaml` (repo root) | `tool`, `ansible`, `terraform`, etc. | Repo root only |
|
||||
|
||||
**Go / Java parsers** (`go.sum`, `pom.xml`, `gradle.lockfile`) are *not yet
|
||||
implemented* — planned for a future workplan.
|
||||
|
||||
**Principle:** commit lockfiles and `sbom-tools.yaml` to the repo. These are
|
||||
the SBOM source of truth; do not generate them at ingest time.
|
||||
|
||||
---
|
||||
|
||||
@@ -64,27 +67,35 @@ curl -s http://127.0.0.1:8000/repos/ | python3 -m json.tool
|
||||
|
||||
## 3. SBOM Ingestion
|
||||
|
||||
### 3.1 Standard ingest (single lockfile at repo root)
|
||||
### 3.1 Standard ingest (all mechanisms, recommended)
|
||||
|
||||
```bash
|
||||
cd ~/the-custodian/state-hub
|
||||
make ingest-sbom REPO=<slug> REPO_PATH=/path/to/repo
|
||||
```
|
||||
|
||||
The script auto-detects the first recognised lockfile at `REPO_PATH`.
|
||||
`ingest_sbom.py` automatically runs all four mechanisms in one scan — lockfiles,
|
||||
Terraform provider locks, Ansible Galaxy manifests, and `sbom-tools.yaml`. All
|
||||
results are merged into a single snapshot. Non-dep directories (`.venv`,
|
||||
`node_modules`, `.git`, `dist`, etc.) are automatically skipped.
|
||||
|
||||
### 3.2 Multi-ecosystem repos (recommended for complex repos)
|
||||
### 3.2 Repos with system-level tools: capture first, then ingest
|
||||
|
||||
Use `SCAN=1` to walk the repo tree and combine **all** lockfiles into a single
|
||||
snapshot. Non-dep directories (`.venv`, `node_modules`, `.git`, `dist`, etc.)
|
||||
are automatically skipped.
|
||||
For repos that use system-level tools not tracked by any lockfile (Terraform
|
||||
binary, Helm, kubectl, k3s, goss, etc.):
|
||||
|
||||
```bash
|
||||
make ingest-sbom REPO=the-custodian SCAN=1 REPO_PATH=/home/worsch/the-custodian
|
||||
```
|
||||
# Step 1: generate sbom-tools.yaml via agent
|
||||
make capture-tools REPO=<slug> REPO_PATH=/path/to/repo
|
||||
|
||||
This is the correct approach for repos that contain both a backend and a
|
||||
frontend (e.g., a Python API + Node/Observable dashboard).
|
||||
# Step 2: review sbom-tools.yaml — correct any confidence: low entries
|
||||
|
||||
# Step 3: commit sbom-tools.yaml
|
||||
git -C /path/to/repo add sbom-tools.yaml && git -C /path/to/repo commit -m "chore(sbom): add tool manifest"
|
||||
|
||||
# Step 4: ingest everything
|
||||
make ingest-sbom REPO=<slug> REPO_PATH=/path/to/repo
|
||||
```
|
||||
|
||||
### 3.3 Explicit lockfile path
|
||||
|
||||
@@ -96,8 +107,7 @@ Multiple lockfiles can be passed by calling the script directly with repeated
|
||||
`--lockfile` flags:
|
||||
|
||||
```bash
|
||||
cd ~/the-custodian/state-hub
|
||||
.venv/bin/python scripts/ingest_sbom.py \
|
||||
uv run python scripts/ingest_sbom.py \
|
||||
--repo <slug> \
|
||||
--lockfile /path/to/uv.lock \
|
||||
--lockfile /path/to/package-lock.json
|
||||
@@ -106,11 +116,40 @@ cd ~/the-custodian/state-hub
|
||||
### 3.4 Dry run (inspect without submitting)
|
||||
|
||||
```bash
|
||||
make ingest-sbom REPO=<slug> SCAN=1 REPO_PATH=/path/to/repo
|
||||
# append: add --dry-run to the command, or run the script directly:
|
||||
.venv/bin/python scripts/ingest_sbom.py --repo <slug> --scan --repo-path /path/to/repo --dry-run
|
||||
make ingest-sbom REPO=<slug> REPO_PATH=/path/to/repo DRY_RUN=1
|
||||
```
|
||||
|
||||
### 3.5 sbom-tools.yaml: the tool manifest
|
||||
|
||||
Create `sbom-tools.yaml` at the repo root for any system-level tools not
|
||||
covered by lockfiles. Schema:
|
||||
|
||||
```yaml
|
||||
# sbom-tools.yaml
|
||||
tools:
|
||||
- name: terraform
|
||||
version: "1.9.5" # confidence: medium
|
||||
ecosystem: terraform
|
||||
license_spdx: BSL-1.1
|
||||
is_direct: true
|
||||
is_dev: false
|
||||
- name: helm
|
||||
version: null # confidence: low (no version pin found)
|
||||
ecosystem: tool
|
||||
license_spdx: Apache-2.0
|
||||
is_direct: true
|
||||
is_dev: false
|
||||
```
|
||||
|
||||
**Valid ecosystem values:** `python`, `node`, `rust`, `go`, `java`, `terraform`,
|
||||
`ansible`, `tool`, `other`
|
||||
|
||||
Annotate each version with a `# confidence: high/medium/low` comment.
|
||||
Entries with `confidence: low` need human verification before committing.
|
||||
|
||||
The `make capture-tools` command generates this file automatically using the
|
||||
SBOM capture agent prompt (`state-hub/prompts/sbom-capture-agent.md`).
|
||||
|
||||
---
|
||||
|
||||
## 4. Snapshot Semantics
|
||||
@@ -248,10 +287,14 @@ The SBOM dashboard aggregates across all repos within a domain in the
|
||||
|
||||
## 10. Planned Enhancements
|
||||
|
||||
- **Go / Java parsers** — add to `ingest_sbom.py`
|
||||
- **Go / Java parsers** — add `go.sum`, `pom.xml`, `gradle.lockfile` support to `ingest_sbom.py`
|
||||
- **Versioned snapshots** — retain history per repo for trend analysis
|
||||
- **Licence override file** — allow repos to document known-acceptable
|
||||
copyleft exceptions (`.sbom-overrides.yaml`)
|
||||
- **CI integration** — GitHub Actions step to run ingest on lockfile change
|
||||
- **Direct-dep detection for uv.lock** — parse `pyproject.toml` `[project.dependencies]`
|
||||
to mark direct deps accurately
|
||||
- **Galaxy API licence lookup** — resolve `license_spdx` for Ansible collections
|
||||
via the Galaxy API at ingest time
|
||||
- **Tool version pinning guidance** — tooling to detect `confidence: low` entries
|
||||
across all registered repos and flag them for resolution
|
||||
|
||||
@@ -133,16 +133,26 @@ list-repos:
|
||||
@test -n "$(DOMAIN)" || (echo "ERROR: DOMAIN is required."; exit 1)
|
||||
curl -sf "http://127.0.0.1:8000/repos/?domain=$(DOMAIN)" | python3 -m json.tool
|
||||
|
||||
## Ingest SBOM data for a repo.
|
||||
## Ingest SBOM data for a repo (all mechanisms: lockfiles + ansible + sbom-tools.yaml).
|
||||
## Auto-detect all sources: make ingest-sbom REPO=the-custodian REPO_PATH=/home/worsch/the-custodian
|
||||
## Single lockfile (explicit): make ingest-sbom REPO=the-custodian LOCKFILE=/path/to/uv.lock
|
||||
## Scan all lockfiles in tree: make ingest-sbom REPO=the-custodian SCAN=1 REPO_PATH=/home/worsch/the-custodian
|
||||
## Auto-detect at repo root: make ingest-sbom REPO=the-custodian REPO_PATH=/home/worsch/the-custodian
|
||||
## Dry-run (no submit): make ingest-sbom REPO=the-custodian REPO_PATH=... DRY_RUN=1
|
||||
## Tip: run capture-tools first for repos with system-level tool dependencies.
|
||||
ingest-sbom:
|
||||
@test -n "$(REPO)" || (echo "ERROR: REPO is required."; exit 1)
|
||||
uv run python scripts/ingest_sbom.py --repo "$(REPO)" \
|
||||
$(if $(LOCKFILE),--lockfile "$(LOCKFILE)") \
|
||||
$(if $(SCAN),--scan) \
|
||||
$(if $(REPO_PATH),--repo-path "$(REPO_PATH)")
|
||||
$(if $(REPO_PATH),--repo-path "$(REPO_PATH)") \
|
||||
$(if $(DRY_RUN),--dry-run)
|
||||
|
||||
## Run SBOM capture agent for a repo — generates/updates sbom-tools.yaml.
|
||||
## Usage: make capture-tools REPO=railiance-infra [REPO_PATH=/home/worsch/railiance-infra]
|
||||
## Add DRY_RUN=1 to preview without writing.
|
||||
capture-tools:
|
||||
@test -n "$(REPO)" || (echo "ERROR: REPO is required."; exit 1)
|
||||
uv run python scripts/capture_sbom_tools.py --repo "$(REPO)" \
|
||||
$(if $(REPO_PATH),--repo-path "$(REPO_PATH)") \
|
||||
$(if $(DRY_RUN),--dry-run)
|
||||
|
||||
## Check a repo for ADR-001 compliance: make validate-adr REPO=/path/to/repo [DOMAIN=custodian]
|
||||
validate-adr:
|
||||
|
||||
@@ -15,6 +15,9 @@ class Ecosystem(str, enum.Enum):
|
||||
rust = "rust"
|
||||
go = "go"
|
||||
java = "java"
|
||||
terraform = "terraform"
|
||||
ansible = "ansible"
|
||||
tool = "tool"
|
||||
other = "other"
|
||||
|
||||
|
||||
|
||||
@@ -0,0 +1,30 @@
|
||||
"""SBOM ecosystem enum expansion: add terraform, ansible, tool
|
||||
|
||||
Revision ID: d6e7f8a9b0c1
|
||||
Revises: c5d6e7f8a9b0
|
||||
Create Date: 2026-03-12 00:00:00.000000
|
||||
"""
|
||||
from typing import Sequence, Union
|
||||
|
||||
from alembic import op
|
||||
|
||||
revision: str = "d6e7f8a9b0c1"
|
||||
down_revision: Union[str, None] = "c5d6e7f8a9b0"
|
||||
branch_labels: Union[str, Sequence[str], None] = None
|
||||
depends_on: Union[str, Sequence[str], None] = None
|
||||
|
||||
|
||||
def upgrade() -> None:
|
||||
# PostgreSQL requires each ADD VALUE in its own statement and cannot be
|
||||
# run inside a transaction that also modifies data. ADD VALUE is
|
||||
# transactional in PG 12+ (no COMMIT needed).
|
||||
op.execute("ALTER TYPE ecosystem ADD VALUE IF NOT EXISTS 'terraform'")
|
||||
op.execute("ALTER TYPE ecosystem ADD VALUE IF NOT EXISTS 'ansible'")
|
||||
op.execute("ALTER TYPE ecosystem ADD VALUE IF NOT EXISTS 'tool'")
|
||||
|
||||
|
||||
def downgrade() -> None:
|
||||
# PostgreSQL does not support removing enum values without recreating the
|
||||
# type. Document the limitation and do nothing — reverting this migration
|
||||
# requires a full type recreation if needed.
|
||||
pass
|
||||
90
state-hub/prompts/sbom-capture-agent.md
Normal file
90
state-hub/prompts/sbom-capture-agent.md
Normal file
@@ -0,0 +1,90 @@
|
||||
# SBOM Capture Agent Prompt
|
||||
|
||||
**Task:** Generate or update `sbom-tools.yaml` for the repository at `{repo_path}` (slug: `{repo_slug}`).
|
||||
|
||||
This file captures system-level tool dependencies that are not tracked by any package manager lockfile — tools that are installed via provisioning, Homebrew, system packages, or assumed present in the environment.
|
||||
|
||||
---
|
||||
|
||||
## Instructions
|
||||
|
||||
1. **Read the following files** in `{repo_path}` (read each that exists; skip gracefully if absent):
|
||||
- `CLAUDE.md` — look for stack declarations, tool prerequisites, dev commands
|
||||
- `README.md` / `QUICKSTART.md` — prerequisites sections, tool version requirements
|
||||
- `Makefile` — tool invocations, version variables (e.g. `ANSIBLE_VERSION := 12.3`)
|
||||
- `pyproject.toml` — Python tool dependencies (already covered by uv.lock; note but don't duplicate)
|
||||
- `.tool-versions` — asdf version pins
|
||||
- `.terraform-version` — tfenv pin
|
||||
- `.ansible-version` — if present
|
||||
- `Dockerfile` / `docker-compose.yml` — base image versions, tool installs
|
||||
- `.github/workflows/*.yml` / `.gitlab-ci.yml` — CI tool install steps, version pins
|
||||
- `ansible/requirements.yml` — **already captured by lockfile parser; do NOT include Galaxy collections here**
|
||||
- Any `scripts/setup*.sh`, `scripts/bootstrap*.sh`, or `tools/` directory
|
||||
|
||||
2. **Identify system-level tools only** — tools that:
|
||||
- Are invoked as CLI commands (e.g. `ansible-playbook`, `terraform`, `helm`, `kubectl`, `k3s`, `goss`, `age`, `sops`)
|
||||
- Are NOT installed via `uv`/`pip`/`npm`/`cargo` into a project virtualenv (those are in lockfiles)
|
||||
- Note: `ansible` itself as a CLI tool is a system dep even if `ansible-core` appears in `uv.lock`
|
||||
|
||||
3. **For each tool, determine**:
|
||||
- `name`: canonical tool name (e.g. `ansible`, `terraform`, `helm`, `kubectl`, `k3s`, `goss`, `age`, `sops`, `cloud-init`)
|
||||
- `version`: the pinned or documented version. Use `unknown` only if no evidence found anywhere.
|
||||
- `ecosystem`: one of `python`, `node`, `rust`, `go`, `java`, `terraform`, `ansible`, `tool`, `other`
|
||||
- Use `ansible` for Ansible itself; `terraform` for Terraform itself; `tool` for generic CLI tools
|
||||
- `license_spdx`: the SPDX identifier. Common known licences (use these exact strings):
|
||||
- ansible / ansible-core: `GPL-3.0-only`
|
||||
- terraform ≤ 1.5.5: `MPL-2.0`; terraform ≥ 1.5.6: `BSL-1.1`
|
||||
- helm: `Apache-2.0`
|
||||
- kubectl: `Apache-2.0`
|
||||
- k3s: `Apache-2.0`
|
||||
- goss: `Apache-2.0`
|
||||
- age: `BSD-3-Clause`
|
||||
- sops: `MPL-2.0`
|
||||
- cloud-init: `Apache-2.0` (or `GPL-3.0-only` for older versions — check)
|
||||
- docker: `Apache-2.0`
|
||||
- If unknown, use `null`
|
||||
- `is_direct`: `true` if this repo directly declares/uses it; `false` if it's a transitive dependency of another tool
|
||||
- `is_dev`: `true` only if the tool is only used for development/testing, not production operation
|
||||
|
||||
4. **Confidence annotation**: Add a `# confidence: high/medium/low` comment after each entry:
|
||||
- `high`: version found explicitly pinned in a file
|
||||
- `medium`: version inferred from context (e.g. "Ansible 12" in README)
|
||||
- `low`: version not found; using `unknown` or a reasonable guess
|
||||
|
||||
5. **Do NOT include**:
|
||||
- Python packages already covered by `uv.lock` or `requirements.txt`
|
||||
- Ansible Galaxy collections (covered by `ansible/requirements.yml`)
|
||||
- Terraform providers (covered by `.terraform.lock.hcl`)
|
||||
- Node packages, Rust crates, etc. (covered by their lockfiles)
|
||||
- Operating system packages unless the repo explicitly declares them
|
||||
|
||||
6. **Output format**: Emit ONLY the YAML block below — no prose, no markdown fences, no explanation. The output must be valid YAML that can be written directly to `sbom-tools.yaml`.
|
||||
|
||||
---
|
||||
|
||||
## Output format
|
||||
|
||||
```yaml
|
||||
# sbom-tools.yaml — system-level tool dependencies for {repo_slug}
|
||||
# Generated by sbom-capture-agent on {date}
|
||||
# Review each entry before committing. Entries with confidence: low need human verification.
|
||||
tools:
|
||||
- name: example-tool
|
||||
version: "1.2.3" # confidence: high
|
||||
ecosystem: tool
|
||||
license_spdx: Apache-2.0
|
||||
is_direct: true
|
||||
is_dev: false
|
||||
```
|
||||
|
||||
If no system-level tools are found, output:
|
||||
```yaml
|
||||
# sbom-tools.yaml — system-level tool dependencies for {repo_slug}
|
||||
# Generated by sbom-capture-agent on {date}
|
||||
# No system-level tools identified — all dependencies are covered by lockfiles.
|
||||
tools: []
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
Now read `{repo_path}` and produce the `sbom-tools.yaml` content.
|
||||
187
state-hub/scripts/capture_sbom_tools.py
Normal file
187
state-hub/scripts/capture_sbom_tools.py
Normal file
@@ -0,0 +1,187 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Invoke the SBOM capture agent to generate/update sbom-tools.yaml for a repo.
|
||||
|
||||
Usage:
|
||||
python capture_sbom_tools.py --repo <slug> [--repo-path <path>] [--dry-run]
|
||||
|
||||
The script:
|
||||
1. Resolves repo path from the state-hub API (if --repo-path is not given)
|
||||
2. Loads the agent prompt from prompts/sbom-capture-agent.md
|
||||
3. Substitutes {repo_slug}, {repo_path}, {date} placeholders
|
||||
4. Invokes `claude -p "<prompt>"` non-interactively
|
||||
5. Extracts the YAML block from the response
|
||||
6. Writes (or shows diff of) sbom-tools.yaml in the repo root
|
||||
|
||||
Requirements:
|
||||
- `claude` CLI must be on PATH (Claude Code)
|
||||
- PyYAML must be available in the active venv
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import datetime
|
||||
import difflib
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import subprocess
|
||||
import sys
|
||||
import urllib.error
|
||||
import urllib.request
|
||||
from pathlib import Path
|
||||
|
||||
API_BASE = os.environ.get("API_BASE", "http://127.0.0.1:8000").rstrip("/")
|
||||
SCRIPT_DIR = Path(__file__).parent
|
||||
PROMPT_FILE = SCRIPT_DIR.parent / "prompts" / "sbom-capture-agent.md"
|
||||
|
||||
|
||||
def resolve_repo_path(repo_slug: str) -> Path | None:
|
||||
"""Look up the registered path for a repo slug via the state-hub API."""
|
||||
url = f"{API_BASE}/repos/{repo_slug}/"
|
||||
try:
|
||||
with urllib.request.urlopen(url, timeout=10) as resp:
|
||||
data = json.loads(resp.read())
|
||||
path_str = data.get("local_path")
|
||||
if path_str:
|
||||
return Path(path_str)
|
||||
except (urllib.error.URLError, KeyError):
|
||||
pass
|
||||
return None
|
||||
|
||||
|
||||
def load_prompt(repo_slug: str, repo_path: Path) -> str:
|
||||
if not PROMPT_FILE.exists():
|
||||
print(f"Error: prompt file not found at {PROMPT_FILE}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
template = PROMPT_FILE.read_text()
|
||||
today = datetime.date.today().isoformat()
|
||||
return (
|
||||
template
|
||||
.replace("{repo_slug}", repo_slug)
|
||||
.replace("{repo_path}", str(repo_path))
|
||||
.replace("{date}", today)
|
||||
)
|
||||
|
||||
|
||||
def invoke_agent(prompt: str) -> str:
|
||||
"""Run `claude -p <prompt>` and return stdout."""
|
||||
try:
|
||||
result = subprocess.run(
|
||||
["claude", "-p", prompt],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
timeout=120,
|
||||
)
|
||||
except FileNotFoundError:
|
||||
print("Error: `claude` CLI not found on PATH. Install Claude Code.", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
except subprocess.TimeoutExpired:
|
||||
print("Error: claude invocation timed out after 120s.", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
if result.returncode != 0:
|
||||
print(f"Error: claude exited with code {result.returncode}", file=sys.stderr)
|
||||
if result.stderr:
|
||||
print(result.stderr, file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
return result.stdout
|
||||
|
||||
|
||||
def extract_yaml(response: str) -> str:
|
||||
"""Extract YAML content from the agent response.
|
||||
|
||||
Accepts:
|
||||
- Raw YAML (starts with # or 'tools:')
|
||||
- YAML wrapped in ```yaml ... ``` fences
|
||||
"""
|
||||
# Try fenced block first
|
||||
m = re.search(r"```(?:yaml)?\s*\n(.*?)```", response, re.DOTALL)
|
||||
if m:
|
||||
return m.group(1).strip()
|
||||
|
||||
# Otherwise treat entire response as YAML
|
||||
stripped = response.strip()
|
||||
if stripped.startswith("#") or stripped.startswith("tools:"):
|
||||
return stripped
|
||||
|
||||
print("Warning: could not extract YAML from agent response.", file=sys.stderr)
|
||||
print("Raw response:", file=sys.stderr)
|
||||
print(response[:500], file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
def show_diff(old: str | None, new: str, target: Path) -> None:
|
||||
if old is None:
|
||||
print(f"[new file] {target}")
|
||||
for line in new.splitlines():
|
||||
print(f" + {line}")
|
||||
else:
|
||||
diff = list(difflib.unified_diff(
|
||||
old.splitlines(keepends=True),
|
||||
new.splitlines(keepends=True),
|
||||
fromfile=f"a/{target.name}",
|
||||
tofile=f"b/{target.name}",
|
||||
))
|
||||
if diff:
|
||||
print("".join(diff))
|
||||
else:
|
||||
print(f"[no changes] {target}")
|
||||
|
||||
|
||||
def main() -> None:
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Generate/update sbom-tools.yaml for a repo using the SBOM capture agent."
|
||||
)
|
||||
parser.add_argument("--repo", required=True, help="Repo slug (e.g. 'railiance-infra')")
|
||||
parser.add_argument("--repo-path", help="Path to repo root (auto-resolved from state-hub if omitted)")
|
||||
parser.add_argument("--dry-run", action="store_true",
|
||||
help="Show prompt and diff without writing sbom-tools.yaml")
|
||||
parser.add_argument("--print-prompt", action="store_true",
|
||||
help="Print the rendered prompt and exit (useful for inspection)")
|
||||
args = parser.parse_args()
|
||||
|
||||
# Resolve repo path
|
||||
if args.repo_path:
|
||||
repo_path = Path(args.repo_path).resolve()
|
||||
else:
|
||||
repo_path = resolve_repo_path(args.repo)
|
||||
if repo_path is None:
|
||||
# Fall back to ~/repo_slug convention
|
||||
repo_path = Path.home() / args.repo
|
||||
print(f"Could not resolve path from API; trying {repo_path}", file=sys.stderr)
|
||||
|
||||
if not repo_path.exists():
|
||||
print(f"Error: repo path does not exist: {repo_path}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
target = repo_path / "sbom-tools.yaml"
|
||||
existing_content = target.read_text() if target.exists() else None
|
||||
|
||||
prompt = load_prompt(args.repo, repo_path)
|
||||
|
||||
if args.print_prompt:
|
||||
print(prompt)
|
||||
return
|
||||
|
||||
print(f"Running SBOM capture agent for {args.repo} ({repo_path})…")
|
||||
response = invoke_agent(prompt)
|
||||
yaml_content = extract_yaml(response)
|
||||
|
||||
# Ensure trailing newline
|
||||
if not yaml_content.endswith("\n"):
|
||||
yaml_content += "\n"
|
||||
|
||||
show_diff(existing_content, yaml_content, target)
|
||||
|
||||
if args.dry_run:
|
||||
print("\n[dry-run] sbom-tools.yaml not written.")
|
||||
return
|
||||
|
||||
target.write_text(yaml_content)
|
||||
print(f"\nWritten: {target}")
|
||||
print("Review the file, correct any 'confidence: low' entries, then commit.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
@@ -1,15 +1,19 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Ingest a repo's lockfile into the State Hub SBOM store.
|
||||
"""Ingest a repo's lockfiles and tool manifests into the State Hub SBOM store.
|
||||
|
||||
Usage:
|
||||
python ingest_sbom.py --repo <slug> [--lockfile <path>] [--api-base <url>]
|
||||
python ingest_sbom.py --repo <slug> [--repo-path <path>] [--dry-run]
|
||||
|
||||
Auto-detects lockfile type:
|
||||
uv.lock → Python ecosystem
|
||||
requirements.txt → Python ecosystem (basic)
|
||||
package-lock.json → Node ecosystem
|
||||
yarn.lock → Node ecosystem
|
||||
Cargo.lock → Rust ecosystem
|
||||
Auto-detects all of the following in one scan:
|
||||
uv.lock → python
|
||||
requirements.txt → python
|
||||
package-lock.json → node
|
||||
yarn.lock → node
|
||||
Cargo.lock → rust
|
||||
.terraform.lock.hcl → terraform (anywhere in tree)
|
||||
ansible/requirements.yml → ansible (anywhere under ansible/ dirs)
|
||||
ansible/requirements.yaml → ansible
|
||||
sbom-tools.yaml → tool (repo root; agent-generated)
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
@@ -22,11 +26,17 @@ import urllib.error
|
||||
import urllib.request
|
||||
from pathlib import Path
|
||||
|
||||
try:
|
||||
import yaml # optional; only needed for sbom-tools.yaml and ansible parsers
|
||||
_YAML_AVAILABLE = True
|
||||
except ImportError:
|
||||
_YAML_AVAILABLE = False
|
||||
|
||||
API_BASE = os.environ.get("API_BASE", "http://127.0.0.1:8000").rstrip("/")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Lockfile parsers
|
||||
# Lockfile parsers — each returns list[dict]
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _parse_uv_lock(path: Path) -> list[dict]:
|
||||
@@ -55,7 +65,7 @@ def _parse_uv_lock(path: Path) -> list[dict]:
|
||||
"package_version": e.get("package_version"),
|
||||
"ecosystem": "python",
|
||||
"license_spdx": None,
|
||||
"is_direct": False, # uv.lock doesn't distinguish; treat all as transitive
|
||||
"is_direct": False,
|
||||
"is_dev": False,
|
||||
}
|
||||
for e in entries
|
||||
@@ -70,7 +80,6 @@ def _parse_requirements_txt(path: Path) -> list[dict]:
|
||||
line = line.strip()
|
||||
if not line or line.startswith("#") or line.startswith("-"):
|
||||
continue
|
||||
# Handle: pkg==1.2.3, pkg>=1.2, pkg
|
||||
m = re.match(r"^([A-Za-z0-9_.\-]+)(?:[>=<!~^]+([^\s;]+))?", line)
|
||||
if m:
|
||||
entries.append({
|
||||
@@ -95,7 +104,7 @@ def _parse_package_lock_json(path: Path) -> list[dict]:
|
||||
packages = data.get("packages", {})
|
||||
entries = []
|
||||
for pkg_path, info in packages.items():
|
||||
if not pkg_path: # root package
|
||||
if not pkg_path:
|
||||
continue
|
||||
name = info.get("name") or pkg_path.split("node_modules/")[-1]
|
||||
entries.append({
|
||||
@@ -120,8 +129,6 @@ def _parse_yarn_lock(path: Path) -> list[dict]:
|
||||
if not stripped or stripped.startswith("#"):
|
||||
continue
|
||||
if not line.startswith(" ") and stripped.endswith(":"):
|
||||
# New package block header: "name@version::" or "\"name@version\":"
|
||||
# May list multiple versions: "name@^1.0, name@~1.0:"
|
||||
current_names = []
|
||||
current_version = None
|
||||
for part in stripped.rstrip(":").split(","):
|
||||
@@ -188,12 +195,10 @@ def _parse_terraform_lock_hcl(path: Path) -> list[dict]:
|
||||
|
||||
for line in path.read_text().splitlines():
|
||||
stripped = line.strip()
|
||||
# e.g.: provider "registry.terraform.io/hetznercloud/hcloud" {
|
||||
m = re.match(r'^provider\s+"([^"]+)"\s*\{', stripped)
|
||||
if m:
|
||||
# Use full provider address as package_name, short name as display
|
||||
full = m.group(1)
|
||||
current_name = full # e.g. "registry.terraform.io/hetznercloud/hcloud"
|
||||
current_name = full
|
||||
current_version = None
|
||||
elif current_name is not None:
|
||||
vm = re.match(r'version\s*=\s*"([^"]+)"', stripped)
|
||||
@@ -203,7 +208,7 @@ def _parse_terraform_lock_hcl(path: Path) -> list[dict]:
|
||||
entries.append({
|
||||
"package_name": current_name,
|
||||
"package_version": current_version,
|
||||
"ecosystem": "other", # "terraform" not yet in ENUM; tracked as other
|
||||
"ecosystem": "terraform",
|
||||
"license_spdx": None,
|
||||
"is_direct": True,
|
||||
"is_dev": False,
|
||||
@@ -214,7 +219,114 @@ def _parse_terraform_lock_hcl(path: Path) -> list[dict]:
|
||||
return entries
|
||||
|
||||
|
||||
_LOCKFILE_PARSERS = {
|
||||
def _parse_ansible_requirements(path: Path) -> list[dict]:
|
||||
"""Parse ansible/requirements.yml — collections and roles from Ansible Galaxy."""
|
||||
if not _YAML_AVAILABLE:
|
||||
print(f"Warning: PyYAML not available; skipping {path}", file=sys.stderr)
|
||||
return []
|
||||
|
||||
try:
|
||||
data = yaml.safe_load(path.read_text())
|
||||
except yaml.YAMLError as e:
|
||||
print(f"Warning: cannot parse {path}: {e}", file=sys.stderr)
|
||||
return []
|
||||
|
||||
if not isinstance(data, dict):
|
||||
return []
|
||||
|
||||
entries = []
|
||||
|
||||
for item in data.get("collections", []) or []:
|
||||
if isinstance(item, str):
|
||||
name, version = item, None
|
||||
elif isinstance(item, dict):
|
||||
name = item.get("name", "")
|
||||
version = str(item.get("version", "")) if item.get("version") else None
|
||||
else:
|
||||
continue
|
||||
if name:
|
||||
entries.append({
|
||||
"package_name": name,
|
||||
"package_version": version,
|
||||
"ecosystem": "ansible",
|
||||
"license_spdx": None,
|
||||
"is_direct": True,
|
||||
"is_dev": False,
|
||||
})
|
||||
|
||||
for item in data.get("roles", []) or []:
|
||||
if isinstance(item, str):
|
||||
name, version = item, None
|
||||
elif isinstance(item, dict):
|
||||
name = item.get("name", item.get("src", ""))
|
||||
version = str(item.get("version", "")) if item.get("version") else None
|
||||
else:
|
||||
continue
|
||||
if name:
|
||||
entries.append({
|
||||
"package_name": name,
|
||||
"package_version": version,
|
||||
"ecosystem": "ansible",
|
||||
"license_spdx": None,
|
||||
"is_direct": True,
|
||||
"is_dev": False,
|
||||
})
|
||||
|
||||
return entries
|
||||
|
||||
|
||||
def _parse_sbom_tools_yaml(path: Path) -> list[dict]:
|
||||
"""Parse sbom-tools.yaml — agent-generated tool manifest at repo root."""
|
||||
if not _YAML_AVAILABLE:
|
||||
print(f"Warning: PyYAML not available; skipping {path}", file=sys.stderr)
|
||||
return []
|
||||
|
||||
try:
|
||||
data = yaml.safe_load(path.read_text())
|
||||
except yaml.YAMLError as e:
|
||||
print(f"Warning: cannot parse {path}: {e}", file=sys.stderr)
|
||||
return []
|
||||
|
||||
if not isinstance(data, dict):
|
||||
return []
|
||||
|
||||
entries = []
|
||||
valid_ecosystems = {
|
||||
"python", "node", "rust", "go", "java",
|
||||
"terraform", "ansible", "tool", "other",
|
||||
}
|
||||
|
||||
for item in data.get("tools", []) or []:
|
||||
if not isinstance(item, dict):
|
||||
continue
|
||||
name = item.get("name", "")
|
||||
version = str(item.get("version", "")) if item.get("version") else None
|
||||
if version == "unknown":
|
||||
print(f" Warning: tool '{name}' has version=unknown — flagged for review", file=sys.stderr)
|
||||
version = None
|
||||
ecosystem = item.get("ecosystem", "tool")
|
||||
if ecosystem not in valid_ecosystems:
|
||||
print(f" Warning: unknown ecosystem '{ecosystem}' for '{name}'; using 'tool'", file=sys.stderr)
|
||||
ecosystem = "tool"
|
||||
license_spdx = item.get("license_spdx") or None
|
||||
entries.append({
|
||||
"package_name": name,
|
||||
"package_version": version,
|
||||
"ecosystem": ecosystem,
|
||||
"license_spdx": license_spdx,
|
||||
"is_direct": bool(item.get("is_direct", True)),
|
||||
"is_dev": bool(item.get("is_dev", False)),
|
||||
})
|
||||
|
||||
return entries
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Detection helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
# Filename → parser for standard lockfiles (detected by filename anywhere in tree)
|
||||
_LOCKFILE_PARSERS: dict[str, object] = {
|
||||
"uv.lock": _parse_uv_lock,
|
||||
"requirements.txt": _parse_requirements_txt,
|
||||
"package-lock.json": _parse_package_lock_json,
|
||||
@@ -234,6 +346,47 @@ _SKIP_DIRS = {
|
||||
}
|
||||
|
||||
|
||||
def detect_all(repo_path: Path) -> list[tuple[Path, str, object]]:
|
||||
"""Scan repo_path and return all discovered dependency sources.
|
||||
|
||||
Returns list of (path, label, parser_fn) tuples covering:
|
||||
- Standard lockfiles (anywhere in tree)
|
||||
- Ansible requirements files (in ansible/ subdirs)
|
||||
- sbom-tools.yaml at repo root
|
||||
"""
|
||||
found: list[tuple[Path, str, object]] = []
|
||||
seen_paths: set[Path] = set()
|
||||
|
||||
# Walk tree for all source types
|
||||
for dirpath, dirnames, filenames in os.walk(repo_path):
|
||||
dirnames[:] = sorted(d for d in dirnames if d not in _SKIP_DIRS)
|
||||
dirpath_p = Path(dirpath)
|
||||
|
||||
# Standard lockfiles
|
||||
for fname, parser in _LOCKFILE_PARSERS.items():
|
||||
if fname in filenames:
|
||||
p = dirpath_p / fname
|
||||
if p not in seen_paths:
|
||||
found.append((p, fname, parser))
|
||||
seen_paths.add(p)
|
||||
|
||||
# Ansible requirements files — only under directories named "ansible"
|
||||
if dirpath_p.name == "ansible":
|
||||
for fname in ("requirements.yml", "requirements.yaml"):
|
||||
if fname in filenames:
|
||||
p = dirpath_p / fname
|
||||
if p not in seen_paths:
|
||||
found.append((p, f"ansible/{fname}", _parse_ansible_requirements))
|
||||
seen_paths.add(p)
|
||||
|
||||
# sbom-tools.yaml at repo root only
|
||||
tools_manifest = repo_path / "sbom-tools.yaml"
|
||||
if tools_manifest.exists() and tools_manifest not in seen_paths:
|
||||
found.append((tools_manifest, "sbom-tools.yaml", _parse_sbom_tools_yaml))
|
||||
|
||||
return found
|
||||
|
||||
|
||||
def detect_lockfile(repo_path: Path) -> tuple[Path, str] | None:
|
||||
"""Return (lockfile_path, filename) for the first recognised lockfile at repo root."""
|
||||
for name in _LOCKFILE_PARSERS:
|
||||
@@ -244,7 +397,10 @@ def detect_lockfile(repo_path: Path) -> tuple[Path, str] | None:
|
||||
|
||||
|
||||
def detect_lockfiles_recursive(repo_path: Path) -> list[Path]:
|
||||
"""Walk repo_path and return all recognised lockfiles, skipping non-dep dirs."""
|
||||
"""Walk repo_path and return all recognised lockfiles, skipping non-dep dirs.
|
||||
|
||||
Kept for backwards compatibility; prefer detect_all() for new code.
|
||||
"""
|
||||
found: list[Path] = []
|
||||
for dirpath, dirnames, filenames in os.walk(repo_path):
|
||||
dirnames[:] = sorted(d for d in dirnames if d not in _SKIP_DIRS)
|
||||
@@ -292,52 +448,47 @@ def post_ingest(api_base: str, repo_slug: str, entries: list[dict]) -> dict:
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def main() -> None:
|
||||
parser = argparse.ArgumentParser(description="Ingest a repo's lockfiles into the State Hub SBOM store.")
|
||||
parser = argparse.ArgumentParser(
|
||||
description="Ingest a repo's lockfiles and tool manifests into the State Hub SBOM store."
|
||||
)
|
||||
parser.add_argument("--repo", required=True, help="Managed-repo slug (e.g. 'the-custodian')")
|
||||
parser.add_argument("--lockfile", action="append", dest="lockfiles",
|
||||
metavar="PATH", help="Path to a specific lockfile (repeatable)")
|
||||
parser.add_argument("--repo-path", default=".", help="Repo root for auto-detection/scan (default: cwd)")
|
||||
parser.add_argument("--scan", action="store_true",
|
||||
help="Recursively find ALL lockfiles under --repo-path (handles multi-ecosystem repos)")
|
||||
help="Recursively find ALL lockfiles under --repo-path (deprecated; now default behaviour)")
|
||||
parser.add_argument("--api-base", default=API_BASE, help="State Hub API base URL")
|
||||
parser.add_argument("--dry-run", action="store_true", help="Parse only — do not submit")
|
||||
args = parser.parse_args()
|
||||
|
||||
repo_root = Path(args.repo_path).resolve()
|
||||
lockfile_paths: list[Path] = []
|
||||
all_entries: list[dict] = []
|
||||
|
||||
if args.lockfiles:
|
||||
lockfile_paths = [Path(lf).resolve() for lf in args.lockfiles]
|
||||
elif args.scan:
|
||||
lockfile_paths = detect_lockfiles_recursive(repo_root)
|
||||
if not lockfile_paths:
|
||||
print(f"No lockfiles found under '{repo_root}'.", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
print(f"Scan found {len(lockfile_paths)} lockfile(s):")
|
||||
for lf in lockfile_paths:
|
||||
print(f" {lf.relative_to(repo_root) if lf.is_relative_to(repo_root) else lf}")
|
||||
# Explicit paths: parse each, detect parser by filename
|
||||
for lf_str in args.lockfiles:
|
||||
lf = Path(lf_str).resolve()
|
||||
parsed = parse_lockfile(lf)
|
||||
rel = lf.relative_to(repo_root) if lf.is_relative_to(repo_root) else lf
|
||||
print(f" {rel}: {len(parsed)} packages")
|
||||
all_entries.extend(parsed)
|
||||
else:
|
||||
found = detect_lockfile(repo_root)
|
||||
if not found:
|
||||
# Comprehensive auto-detection: all mechanisms in one scan
|
||||
sources = detect_all(repo_root)
|
||||
if not sources:
|
||||
print(
|
||||
f"No recognised lockfile found in '{repo_root}'. "
|
||||
f"Supported: {', '.join(_LOCKFILE_PARSERS)}. "
|
||||
"Use --scan to search subdirectories.",
|
||||
f"No recognised dependency sources found in '{repo_root}'.",
|
||||
file=sys.stderr,
|
||||
)
|
||||
sys.exit(1)
|
||||
lockfile_path, _ = found
|
||||
print(f"Auto-detected: {lockfile_path}")
|
||||
lockfile_paths = [lockfile_path]
|
||||
|
||||
all_entries: list[dict] = []
|
||||
for lf in lockfile_paths:
|
||||
parsed = parse_lockfile(lf)
|
||||
rel = lf.relative_to(repo_root) if lf.is_relative_to(repo_root) else lf
|
||||
print(f" {rel}: {len(parsed)} packages")
|
||||
all_entries.extend(parsed)
|
||||
for src_path, label, parser_fn in sources:
|
||||
parsed = parser_fn(src_path)
|
||||
rel = src_path.relative_to(repo_root) if src_path.is_relative_to(repo_root) else src_path
|
||||
print(f" {label} ({rel}): {len(parsed)} entries")
|
||||
all_entries.extend(parsed)
|
||||
|
||||
print(f"Total: {len(all_entries)} packages across {len(lockfile_paths)} lockfile(s)")
|
||||
print(f"Total: {len(all_entries)} entries")
|
||||
|
||||
if args.dry_run:
|
||||
print(json.dumps(all_entries[:5], indent=2))
|
||||
|
||||
397
state-hub/tests/test_ingest_sbom.py
Normal file
397
state-hub/tests/test_ingest_sbom.py
Normal file
@@ -0,0 +1,397 @@
|
||||
"""Unit tests for ingest_sbom.py parsers and auto-detection."""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import sys
|
||||
import textwrap
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
# Make scripts/ importable
|
||||
sys.path.insert(0, str(Path(__file__).parent.parent / "scripts"))
|
||||
import ingest_sbom as sb
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Terraform parser
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
TERRAFORM_LOCK = textwrap.dedent("""\
|
||||
provider "registry.terraform.io/hashicorp/template" {
|
||||
version = "2.2.0"
|
||||
constraints = ">= 2.0.0"
|
||||
hashes = [
|
||||
"h1:abc123",
|
||||
]
|
||||
}
|
||||
|
||||
provider "registry.terraform.io/hetznercloud/hcloud" {
|
||||
version = "1.52.0"
|
||||
constraints = ">= 1.40.0"
|
||||
}
|
||||
""")
|
||||
|
||||
|
||||
def test_terraform_parser_ecosystem(tmp_path):
|
||||
lock = tmp_path / ".terraform.lock.hcl"
|
||||
lock.write_text(TERRAFORM_LOCK)
|
||||
entries = sb._parse_terraform_lock_hcl(lock)
|
||||
assert len(entries) == 2
|
||||
for e in entries:
|
||||
assert e["ecosystem"] == "terraform", f"expected terraform, got {e['ecosystem']}"
|
||||
names = {e["package_name"] for e in entries}
|
||||
assert "registry.terraform.io/hashicorp/template" in names
|
||||
assert "registry.terraform.io/hetznercloud/hcloud" in names
|
||||
|
||||
|
||||
def test_terraform_parser_versions(tmp_path):
|
||||
lock = tmp_path / ".terraform.lock.hcl"
|
||||
lock.write_text(TERRAFORM_LOCK)
|
||||
entries = sb._parse_terraform_lock_hcl(lock)
|
||||
by_name = {e["package_name"]: e for e in entries}
|
||||
assert by_name["registry.terraform.io/hashicorp/template"]["package_version"] == "2.2.0"
|
||||
assert by_name["registry.terraform.io/hetznercloud/hcloud"]["package_version"] == "1.52.0"
|
||||
|
||||
|
||||
def test_terraform_parser_is_direct(tmp_path):
|
||||
lock = tmp_path / ".terraform.lock.hcl"
|
||||
lock.write_text(TERRAFORM_LOCK)
|
||||
entries = sb._parse_terraform_lock_hcl(lock)
|
||||
assert all(e["is_direct"] for e in entries)
|
||||
|
||||
|
||||
def test_terraform_parser_empty(tmp_path):
|
||||
lock = tmp_path / ".terraform.lock.hcl"
|
||||
lock.write_text("# no providers\n")
|
||||
entries = sb._parse_terraform_lock_hcl(lock)
|
||||
assert entries == []
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Ansible Galaxy parser
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
ANSIBLE_REQUIREMENTS_FULL = textwrap.dedent("""\
|
||||
collections:
|
||||
- name: community.general
|
||||
version: "9.5.0"
|
||||
- name: ansible.posix
|
||||
version: "1.6.0"
|
||||
- community.crypto
|
||||
|
||||
roles:
|
||||
- name: geerlingguy.docker
|
||||
version: "6.1.0"
|
||||
- geerlingguy.pip
|
||||
""")
|
||||
|
||||
ANSIBLE_REQUIREMENTS_EMPTY = textwrap.dedent("""\
|
||||
collections: []
|
||||
roles: []
|
||||
""")
|
||||
|
||||
ANSIBLE_REQUIREMENTS_COLLECTIONS_ONLY = textwrap.dedent("""\
|
||||
collections:
|
||||
- name: community.general
|
||||
version: "9.0.0"
|
||||
""")
|
||||
|
||||
|
||||
def test_ansible_parser_collections_and_roles(tmp_path):
|
||||
req = tmp_path / "requirements.yml"
|
||||
req.write_text(ANSIBLE_REQUIREMENTS_FULL)
|
||||
entries = sb._parse_ansible_requirements(req)
|
||||
assert len(entries) == 5
|
||||
names = {e["package_name"] for e in entries}
|
||||
assert "community.general" in names
|
||||
assert "ansible.posix" in names
|
||||
assert "community.crypto" in names
|
||||
assert "geerlingguy.docker" in names
|
||||
assert "geerlingguy.pip" in names
|
||||
|
||||
|
||||
def test_ansible_parser_ecosystem(tmp_path):
|
||||
req = tmp_path / "requirements.yml"
|
||||
req.write_text(ANSIBLE_REQUIREMENTS_FULL)
|
||||
entries = sb._parse_ansible_requirements(req)
|
||||
for e in entries:
|
||||
assert e["ecosystem"] == "ansible"
|
||||
|
||||
|
||||
def test_ansible_parser_versions(tmp_path):
|
||||
req = tmp_path / "requirements.yml"
|
||||
req.write_text(ANSIBLE_REQUIREMENTS_FULL)
|
||||
entries = sb._parse_ansible_requirements(req)
|
||||
by_name = {e["package_name"]: e for e in entries}
|
||||
assert by_name["community.general"]["package_version"] == "9.5.0"
|
||||
assert by_name["ansible.posix"]["package_version"] == "1.6.0"
|
||||
assert by_name["community.crypto"]["package_version"] is None # no version specified
|
||||
assert by_name["geerlingguy.docker"]["package_version"] == "6.1.0"
|
||||
assert by_name["geerlingguy.pip"]["package_version"] is None
|
||||
|
||||
|
||||
def test_ansible_parser_is_direct(tmp_path):
|
||||
req = tmp_path / "requirements.yml"
|
||||
req.write_text(ANSIBLE_REQUIREMENTS_FULL)
|
||||
entries = sb._parse_ansible_requirements(req)
|
||||
assert all(e["is_direct"] for e in entries)
|
||||
|
||||
|
||||
def test_ansible_parser_empty(tmp_path):
|
||||
req = tmp_path / "requirements.yml"
|
||||
req.write_text(ANSIBLE_REQUIREMENTS_EMPTY)
|
||||
entries = sb._parse_ansible_requirements(req)
|
||||
assert entries == []
|
||||
|
||||
|
||||
def test_ansible_parser_collections_only(tmp_path):
|
||||
req = tmp_path / "requirements.yml"
|
||||
req.write_text(ANSIBLE_REQUIREMENTS_COLLECTIONS_ONLY)
|
||||
entries = sb._parse_ansible_requirements(req)
|
||||
assert len(entries) == 1
|
||||
assert entries[0]["package_name"] == "community.general"
|
||||
|
||||
|
||||
def test_ansible_parser_yaml_extension(tmp_path):
|
||||
"""Both .yml and .yaml extensions must work."""
|
||||
req = tmp_path / "requirements.yaml"
|
||||
req.write_text(ANSIBLE_REQUIREMENTS_COLLECTIONS_ONLY)
|
||||
entries = sb._parse_ansible_requirements(req)
|
||||
assert len(entries) == 1
|
||||
|
||||
|
||||
def test_ansible_parser_invalid_yaml(tmp_path, capsys):
|
||||
req = tmp_path / "requirements.yml"
|
||||
req.write_text("collections: [unclosed")
|
||||
entries = sb._parse_ansible_requirements(req)
|
||||
assert entries == []
|
||||
captured = capsys.readouterr()
|
||||
assert "Warning" in captured.err
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# sbom-tools.yaml parser
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
SBOM_TOOLS_YAML = textwrap.dedent("""\
|
||||
tools:
|
||||
- name: ansible
|
||||
version: "12.3.0"
|
||||
ecosystem: ansible
|
||||
license_spdx: GPL-3.0-only
|
||||
is_direct: true
|
||||
is_dev: false
|
||||
- name: terraform
|
||||
version: "1.10.5"
|
||||
ecosystem: terraform
|
||||
license_spdx: BSL-1.1
|
||||
is_direct: true
|
||||
is_dev: false
|
||||
- name: helm
|
||||
version: "3.17.1"
|
||||
ecosystem: tool
|
||||
license_spdx: Apache-2.0
|
||||
is_direct: true
|
||||
is_dev: false
|
||||
- name: k3s
|
||||
version: unknown
|
||||
ecosystem: other
|
||||
license_spdx: Apache-2.0
|
||||
is_direct: true
|
||||
is_dev: false
|
||||
""")
|
||||
|
||||
SBOM_TOOLS_YAML_MINIMAL = textwrap.dedent("""\
|
||||
tools:
|
||||
- name: kubectl
|
||||
ecosystem: tool
|
||||
""")
|
||||
|
||||
|
||||
def test_sbom_tools_parser_basic(tmp_path):
|
||||
manifest = tmp_path / "sbom-tools.yaml"
|
||||
manifest.write_text(SBOM_TOOLS_YAML)
|
||||
entries = sb._parse_sbom_tools_yaml(manifest)
|
||||
assert len(entries) == 4
|
||||
names = {e["package_name"] for e in entries}
|
||||
assert {"ansible", "terraform", "helm", "k3s"} == names
|
||||
|
||||
|
||||
def test_sbom_tools_parser_ecosystems(tmp_path):
|
||||
manifest = tmp_path / "sbom-tools.yaml"
|
||||
manifest.write_text(SBOM_TOOLS_YAML)
|
||||
entries = sb._parse_sbom_tools_yaml(manifest)
|
||||
by_name = {e["package_name"]: e for e in entries}
|
||||
assert by_name["ansible"]["ecosystem"] == "ansible"
|
||||
assert by_name["terraform"]["ecosystem"] == "terraform"
|
||||
assert by_name["helm"]["ecosystem"] == "tool"
|
||||
assert by_name["k3s"]["ecosystem"] == "other"
|
||||
|
||||
|
||||
def test_sbom_tools_parser_licenses(tmp_path):
|
||||
manifest = tmp_path / "sbom-tools.yaml"
|
||||
manifest.write_text(SBOM_TOOLS_YAML)
|
||||
entries = sb._parse_sbom_tools_yaml(manifest)
|
||||
by_name = {e["package_name"]: e for e in entries}
|
||||
assert by_name["ansible"]["license_spdx"] == "GPL-3.0-only"
|
||||
assert by_name["terraform"]["license_spdx"] == "BSL-1.1"
|
||||
assert by_name["helm"]["license_spdx"] == "Apache-2.0"
|
||||
|
||||
|
||||
def test_sbom_tools_parser_unknown_version_becomes_none(tmp_path, capsys):
|
||||
"""version: unknown must be converted to None and emit a warning."""
|
||||
manifest = tmp_path / "sbom-tools.yaml"
|
||||
manifest.write_text(SBOM_TOOLS_YAML)
|
||||
entries = sb._parse_sbom_tools_yaml(manifest)
|
||||
by_name = {e["package_name"]: e for e in entries}
|
||||
assert by_name["k3s"]["package_version"] is None
|
||||
captured = capsys.readouterr()
|
||||
assert "unknown" in captured.err
|
||||
|
||||
|
||||
def test_sbom_tools_parser_minimal_entry(tmp_path):
|
||||
"""Only 'name' and 'ecosystem' required; version and license default to None."""
|
||||
manifest = tmp_path / "sbom-tools.yaml"
|
||||
manifest.write_text(SBOM_TOOLS_YAML_MINIMAL)
|
||||
entries = sb._parse_sbom_tools_yaml(manifest)
|
||||
assert len(entries) == 1
|
||||
e = entries[0]
|
||||
assert e["package_name"] == "kubectl"
|
||||
assert e["ecosystem"] == "tool"
|
||||
assert e["package_version"] is None
|
||||
assert e["license_spdx"] is None
|
||||
assert e["is_direct"] is True
|
||||
assert e["is_dev"] is False
|
||||
|
||||
|
||||
def test_sbom_tools_parser_invalid_ecosystem_falls_back(tmp_path, capsys):
|
||||
manifest = tmp_path / "sbom-tools.yaml"
|
||||
manifest.write_text("tools:\n - name: foo\n ecosystem: nonsense\n")
|
||||
entries = sb._parse_sbom_tools_yaml(manifest)
|
||||
assert entries[0]["ecosystem"] == "tool"
|
||||
captured = capsys.readouterr()
|
||||
assert "Warning" in captured.err
|
||||
|
||||
|
||||
def test_sbom_tools_parser_empty_tools(tmp_path):
|
||||
manifest = tmp_path / "sbom-tools.yaml"
|
||||
manifest.write_text("tools: []\n")
|
||||
entries = sb._parse_sbom_tools_yaml(manifest)
|
||||
assert entries == []
|
||||
|
||||
|
||||
def test_sbom_tools_parser_invalid_yaml(tmp_path, capsys):
|
||||
manifest = tmp_path / "sbom-tools.yaml"
|
||||
manifest.write_text("tools: {bad yaml: [unclosed")
|
||||
entries = sb._parse_sbom_tools_yaml(manifest)
|
||||
assert entries == []
|
||||
captured = capsys.readouterr()
|
||||
assert "Warning" in captured.err
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# detect_all — comprehensive multi-parser scan
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def test_detect_all_uv_lock(tmp_path):
|
||||
(tmp_path / "uv.lock").write_text("[[package]]\nname = \"typer\"\nversion = \"0.12.0\"\n")
|
||||
sources = sb.detect_all(tmp_path)
|
||||
labels = {label for _, label, _ in sources}
|
||||
assert "uv.lock" in labels
|
||||
|
||||
|
||||
def test_detect_all_terraform_lock(tmp_path):
|
||||
tf_dir = tmp_path / "terraform" / "hetzner"
|
||||
tf_dir.mkdir(parents=True)
|
||||
(tf_dir / ".terraform.lock.hcl").write_text(
|
||||
'provider "registry.terraform.io/hetznercloud/hcloud" {\n version = "1.52.0"\n}\n'
|
||||
)
|
||||
sources = sb.detect_all(tmp_path)
|
||||
labels = {label for _, label, _ in sources}
|
||||
assert ".terraform.lock.hcl" in labels
|
||||
|
||||
|
||||
def test_detect_all_ansible_requirements(tmp_path):
|
||||
ansible_dir = tmp_path / "ansible"
|
||||
ansible_dir.mkdir()
|
||||
(ansible_dir / "requirements.yml").write_text("collections:\n - name: community.general\n")
|
||||
sources = sb.detect_all(tmp_path)
|
||||
labels = {label for _, label, _ in sources}
|
||||
assert "ansible/requirements.yml" in labels
|
||||
|
||||
|
||||
def test_detect_all_sbom_tools_yaml(tmp_path):
|
||||
(tmp_path / "sbom-tools.yaml").write_text("tools:\n - name: helm\n ecosystem: tool\n")
|
||||
sources = sb.detect_all(tmp_path)
|
||||
labels = {label for _, label, _ in sources}
|
||||
assert "sbom-tools.yaml" in labels
|
||||
|
||||
|
||||
def test_detect_all_multi_ecosystem(tmp_path):
|
||||
"""A repo with Python + Terraform + Ansible + tools manifest yields all four."""
|
||||
# Python
|
||||
(tmp_path / "uv.lock").write_text("[[package]]\nname = \"typer\"\nversion = \"0.12.0\"\n")
|
||||
# Terraform
|
||||
tf_dir = tmp_path / "terraform"
|
||||
tf_dir.mkdir()
|
||||
(tf_dir / ".terraform.lock.hcl").write_text(
|
||||
'provider "registry.terraform.io/hashicorp/null" {\n version = "3.2.3"\n}\n'
|
||||
)
|
||||
# Ansible
|
||||
ansible_dir = tmp_path / "ansible"
|
||||
ansible_dir.mkdir()
|
||||
(ansible_dir / "requirements.yml").write_text("collections:\n - name: ansible.posix\n version: \"1.6.0\"\n")
|
||||
# Tool manifest
|
||||
(tmp_path / "sbom-tools.yaml").write_text("tools:\n - name: helm\n ecosystem: tool\n version: \"3.17.1\"\n")
|
||||
|
||||
sources = sb.detect_all(tmp_path)
|
||||
labels = {label for _, label, _ in sources}
|
||||
assert "uv.lock" in labels
|
||||
assert ".terraform.lock.hcl" in labels
|
||||
assert "ansible/requirements.yml" in labels
|
||||
assert "sbom-tools.yaml" in labels
|
||||
|
||||
# Parse all and verify merged entries
|
||||
all_entries = []
|
||||
for path, label, parser_fn in sources:
|
||||
all_entries.extend(parser_fn(path))
|
||||
|
||||
ecosystems = {e["ecosystem"] for e in all_entries}
|
||||
assert "python" in ecosystems
|
||||
assert "terraform" in ecosystems
|
||||
assert "ansible" in ecosystems
|
||||
assert "tool" in ecosystems
|
||||
|
||||
|
||||
def test_detect_all_skips_venv(tmp_path):
|
||||
"""Lockfiles inside .venv must be ignored."""
|
||||
venv_dir = tmp_path / ".venv" / "lib"
|
||||
venv_dir.mkdir(parents=True)
|
||||
(venv_dir / "requirements.txt").write_text("requests==2.31.0\n")
|
||||
sources = sb.detect_all(tmp_path)
|
||||
paths = {str(p) for p, _, _ in sources}
|
||||
assert not any(".venv" in p for p in paths)
|
||||
|
||||
|
||||
def test_detect_all_ansible_req_only_in_ansible_dir(tmp_path):
|
||||
"""requirements.yml at repo root (not in ansible/) should not be picked up as ansible."""
|
||||
(tmp_path / "requirements.yml").write_text("collections:\n - name: community.general\n")
|
||||
sources = sb.detect_all(tmp_path)
|
||||
labels = {label for _, label, _ in sources}
|
||||
# Should NOT be detected since it's not under an 'ansible/' directory
|
||||
assert "ansible/requirements.yml" not in labels
|
||||
assert "ansible/requirements.yaml" not in labels
|
||||
|
||||
|
||||
def test_detect_all_no_duplicates(tmp_path):
|
||||
"""Same file should not appear twice."""
|
||||
(tmp_path / "uv.lock").write_text("[[package]]\nname = \"x\"\nversion = \"1.0\"\n")
|
||||
sources = sb.detect_all(tmp_path)
|
||||
paths = [p for p, _, _ in sources]
|
||||
assert len(paths) == len(set(paths))
|
||||
|
||||
|
||||
def test_detect_all_empty_repo(tmp_path):
|
||||
sources = sb.detect_all(tmp_path)
|
||||
assert sources == []
|
||||
386
workplans/CUST-WP-0013-sbom-infra-expansion.md
Normal file
386
workplans/CUST-WP-0013-sbom-infra-expansion.md
Normal file
@@ -0,0 +1,386 @@
|
||||
---
|
||||
id: CUST-WP-0013
|
||||
type: workplan
|
||||
title: "SBOM Infrastructure Expansion"
|
||||
domain: custodian
|
||||
repo: the-custodian
|
||||
status: completed
|
||||
owner: custodian
|
||||
topic_slug: custodian
|
||||
state_hub_workstream_id: f4ba84c8-4d47-492d-b65e-73b157271a2b
|
||||
created: "2026-03-12"
|
||||
updated: "2026-03-12"
|
||||
---
|
||||
|
||||
# CUST-WP-0013 — SBOM Infrastructure Expansion
|
||||
|
||||
**Scope:** Extend SBOM capture beyond Python packages to cover Terraform providers,
|
||||
Ansible Galaxy collections, and system-level tools (Ansible, Terraform, Helm, k3s,
|
||||
cloud-init, etc.). Introduces an agent-assisted tool manifest capture workflow,
|
||||
new ecosystem enum values, comprehensive auto-detection in `ingest_sbom.py`, and
|
||||
delivers full SBOM coverage for `railiance-infra` and `railiance-cluster`.
|
||||
|
||||
**Drives:** Licence risk visibility across the full dependency graph, not just
|
||||
language-level packages.
|
||||
|
||||
---
|
||||
|
||||
## Design Decisions
|
||||
|
||||
### Tool manifest: agent-generated, not hand-written
|
||||
|
||||
System tools (Ansible, Terraform, Helm, k3s, etc.) live outside any lockfile —
|
||||
they are provisioned, not installed by a package manager. Rather than asking
|
||||
operators to maintain a hand-written manifest, the SBOM capture agent inspects
|
||||
the repo and generates/updates `sbom-tools.yaml` automatically.
|
||||
|
||||
The agent prompt (`state-hub/prompts/sbom-capture-agent.md`) is parameterised
|
||||
per repo. It reads the repo's CLAUDE.md, Makefile, README, CI configs, version
|
||||
pins, and provisioning files, then emits a structured `sbom-tools.yaml` with
|
||||
tool name, version, ecosystem, SPDX licence, and directness flags.
|
||||
|
||||
A thin wrapper script (`state-hub/scripts/capture_sbom_tools.py`) invokes the
|
||||
agent prompt via `claude -p` (or prints it for manual use) and writes the result
|
||||
to `<repo-root>/sbom-tools.yaml`.
|
||||
|
||||
### Comprehensive ingest: all mechanisms per repo
|
||||
|
||||
`make ingest-sbom REPO=<slug>` must run all applicable parsers, not just
|
||||
whichever lockfile happens to be auto-detected first. The updated auto-detection
|
||||
in `ingest_sbom.py` scans:
|
||||
|
||||
1. Package manager lockfiles (`uv.lock`, `requirements.txt`, `package-lock.json`,
|
||||
`yarn.lock`, `Cargo.lock`, `go.sum`)
|
||||
2. Terraform provider locks (`.terraform.lock.hcl`, anywhere in the tree)
|
||||
3. Ansible Galaxy manifests (`requirements.yml` / `requirements.yaml`, anywhere
|
||||
in the tree under `ansible/`)
|
||||
4. Agent-generated tool manifest (`sbom-tools.yaml` at repo root)
|
||||
|
||||
All parsers run and their results are merged into a single snapshot.
|
||||
|
||||
---
|
||||
|
||||
## Phase 1 — Schema: Ecosystem Enum Extension
|
||||
|
||||
**Acceptance:** `terraform` and `ansible` are valid ecosystem values; existing
|
||||
`other` entries are unaffected; migration applies cleanly.
|
||||
|
||||
### T01 — Alembic migration: add terraform and ansible enum values
|
||||
|
||||
```task
|
||||
id: CUST-WP-0013-T01
|
||||
state_hub_task_id: c0b6edc4-86ab-4cee-88a8-6c66fb81adee
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
Add `terraform` and `ansible` to the `Ecosystem` enum in the DB. Check whether
|
||||
the column uses a native PostgreSQL ENUM type (requiring `ALTER TYPE`) or a
|
||||
`String` column (requiring no migration). Write the migration accordingly.
|
||||
Also add `tool` as a catch-all for tool-manifest entries that don't fit a
|
||||
named ecosystem.
|
||||
|
||||
---
|
||||
|
||||
## Phase 2 — Parser Improvements in ingest_sbom.py
|
||||
|
||||
**Acceptance:** `--dry-run` on railiance-infra shows terraform providers and
|
||||
ansible collections correctly labelled; tool manifest entries appear with the
|
||||
declared ecosystem.
|
||||
|
||||
### T02 — Promote Terraform parser: other → terraform ecosystem
|
||||
|
||||
```task
|
||||
id: CUST-WP-0013-T02
|
||||
state_hub_task_id: 7686bccd-022c-4e30-8081-c8487eb82253
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
The `.terraform.lock.hcl` parser already exists in `ingest_sbom.py` but stores
|
||||
entries as `ecosystem="other"`. Change to `ecosystem="terraform"` after T01
|
||||
migration lands. Re-ingest any repos that previously ingested terraform entries
|
||||
as `other` to correct the label.
|
||||
|
||||
### T03 — Implement Ansible Galaxy requirements.yml parser
|
||||
|
||||
```task
|
||||
id: CUST-WP-0013-T03
|
||||
state_hub_task_id: 48658bdd-4d16-4be0-a87e-45df4f4901b0
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
Parse `requirements.yml` / `requirements.yaml` files found in `ansible/`
|
||||
subdirectories. Standard format:
|
||||
|
||||
```yaml
|
||||
collections:
|
||||
- name: community.general
|
||||
version: "9.5.0"
|
||||
roles:
|
||||
- name: geerlingguy.docker
|
||||
version: "6.x"
|
||||
```
|
||||
|
||||
Store as `ecosystem="ansible"`, `is_direct=True`. Licence left `null` (Galaxy
|
||||
API lookup is deferred). Handle both `collections:` and `roles:` blocks.
|
||||
|
||||
### T04 — Implement sbom-tools.yaml manifest parser
|
||||
|
||||
```task
|
||||
id: CUST-WP-0013-T04
|
||||
state_hub_task_id: 4522ea04-134b-40ee-a7a2-ea0e4c1c061d
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
Parse `sbom-tools.yaml` at the repo root (written by the capture agent). Schema:
|
||||
|
||||
```yaml
|
||||
# Generated by sbom-capture-agent — review before committing
|
||||
tools:
|
||||
- name: ansible
|
||||
version: "12.3.0"
|
||||
ecosystem: ansible # or: terraform, other, python, etc.
|
||||
license_spdx: GPL-3.0-only
|
||||
is_direct: true
|
||||
is_dev: false
|
||||
- name: helm
|
||||
version: "3.17.x"
|
||||
ecosystem: other
|
||||
license_spdx: Apache-2.0
|
||||
is_direct: true
|
||||
is_dev: false
|
||||
```
|
||||
|
||||
Supports all existing ecosystem values plus `tool`. Pass entries through the
|
||||
same normalisation as lockfile entries. Skip entries with `version: unknown`
|
||||
with a warning (agent could not determine version).
|
||||
|
||||
### T05 — Comprehensive auto-detection: all formats in one scan
|
||||
|
||||
```task
|
||||
id: CUST-WP-0013-T05
|
||||
state_hub_task_id: cdda6bf2-2a44-4444-a04a-ac2fe2314923
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
Refactor the `--repo-path` scan to discover and run all applicable parsers,
|
||||
not just the first match. Scan order:
|
||||
|
||||
1. Walk tree for all `uv.lock`, `requirements.txt`, `package-lock.json`,
|
||||
`yarn.lock`, `Cargo.lock`
|
||||
2. Walk tree for all `.terraform.lock.hcl`
|
||||
3. Walk tree for `ansible/requirements.yml` and `ansible/requirements.yaml`
|
||||
4. Check repo root for `sbom-tools.yaml`
|
||||
|
||||
Merge all results into a single batch for the snapshot ingest call. Log a
|
||||
summary line per parser: ` <parser>: N packages from <path>`.
|
||||
|
||||
### T06 — Unit tests for new parsers
|
||||
|
||||
```task
|
||||
id: CUST-WP-0013-T06
|
||||
state_hub_task_id: fee37e66-8f41-4dba-995b-97fc66493caf
|
||||
status: done
|
||||
priority: medium
|
||||
```
|
||||
|
||||
Add test fixtures and unit tests for:
|
||||
- Ansible Galaxy requirements.yml (collections + roles, version pinned and
|
||||
unpinned)
|
||||
- sbom-tools.yaml (valid, missing version, unknown ecosystem)
|
||||
- Multi-parser scan: repo root with uv.lock + .terraform.lock.hcl +
|
||||
sbom-tools.yaml produces merged results
|
||||
|
||||
---
|
||||
|
||||
## Phase 3 — SBOM Capture Agent
|
||||
|
||||
**Acceptance:** `make capture-tools REPO=railiance-infra` produces a reviewed
|
||||
`sbom-tools.yaml` that correctly identifies Ansible, Terraform, Helm, and other
|
||||
declared tools with versions and SPDX licences.
|
||||
|
||||
### T07 — Write SBOM capture agent prompt
|
||||
|
||||
```task
|
||||
id: CUST-WP-0013-T07
|
||||
state_hub_task_id: a3b919b5-63b0-44f7-a048-ebfae603ef7b
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
Write `state-hub/prompts/sbom-capture-agent.md` — a Claude agent prompt
|
||||
parameterised with `{repo_slug}` and `{repo_path}`. The prompt instructs the
|
||||
agent to:
|
||||
|
||||
1. Read `CLAUDE.md`, `Makefile`, `README.md`, `pyproject.toml`, `.tool-versions`,
|
||||
CI configs, Dockerfiles, and provisioning files in `{repo_path}`
|
||||
2. Identify all system-level tools: name, version (from version pins, Makefile
|
||||
vars, or documented prerequisites), ecosystem, SPDX licence
|
||||
3. Identify indirect/transitive tool deps (e.g. Ansible → Python; Terraform →
|
||||
provider plugins already captured by `.terraform.lock.hcl`)
|
||||
4. Emit a well-formed `sbom-tools.yaml` with a comment header noting generation
|
||||
date and confidence level per entry (`# confidence: high/medium/low`)
|
||||
5. Flag any tools where version could not be determined (`version: unknown`) for
|
||||
human review
|
||||
|
||||
The prompt must not hallucinate versions — it must derive them from evidence in
|
||||
the repo or mark them unknown.
|
||||
|
||||
### T08 — Implement capture_sbom_tools.py
|
||||
|
||||
```task
|
||||
id: CUST-WP-0013-T08
|
||||
state_hub_task_id: 9593dca7-e713-4d7a-b4f2-c5333ae0b3d2
|
||||
status: done
|
||||
priority: high
|
||||
```
|
||||
|
||||
Write `state-hub/scripts/capture_sbom_tools.py`:
|
||||
|
||||
- Accepts `--repo SLUG` and `--repo-path PATH`
|
||||
- Resolves repo path from slug via the state-hub API if `--repo-path` is omitted
|
||||
- Loads the agent prompt from `prompts/sbom-capture-agent.md`, substitutes
|
||||
`{repo_slug}` and `{repo_path}`
|
||||
- Invokes `claude -p "<prompt>"` (non-interactive) and captures stdout
|
||||
- Parses the YAML block from the response
|
||||
- Writes or updates `<repo-path>/sbom-tools.yaml`
|
||||
- Prints a diff of changes if the file already exists
|
||||
- `--dry-run` flag: print the prompt and diff without writing
|
||||
|
||||
### T09 — Add make capture-tools target
|
||||
|
||||
```task
|
||||
id: CUST-WP-0013-T09
|
||||
state_hub_task_id: 6948e1d2-9c97-4709-bdb0-4b6ded700a22
|
||||
status: done
|
||||
priority: medium
|
||||
```
|
||||
|
||||
Add to `state-hub/Makefile`:
|
||||
|
||||
```makefile
|
||||
capture-tools: ## Run SBOM capture agent for a repo (REPO=slug, REPO_PATH=path)
|
||||
uv run python scripts/capture_sbom_tools.py --repo $(REPO) $(if $(REPO_PATH),--repo-path $(REPO_PATH),)
|
||||
```
|
||||
|
||||
Also update `make ingest-sbom` to note that `capture-tools` should be run first
|
||||
for repos that have system-level tool dependencies.
|
||||
|
||||
---
|
||||
|
||||
## Phase 4 — Ingest railiance-infra
|
||||
|
||||
**Acceptance:** `make ingest-sbom REPO=railiance-infra` shows terraform providers,
|
||||
ansible collections, and tool manifest entries in one snapshot.
|
||||
|
||||
### T10 — Capture tools manifest for railiance-infra
|
||||
|
||||
```task
|
||||
id: CUST-WP-0013-T10
|
||||
state_hub_task_id: 99b23998-5129-4777-9d42-7bee5981cdbb
|
||||
status: done
|
||||
priority: medium
|
||||
```
|
||||
|
||||
Run `make capture-tools REPO=railiance-infra`. Review the generated
|
||||
`railiance-infra/sbom-tools.yaml` — verify Ansible, Terraform, cloud-init, goss,
|
||||
and any other tools with their versions and licences. Correct any `unknown`
|
||||
versions by consulting the repo. Commit the file.
|
||||
|
||||
### T11 — Ingest railiance-infra
|
||||
|
||||
```task
|
||||
id: CUST-WP-0013-T11
|
||||
state_hub_task_id: bb516909-f903-48ce-b60b-a24245e7382e
|
||||
status: done
|
||||
priority: medium
|
||||
```
|
||||
|
||||
Run `make ingest-sbom REPO=railiance-infra REPO_PATH=~/railiance-infra`. Verify
|
||||
the snapshot contains:
|
||||
- Terraform providers (from `.terraform.lock.hcl`)
|
||||
- Ansible Galaxy collections (from `ansible/requirements.yaml`)
|
||||
- System tools (from `sbom-tools.yaml`)
|
||||
|
||||
Check the licence report for any copyleft or BSL flags.
|
||||
|
||||
---
|
||||
|
||||
## Phase 5 — Ingest railiance-cluster
|
||||
|
||||
**Acceptance:** railiance-cluster SBOM covers both Python packages (uv.lock) and
|
||||
system tools in a single snapshot.
|
||||
|
||||
### T12 — Capture tools manifest for railiance-cluster
|
||||
|
||||
```task
|
||||
id: CUST-WP-0013-T12
|
||||
state_hub_task_id: 7a890f1a-da9f-4e6d-86a7-4fd1aefd5b3f
|
||||
status: done
|
||||
priority: medium
|
||||
```
|
||||
|
||||
Run `make capture-tools REPO=railiance-cluster`. Review the generated
|
||||
`railiance-cluster/sbom-tools.yaml` — verify Helm, kubectl, k3s, and any other
|
||||
operational tools. Commit the file.
|
||||
|
||||
### T13 — Re-ingest railiance-cluster
|
||||
|
||||
```task
|
||||
id: CUST-WP-0013-T13
|
||||
state_hub_task_id: 789dbe93-011a-4470-9fec-ebf249cd7134
|
||||
status: done
|
||||
priority: medium
|
||||
```
|
||||
|
||||
Run `make ingest-sbom REPO=railiance-cluster REPO_PATH=~/railiance-cluster`.
|
||||
Verify the snapshot merges uv.lock (Python packages including ansible-core) and
|
||||
sbom-tools.yaml entries into one coherent snapshot. Confirm ansible-core GPL-3.0
|
||||
flag appears in the licence report.
|
||||
|
||||
---
|
||||
|
||||
## Phase 6 — Convention Documentation
|
||||
|
||||
**Acceptance:** A developer reading the SBOM convention doc knows exactly how to
|
||||
add a new repo to SBOM coverage.
|
||||
|
||||
### T14 — Document SBOM capture convention in canon/standards
|
||||
|
||||
```task
|
||||
id: CUST-WP-0013-T14
|
||||
state_hub_task_id: dc3bb2a3-882e-4dd7-ab7c-8b1e88279a7d
|
||||
status: done
|
||||
priority: low
|
||||
```
|
||||
|
||||
Write `canon/standards/sbom-convention_v0.1.md` documenting:
|
||||
- The four capture mechanisms and when each applies
|
||||
- The `sbom-tools.yaml` schema (with confidence annotation convention)
|
||||
- The `make capture-tools` → review → commit → `make ingest-sbom` workflow
|
||||
- Licence risk thresholds: copyleft = flag for review; BSL = flag for review;
|
||||
null licence = acceptable for infra tools if well-known open source
|
||||
|
||||
---
|
||||
|
||||
## Licence Risk Preview
|
||||
|
||||
Based on known tool licences, expect these flags once ingested:
|
||||
|
||||
| Tool / Package | Licence | Risk level |
|
||||
|---|---|---|
|
||||
| ansible-core | GPL-3.0-only | Copyleft — flag (ops toolchain, not shipped) |
|
||||
| terraform ≥ 1.5.6 | BSL-1.1 | Non-OSI — flag for review |
|
||||
| hashicorp providers | BSL-1.1 | Same |
|
||||
| community.general | GPL-3.0 | Copyleft — flag (ops toolchain) |
|
||||
| Helm | Apache-2.0 | Clean |
|
||||
| k3s | Apache-2.0 | Clean |
|
||||
| cloud-init | Apache-2.0 / GPL-3.0 | Mixed — check version |
|
||||
| goss | Apache-2.0 | Clean |
|
||||
|
||||
All copyleft/BSL entries here are **operational toolchain** dependencies, not
|
||||
shipped code — risk is low but worth tracking for compliance awareness.
|
||||
Reference in New Issue
Block a user