feat(sbom): CUST-WP-0013 — expand SBOM infra to terraform, ansible, and tool manifests
- Migration d6e7f8a9b0c1: add terraform, ansible, tool to Ecosystem enum - ingest_sbom.py: new Ansible Galaxy requirements.yml parser (collections + roles) - ingest_sbom.py: new sbom-tools.yaml manifest parser (agent-generated tool deps) - ingest_sbom.py: promote .terraform.lock.hcl parser from ecosystem=other → terraform - ingest_sbom.py: detect_all() runs all four parsers in one comprehensive scan - capture_sbom_tools.py: agent-assisted tool manifest generator (claude -p) - prompts/sbom-capture-agent.md: parameterised prompt for repo tool discovery - Makefile: capture-tools target; ingest-sbom updated docs and DRY_RUN support - 29 unit tests covering all new parsers and detect_all() behaviour - canon/standards/sbom-convention_v0.1.md: updated with four-mechanism model and workflow Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -6,7 +6,7 @@ domain: custodian
|
|||||||
status: active
|
status: active
|
||||||
version: "0.1"
|
version: "0.1"
|
||||||
created: "2026-03-01"
|
created: "2026-03-01"
|
||||||
updated: "2026-03-01"
|
updated: "2026-03-12"
|
||||||
---
|
---
|
||||||
|
|
||||||
# SBOM Convention v0.1 — Dependency Tracking & Licence Governance
|
# SBOM Convention v0.1 — Dependency Tracking & Licence Governance
|
||||||
@@ -27,20 +27,23 @@ dashboard (`/sbom`) provides domain-level and repo-level drill-down.
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 1. Authoritative Lockfiles per Ecosystem
|
## 1. Capture Mechanisms
|
||||||
|
|
||||||
| Ecosystem | Authoritative file | Notes |
|
`ingest_sbom.py` runs all four mechanisms in a single scan when given `--repo-path`.
|
||||||
|-----------|-------------------|-------|
|
No flags needed — comprehensive detection is the default.
|
||||||
| Python | `uv.lock` | Preferred. `requirements.txt` accepted as fallback |
|
|
||||||
| Node / npm | `package-lock.json` | Preferred. `yarn.lock` accepted |
|
|
||||||
| Rust | `Cargo.lock` | Auto-detected |
|
|
||||||
| Terraform | `.terraform.lock.hcl` | Provider pins; ecosystem stored as `other` until ENUM extended |
|
|
||||||
| Go | `go.sum` | *Not yet parsed — planned* |
|
|
||||||
| Java / JVM | `gradle.lockfile` / `pom.xml` | *Not yet parsed — planned* |
|
|
||||||
| Ansible | `requirements.yml` | *Not yet parsed — planned* |
|
|
||||||
|
|
||||||
**Principle:** commit lockfiles to the repo. Lockfiles are the SBOM source
|
| Mechanism | File(s) | Ecosystem | Detection scope |
|
||||||
of truth; do not generate them at ingest time.
|
|-----------|---------|-----------|-----------------|
|
||||||
|
| **Package manager lockfiles** | `uv.lock`, `requirements.txt`, `package-lock.json`, `yarn.lock`, `Cargo.lock` | `python`, `node`, `rust` | Anywhere in tree |
|
||||||
|
| **Terraform provider lock** | `.terraform.lock.hcl` | `terraform` | Anywhere in tree |
|
||||||
|
| **Ansible Galaxy manifest** | `ansible/requirements.yml` or `.yaml` | `ansible` | Under directories named `ansible/` |
|
||||||
|
| **Tool manifest** | `sbom-tools.yaml` (repo root) | `tool`, `ansible`, `terraform`, etc. | Repo root only |
|
||||||
|
|
||||||
|
**Go / Java parsers** (`go.sum`, `pom.xml`, `gradle.lockfile`) are *not yet
|
||||||
|
implemented* — planned for a future workplan.
|
||||||
|
|
||||||
|
**Principle:** commit lockfiles and `sbom-tools.yaml` to the repo. These are
|
||||||
|
the SBOM source of truth; do not generate them at ingest time.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -64,27 +67,35 @@ curl -s http://127.0.0.1:8000/repos/ | python3 -m json.tool
|
|||||||
|
|
||||||
## 3. SBOM Ingestion
|
## 3. SBOM Ingestion
|
||||||
|
|
||||||
### 3.1 Standard ingest (single lockfile at repo root)
|
### 3.1 Standard ingest (all mechanisms, recommended)
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cd ~/the-custodian/state-hub
|
cd ~/the-custodian/state-hub
|
||||||
make ingest-sbom REPO=<slug> REPO_PATH=/path/to/repo
|
make ingest-sbom REPO=<slug> REPO_PATH=/path/to/repo
|
||||||
```
|
```
|
||||||
|
|
||||||
The script auto-detects the first recognised lockfile at `REPO_PATH`.
|
`ingest_sbom.py` automatically runs all four mechanisms in one scan — lockfiles,
|
||||||
|
Terraform provider locks, Ansible Galaxy manifests, and `sbom-tools.yaml`. All
|
||||||
|
results are merged into a single snapshot. Non-dep directories (`.venv`,
|
||||||
|
`node_modules`, `.git`, `dist`, etc.) are automatically skipped.
|
||||||
|
|
||||||
### 3.2 Multi-ecosystem repos (recommended for complex repos)
|
### 3.2 Repos with system-level tools: capture first, then ingest
|
||||||
|
|
||||||
Use `SCAN=1` to walk the repo tree and combine **all** lockfiles into a single
|
For repos that use system-level tools not tracked by any lockfile (Terraform
|
||||||
snapshot. Non-dep directories (`.venv`, `node_modules`, `.git`, `dist`, etc.)
|
binary, Helm, kubectl, k3s, goss, etc.):
|
||||||
are automatically skipped.
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
make ingest-sbom REPO=the-custodian SCAN=1 REPO_PATH=/home/worsch/the-custodian
|
# Step 1: generate sbom-tools.yaml via agent
|
||||||
```
|
make capture-tools REPO=<slug> REPO_PATH=/path/to/repo
|
||||||
|
|
||||||
This is the correct approach for repos that contain both a backend and a
|
# Step 2: review sbom-tools.yaml — correct any confidence: low entries
|
||||||
frontend (e.g., a Python API + Node/Observable dashboard).
|
|
||||||
|
# Step 3: commit sbom-tools.yaml
|
||||||
|
git -C /path/to/repo add sbom-tools.yaml && git -C /path/to/repo commit -m "chore(sbom): add tool manifest"
|
||||||
|
|
||||||
|
# Step 4: ingest everything
|
||||||
|
make ingest-sbom REPO=<slug> REPO_PATH=/path/to/repo
|
||||||
|
```
|
||||||
|
|
||||||
### 3.3 Explicit lockfile path
|
### 3.3 Explicit lockfile path
|
||||||
|
|
||||||
@@ -96,8 +107,7 @@ Multiple lockfiles can be passed by calling the script directly with repeated
|
|||||||
`--lockfile` flags:
|
`--lockfile` flags:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cd ~/the-custodian/state-hub
|
uv run python scripts/ingest_sbom.py \
|
||||||
.venv/bin/python scripts/ingest_sbom.py \
|
|
||||||
--repo <slug> \
|
--repo <slug> \
|
||||||
--lockfile /path/to/uv.lock \
|
--lockfile /path/to/uv.lock \
|
||||||
--lockfile /path/to/package-lock.json
|
--lockfile /path/to/package-lock.json
|
||||||
@@ -106,11 +116,40 @@ cd ~/the-custodian/state-hub
|
|||||||
### 3.4 Dry run (inspect without submitting)
|
### 3.4 Dry run (inspect without submitting)
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
make ingest-sbom REPO=<slug> SCAN=1 REPO_PATH=/path/to/repo
|
make ingest-sbom REPO=<slug> REPO_PATH=/path/to/repo DRY_RUN=1
|
||||||
# append: add --dry-run to the command, or run the script directly:
|
|
||||||
.venv/bin/python scripts/ingest_sbom.py --repo <slug> --scan --repo-path /path/to/repo --dry-run
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### 3.5 sbom-tools.yaml: the tool manifest
|
||||||
|
|
||||||
|
Create `sbom-tools.yaml` at the repo root for any system-level tools not
|
||||||
|
covered by lockfiles. Schema:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# sbom-tools.yaml
|
||||||
|
tools:
|
||||||
|
- name: terraform
|
||||||
|
version: "1.9.5" # confidence: medium
|
||||||
|
ecosystem: terraform
|
||||||
|
license_spdx: BSL-1.1
|
||||||
|
is_direct: true
|
||||||
|
is_dev: false
|
||||||
|
- name: helm
|
||||||
|
version: null # confidence: low (no version pin found)
|
||||||
|
ecosystem: tool
|
||||||
|
license_spdx: Apache-2.0
|
||||||
|
is_direct: true
|
||||||
|
is_dev: false
|
||||||
|
```
|
||||||
|
|
||||||
|
**Valid ecosystem values:** `python`, `node`, `rust`, `go`, `java`, `terraform`,
|
||||||
|
`ansible`, `tool`, `other`
|
||||||
|
|
||||||
|
Annotate each version with a `# confidence: high/medium/low` comment.
|
||||||
|
Entries with `confidence: low` need human verification before committing.
|
||||||
|
|
||||||
|
The `make capture-tools` command generates this file automatically using the
|
||||||
|
SBOM capture agent prompt (`state-hub/prompts/sbom-capture-agent.md`).
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 4. Snapshot Semantics
|
## 4. Snapshot Semantics
|
||||||
@@ -248,10 +287,14 @@ The SBOM dashboard aggregates across all repos within a domain in the
|
|||||||
|
|
||||||
## 10. Planned Enhancements
|
## 10. Planned Enhancements
|
||||||
|
|
||||||
- **Go / Java parsers** — add to `ingest_sbom.py`
|
- **Go / Java parsers** — add `go.sum`, `pom.xml`, `gradle.lockfile` support to `ingest_sbom.py`
|
||||||
- **Versioned snapshots** — retain history per repo for trend analysis
|
- **Versioned snapshots** — retain history per repo for trend analysis
|
||||||
- **Licence override file** — allow repos to document known-acceptable
|
- **Licence override file** — allow repos to document known-acceptable
|
||||||
copyleft exceptions (`.sbom-overrides.yaml`)
|
copyleft exceptions (`.sbom-overrides.yaml`)
|
||||||
- **CI integration** — GitHub Actions step to run ingest on lockfile change
|
- **CI integration** — GitHub Actions step to run ingest on lockfile change
|
||||||
- **Direct-dep detection for uv.lock** — parse `pyproject.toml` `[project.dependencies]`
|
- **Direct-dep detection for uv.lock** — parse `pyproject.toml` `[project.dependencies]`
|
||||||
to mark direct deps accurately
|
to mark direct deps accurately
|
||||||
|
- **Galaxy API licence lookup** — resolve `license_spdx` for Ansible collections
|
||||||
|
via the Galaxy API at ingest time
|
||||||
|
- **Tool version pinning guidance** — tooling to detect `confidence: low` entries
|
||||||
|
across all registered repos and flag them for resolution
|
||||||
|
|||||||
@@ -133,16 +133,26 @@ list-repos:
|
|||||||
@test -n "$(DOMAIN)" || (echo "ERROR: DOMAIN is required."; exit 1)
|
@test -n "$(DOMAIN)" || (echo "ERROR: DOMAIN is required."; exit 1)
|
||||||
curl -sf "http://127.0.0.1:8000/repos/?domain=$(DOMAIN)" | python3 -m json.tool
|
curl -sf "http://127.0.0.1:8000/repos/?domain=$(DOMAIN)" | python3 -m json.tool
|
||||||
|
|
||||||
## Ingest SBOM data for a repo.
|
## Ingest SBOM data for a repo (all mechanisms: lockfiles + ansible + sbom-tools.yaml).
|
||||||
|
## Auto-detect all sources: make ingest-sbom REPO=the-custodian REPO_PATH=/home/worsch/the-custodian
|
||||||
## Single lockfile (explicit): make ingest-sbom REPO=the-custodian LOCKFILE=/path/to/uv.lock
|
## Single lockfile (explicit): make ingest-sbom REPO=the-custodian LOCKFILE=/path/to/uv.lock
|
||||||
## Scan all lockfiles in tree: make ingest-sbom REPO=the-custodian SCAN=1 REPO_PATH=/home/worsch/the-custodian
|
## Dry-run (no submit): make ingest-sbom REPO=the-custodian REPO_PATH=... DRY_RUN=1
|
||||||
## Auto-detect at repo root: make ingest-sbom REPO=the-custodian REPO_PATH=/home/worsch/the-custodian
|
## Tip: run capture-tools first for repos with system-level tool dependencies.
|
||||||
ingest-sbom:
|
ingest-sbom:
|
||||||
@test -n "$(REPO)" || (echo "ERROR: REPO is required."; exit 1)
|
@test -n "$(REPO)" || (echo "ERROR: REPO is required."; exit 1)
|
||||||
uv run python scripts/ingest_sbom.py --repo "$(REPO)" \
|
uv run python scripts/ingest_sbom.py --repo "$(REPO)" \
|
||||||
$(if $(LOCKFILE),--lockfile "$(LOCKFILE)") \
|
$(if $(LOCKFILE),--lockfile "$(LOCKFILE)") \
|
||||||
$(if $(SCAN),--scan) \
|
$(if $(REPO_PATH),--repo-path "$(REPO_PATH)") \
|
||||||
$(if $(REPO_PATH),--repo-path "$(REPO_PATH)")
|
$(if $(DRY_RUN),--dry-run)
|
||||||
|
|
||||||
|
## Run SBOM capture agent for a repo — generates/updates sbom-tools.yaml.
|
||||||
|
## Usage: make capture-tools REPO=railiance-infra [REPO_PATH=/home/worsch/railiance-infra]
|
||||||
|
## Add DRY_RUN=1 to preview without writing.
|
||||||
|
capture-tools:
|
||||||
|
@test -n "$(REPO)" || (echo "ERROR: REPO is required."; exit 1)
|
||||||
|
uv run python scripts/capture_sbom_tools.py --repo "$(REPO)" \
|
||||||
|
$(if $(REPO_PATH),--repo-path "$(REPO_PATH)") \
|
||||||
|
$(if $(DRY_RUN),--dry-run)
|
||||||
|
|
||||||
## Check a repo for ADR-001 compliance: make validate-adr REPO=/path/to/repo [DOMAIN=custodian]
|
## Check a repo for ADR-001 compliance: make validate-adr REPO=/path/to/repo [DOMAIN=custodian]
|
||||||
validate-adr:
|
validate-adr:
|
||||||
|
|||||||
@@ -15,6 +15,9 @@ class Ecosystem(str, enum.Enum):
|
|||||||
rust = "rust"
|
rust = "rust"
|
||||||
go = "go"
|
go = "go"
|
||||||
java = "java"
|
java = "java"
|
||||||
|
terraform = "terraform"
|
||||||
|
ansible = "ansible"
|
||||||
|
tool = "tool"
|
||||||
other = "other"
|
other = "other"
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,30 @@
|
|||||||
|
"""SBOM ecosystem enum expansion: add terraform, ansible, tool
|
||||||
|
|
||||||
|
Revision ID: d6e7f8a9b0c1
|
||||||
|
Revises: c5d6e7f8a9b0
|
||||||
|
Create Date: 2026-03-12 00:00:00.000000
|
||||||
|
"""
|
||||||
|
from typing import Sequence, Union
|
||||||
|
|
||||||
|
from alembic import op
|
||||||
|
|
||||||
|
revision: str = "d6e7f8a9b0c1"
|
||||||
|
down_revision: Union[str, None] = "c5d6e7f8a9b0"
|
||||||
|
branch_labels: Union[str, Sequence[str], None] = None
|
||||||
|
depends_on: Union[str, Sequence[str], None] = None
|
||||||
|
|
||||||
|
|
||||||
|
def upgrade() -> None:
|
||||||
|
# PostgreSQL requires each ADD VALUE in its own statement and cannot be
|
||||||
|
# run inside a transaction that also modifies data. ADD VALUE is
|
||||||
|
# transactional in PG 12+ (no COMMIT needed).
|
||||||
|
op.execute("ALTER TYPE ecosystem ADD VALUE IF NOT EXISTS 'terraform'")
|
||||||
|
op.execute("ALTER TYPE ecosystem ADD VALUE IF NOT EXISTS 'ansible'")
|
||||||
|
op.execute("ALTER TYPE ecosystem ADD VALUE IF NOT EXISTS 'tool'")
|
||||||
|
|
||||||
|
|
||||||
|
def downgrade() -> None:
|
||||||
|
# PostgreSQL does not support removing enum values without recreating the
|
||||||
|
# type. Document the limitation and do nothing — reverting this migration
|
||||||
|
# requires a full type recreation if needed.
|
||||||
|
pass
|
||||||
90
state-hub/prompts/sbom-capture-agent.md
Normal file
90
state-hub/prompts/sbom-capture-agent.md
Normal file
@@ -0,0 +1,90 @@
|
|||||||
|
# SBOM Capture Agent Prompt
|
||||||
|
|
||||||
|
**Task:** Generate or update `sbom-tools.yaml` for the repository at `{repo_path}` (slug: `{repo_slug}`).
|
||||||
|
|
||||||
|
This file captures system-level tool dependencies that are not tracked by any package manager lockfile — tools that are installed via provisioning, Homebrew, system packages, or assumed present in the environment.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Instructions
|
||||||
|
|
||||||
|
1. **Read the following files** in `{repo_path}` (read each that exists; skip gracefully if absent):
|
||||||
|
- `CLAUDE.md` — look for stack declarations, tool prerequisites, dev commands
|
||||||
|
- `README.md` / `QUICKSTART.md` — prerequisites sections, tool version requirements
|
||||||
|
- `Makefile` — tool invocations, version variables (e.g. `ANSIBLE_VERSION := 12.3`)
|
||||||
|
- `pyproject.toml` — Python tool dependencies (already covered by uv.lock; note but don't duplicate)
|
||||||
|
- `.tool-versions` — asdf version pins
|
||||||
|
- `.terraform-version` — tfenv pin
|
||||||
|
- `.ansible-version` — if present
|
||||||
|
- `Dockerfile` / `docker-compose.yml` — base image versions, tool installs
|
||||||
|
- `.github/workflows/*.yml` / `.gitlab-ci.yml` — CI tool install steps, version pins
|
||||||
|
- `ansible/requirements.yml` — **already captured by lockfile parser; do NOT include Galaxy collections here**
|
||||||
|
- Any `scripts/setup*.sh`, `scripts/bootstrap*.sh`, or `tools/` directory
|
||||||
|
|
||||||
|
2. **Identify system-level tools only** — tools that:
|
||||||
|
- Are invoked as CLI commands (e.g. `ansible-playbook`, `terraform`, `helm`, `kubectl`, `k3s`, `goss`, `age`, `sops`)
|
||||||
|
- Are NOT installed via `uv`/`pip`/`npm`/`cargo` into a project virtualenv (those are in lockfiles)
|
||||||
|
- Note: `ansible` itself as a CLI tool is a system dep even if `ansible-core` appears in `uv.lock`
|
||||||
|
|
||||||
|
3. **For each tool, determine**:
|
||||||
|
- `name`: canonical tool name (e.g. `ansible`, `terraform`, `helm`, `kubectl`, `k3s`, `goss`, `age`, `sops`, `cloud-init`)
|
||||||
|
- `version`: the pinned or documented version. Use `unknown` only if no evidence found anywhere.
|
||||||
|
- `ecosystem`: one of `python`, `node`, `rust`, `go`, `java`, `terraform`, `ansible`, `tool`, `other`
|
||||||
|
- Use `ansible` for Ansible itself; `terraform` for Terraform itself; `tool` for generic CLI tools
|
||||||
|
- `license_spdx`: the SPDX identifier. Common known licences (use these exact strings):
|
||||||
|
- ansible / ansible-core: `GPL-3.0-only`
|
||||||
|
- terraform ≤ 1.5.5: `MPL-2.0`; terraform ≥ 1.5.6: `BSL-1.1`
|
||||||
|
- helm: `Apache-2.0`
|
||||||
|
- kubectl: `Apache-2.0`
|
||||||
|
- k3s: `Apache-2.0`
|
||||||
|
- goss: `Apache-2.0`
|
||||||
|
- age: `BSD-3-Clause`
|
||||||
|
- sops: `MPL-2.0`
|
||||||
|
- cloud-init: `Apache-2.0` (or `GPL-3.0-only` for older versions — check)
|
||||||
|
- docker: `Apache-2.0`
|
||||||
|
- If unknown, use `null`
|
||||||
|
- `is_direct`: `true` if this repo directly declares/uses it; `false` if it's a transitive dependency of another tool
|
||||||
|
- `is_dev`: `true` only if the tool is only used for development/testing, not production operation
|
||||||
|
|
||||||
|
4. **Confidence annotation**: Add a `# confidence: high/medium/low` comment after each entry:
|
||||||
|
- `high`: version found explicitly pinned in a file
|
||||||
|
- `medium`: version inferred from context (e.g. "Ansible 12" in README)
|
||||||
|
- `low`: version not found; using `unknown` or a reasonable guess
|
||||||
|
|
||||||
|
5. **Do NOT include**:
|
||||||
|
- Python packages already covered by `uv.lock` or `requirements.txt`
|
||||||
|
- Ansible Galaxy collections (covered by `ansible/requirements.yml`)
|
||||||
|
- Terraform providers (covered by `.terraform.lock.hcl`)
|
||||||
|
- Node packages, Rust crates, etc. (covered by their lockfiles)
|
||||||
|
- Operating system packages unless the repo explicitly declares them
|
||||||
|
|
||||||
|
6. **Output format**: Emit ONLY the YAML block below — no prose, no markdown fences, no explanation. The output must be valid YAML that can be written directly to `sbom-tools.yaml`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Output format
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# sbom-tools.yaml — system-level tool dependencies for {repo_slug}
|
||||||
|
# Generated by sbom-capture-agent on {date}
|
||||||
|
# Review each entry before committing. Entries with confidence: low need human verification.
|
||||||
|
tools:
|
||||||
|
- name: example-tool
|
||||||
|
version: "1.2.3" # confidence: high
|
||||||
|
ecosystem: tool
|
||||||
|
license_spdx: Apache-2.0
|
||||||
|
is_direct: true
|
||||||
|
is_dev: false
|
||||||
|
```
|
||||||
|
|
||||||
|
If no system-level tools are found, output:
|
||||||
|
```yaml
|
||||||
|
# sbom-tools.yaml — system-level tool dependencies for {repo_slug}
|
||||||
|
# Generated by sbom-capture-agent on {date}
|
||||||
|
# No system-level tools identified — all dependencies are covered by lockfiles.
|
||||||
|
tools: []
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
Now read `{repo_path}` and produce the `sbom-tools.yaml` content.
|
||||||
187
state-hub/scripts/capture_sbom_tools.py
Normal file
187
state-hub/scripts/capture_sbom_tools.py
Normal file
@@ -0,0 +1,187 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Invoke the SBOM capture agent to generate/update sbom-tools.yaml for a repo.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python capture_sbom_tools.py --repo <slug> [--repo-path <path>] [--dry-run]
|
||||||
|
|
||||||
|
The script:
|
||||||
|
1. Resolves repo path from the state-hub API (if --repo-path is not given)
|
||||||
|
2. Loads the agent prompt from prompts/sbom-capture-agent.md
|
||||||
|
3. Substitutes {repo_slug}, {repo_path}, {date} placeholders
|
||||||
|
4. Invokes `claude -p "<prompt>"` non-interactively
|
||||||
|
5. Extracts the YAML block from the response
|
||||||
|
6. Writes (or shows diff of) sbom-tools.yaml in the repo root
|
||||||
|
|
||||||
|
Requirements:
|
||||||
|
- `claude` CLI must be on PATH (Claude Code)
|
||||||
|
- PyYAML must be available in the active venv
|
||||||
|
"""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import datetime
|
||||||
|
import difflib
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
import urllib.error
|
||||||
|
import urllib.request
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
API_BASE = os.environ.get("API_BASE", "http://127.0.0.1:8000").rstrip("/")
|
||||||
|
SCRIPT_DIR = Path(__file__).parent
|
||||||
|
PROMPT_FILE = SCRIPT_DIR.parent / "prompts" / "sbom-capture-agent.md"
|
||||||
|
|
||||||
|
|
||||||
|
def resolve_repo_path(repo_slug: str) -> Path | None:
|
||||||
|
"""Look up the registered path for a repo slug via the state-hub API."""
|
||||||
|
url = f"{API_BASE}/repos/{repo_slug}/"
|
||||||
|
try:
|
||||||
|
with urllib.request.urlopen(url, timeout=10) as resp:
|
||||||
|
data = json.loads(resp.read())
|
||||||
|
path_str = data.get("local_path")
|
||||||
|
if path_str:
|
||||||
|
return Path(path_str)
|
||||||
|
except (urllib.error.URLError, KeyError):
|
||||||
|
pass
|
||||||
|
return None
|
||||||
|
|
||||||
|
|
||||||
|
def load_prompt(repo_slug: str, repo_path: Path) -> str:
|
||||||
|
if not PROMPT_FILE.exists():
|
||||||
|
print(f"Error: prompt file not found at {PROMPT_FILE}", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
template = PROMPT_FILE.read_text()
|
||||||
|
today = datetime.date.today().isoformat()
|
||||||
|
return (
|
||||||
|
template
|
||||||
|
.replace("{repo_slug}", repo_slug)
|
||||||
|
.replace("{repo_path}", str(repo_path))
|
||||||
|
.replace("{date}", today)
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def invoke_agent(prompt: str) -> str:
|
||||||
|
"""Run `claude -p <prompt>` and return stdout."""
|
||||||
|
try:
|
||||||
|
result = subprocess.run(
|
||||||
|
["claude", "-p", prompt],
|
||||||
|
capture_output=True,
|
||||||
|
text=True,
|
||||||
|
timeout=120,
|
||||||
|
)
|
||||||
|
except FileNotFoundError:
|
||||||
|
print("Error: `claude` CLI not found on PATH. Install Claude Code.", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
print("Error: claude invocation timed out after 120s.", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
if result.returncode != 0:
|
||||||
|
print(f"Error: claude exited with code {result.returncode}", file=sys.stderr)
|
||||||
|
if result.stderr:
|
||||||
|
print(result.stderr, file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
return result.stdout
|
||||||
|
|
||||||
|
|
||||||
|
def extract_yaml(response: str) -> str:
|
||||||
|
"""Extract YAML content from the agent response.
|
||||||
|
|
||||||
|
Accepts:
|
||||||
|
- Raw YAML (starts with # or 'tools:')
|
||||||
|
- YAML wrapped in ```yaml ... ``` fences
|
||||||
|
"""
|
||||||
|
# Try fenced block first
|
||||||
|
m = re.search(r"```(?:yaml)?\s*\n(.*?)```", response, re.DOTALL)
|
||||||
|
if m:
|
||||||
|
return m.group(1).strip()
|
||||||
|
|
||||||
|
# Otherwise treat entire response as YAML
|
||||||
|
stripped = response.strip()
|
||||||
|
if stripped.startswith("#") or stripped.startswith("tools:"):
|
||||||
|
return stripped
|
||||||
|
|
||||||
|
print("Warning: could not extract YAML from agent response.", file=sys.stderr)
|
||||||
|
print("Raw response:", file=sys.stderr)
|
||||||
|
print(response[:500], file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
|
||||||
|
def show_diff(old: str | None, new: str, target: Path) -> None:
|
||||||
|
if old is None:
|
||||||
|
print(f"[new file] {target}")
|
||||||
|
for line in new.splitlines():
|
||||||
|
print(f" + {line}")
|
||||||
|
else:
|
||||||
|
diff = list(difflib.unified_diff(
|
||||||
|
old.splitlines(keepends=True),
|
||||||
|
new.splitlines(keepends=True),
|
||||||
|
fromfile=f"a/{target.name}",
|
||||||
|
tofile=f"b/{target.name}",
|
||||||
|
))
|
||||||
|
if diff:
|
||||||
|
print("".join(diff))
|
||||||
|
else:
|
||||||
|
print(f"[no changes] {target}")
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Generate/update sbom-tools.yaml for a repo using the SBOM capture agent."
|
||||||
|
)
|
||||||
|
parser.add_argument("--repo", required=True, help="Repo slug (e.g. 'railiance-infra')")
|
||||||
|
parser.add_argument("--repo-path", help="Path to repo root (auto-resolved from state-hub if omitted)")
|
||||||
|
parser.add_argument("--dry-run", action="store_true",
|
||||||
|
help="Show prompt and diff without writing sbom-tools.yaml")
|
||||||
|
parser.add_argument("--print-prompt", action="store_true",
|
||||||
|
help="Print the rendered prompt and exit (useful for inspection)")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
# Resolve repo path
|
||||||
|
if args.repo_path:
|
||||||
|
repo_path = Path(args.repo_path).resolve()
|
||||||
|
else:
|
||||||
|
repo_path = resolve_repo_path(args.repo)
|
||||||
|
if repo_path is None:
|
||||||
|
# Fall back to ~/repo_slug convention
|
||||||
|
repo_path = Path.home() / args.repo
|
||||||
|
print(f"Could not resolve path from API; trying {repo_path}", file=sys.stderr)
|
||||||
|
|
||||||
|
if not repo_path.exists():
|
||||||
|
print(f"Error: repo path does not exist: {repo_path}", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
target = repo_path / "sbom-tools.yaml"
|
||||||
|
existing_content = target.read_text() if target.exists() else None
|
||||||
|
|
||||||
|
prompt = load_prompt(args.repo, repo_path)
|
||||||
|
|
||||||
|
if args.print_prompt:
|
||||||
|
print(prompt)
|
||||||
|
return
|
||||||
|
|
||||||
|
print(f"Running SBOM capture agent for {args.repo} ({repo_path})…")
|
||||||
|
response = invoke_agent(prompt)
|
||||||
|
yaml_content = extract_yaml(response)
|
||||||
|
|
||||||
|
# Ensure trailing newline
|
||||||
|
if not yaml_content.endswith("\n"):
|
||||||
|
yaml_content += "\n"
|
||||||
|
|
||||||
|
show_diff(existing_content, yaml_content, target)
|
||||||
|
|
||||||
|
if args.dry_run:
|
||||||
|
print("\n[dry-run] sbom-tools.yaml not written.")
|
||||||
|
return
|
||||||
|
|
||||||
|
target.write_text(yaml_content)
|
||||||
|
print(f"\nWritten: {target}")
|
||||||
|
print("Review the file, correct any 'confidence: low' entries, then commit.")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
@@ -1,15 +1,19 @@
|
|||||||
#!/usr/bin/env python3
|
#!/usr/bin/env python3
|
||||||
"""Ingest a repo's lockfile into the State Hub SBOM store.
|
"""Ingest a repo's lockfiles and tool manifests into the State Hub SBOM store.
|
||||||
|
|
||||||
Usage:
|
Usage:
|
||||||
python ingest_sbom.py --repo <slug> [--lockfile <path>] [--api-base <url>]
|
python ingest_sbom.py --repo <slug> [--repo-path <path>] [--dry-run]
|
||||||
|
|
||||||
Auto-detects lockfile type:
|
Auto-detects all of the following in one scan:
|
||||||
uv.lock → Python ecosystem
|
uv.lock → python
|
||||||
requirements.txt → Python ecosystem (basic)
|
requirements.txt → python
|
||||||
package-lock.json → Node ecosystem
|
package-lock.json → node
|
||||||
yarn.lock → Node ecosystem
|
yarn.lock → node
|
||||||
Cargo.lock → Rust ecosystem
|
Cargo.lock → rust
|
||||||
|
.terraform.lock.hcl → terraform (anywhere in tree)
|
||||||
|
ansible/requirements.yml → ansible (anywhere under ansible/ dirs)
|
||||||
|
ansible/requirements.yaml → ansible
|
||||||
|
sbom-tools.yaml → tool (repo root; agent-generated)
|
||||||
"""
|
"""
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
@@ -22,11 +26,17 @@ import urllib.error
|
|||||||
import urllib.request
|
import urllib.request
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
|
try:
|
||||||
|
import yaml # optional; only needed for sbom-tools.yaml and ansible parsers
|
||||||
|
_YAML_AVAILABLE = True
|
||||||
|
except ImportError:
|
||||||
|
_YAML_AVAILABLE = False
|
||||||
|
|
||||||
API_BASE = os.environ.get("API_BASE", "http://127.0.0.1:8000").rstrip("/")
|
API_BASE = os.environ.get("API_BASE", "http://127.0.0.1:8000").rstrip("/")
|
||||||
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
# Lockfile parsers
|
# Lockfile parsers — each returns list[dict]
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
def _parse_uv_lock(path: Path) -> list[dict]:
|
def _parse_uv_lock(path: Path) -> list[dict]:
|
||||||
@@ -55,7 +65,7 @@ def _parse_uv_lock(path: Path) -> list[dict]:
|
|||||||
"package_version": e.get("package_version"),
|
"package_version": e.get("package_version"),
|
||||||
"ecosystem": "python",
|
"ecosystem": "python",
|
||||||
"license_spdx": None,
|
"license_spdx": None,
|
||||||
"is_direct": False, # uv.lock doesn't distinguish; treat all as transitive
|
"is_direct": False,
|
||||||
"is_dev": False,
|
"is_dev": False,
|
||||||
}
|
}
|
||||||
for e in entries
|
for e in entries
|
||||||
@@ -70,7 +80,6 @@ def _parse_requirements_txt(path: Path) -> list[dict]:
|
|||||||
line = line.strip()
|
line = line.strip()
|
||||||
if not line or line.startswith("#") or line.startswith("-"):
|
if not line or line.startswith("#") or line.startswith("-"):
|
||||||
continue
|
continue
|
||||||
# Handle: pkg==1.2.3, pkg>=1.2, pkg
|
|
||||||
m = re.match(r"^([A-Za-z0-9_.\-]+)(?:[>=<!~^]+([^\s;]+))?", line)
|
m = re.match(r"^([A-Za-z0-9_.\-]+)(?:[>=<!~^]+([^\s;]+))?", line)
|
||||||
if m:
|
if m:
|
||||||
entries.append({
|
entries.append({
|
||||||
@@ -95,7 +104,7 @@ def _parse_package_lock_json(path: Path) -> list[dict]:
|
|||||||
packages = data.get("packages", {})
|
packages = data.get("packages", {})
|
||||||
entries = []
|
entries = []
|
||||||
for pkg_path, info in packages.items():
|
for pkg_path, info in packages.items():
|
||||||
if not pkg_path: # root package
|
if not pkg_path:
|
||||||
continue
|
continue
|
||||||
name = info.get("name") or pkg_path.split("node_modules/")[-1]
|
name = info.get("name") or pkg_path.split("node_modules/")[-1]
|
||||||
entries.append({
|
entries.append({
|
||||||
@@ -120,8 +129,6 @@ def _parse_yarn_lock(path: Path) -> list[dict]:
|
|||||||
if not stripped or stripped.startswith("#"):
|
if not stripped or stripped.startswith("#"):
|
||||||
continue
|
continue
|
||||||
if not line.startswith(" ") and stripped.endswith(":"):
|
if not line.startswith(" ") and stripped.endswith(":"):
|
||||||
# New package block header: "name@version::" or "\"name@version\":"
|
|
||||||
# May list multiple versions: "name@^1.0, name@~1.0:"
|
|
||||||
current_names = []
|
current_names = []
|
||||||
current_version = None
|
current_version = None
|
||||||
for part in stripped.rstrip(":").split(","):
|
for part in stripped.rstrip(":").split(","):
|
||||||
@@ -188,12 +195,10 @@ def _parse_terraform_lock_hcl(path: Path) -> list[dict]:
|
|||||||
|
|
||||||
for line in path.read_text().splitlines():
|
for line in path.read_text().splitlines():
|
||||||
stripped = line.strip()
|
stripped = line.strip()
|
||||||
# e.g.: provider "registry.terraform.io/hetznercloud/hcloud" {
|
|
||||||
m = re.match(r'^provider\s+"([^"]+)"\s*\{', stripped)
|
m = re.match(r'^provider\s+"([^"]+)"\s*\{', stripped)
|
||||||
if m:
|
if m:
|
||||||
# Use full provider address as package_name, short name as display
|
|
||||||
full = m.group(1)
|
full = m.group(1)
|
||||||
current_name = full # e.g. "registry.terraform.io/hetznercloud/hcloud"
|
current_name = full
|
||||||
current_version = None
|
current_version = None
|
||||||
elif current_name is not None:
|
elif current_name is not None:
|
||||||
vm = re.match(r'version\s*=\s*"([^"]+)"', stripped)
|
vm = re.match(r'version\s*=\s*"([^"]+)"', stripped)
|
||||||
@@ -203,7 +208,7 @@ def _parse_terraform_lock_hcl(path: Path) -> list[dict]:
|
|||||||
entries.append({
|
entries.append({
|
||||||
"package_name": current_name,
|
"package_name": current_name,
|
||||||
"package_version": current_version,
|
"package_version": current_version,
|
||||||
"ecosystem": "other", # "terraform" not yet in ENUM; tracked as other
|
"ecosystem": "terraform",
|
||||||
"license_spdx": None,
|
"license_spdx": None,
|
||||||
"is_direct": True,
|
"is_direct": True,
|
||||||
"is_dev": False,
|
"is_dev": False,
|
||||||
@@ -214,7 +219,114 @@ def _parse_terraform_lock_hcl(path: Path) -> list[dict]:
|
|||||||
return entries
|
return entries
|
||||||
|
|
||||||
|
|
||||||
_LOCKFILE_PARSERS = {
|
def _parse_ansible_requirements(path: Path) -> list[dict]:
|
||||||
|
"""Parse ansible/requirements.yml — collections and roles from Ansible Galaxy."""
|
||||||
|
if not _YAML_AVAILABLE:
|
||||||
|
print(f"Warning: PyYAML not available; skipping {path}", file=sys.stderr)
|
||||||
|
return []
|
||||||
|
|
||||||
|
try:
|
||||||
|
data = yaml.safe_load(path.read_text())
|
||||||
|
except yaml.YAMLError as e:
|
||||||
|
print(f"Warning: cannot parse {path}: {e}", file=sys.stderr)
|
||||||
|
return []
|
||||||
|
|
||||||
|
if not isinstance(data, dict):
|
||||||
|
return []
|
||||||
|
|
||||||
|
entries = []
|
||||||
|
|
||||||
|
for item in data.get("collections", []) or []:
|
||||||
|
if isinstance(item, str):
|
||||||
|
name, version = item, None
|
||||||
|
elif isinstance(item, dict):
|
||||||
|
name = item.get("name", "")
|
||||||
|
version = str(item.get("version", "")) if item.get("version") else None
|
||||||
|
else:
|
||||||
|
continue
|
||||||
|
if name:
|
||||||
|
entries.append({
|
||||||
|
"package_name": name,
|
||||||
|
"package_version": version,
|
||||||
|
"ecosystem": "ansible",
|
||||||
|
"license_spdx": None,
|
||||||
|
"is_direct": True,
|
||||||
|
"is_dev": False,
|
||||||
|
})
|
||||||
|
|
||||||
|
for item in data.get("roles", []) or []:
|
||||||
|
if isinstance(item, str):
|
||||||
|
name, version = item, None
|
||||||
|
elif isinstance(item, dict):
|
||||||
|
name = item.get("name", item.get("src", ""))
|
||||||
|
version = str(item.get("version", "")) if item.get("version") else None
|
||||||
|
else:
|
||||||
|
continue
|
||||||
|
if name:
|
||||||
|
entries.append({
|
||||||
|
"package_name": name,
|
||||||
|
"package_version": version,
|
||||||
|
"ecosystem": "ansible",
|
||||||
|
"license_spdx": None,
|
||||||
|
"is_direct": True,
|
||||||
|
"is_dev": False,
|
||||||
|
})
|
||||||
|
|
||||||
|
return entries
|
||||||
|
|
||||||
|
|
||||||
|
def _parse_sbom_tools_yaml(path: Path) -> list[dict]:
|
||||||
|
"""Parse sbom-tools.yaml — agent-generated tool manifest at repo root."""
|
||||||
|
if not _YAML_AVAILABLE:
|
||||||
|
print(f"Warning: PyYAML not available; skipping {path}", file=sys.stderr)
|
||||||
|
return []
|
||||||
|
|
||||||
|
try:
|
||||||
|
data = yaml.safe_load(path.read_text())
|
||||||
|
except yaml.YAMLError as e:
|
||||||
|
print(f"Warning: cannot parse {path}: {e}", file=sys.stderr)
|
||||||
|
return []
|
||||||
|
|
||||||
|
if not isinstance(data, dict):
|
||||||
|
return []
|
||||||
|
|
||||||
|
entries = []
|
||||||
|
valid_ecosystems = {
|
||||||
|
"python", "node", "rust", "go", "java",
|
||||||
|
"terraform", "ansible", "tool", "other",
|
||||||
|
}
|
||||||
|
|
||||||
|
for item in data.get("tools", []) or []:
|
||||||
|
if not isinstance(item, dict):
|
||||||
|
continue
|
||||||
|
name = item.get("name", "")
|
||||||
|
version = str(item.get("version", "")) if item.get("version") else None
|
||||||
|
if version == "unknown":
|
||||||
|
print(f" Warning: tool '{name}' has version=unknown — flagged for review", file=sys.stderr)
|
||||||
|
version = None
|
||||||
|
ecosystem = item.get("ecosystem", "tool")
|
||||||
|
if ecosystem not in valid_ecosystems:
|
||||||
|
print(f" Warning: unknown ecosystem '{ecosystem}' for '{name}'; using 'tool'", file=sys.stderr)
|
||||||
|
ecosystem = "tool"
|
||||||
|
license_spdx = item.get("license_spdx") or None
|
||||||
|
entries.append({
|
||||||
|
"package_name": name,
|
||||||
|
"package_version": version,
|
||||||
|
"ecosystem": ecosystem,
|
||||||
|
"license_spdx": license_spdx,
|
||||||
|
"is_direct": bool(item.get("is_direct", True)),
|
||||||
|
"is_dev": bool(item.get("is_dev", False)),
|
||||||
|
})
|
||||||
|
|
||||||
|
return entries
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Detection helpers
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
# Filename → parser for standard lockfiles (detected by filename anywhere in tree)
|
||||||
|
_LOCKFILE_PARSERS: dict[str, object] = {
|
||||||
"uv.lock": _parse_uv_lock,
|
"uv.lock": _parse_uv_lock,
|
||||||
"requirements.txt": _parse_requirements_txt,
|
"requirements.txt": _parse_requirements_txt,
|
||||||
"package-lock.json": _parse_package_lock_json,
|
"package-lock.json": _parse_package_lock_json,
|
||||||
@@ -234,6 +346,47 @@ _SKIP_DIRS = {
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def detect_all(repo_path: Path) -> list[tuple[Path, str, object]]:
|
||||||
|
"""Scan repo_path and return all discovered dependency sources.
|
||||||
|
|
||||||
|
Returns list of (path, label, parser_fn) tuples covering:
|
||||||
|
- Standard lockfiles (anywhere in tree)
|
||||||
|
- Ansible requirements files (in ansible/ subdirs)
|
||||||
|
- sbom-tools.yaml at repo root
|
||||||
|
"""
|
||||||
|
found: list[tuple[Path, str, object]] = []
|
||||||
|
seen_paths: set[Path] = set()
|
||||||
|
|
||||||
|
# Walk tree for all source types
|
||||||
|
for dirpath, dirnames, filenames in os.walk(repo_path):
|
||||||
|
dirnames[:] = sorted(d for d in dirnames if d not in _SKIP_DIRS)
|
||||||
|
dirpath_p = Path(dirpath)
|
||||||
|
|
||||||
|
# Standard lockfiles
|
||||||
|
for fname, parser in _LOCKFILE_PARSERS.items():
|
||||||
|
if fname in filenames:
|
||||||
|
p = dirpath_p / fname
|
||||||
|
if p not in seen_paths:
|
||||||
|
found.append((p, fname, parser))
|
||||||
|
seen_paths.add(p)
|
||||||
|
|
||||||
|
# Ansible requirements files — only under directories named "ansible"
|
||||||
|
if dirpath_p.name == "ansible":
|
||||||
|
for fname in ("requirements.yml", "requirements.yaml"):
|
||||||
|
if fname in filenames:
|
||||||
|
p = dirpath_p / fname
|
||||||
|
if p not in seen_paths:
|
||||||
|
found.append((p, f"ansible/{fname}", _parse_ansible_requirements))
|
||||||
|
seen_paths.add(p)
|
||||||
|
|
||||||
|
# sbom-tools.yaml at repo root only
|
||||||
|
tools_manifest = repo_path / "sbom-tools.yaml"
|
||||||
|
if tools_manifest.exists() and tools_manifest not in seen_paths:
|
||||||
|
found.append((tools_manifest, "sbom-tools.yaml", _parse_sbom_tools_yaml))
|
||||||
|
|
||||||
|
return found
|
||||||
|
|
||||||
|
|
||||||
def detect_lockfile(repo_path: Path) -> tuple[Path, str] | None:
|
def detect_lockfile(repo_path: Path) -> tuple[Path, str] | None:
|
||||||
"""Return (lockfile_path, filename) for the first recognised lockfile at repo root."""
|
"""Return (lockfile_path, filename) for the first recognised lockfile at repo root."""
|
||||||
for name in _LOCKFILE_PARSERS:
|
for name in _LOCKFILE_PARSERS:
|
||||||
@@ -244,7 +397,10 @@ def detect_lockfile(repo_path: Path) -> tuple[Path, str] | None:
|
|||||||
|
|
||||||
|
|
||||||
def detect_lockfiles_recursive(repo_path: Path) -> list[Path]:
|
def detect_lockfiles_recursive(repo_path: Path) -> list[Path]:
|
||||||
"""Walk repo_path and return all recognised lockfiles, skipping non-dep dirs."""
|
"""Walk repo_path and return all recognised lockfiles, skipping non-dep dirs.
|
||||||
|
|
||||||
|
Kept for backwards compatibility; prefer detect_all() for new code.
|
||||||
|
"""
|
||||||
found: list[Path] = []
|
found: list[Path] = []
|
||||||
for dirpath, dirnames, filenames in os.walk(repo_path):
|
for dirpath, dirnames, filenames in os.walk(repo_path):
|
||||||
dirnames[:] = sorted(d for d in dirnames if d not in _SKIP_DIRS)
|
dirnames[:] = sorted(d for d in dirnames if d not in _SKIP_DIRS)
|
||||||
@@ -292,52 +448,47 @@ def post_ingest(api_base: str, repo_slug: str, entries: list[dict]) -> dict:
|
|||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
def main() -> None:
|
def main() -> None:
|
||||||
parser = argparse.ArgumentParser(description="Ingest a repo's lockfiles into the State Hub SBOM store.")
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Ingest a repo's lockfiles and tool manifests into the State Hub SBOM store."
|
||||||
|
)
|
||||||
parser.add_argument("--repo", required=True, help="Managed-repo slug (e.g. 'the-custodian')")
|
parser.add_argument("--repo", required=True, help="Managed-repo slug (e.g. 'the-custodian')")
|
||||||
parser.add_argument("--lockfile", action="append", dest="lockfiles",
|
parser.add_argument("--lockfile", action="append", dest="lockfiles",
|
||||||
metavar="PATH", help="Path to a specific lockfile (repeatable)")
|
metavar="PATH", help="Path to a specific lockfile (repeatable)")
|
||||||
parser.add_argument("--repo-path", default=".", help="Repo root for auto-detection/scan (default: cwd)")
|
parser.add_argument("--repo-path", default=".", help="Repo root for auto-detection/scan (default: cwd)")
|
||||||
parser.add_argument("--scan", action="store_true",
|
parser.add_argument("--scan", action="store_true",
|
||||||
help="Recursively find ALL lockfiles under --repo-path (handles multi-ecosystem repos)")
|
help="Recursively find ALL lockfiles under --repo-path (deprecated; now default behaviour)")
|
||||||
parser.add_argument("--api-base", default=API_BASE, help="State Hub API base URL")
|
parser.add_argument("--api-base", default=API_BASE, help="State Hub API base URL")
|
||||||
parser.add_argument("--dry-run", action="store_true", help="Parse only — do not submit")
|
parser.add_argument("--dry-run", action="store_true", help="Parse only — do not submit")
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|
||||||
repo_root = Path(args.repo_path).resolve()
|
repo_root = Path(args.repo_path).resolve()
|
||||||
lockfile_paths: list[Path] = []
|
all_entries: list[dict] = []
|
||||||
|
|
||||||
if args.lockfiles:
|
if args.lockfiles:
|
||||||
lockfile_paths = [Path(lf).resolve() for lf in args.lockfiles]
|
# Explicit paths: parse each, detect parser by filename
|
||||||
elif args.scan:
|
for lf_str in args.lockfiles:
|
||||||
lockfile_paths = detect_lockfiles_recursive(repo_root)
|
lf = Path(lf_str).resolve()
|
||||||
if not lockfile_paths:
|
parsed = parse_lockfile(lf)
|
||||||
print(f"No lockfiles found under '{repo_root}'.", file=sys.stderr)
|
rel = lf.relative_to(repo_root) if lf.is_relative_to(repo_root) else lf
|
||||||
sys.exit(1)
|
print(f" {rel}: {len(parsed)} packages")
|
||||||
print(f"Scan found {len(lockfile_paths)} lockfile(s):")
|
all_entries.extend(parsed)
|
||||||
for lf in lockfile_paths:
|
|
||||||
print(f" {lf.relative_to(repo_root) if lf.is_relative_to(repo_root) else lf}")
|
|
||||||
else:
|
else:
|
||||||
found = detect_lockfile(repo_root)
|
# Comprehensive auto-detection: all mechanisms in one scan
|
||||||
if not found:
|
sources = detect_all(repo_root)
|
||||||
|
if not sources:
|
||||||
print(
|
print(
|
||||||
f"No recognised lockfile found in '{repo_root}'. "
|
f"No recognised dependency sources found in '{repo_root}'.",
|
||||||
f"Supported: {', '.join(_LOCKFILE_PARSERS)}. "
|
|
||||||
"Use --scan to search subdirectories.",
|
|
||||||
file=sys.stderr,
|
file=sys.stderr,
|
||||||
)
|
)
|
||||||
sys.exit(1)
|
sys.exit(1)
|
||||||
lockfile_path, _ = found
|
|
||||||
print(f"Auto-detected: {lockfile_path}")
|
|
||||||
lockfile_paths = [lockfile_path]
|
|
||||||
|
|
||||||
all_entries: list[dict] = []
|
for src_path, label, parser_fn in sources:
|
||||||
for lf in lockfile_paths:
|
parsed = parser_fn(src_path)
|
||||||
parsed = parse_lockfile(lf)
|
rel = src_path.relative_to(repo_root) if src_path.is_relative_to(repo_root) else src_path
|
||||||
rel = lf.relative_to(repo_root) if lf.is_relative_to(repo_root) else lf
|
print(f" {label} ({rel}): {len(parsed)} entries")
|
||||||
print(f" {rel}: {len(parsed)} packages")
|
all_entries.extend(parsed)
|
||||||
all_entries.extend(parsed)
|
|
||||||
|
|
||||||
print(f"Total: {len(all_entries)} packages across {len(lockfile_paths)} lockfile(s)")
|
print(f"Total: {len(all_entries)} entries")
|
||||||
|
|
||||||
if args.dry_run:
|
if args.dry_run:
|
||||||
print(json.dumps(all_entries[:5], indent=2))
|
print(json.dumps(all_entries[:5], indent=2))
|
||||||
|
|||||||
397
state-hub/tests/test_ingest_sbom.py
Normal file
397
state-hub/tests/test_ingest_sbom.py
Normal file
@@ -0,0 +1,397 @@
|
|||||||
|
"""Unit tests for ingest_sbom.py parsers and auto-detection."""
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import json
|
||||||
|
import sys
|
||||||
|
import textwrap
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
# Make scripts/ importable
|
||||||
|
sys.path.insert(0, str(Path(__file__).parent.parent / "scripts"))
|
||||||
|
import ingest_sbom as sb
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Terraform parser
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
TERRAFORM_LOCK = textwrap.dedent("""\
|
||||||
|
provider "registry.terraform.io/hashicorp/template" {
|
||||||
|
version = "2.2.0"
|
||||||
|
constraints = ">= 2.0.0"
|
||||||
|
hashes = [
|
||||||
|
"h1:abc123",
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
provider "registry.terraform.io/hetznercloud/hcloud" {
|
||||||
|
version = "1.52.0"
|
||||||
|
constraints = ">= 1.40.0"
|
||||||
|
}
|
||||||
|
""")
|
||||||
|
|
||||||
|
|
||||||
|
def test_terraform_parser_ecosystem(tmp_path):
|
||||||
|
lock = tmp_path / ".terraform.lock.hcl"
|
||||||
|
lock.write_text(TERRAFORM_LOCK)
|
||||||
|
entries = sb._parse_terraform_lock_hcl(lock)
|
||||||
|
assert len(entries) == 2
|
||||||
|
for e in entries:
|
||||||
|
assert e["ecosystem"] == "terraform", f"expected terraform, got {e['ecosystem']}"
|
||||||
|
names = {e["package_name"] for e in entries}
|
||||||
|
assert "registry.terraform.io/hashicorp/template" in names
|
||||||
|
assert "registry.terraform.io/hetznercloud/hcloud" in names
|
||||||
|
|
||||||
|
|
||||||
|
def test_terraform_parser_versions(tmp_path):
|
||||||
|
lock = tmp_path / ".terraform.lock.hcl"
|
||||||
|
lock.write_text(TERRAFORM_LOCK)
|
||||||
|
entries = sb._parse_terraform_lock_hcl(lock)
|
||||||
|
by_name = {e["package_name"]: e for e in entries}
|
||||||
|
assert by_name["registry.terraform.io/hashicorp/template"]["package_version"] == "2.2.0"
|
||||||
|
assert by_name["registry.terraform.io/hetznercloud/hcloud"]["package_version"] == "1.52.0"
|
||||||
|
|
||||||
|
|
||||||
|
def test_terraform_parser_is_direct(tmp_path):
|
||||||
|
lock = tmp_path / ".terraform.lock.hcl"
|
||||||
|
lock.write_text(TERRAFORM_LOCK)
|
||||||
|
entries = sb._parse_terraform_lock_hcl(lock)
|
||||||
|
assert all(e["is_direct"] for e in entries)
|
||||||
|
|
||||||
|
|
||||||
|
def test_terraform_parser_empty(tmp_path):
|
||||||
|
lock = tmp_path / ".terraform.lock.hcl"
|
||||||
|
lock.write_text("# no providers\n")
|
||||||
|
entries = sb._parse_terraform_lock_hcl(lock)
|
||||||
|
assert entries == []
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# Ansible Galaxy parser
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
ANSIBLE_REQUIREMENTS_FULL = textwrap.dedent("""\
|
||||||
|
collections:
|
||||||
|
- name: community.general
|
||||||
|
version: "9.5.0"
|
||||||
|
- name: ansible.posix
|
||||||
|
version: "1.6.0"
|
||||||
|
- community.crypto
|
||||||
|
|
||||||
|
roles:
|
||||||
|
- name: geerlingguy.docker
|
||||||
|
version: "6.1.0"
|
||||||
|
- geerlingguy.pip
|
||||||
|
""")
|
||||||
|
|
||||||
|
ANSIBLE_REQUIREMENTS_EMPTY = textwrap.dedent("""\
|
||||||
|
collections: []
|
||||||
|
roles: []
|
||||||
|
""")
|
||||||
|
|
||||||
|
ANSIBLE_REQUIREMENTS_COLLECTIONS_ONLY = textwrap.dedent("""\
|
||||||
|
collections:
|
||||||
|
- name: community.general
|
||||||
|
version: "9.0.0"
|
||||||
|
""")
|
||||||
|
|
||||||
|
|
||||||
|
def test_ansible_parser_collections_and_roles(tmp_path):
|
||||||
|
req = tmp_path / "requirements.yml"
|
||||||
|
req.write_text(ANSIBLE_REQUIREMENTS_FULL)
|
||||||
|
entries = sb._parse_ansible_requirements(req)
|
||||||
|
assert len(entries) == 5
|
||||||
|
names = {e["package_name"] for e in entries}
|
||||||
|
assert "community.general" in names
|
||||||
|
assert "ansible.posix" in names
|
||||||
|
assert "community.crypto" in names
|
||||||
|
assert "geerlingguy.docker" in names
|
||||||
|
assert "geerlingguy.pip" in names
|
||||||
|
|
||||||
|
|
||||||
|
def test_ansible_parser_ecosystem(tmp_path):
|
||||||
|
req = tmp_path / "requirements.yml"
|
||||||
|
req.write_text(ANSIBLE_REQUIREMENTS_FULL)
|
||||||
|
entries = sb._parse_ansible_requirements(req)
|
||||||
|
for e in entries:
|
||||||
|
assert e["ecosystem"] == "ansible"
|
||||||
|
|
||||||
|
|
||||||
|
def test_ansible_parser_versions(tmp_path):
|
||||||
|
req = tmp_path / "requirements.yml"
|
||||||
|
req.write_text(ANSIBLE_REQUIREMENTS_FULL)
|
||||||
|
entries = sb._parse_ansible_requirements(req)
|
||||||
|
by_name = {e["package_name"]: e for e in entries}
|
||||||
|
assert by_name["community.general"]["package_version"] == "9.5.0"
|
||||||
|
assert by_name["ansible.posix"]["package_version"] == "1.6.0"
|
||||||
|
assert by_name["community.crypto"]["package_version"] is None # no version specified
|
||||||
|
assert by_name["geerlingguy.docker"]["package_version"] == "6.1.0"
|
||||||
|
assert by_name["geerlingguy.pip"]["package_version"] is None
|
||||||
|
|
||||||
|
|
||||||
|
def test_ansible_parser_is_direct(tmp_path):
|
||||||
|
req = tmp_path / "requirements.yml"
|
||||||
|
req.write_text(ANSIBLE_REQUIREMENTS_FULL)
|
||||||
|
entries = sb._parse_ansible_requirements(req)
|
||||||
|
assert all(e["is_direct"] for e in entries)
|
||||||
|
|
||||||
|
|
||||||
|
def test_ansible_parser_empty(tmp_path):
|
||||||
|
req = tmp_path / "requirements.yml"
|
||||||
|
req.write_text(ANSIBLE_REQUIREMENTS_EMPTY)
|
||||||
|
entries = sb._parse_ansible_requirements(req)
|
||||||
|
assert entries == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_ansible_parser_collections_only(tmp_path):
|
||||||
|
req = tmp_path / "requirements.yml"
|
||||||
|
req.write_text(ANSIBLE_REQUIREMENTS_COLLECTIONS_ONLY)
|
||||||
|
entries = sb._parse_ansible_requirements(req)
|
||||||
|
assert len(entries) == 1
|
||||||
|
assert entries[0]["package_name"] == "community.general"
|
||||||
|
|
||||||
|
|
||||||
|
def test_ansible_parser_yaml_extension(tmp_path):
|
||||||
|
"""Both .yml and .yaml extensions must work."""
|
||||||
|
req = tmp_path / "requirements.yaml"
|
||||||
|
req.write_text(ANSIBLE_REQUIREMENTS_COLLECTIONS_ONLY)
|
||||||
|
entries = sb._parse_ansible_requirements(req)
|
||||||
|
assert len(entries) == 1
|
||||||
|
|
||||||
|
|
||||||
|
def test_ansible_parser_invalid_yaml(tmp_path, capsys):
|
||||||
|
req = tmp_path / "requirements.yml"
|
||||||
|
req.write_text("collections: [unclosed")
|
||||||
|
entries = sb._parse_ansible_requirements(req)
|
||||||
|
assert entries == []
|
||||||
|
captured = capsys.readouterr()
|
||||||
|
assert "Warning" in captured.err
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# sbom-tools.yaml parser
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
SBOM_TOOLS_YAML = textwrap.dedent("""\
|
||||||
|
tools:
|
||||||
|
- name: ansible
|
||||||
|
version: "12.3.0"
|
||||||
|
ecosystem: ansible
|
||||||
|
license_spdx: GPL-3.0-only
|
||||||
|
is_direct: true
|
||||||
|
is_dev: false
|
||||||
|
- name: terraform
|
||||||
|
version: "1.10.5"
|
||||||
|
ecosystem: terraform
|
||||||
|
license_spdx: BSL-1.1
|
||||||
|
is_direct: true
|
||||||
|
is_dev: false
|
||||||
|
- name: helm
|
||||||
|
version: "3.17.1"
|
||||||
|
ecosystem: tool
|
||||||
|
license_spdx: Apache-2.0
|
||||||
|
is_direct: true
|
||||||
|
is_dev: false
|
||||||
|
- name: k3s
|
||||||
|
version: unknown
|
||||||
|
ecosystem: other
|
||||||
|
license_spdx: Apache-2.0
|
||||||
|
is_direct: true
|
||||||
|
is_dev: false
|
||||||
|
""")
|
||||||
|
|
||||||
|
SBOM_TOOLS_YAML_MINIMAL = textwrap.dedent("""\
|
||||||
|
tools:
|
||||||
|
- name: kubectl
|
||||||
|
ecosystem: tool
|
||||||
|
""")
|
||||||
|
|
||||||
|
|
||||||
|
def test_sbom_tools_parser_basic(tmp_path):
|
||||||
|
manifest = tmp_path / "sbom-tools.yaml"
|
||||||
|
manifest.write_text(SBOM_TOOLS_YAML)
|
||||||
|
entries = sb._parse_sbom_tools_yaml(manifest)
|
||||||
|
assert len(entries) == 4
|
||||||
|
names = {e["package_name"] for e in entries}
|
||||||
|
assert {"ansible", "terraform", "helm", "k3s"} == names
|
||||||
|
|
||||||
|
|
||||||
|
def test_sbom_tools_parser_ecosystems(tmp_path):
|
||||||
|
manifest = tmp_path / "sbom-tools.yaml"
|
||||||
|
manifest.write_text(SBOM_TOOLS_YAML)
|
||||||
|
entries = sb._parse_sbom_tools_yaml(manifest)
|
||||||
|
by_name = {e["package_name"]: e for e in entries}
|
||||||
|
assert by_name["ansible"]["ecosystem"] == "ansible"
|
||||||
|
assert by_name["terraform"]["ecosystem"] == "terraform"
|
||||||
|
assert by_name["helm"]["ecosystem"] == "tool"
|
||||||
|
assert by_name["k3s"]["ecosystem"] == "other"
|
||||||
|
|
||||||
|
|
||||||
|
def test_sbom_tools_parser_licenses(tmp_path):
|
||||||
|
manifest = tmp_path / "sbom-tools.yaml"
|
||||||
|
manifest.write_text(SBOM_TOOLS_YAML)
|
||||||
|
entries = sb._parse_sbom_tools_yaml(manifest)
|
||||||
|
by_name = {e["package_name"]: e for e in entries}
|
||||||
|
assert by_name["ansible"]["license_spdx"] == "GPL-3.0-only"
|
||||||
|
assert by_name["terraform"]["license_spdx"] == "BSL-1.1"
|
||||||
|
assert by_name["helm"]["license_spdx"] == "Apache-2.0"
|
||||||
|
|
||||||
|
|
||||||
|
def test_sbom_tools_parser_unknown_version_becomes_none(tmp_path, capsys):
|
||||||
|
"""version: unknown must be converted to None and emit a warning."""
|
||||||
|
manifest = tmp_path / "sbom-tools.yaml"
|
||||||
|
manifest.write_text(SBOM_TOOLS_YAML)
|
||||||
|
entries = sb._parse_sbom_tools_yaml(manifest)
|
||||||
|
by_name = {e["package_name"]: e for e in entries}
|
||||||
|
assert by_name["k3s"]["package_version"] is None
|
||||||
|
captured = capsys.readouterr()
|
||||||
|
assert "unknown" in captured.err
|
||||||
|
|
||||||
|
|
||||||
|
def test_sbom_tools_parser_minimal_entry(tmp_path):
|
||||||
|
"""Only 'name' and 'ecosystem' required; version and license default to None."""
|
||||||
|
manifest = tmp_path / "sbom-tools.yaml"
|
||||||
|
manifest.write_text(SBOM_TOOLS_YAML_MINIMAL)
|
||||||
|
entries = sb._parse_sbom_tools_yaml(manifest)
|
||||||
|
assert len(entries) == 1
|
||||||
|
e = entries[0]
|
||||||
|
assert e["package_name"] == "kubectl"
|
||||||
|
assert e["ecosystem"] == "tool"
|
||||||
|
assert e["package_version"] is None
|
||||||
|
assert e["license_spdx"] is None
|
||||||
|
assert e["is_direct"] is True
|
||||||
|
assert e["is_dev"] is False
|
||||||
|
|
||||||
|
|
||||||
|
def test_sbom_tools_parser_invalid_ecosystem_falls_back(tmp_path, capsys):
|
||||||
|
manifest = tmp_path / "sbom-tools.yaml"
|
||||||
|
manifest.write_text("tools:\n - name: foo\n ecosystem: nonsense\n")
|
||||||
|
entries = sb._parse_sbom_tools_yaml(manifest)
|
||||||
|
assert entries[0]["ecosystem"] == "tool"
|
||||||
|
captured = capsys.readouterr()
|
||||||
|
assert "Warning" in captured.err
|
||||||
|
|
||||||
|
|
||||||
|
def test_sbom_tools_parser_empty_tools(tmp_path):
|
||||||
|
manifest = tmp_path / "sbom-tools.yaml"
|
||||||
|
manifest.write_text("tools: []\n")
|
||||||
|
entries = sb._parse_sbom_tools_yaml(manifest)
|
||||||
|
assert entries == []
|
||||||
|
|
||||||
|
|
||||||
|
def test_sbom_tools_parser_invalid_yaml(tmp_path, capsys):
|
||||||
|
manifest = tmp_path / "sbom-tools.yaml"
|
||||||
|
manifest.write_text("tools: {bad yaml: [unclosed")
|
||||||
|
entries = sb._parse_sbom_tools_yaml(manifest)
|
||||||
|
assert entries == []
|
||||||
|
captured = capsys.readouterr()
|
||||||
|
assert "Warning" in captured.err
|
||||||
|
|
||||||
|
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
# detect_all — comprehensive multi-parser scan
|
||||||
|
# ---------------------------------------------------------------------------
|
||||||
|
|
||||||
|
def test_detect_all_uv_lock(tmp_path):
|
||||||
|
(tmp_path / "uv.lock").write_text("[[package]]\nname = \"typer\"\nversion = \"0.12.0\"\n")
|
||||||
|
sources = sb.detect_all(tmp_path)
|
||||||
|
labels = {label for _, label, _ in sources}
|
||||||
|
assert "uv.lock" in labels
|
||||||
|
|
||||||
|
|
||||||
|
def test_detect_all_terraform_lock(tmp_path):
|
||||||
|
tf_dir = tmp_path / "terraform" / "hetzner"
|
||||||
|
tf_dir.mkdir(parents=True)
|
||||||
|
(tf_dir / ".terraform.lock.hcl").write_text(
|
||||||
|
'provider "registry.terraform.io/hetznercloud/hcloud" {\n version = "1.52.0"\n}\n'
|
||||||
|
)
|
||||||
|
sources = sb.detect_all(tmp_path)
|
||||||
|
labels = {label for _, label, _ in sources}
|
||||||
|
assert ".terraform.lock.hcl" in labels
|
||||||
|
|
||||||
|
|
||||||
|
def test_detect_all_ansible_requirements(tmp_path):
|
||||||
|
ansible_dir = tmp_path / "ansible"
|
||||||
|
ansible_dir.mkdir()
|
||||||
|
(ansible_dir / "requirements.yml").write_text("collections:\n - name: community.general\n")
|
||||||
|
sources = sb.detect_all(tmp_path)
|
||||||
|
labels = {label for _, label, _ in sources}
|
||||||
|
assert "ansible/requirements.yml" in labels
|
||||||
|
|
||||||
|
|
||||||
|
def test_detect_all_sbom_tools_yaml(tmp_path):
|
||||||
|
(tmp_path / "sbom-tools.yaml").write_text("tools:\n - name: helm\n ecosystem: tool\n")
|
||||||
|
sources = sb.detect_all(tmp_path)
|
||||||
|
labels = {label for _, label, _ in sources}
|
||||||
|
assert "sbom-tools.yaml" in labels
|
||||||
|
|
||||||
|
|
||||||
|
def test_detect_all_multi_ecosystem(tmp_path):
|
||||||
|
"""A repo with Python + Terraform + Ansible + tools manifest yields all four."""
|
||||||
|
# Python
|
||||||
|
(tmp_path / "uv.lock").write_text("[[package]]\nname = \"typer\"\nversion = \"0.12.0\"\n")
|
||||||
|
# Terraform
|
||||||
|
tf_dir = tmp_path / "terraform"
|
||||||
|
tf_dir.mkdir()
|
||||||
|
(tf_dir / ".terraform.lock.hcl").write_text(
|
||||||
|
'provider "registry.terraform.io/hashicorp/null" {\n version = "3.2.3"\n}\n'
|
||||||
|
)
|
||||||
|
# Ansible
|
||||||
|
ansible_dir = tmp_path / "ansible"
|
||||||
|
ansible_dir.mkdir()
|
||||||
|
(ansible_dir / "requirements.yml").write_text("collections:\n - name: ansible.posix\n version: \"1.6.0\"\n")
|
||||||
|
# Tool manifest
|
||||||
|
(tmp_path / "sbom-tools.yaml").write_text("tools:\n - name: helm\n ecosystem: tool\n version: \"3.17.1\"\n")
|
||||||
|
|
||||||
|
sources = sb.detect_all(tmp_path)
|
||||||
|
labels = {label for _, label, _ in sources}
|
||||||
|
assert "uv.lock" in labels
|
||||||
|
assert ".terraform.lock.hcl" in labels
|
||||||
|
assert "ansible/requirements.yml" in labels
|
||||||
|
assert "sbom-tools.yaml" in labels
|
||||||
|
|
||||||
|
# Parse all and verify merged entries
|
||||||
|
all_entries = []
|
||||||
|
for path, label, parser_fn in sources:
|
||||||
|
all_entries.extend(parser_fn(path))
|
||||||
|
|
||||||
|
ecosystems = {e["ecosystem"] for e in all_entries}
|
||||||
|
assert "python" in ecosystems
|
||||||
|
assert "terraform" in ecosystems
|
||||||
|
assert "ansible" in ecosystems
|
||||||
|
assert "tool" in ecosystems
|
||||||
|
|
||||||
|
|
||||||
|
def test_detect_all_skips_venv(tmp_path):
|
||||||
|
"""Lockfiles inside .venv must be ignored."""
|
||||||
|
venv_dir = tmp_path / ".venv" / "lib"
|
||||||
|
venv_dir.mkdir(parents=True)
|
||||||
|
(venv_dir / "requirements.txt").write_text("requests==2.31.0\n")
|
||||||
|
sources = sb.detect_all(tmp_path)
|
||||||
|
paths = {str(p) for p, _, _ in sources}
|
||||||
|
assert not any(".venv" in p for p in paths)
|
||||||
|
|
||||||
|
|
||||||
|
def test_detect_all_ansible_req_only_in_ansible_dir(tmp_path):
|
||||||
|
"""requirements.yml at repo root (not in ansible/) should not be picked up as ansible."""
|
||||||
|
(tmp_path / "requirements.yml").write_text("collections:\n - name: community.general\n")
|
||||||
|
sources = sb.detect_all(tmp_path)
|
||||||
|
labels = {label for _, label, _ in sources}
|
||||||
|
# Should NOT be detected since it's not under an 'ansible/' directory
|
||||||
|
assert "ansible/requirements.yml" not in labels
|
||||||
|
assert "ansible/requirements.yaml" not in labels
|
||||||
|
|
||||||
|
|
||||||
|
def test_detect_all_no_duplicates(tmp_path):
|
||||||
|
"""Same file should not appear twice."""
|
||||||
|
(tmp_path / "uv.lock").write_text("[[package]]\nname = \"x\"\nversion = \"1.0\"\n")
|
||||||
|
sources = sb.detect_all(tmp_path)
|
||||||
|
paths = [p for p, _, _ in sources]
|
||||||
|
assert len(paths) == len(set(paths))
|
||||||
|
|
||||||
|
|
||||||
|
def test_detect_all_empty_repo(tmp_path):
|
||||||
|
sources = sb.detect_all(tmp_path)
|
||||||
|
assert sources == []
|
||||||
386
workplans/CUST-WP-0013-sbom-infra-expansion.md
Normal file
386
workplans/CUST-WP-0013-sbom-infra-expansion.md
Normal file
@@ -0,0 +1,386 @@
|
|||||||
|
---
|
||||||
|
id: CUST-WP-0013
|
||||||
|
type: workplan
|
||||||
|
title: "SBOM Infrastructure Expansion"
|
||||||
|
domain: custodian
|
||||||
|
repo: the-custodian
|
||||||
|
status: completed
|
||||||
|
owner: custodian
|
||||||
|
topic_slug: custodian
|
||||||
|
state_hub_workstream_id: f4ba84c8-4d47-492d-b65e-73b157271a2b
|
||||||
|
created: "2026-03-12"
|
||||||
|
updated: "2026-03-12"
|
||||||
|
---
|
||||||
|
|
||||||
|
# CUST-WP-0013 — SBOM Infrastructure Expansion
|
||||||
|
|
||||||
|
**Scope:** Extend SBOM capture beyond Python packages to cover Terraform providers,
|
||||||
|
Ansible Galaxy collections, and system-level tools (Ansible, Terraform, Helm, k3s,
|
||||||
|
cloud-init, etc.). Introduces an agent-assisted tool manifest capture workflow,
|
||||||
|
new ecosystem enum values, comprehensive auto-detection in `ingest_sbom.py`, and
|
||||||
|
delivers full SBOM coverage for `railiance-infra` and `railiance-cluster`.
|
||||||
|
|
||||||
|
**Drives:** Licence risk visibility across the full dependency graph, not just
|
||||||
|
language-level packages.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Design Decisions
|
||||||
|
|
||||||
|
### Tool manifest: agent-generated, not hand-written
|
||||||
|
|
||||||
|
System tools (Ansible, Terraform, Helm, k3s, etc.) live outside any lockfile —
|
||||||
|
they are provisioned, not installed by a package manager. Rather than asking
|
||||||
|
operators to maintain a hand-written manifest, the SBOM capture agent inspects
|
||||||
|
the repo and generates/updates `sbom-tools.yaml` automatically.
|
||||||
|
|
||||||
|
The agent prompt (`state-hub/prompts/sbom-capture-agent.md`) is parameterised
|
||||||
|
per repo. It reads the repo's CLAUDE.md, Makefile, README, CI configs, version
|
||||||
|
pins, and provisioning files, then emits a structured `sbom-tools.yaml` with
|
||||||
|
tool name, version, ecosystem, SPDX licence, and directness flags.
|
||||||
|
|
||||||
|
A thin wrapper script (`state-hub/scripts/capture_sbom_tools.py`) invokes the
|
||||||
|
agent prompt via `claude -p` (or prints it for manual use) and writes the result
|
||||||
|
to `<repo-root>/sbom-tools.yaml`.
|
||||||
|
|
||||||
|
### Comprehensive ingest: all mechanisms per repo
|
||||||
|
|
||||||
|
`make ingest-sbom REPO=<slug>` must run all applicable parsers, not just
|
||||||
|
whichever lockfile happens to be auto-detected first. The updated auto-detection
|
||||||
|
in `ingest_sbom.py` scans:
|
||||||
|
|
||||||
|
1. Package manager lockfiles (`uv.lock`, `requirements.txt`, `package-lock.json`,
|
||||||
|
`yarn.lock`, `Cargo.lock`, `go.sum`)
|
||||||
|
2. Terraform provider locks (`.terraform.lock.hcl`, anywhere in the tree)
|
||||||
|
3. Ansible Galaxy manifests (`requirements.yml` / `requirements.yaml`, anywhere
|
||||||
|
in the tree under `ansible/`)
|
||||||
|
4. Agent-generated tool manifest (`sbom-tools.yaml` at repo root)
|
||||||
|
|
||||||
|
All parsers run and their results are merged into a single snapshot.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 1 — Schema: Ecosystem Enum Extension
|
||||||
|
|
||||||
|
**Acceptance:** `terraform` and `ansible` are valid ecosystem values; existing
|
||||||
|
`other` entries are unaffected; migration applies cleanly.
|
||||||
|
|
||||||
|
### T01 — Alembic migration: add terraform and ansible enum values
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: CUST-WP-0013-T01
|
||||||
|
state_hub_task_id: c0b6edc4-86ab-4cee-88a8-6c66fb81adee
|
||||||
|
status: done
|
||||||
|
priority: high
|
||||||
|
```
|
||||||
|
|
||||||
|
Add `terraform` and `ansible` to the `Ecosystem` enum in the DB. Check whether
|
||||||
|
the column uses a native PostgreSQL ENUM type (requiring `ALTER TYPE`) or a
|
||||||
|
`String` column (requiring no migration). Write the migration accordingly.
|
||||||
|
Also add `tool` as a catch-all for tool-manifest entries that don't fit a
|
||||||
|
named ecosystem.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 2 — Parser Improvements in ingest_sbom.py
|
||||||
|
|
||||||
|
**Acceptance:** `--dry-run` on railiance-infra shows terraform providers and
|
||||||
|
ansible collections correctly labelled; tool manifest entries appear with the
|
||||||
|
declared ecosystem.
|
||||||
|
|
||||||
|
### T02 — Promote Terraform parser: other → terraform ecosystem
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: CUST-WP-0013-T02
|
||||||
|
state_hub_task_id: 7686bccd-022c-4e30-8081-c8487eb82253
|
||||||
|
status: done
|
||||||
|
priority: high
|
||||||
|
```
|
||||||
|
|
||||||
|
The `.terraform.lock.hcl` parser already exists in `ingest_sbom.py` but stores
|
||||||
|
entries as `ecosystem="other"`. Change to `ecosystem="terraform"` after T01
|
||||||
|
migration lands. Re-ingest any repos that previously ingested terraform entries
|
||||||
|
as `other` to correct the label.
|
||||||
|
|
||||||
|
### T03 — Implement Ansible Galaxy requirements.yml parser
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: CUST-WP-0013-T03
|
||||||
|
state_hub_task_id: 48658bdd-4d16-4be0-a87e-45df4f4901b0
|
||||||
|
status: done
|
||||||
|
priority: high
|
||||||
|
```
|
||||||
|
|
||||||
|
Parse `requirements.yml` / `requirements.yaml` files found in `ansible/`
|
||||||
|
subdirectories. Standard format:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
collections:
|
||||||
|
- name: community.general
|
||||||
|
version: "9.5.0"
|
||||||
|
roles:
|
||||||
|
- name: geerlingguy.docker
|
||||||
|
version: "6.x"
|
||||||
|
```
|
||||||
|
|
||||||
|
Store as `ecosystem="ansible"`, `is_direct=True`. Licence left `null` (Galaxy
|
||||||
|
API lookup is deferred). Handle both `collections:` and `roles:` blocks.
|
||||||
|
|
||||||
|
### T04 — Implement sbom-tools.yaml manifest parser
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: CUST-WP-0013-T04
|
||||||
|
state_hub_task_id: 4522ea04-134b-40ee-a7a2-ea0e4c1c061d
|
||||||
|
status: done
|
||||||
|
priority: high
|
||||||
|
```
|
||||||
|
|
||||||
|
Parse `sbom-tools.yaml` at the repo root (written by the capture agent). Schema:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# Generated by sbom-capture-agent — review before committing
|
||||||
|
tools:
|
||||||
|
- name: ansible
|
||||||
|
version: "12.3.0"
|
||||||
|
ecosystem: ansible # or: terraform, other, python, etc.
|
||||||
|
license_spdx: GPL-3.0-only
|
||||||
|
is_direct: true
|
||||||
|
is_dev: false
|
||||||
|
- name: helm
|
||||||
|
version: "3.17.x"
|
||||||
|
ecosystem: other
|
||||||
|
license_spdx: Apache-2.0
|
||||||
|
is_direct: true
|
||||||
|
is_dev: false
|
||||||
|
```
|
||||||
|
|
||||||
|
Supports all existing ecosystem values plus `tool`. Pass entries through the
|
||||||
|
same normalisation as lockfile entries. Skip entries with `version: unknown`
|
||||||
|
with a warning (agent could not determine version).
|
||||||
|
|
||||||
|
### T05 — Comprehensive auto-detection: all formats in one scan
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: CUST-WP-0013-T05
|
||||||
|
state_hub_task_id: cdda6bf2-2a44-4444-a04a-ac2fe2314923
|
||||||
|
status: done
|
||||||
|
priority: high
|
||||||
|
```
|
||||||
|
|
||||||
|
Refactor the `--repo-path` scan to discover and run all applicable parsers,
|
||||||
|
not just the first match. Scan order:
|
||||||
|
|
||||||
|
1. Walk tree for all `uv.lock`, `requirements.txt`, `package-lock.json`,
|
||||||
|
`yarn.lock`, `Cargo.lock`
|
||||||
|
2. Walk tree for all `.terraform.lock.hcl`
|
||||||
|
3. Walk tree for `ansible/requirements.yml` and `ansible/requirements.yaml`
|
||||||
|
4. Check repo root for `sbom-tools.yaml`
|
||||||
|
|
||||||
|
Merge all results into a single batch for the snapshot ingest call. Log a
|
||||||
|
summary line per parser: ` <parser>: N packages from <path>`.
|
||||||
|
|
||||||
|
### T06 — Unit tests for new parsers
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: CUST-WP-0013-T06
|
||||||
|
state_hub_task_id: fee37e66-8f41-4dba-995b-97fc66493caf
|
||||||
|
status: done
|
||||||
|
priority: medium
|
||||||
|
```
|
||||||
|
|
||||||
|
Add test fixtures and unit tests for:
|
||||||
|
- Ansible Galaxy requirements.yml (collections + roles, version pinned and
|
||||||
|
unpinned)
|
||||||
|
- sbom-tools.yaml (valid, missing version, unknown ecosystem)
|
||||||
|
- Multi-parser scan: repo root with uv.lock + .terraform.lock.hcl +
|
||||||
|
sbom-tools.yaml produces merged results
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 3 — SBOM Capture Agent
|
||||||
|
|
||||||
|
**Acceptance:** `make capture-tools REPO=railiance-infra` produces a reviewed
|
||||||
|
`sbom-tools.yaml` that correctly identifies Ansible, Terraform, Helm, and other
|
||||||
|
declared tools with versions and SPDX licences.
|
||||||
|
|
||||||
|
### T07 — Write SBOM capture agent prompt
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: CUST-WP-0013-T07
|
||||||
|
state_hub_task_id: a3b919b5-63b0-44f7-a048-ebfae603ef7b
|
||||||
|
status: done
|
||||||
|
priority: high
|
||||||
|
```
|
||||||
|
|
||||||
|
Write `state-hub/prompts/sbom-capture-agent.md` — a Claude agent prompt
|
||||||
|
parameterised with `{repo_slug}` and `{repo_path}`. The prompt instructs the
|
||||||
|
agent to:
|
||||||
|
|
||||||
|
1. Read `CLAUDE.md`, `Makefile`, `README.md`, `pyproject.toml`, `.tool-versions`,
|
||||||
|
CI configs, Dockerfiles, and provisioning files in `{repo_path}`
|
||||||
|
2. Identify all system-level tools: name, version (from version pins, Makefile
|
||||||
|
vars, or documented prerequisites), ecosystem, SPDX licence
|
||||||
|
3. Identify indirect/transitive tool deps (e.g. Ansible → Python; Terraform →
|
||||||
|
provider plugins already captured by `.terraform.lock.hcl`)
|
||||||
|
4. Emit a well-formed `sbom-tools.yaml` with a comment header noting generation
|
||||||
|
date and confidence level per entry (`# confidence: high/medium/low`)
|
||||||
|
5. Flag any tools where version could not be determined (`version: unknown`) for
|
||||||
|
human review
|
||||||
|
|
||||||
|
The prompt must not hallucinate versions — it must derive them from evidence in
|
||||||
|
the repo or mark them unknown.
|
||||||
|
|
||||||
|
### T08 — Implement capture_sbom_tools.py
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: CUST-WP-0013-T08
|
||||||
|
state_hub_task_id: 9593dca7-e713-4d7a-b4f2-c5333ae0b3d2
|
||||||
|
status: done
|
||||||
|
priority: high
|
||||||
|
```
|
||||||
|
|
||||||
|
Write `state-hub/scripts/capture_sbom_tools.py`:
|
||||||
|
|
||||||
|
- Accepts `--repo SLUG` and `--repo-path PATH`
|
||||||
|
- Resolves repo path from slug via the state-hub API if `--repo-path` is omitted
|
||||||
|
- Loads the agent prompt from `prompts/sbom-capture-agent.md`, substitutes
|
||||||
|
`{repo_slug}` and `{repo_path}`
|
||||||
|
- Invokes `claude -p "<prompt>"` (non-interactive) and captures stdout
|
||||||
|
- Parses the YAML block from the response
|
||||||
|
- Writes or updates `<repo-path>/sbom-tools.yaml`
|
||||||
|
- Prints a diff of changes if the file already exists
|
||||||
|
- `--dry-run` flag: print the prompt and diff without writing
|
||||||
|
|
||||||
|
### T09 — Add make capture-tools target
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: CUST-WP-0013-T09
|
||||||
|
state_hub_task_id: 6948e1d2-9c97-4709-bdb0-4b6ded700a22
|
||||||
|
status: done
|
||||||
|
priority: medium
|
||||||
|
```
|
||||||
|
|
||||||
|
Add to `state-hub/Makefile`:
|
||||||
|
|
||||||
|
```makefile
|
||||||
|
capture-tools: ## Run SBOM capture agent for a repo (REPO=slug, REPO_PATH=path)
|
||||||
|
uv run python scripts/capture_sbom_tools.py --repo $(REPO) $(if $(REPO_PATH),--repo-path $(REPO_PATH),)
|
||||||
|
```
|
||||||
|
|
||||||
|
Also update `make ingest-sbom` to note that `capture-tools` should be run first
|
||||||
|
for repos that have system-level tool dependencies.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 4 — Ingest railiance-infra
|
||||||
|
|
||||||
|
**Acceptance:** `make ingest-sbom REPO=railiance-infra` shows terraform providers,
|
||||||
|
ansible collections, and tool manifest entries in one snapshot.
|
||||||
|
|
||||||
|
### T10 — Capture tools manifest for railiance-infra
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: CUST-WP-0013-T10
|
||||||
|
state_hub_task_id: 99b23998-5129-4777-9d42-7bee5981cdbb
|
||||||
|
status: done
|
||||||
|
priority: medium
|
||||||
|
```
|
||||||
|
|
||||||
|
Run `make capture-tools REPO=railiance-infra`. Review the generated
|
||||||
|
`railiance-infra/sbom-tools.yaml` — verify Ansible, Terraform, cloud-init, goss,
|
||||||
|
and any other tools with their versions and licences. Correct any `unknown`
|
||||||
|
versions by consulting the repo. Commit the file.
|
||||||
|
|
||||||
|
### T11 — Ingest railiance-infra
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: CUST-WP-0013-T11
|
||||||
|
state_hub_task_id: bb516909-f903-48ce-b60b-a24245e7382e
|
||||||
|
status: done
|
||||||
|
priority: medium
|
||||||
|
```
|
||||||
|
|
||||||
|
Run `make ingest-sbom REPO=railiance-infra REPO_PATH=~/railiance-infra`. Verify
|
||||||
|
the snapshot contains:
|
||||||
|
- Terraform providers (from `.terraform.lock.hcl`)
|
||||||
|
- Ansible Galaxy collections (from `ansible/requirements.yaml`)
|
||||||
|
- System tools (from `sbom-tools.yaml`)
|
||||||
|
|
||||||
|
Check the licence report for any copyleft or BSL flags.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 5 — Ingest railiance-cluster
|
||||||
|
|
||||||
|
**Acceptance:** railiance-cluster SBOM covers both Python packages (uv.lock) and
|
||||||
|
system tools in a single snapshot.
|
||||||
|
|
||||||
|
### T12 — Capture tools manifest for railiance-cluster
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: CUST-WP-0013-T12
|
||||||
|
state_hub_task_id: 7a890f1a-da9f-4e6d-86a7-4fd1aefd5b3f
|
||||||
|
status: done
|
||||||
|
priority: medium
|
||||||
|
```
|
||||||
|
|
||||||
|
Run `make capture-tools REPO=railiance-cluster`. Review the generated
|
||||||
|
`railiance-cluster/sbom-tools.yaml` — verify Helm, kubectl, k3s, and any other
|
||||||
|
operational tools. Commit the file.
|
||||||
|
|
||||||
|
### T13 — Re-ingest railiance-cluster
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: CUST-WP-0013-T13
|
||||||
|
state_hub_task_id: 789dbe93-011a-4470-9fec-ebf249cd7134
|
||||||
|
status: done
|
||||||
|
priority: medium
|
||||||
|
```
|
||||||
|
|
||||||
|
Run `make ingest-sbom REPO=railiance-cluster REPO_PATH=~/railiance-cluster`.
|
||||||
|
Verify the snapshot merges uv.lock (Python packages including ansible-core) and
|
||||||
|
sbom-tools.yaml entries into one coherent snapshot. Confirm ansible-core GPL-3.0
|
||||||
|
flag appears in the licence report.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 6 — Convention Documentation
|
||||||
|
|
||||||
|
**Acceptance:** A developer reading the SBOM convention doc knows exactly how to
|
||||||
|
add a new repo to SBOM coverage.
|
||||||
|
|
||||||
|
### T14 — Document SBOM capture convention in canon/standards
|
||||||
|
|
||||||
|
```task
|
||||||
|
id: CUST-WP-0013-T14
|
||||||
|
state_hub_task_id: dc3bb2a3-882e-4dd7-ab7c-8b1e88279a7d
|
||||||
|
status: done
|
||||||
|
priority: low
|
||||||
|
```
|
||||||
|
|
||||||
|
Write `canon/standards/sbom-convention_v0.1.md` documenting:
|
||||||
|
- The four capture mechanisms and when each applies
|
||||||
|
- The `sbom-tools.yaml` schema (with confidence annotation convention)
|
||||||
|
- The `make capture-tools` → review → commit → `make ingest-sbom` workflow
|
||||||
|
- Licence risk thresholds: copyleft = flag for review; BSL = flag for review;
|
||||||
|
null licence = acceptable for infra tools if well-known open source
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Licence Risk Preview
|
||||||
|
|
||||||
|
Based on known tool licences, expect these flags once ingested:
|
||||||
|
|
||||||
|
| Tool / Package | Licence | Risk level |
|
||||||
|
|---|---|---|
|
||||||
|
| ansible-core | GPL-3.0-only | Copyleft — flag (ops toolchain, not shipped) |
|
||||||
|
| terraform ≥ 1.5.6 | BSL-1.1 | Non-OSI — flag for review |
|
||||||
|
| hashicorp providers | BSL-1.1 | Same |
|
||||||
|
| community.general | GPL-3.0 | Copyleft — flag (ops toolchain) |
|
||||||
|
| Helm | Apache-2.0 | Clean |
|
||||||
|
| k3s | Apache-2.0 | Clean |
|
||||||
|
| cloud-init | Apache-2.0 / GPL-3.0 | Mixed — check version |
|
||||||
|
| goss | Apache-2.0 | Clean |
|
||||||
|
|
||||||
|
All copyleft/BSL entries here are **operational toolchain** dependencies, not
|
||||||
|
shipped code — risk is low but worth tracking for compliance awareness.
|
||||||
Reference in New Issue
Block a user