feat(sbom): CUST-WP-0013 — expand SBOM infra to terraform, ansible, and tool manifests

- Migration d6e7f8a9b0c1: add terraform, ansible, tool to Ecosystem enum - ingest_sbom.py: new Ansible Galaxy requirements.yml parser (collections + roles) - ingest_sbom.py: new sbom-tools.yaml manifest parser (agent-generated tool deps) - ingest_sbom.py: promote .terraform.lock.hcl parser from ecosystem=other → terraform - ingest_sbom.py: detect_all() runs all four parsers in one comprehensive scan - capture_sbom_tools.py: agent-assisted tool manifest generator (claude -p) - prompts/sbom-capture-agent.md: parameterised prompt for repo tool discovery - Makefile: capture-tools target; ingest-sbom updated docs and DRY_RUN support - 29 unit tests covering all new parsers and detect_all() behaviour - canon/standards/sbom-convention_v0.1.md: updated with four-mechanism model and workflow Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 04:40:26 +01:00
parent 3c69ad2929
commit 1c94f5545c
9 changed files with 1378 additions and 81 deletions
--- a/canon/standards/sbom-convention_v0.1.md
+++ b/canon/standards/sbom-convention_v0.1.md
@@ -6,7 +6,7 @@ domain: custodian
 status: active
 version: "0.1"
 created: "2026-03-01"
-updated: "2026-03-01"
+updated: "2026-03-12"
 ---

 # SBOM Convention v0.1 — Dependency Tracking & Licence Governance
@@ -27,20 +27,23 @@ dashboard (`/sbom`) provides domain-level and repo-level drill-down.

 ---

-## 1. Authoritative Lockfiles per Ecosystem
+## 1. Capture Mechanisms

-| Ecosystem | Authoritative file | Notes |
-|-----------|-------------------|-------|
-| Python | `uv.lock` | Preferred. `requirements.txt` accepted as fallback |
-| Node / npm | `package-lock.json` | Preferred. `yarn.lock` accepted |
-| Rust | `Cargo.lock` | Auto-detected |
-| Terraform | `.terraform.lock.hcl` | Provider pins; ecosystem stored as `other` until ENUM extended |
-| Go | `go.sum` | *Not yet parsed — planned* |
-| Java / JVM | `gradle.lockfile` / `pom.xml` | *Not yet parsed — planned* |
-| Ansible | `requirements.yml` | *Not yet parsed — planned* |
+`ingest_sbom.py` runs all four mechanisms in a single scan when given `--repo-path`.
+No flags needed — comprehensive detection is the default.

-**Principle:** commit lockfiles to the repo. Lockfiles are the SBOM source
-of truth; do not generate them at ingest time.
+| Mechanism | File(s) | Ecosystem | Detection scope |
+|-----------|---------|-----------|-----------------|
+| **Package manager lockfiles** | `uv.lock`, `requirements.txt`, `package-lock.json`, `yarn.lock`, `Cargo.lock` | `python`, `node`, `rust` | Anywhere in tree |
+| **Terraform provider lock** | `.terraform.lock.hcl` | `terraform` | Anywhere in tree |
+| **Ansible Galaxy manifest** | `ansible/requirements.yml` or `.yaml` | `ansible` | Under directories named `ansible/` |
+| **Tool manifest** | `sbom-tools.yaml` (repo root) | `tool`, `ansible`, `terraform`, etc. | Repo root only |
+
+**Go / Java parsers** (`go.sum`, `pom.xml`, `gradle.lockfile`) are *not yet
+implemented* — planned for a future workplan.
+
+**Principle:** commit lockfiles and `sbom-tools.yaml` to the repo. These are
+the SBOM source of truth; do not generate them at ingest time.

 ---

@@ -64,27 +67,35 @@ curl -s http://127.0.0.1:8000/repos/ | python3 -m json.tool

 ## 3. SBOM Ingestion

-### 3.1 Standard ingest (single lockfile at repo root)
+### 3.1 Standard ingest (all mechanisms, recommended)

 ```bash
 cd ~/the-custodian/state-hub
 make ingest-sbom REPO=<slug> REPO_PATH=/path/to/repo
 ```

-The script auto-detects the first recognised lockfile at `REPO_PATH`.
+`ingest_sbom.py` automatically runs all four mechanisms in one scan — lockfiles,
+Terraform provider locks, Ansible Galaxy manifests, and `sbom-tools.yaml`. All
+results are merged into a single snapshot. Non-dep directories (`.venv`,
+`node_modules`, `.git`, `dist`, etc.) are automatically skipped.

-### 3.2 Multi-ecosystem repos (recommended for complex repos)
+### 3.2 Repos with system-level tools: capture first, then ingest

-Use `SCAN=1` to walk the repo tree and combine **all** lockfiles into a single
-snapshot. Non-dep directories (`.venv`, `node_modules`, `.git`, `dist`, etc.)
-are automatically skipped.
+For repos that use system-level tools not tracked by any lockfile (Terraform
+binary, Helm, kubectl, k3s, goss, etc.):

 ```bash
-make ingest-sbom REPO=the-custodian SCAN=1 REPO_PATH=/home/worsch/the-custodian
-```
+# Step 1: generate sbom-tools.yaml via agent
+make capture-tools REPO=<slug> REPO_PATH=/path/to/repo

-This is the correct approach for repos that contain both a backend and a
-frontend (e.g., a Python API + Node/Observable dashboard).
+# Step 2: review sbom-tools.yaml — correct any confidence: low entries
+
+# Step 3: commit sbom-tools.yaml
+git -C /path/to/repo add sbom-tools.yaml && git -C /path/to/repo commit -m "chore(sbom): add tool manifest"
+
+# Step 4: ingest everything
+make ingest-sbom REPO=<slug> REPO_PATH=/path/to/repo
+```

 ### 3.3 Explicit lockfile path

@@ -96,8 +107,7 @@ Multiple lockfiles can be passed by calling the script directly with repeated
 `--lockfile` flags:

 ```bash
-cd ~/the-custodian/state-hub
-.venv/bin/python scripts/ingest_sbom.py \
+uv run python scripts/ingest_sbom.py \
  --repo <slug> \
  --lockfile /path/to/uv.lock \
  --lockfile /path/to/package-lock.json
@@ -106,11 +116,40 @@ cd ~/the-custodian/state-hub
 ### 3.4 Dry run (inspect without submitting)

 ```bash
-make ingest-sbom REPO=<slug> SCAN=1 REPO_PATH=/path/to/repo
-# append: add --dry-run to the command, or run the script directly:
-.venv/bin/python scripts/ingest_sbom.py --repo <slug> --scan --repo-path /path/to/repo --dry-run
+make ingest-sbom REPO=<slug> REPO_PATH=/path/to/repo DRY_RUN=1
 ```

+### 3.5 sbom-tools.yaml: the tool manifest
+
+Create `sbom-tools.yaml` at the repo root for any system-level tools not
+covered by lockfiles. Schema:
+
+```yaml
+# sbom-tools.yaml
+tools:
+  - name: terraform
+    version: "1.9.5"         # confidence: medium
+    ecosystem: terraform
+    license_spdx: BSL-1.1
+    is_direct: true
+    is_dev: false
+  - name: helm
+    version: null             # confidence: low (no version pin found)
+    ecosystem: tool
+    license_spdx: Apache-2.0
+    is_direct: true
+    is_dev: false
+```
+
+**Valid ecosystem values:** `python`, `node`, `rust`, `go`, `java`, `terraform`,
+`ansible`, `tool`, `other`
+
+Annotate each version with a `# confidence: high/medium/low` comment.
+Entries with `confidence: low` need human verification before committing.
+
+The `make capture-tools` command generates this file automatically using the
+SBOM capture agent prompt (`state-hub/prompts/sbom-capture-agent.md`).
+
 ---

 ## 4. Snapshot Semantics
@@ -248,10 +287,14 @@ The SBOM dashboard aggregates across all repos within a domain in the

 ## 10. Planned Enhancements

- **Go / Java parsers** — add to `ingest_sbom.py`
+- **Go / Java parsers** — add `go.sum`, `pom.xml`, `gradle.lockfile` support to `ingest_sbom.py`
 - **Versioned snapshots** — retain history per repo for trend analysis
 - **Licence override file** — allow repos to document known-acceptable
  copyleft exceptions (`.sbom-overrides.yaml`)
 - **CI integration** — GitHub Actions step to run ingest on lockfile change
 - **Direct-dep detection for uv.lock** — parse `pyproject.toml` `[project.dependencies]`
  to mark direct deps accurately
+- **Galaxy API licence lookup** — resolve `license_spdx` for Ansible collections
+  via the Galaxy API at ingest time
+- **Tool version pinning guidance** — tooling to detect `confidence: low` entries
+  across all registered repos and flag them for resolution
--- a/state-hub/Makefile
+++ b/state-hub/Makefile
@@ -133,16 +133,26 @@ list-repos:
 	@test -n "$(DOMAIN)" || (echo "ERROR: DOMAIN is required."; exit 1)
 	curl -sf "http://127.0.0.1:8000/repos/?domain=$(DOMAIN)" | python3 -m json.tool

-## Ingest SBOM data for a repo.
+## Ingest SBOM data for a repo (all mechanisms: lockfiles + ansible + sbom-tools.yaml).
+## Auto-detect all sources:     make ingest-sbom REPO=the-custodian REPO_PATH=/home/worsch/the-custodian
 ## Single lockfile (explicit):  make ingest-sbom REPO=the-custodian LOCKFILE=/path/to/uv.lock
-## Scan all lockfiles in tree:  make ingest-sbom REPO=the-custodian SCAN=1 REPO_PATH=/home/worsch/the-custodian
-## Auto-detect at repo root:    make ingest-sbom REPO=the-custodian REPO_PATH=/home/worsch/the-custodian
+## Dry-run (no submit):         make ingest-sbom REPO=the-custodian REPO_PATH=... DRY_RUN=1
+## Tip: run capture-tools first for repos with system-level tool dependencies.
 ingest-sbom:
 	@test -n "$(REPO)" || (echo "ERROR: REPO is required."; exit 1)
 	uv run python scripts/ingest_sbom.py --repo "$(REPO)" \
 		$(if $(LOCKFILE),--lockfile "$(LOCKFILE)") \
-		$(if $(SCAN),--scan) \
-		$(if $(REPO_PATH),--repo-path "$(REPO_PATH)")
+		$(if $(REPO_PATH),--repo-path "$(REPO_PATH)") \
+		$(if $(DRY_RUN),--dry-run)
+
+## Run SBOM capture agent for a repo — generates/updates sbom-tools.yaml.
+## Usage: make capture-tools REPO=railiance-infra [REPO_PATH=/home/worsch/railiance-infra]
+## Add DRY_RUN=1 to preview without writing.
+capture-tools:
+	@test -n "$(REPO)" || (echo "ERROR: REPO is required."; exit 1)
+	uv run python scripts/capture_sbom_tools.py --repo "$(REPO)" \
+		$(if $(REPO_PATH),--repo-path "$(REPO_PATH)") \
+		$(if $(DRY_RUN),--dry-run)

 ## Check a repo for ADR-001 compliance: make validate-adr REPO=/path/to/repo [DOMAIN=custodian]
 validate-adr:
--- a/state-hub/api/models/sbom_entry.py
+++ b/state-hub/api/models/sbom_entry.py
@@ -15,6 +15,9 @@ class Ecosystem(str, enum.Enum):
    rust = "rust"
    go = "go"
    java = "java"
+    terraform = "terraform"
+    ansible = "ansible"
+    tool = "tool"
    other = "other"


--- a/state-hub/migrations/versions/d6e7f8a9b0c1_sbom_ecosystem_expand.py
+++ b/state-hub/migrations/versions/d6e7f8a9b0c1_sbom_ecosystem_expand.py
@@ -0,0 +1,30 @@
+"""SBOM ecosystem enum expansion: add terraform, ansible, tool
+
+Revision ID: d6e7f8a9b0c1
+Revises: c5d6e7f8a9b0
+Create Date: 2026-03-12 00:00:00.000000
+"""
+from typing import Sequence, Union
+
+from alembic import op
+
+revision: str = "d6e7f8a9b0c1"
+down_revision: Union[str, None] = "c5d6e7f8a9b0"
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None
+
+
+def upgrade() -> None:
+    # PostgreSQL requires each ADD VALUE in its own statement and cannot be
+    # run inside a transaction that also modifies data. ADD VALUE is
+    # transactional in PG 12+ (no COMMIT needed).
+    op.execute("ALTER TYPE ecosystem ADD VALUE IF NOT EXISTS 'terraform'")
+    op.execute("ALTER TYPE ecosystem ADD VALUE IF NOT EXISTS 'ansible'")
+    op.execute("ALTER TYPE ecosystem ADD VALUE IF NOT EXISTS 'tool'")
+
+
+def downgrade() -> None:
+    # PostgreSQL does not support removing enum values without recreating the
+    # type. Document the limitation and do nothing — reverting this migration
+    # requires a full type recreation if needed.
+    pass
--- a/state-hub/prompts/sbom-capture-agent.md
+++ b/state-hub/prompts/sbom-capture-agent.md
@@ -0,0 +1,90 @@
+# SBOM Capture Agent Prompt
+
+**Task:** Generate or update `sbom-tools.yaml` for the repository at `{repo_path}` (slug: `{repo_slug}`).
+
+This file captures system-level tool dependencies that are not tracked by any package manager lockfile — tools that are installed via provisioning, Homebrew, system packages, or assumed present in the environment.
+
+---
+
+## Instructions
+
+1. **Read the following files** in `{repo_path}` (read each that exists; skip gracefully if absent):
+   - `CLAUDE.md` — look for stack declarations, tool prerequisites, dev commands
+   - `README.md` / `QUICKSTART.md` — prerequisites sections, tool version requirements
+   - `Makefile` — tool invocations, version variables (e.g. `ANSIBLE_VERSION := 12.3`)
+   - `pyproject.toml` — Python tool dependencies (already covered by uv.lock; note but don't duplicate)
+   - `.tool-versions` — asdf version pins
+   - `.terraform-version` — tfenv pin
+   - `.ansible-version` — if present
+   - `Dockerfile` / `docker-compose.yml` — base image versions, tool installs
+   - `.github/workflows/*.yml` / `.gitlab-ci.yml` — CI tool install steps, version pins
+   - `ansible/requirements.yml` — **already captured by lockfile parser; do NOT include Galaxy collections here**
+   - Any `scripts/setup*.sh`, `scripts/bootstrap*.sh`, or `tools/` directory
+
+2. **Identify system-level tools only** — tools that:
+   - Are invoked as CLI commands (e.g. `ansible-playbook`, `terraform`, `helm`, `kubectl`, `k3s`, `goss`, `age`, `sops`)
+   - Are NOT installed via `uv`/`pip`/`npm`/`cargo` into a project virtualenv (those are in lockfiles)
+   - Note: `ansible` itself as a CLI tool is a system dep even if `ansible-core` appears in `uv.lock`
+
+3. **For each tool, determine**:
+   - `name`: canonical tool name (e.g. `ansible`, `terraform`, `helm`, `kubectl`, `k3s`, `goss`, `age`, `sops`, `cloud-init`)
+   - `version`: the pinned or documented version. Use `unknown` only if no evidence found anywhere.
+   - `ecosystem`: one of `python`, `node`, `rust`, `go`, `java`, `terraform`, `ansible`, `tool`, `other`
+     - Use `ansible` for Ansible itself; `terraform` for Terraform itself; `tool` for generic CLI tools
+   - `license_spdx`: the SPDX identifier. Common known licences (use these exact strings):
+     - ansible / ansible-core: `GPL-3.0-only`
+     - terraform ≤ 1.5.5: `MPL-2.0`; terraform ≥ 1.5.6: `BSL-1.1`
+     - helm: `Apache-2.0`
+     - kubectl: `Apache-2.0`
+     - k3s: `Apache-2.0`
+     - goss: `Apache-2.0`
+     - age: `BSD-3-Clause`
+     - sops: `MPL-2.0`
+     - cloud-init: `Apache-2.0` (or `GPL-3.0-only` for older versions — check)
+     - docker: `Apache-2.0`
+     - If unknown, use `null`
+   - `is_direct`: `true` if this repo directly declares/uses it; `false` if it's a transitive dependency of another tool
+   - `is_dev`: `true` only if the tool is only used for development/testing, not production operation
+
+4. **Confidence annotation**: Add a `# confidence: high/medium/low` comment after each entry:
+   - `high`: version found explicitly pinned in a file
+   - `medium`: version inferred from context (e.g. "Ansible 12" in README)
+   - `low`: version not found; using `unknown` or a reasonable guess
+
+5. **Do NOT include**:
+   - Python packages already covered by `uv.lock` or `requirements.txt`
+   - Ansible Galaxy collections (covered by `ansible/requirements.yml`)
+   - Terraform providers (covered by `.terraform.lock.hcl`)
+   - Node packages, Rust crates, etc. (covered by their lockfiles)
+   - Operating system packages unless the repo explicitly declares them
+
+6. **Output format**: Emit ONLY the YAML block below — no prose, no markdown fences, no explanation. The output must be valid YAML that can be written directly to `sbom-tools.yaml`.
+
+---
+
+## Output format
+
+```yaml
+# sbom-tools.yaml — system-level tool dependencies for {repo_slug}
+# Generated by sbom-capture-agent on {date}
+# Review each entry before committing. Entries with confidence: low need human verification.
+tools:
+  - name: example-tool
+    version: "1.2.3"        # confidence: high
+    ecosystem: tool
+    license_spdx: Apache-2.0
+    is_direct: true
+    is_dev: false
+```
+
+If no system-level tools are found, output:
+```yaml
+# sbom-tools.yaml — system-level tool dependencies for {repo_slug}
+# Generated by sbom-capture-agent on {date}
+# No system-level tools identified — all dependencies are covered by lockfiles.
+tools: []
+```
+
+---
+
+Now read `{repo_path}` and produce the `sbom-tools.yaml` content.
--- a/state-hub/scripts/capture_sbom_tools.py
+++ b/state-hub/scripts/capture_sbom_tools.py
@@ -0,0 +1,187 @@
+#!/usr/bin/env python3
+"""Invoke the SBOM capture agent to generate/update sbom-tools.yaml for a repo.
+
+Usage:
+    python capture_sbom_tools.py --repo <slug> [--repo-path <path>] [--dry-run]
+
+The script:
+1. Resolves repo path from the state-hub API (if --repo-path is not given)
+2. Loads the agent prompt from prompts/sbom-capture-agent.md
+3. Substitutes {repo_slug}, {repo_path}, {date} placeholders
+4. Invokes `claude -p "<prompt>"` non-interactively
+5. Extracts the YAML block from the response
+6. Writes (or shows diff of) sbom-tools.yaml in the repo root
+
+Requirements:
+  - `claude` CLI must be on PATH (Claude Code)
+  - PyYAML must be available in the active venv
+"""
+from __future__ import annotations
+
+import argparse
+import datetime
+import difflib
+import json
+import os
+import re
+import subprocess
+import sys
+import urllib.error
+import urllib.request
+from pathlib import Path
+
+API_BASE = os.environ.get("API_BASE", "http://127.0.0.1:8000").rstrip("/")
+SCRIPT_DIR = Path(__file__).parent
+PROMPT_FILE = SCRIPT_DIR.parent / "prompts" / "sbom-capture-agent.md"
+
+
+def resolve_repo_path(repo_slug: str) -> Path | None:
+    """Look up the registered path for a repo slug via the state-hub API."""
+    url = f"{API_BASE}/repos/{repo_slug}/"
+    try:
+        with urllib.request.urlopen(url, timeout=10) as resp:
+            data = json.loads(resp.read())
+            path_str = data.get("local_path")
+            if path_str:
+                return Path(path_str)
+    except (urllib.error.URLError, KeyError):
+        pass
+    return None
+
+
+def load_prompt(repo_slug: str, repo_path: Path) -> str:
+    if not PROMPT_FILE.exists():
+        print(f"Error: prompt file not found at {PROMPT_FILE}", file=sys.stderr)
+        sys.exit(1)
+    template = PROMPT_FILE.read_text()
+    today = datetime.date.today().isoformat()
+    return (
+        template
+        .replace("{repo_slug}", repo_slug)
+        .replace("{repo_path}", str(repo_path))
+        .replace("{date}", today)
+    )
+
+
+def invoke_agent(prompt: str) -> str:
+    """Run `claude -p <prompt>` and return stdout."""
+    try:
+        result = subprocess.run(
+            ["claude", "-p", prompt],
+            capture_output=True,
+            text=True,
+            timeout=120,
+        )
+    except FileNotFoundError:
+        print("Error: `claude` CLI not found on PATH. Install Claude Code.", file=sys.stderr)
+        sys.exit(1)
+    except subprocess.TimeoutExpired:
+        print("Error: claude invocation timed out after 120s.", file=sys.stderr)
+        sys.exit(1)
+
+    if result.returncode != 0:
+        print(f"Error: claude exited with code {result.returncode}", file=sys.stderr)
+        if result.stderr:
+            print(result.stderr, file=sys.stderr)
+        sys.exit(1)
+
+    return result.stdout
+
+
+def extract_yaml(response: str) -> str:
+    """Extract YAML content from the agent response.
+
+    Accepts:
+    - Raw YAML (starts with # or 'tools:')
+    - YAML wrapped in ```yaml ... ``` fences
+    """
+    # Try fenced block first
+    m = re.search(r"```(?:yaml)?\s*\n(.*?)```", response, re.DOTALL)
+    if m:
+        return m.group(1).strip()
+
+    # Otherwise treat entire response as YAML
+    stripped = response.strip()
+    if stripped.startswith("#") or stripped.startswith("tools:"):
+        return stripped
+
+    print("Warning: could not extract YAML from agent response.", file=sys.stderr)
+    print("Raw response:", file=sys.stderr)
+    print(response[:500], file=sys.stderr)
+    sys.exit(1)
+
+
+def show_diff(old: str | None, new: str, target: Path) -> None:
+    if old is None:
+        print(f"[new file] {target}")
+        for line in new.splitlines():
+            print(f"  + {line}")
+    else:
+        diff = list(difflib.unified_diff(
+            old.splitlines(keepends=True),
+            new.splitlines(keepends=True),
+            fromfile=f"a/{target.name}",
+            tofile=f"b/{target.name}",
+        ))
+        if diff:
+            print("".join(diff))
+        else:
+            print(f"[no changes] {target}")
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(
+        description="Generate/update sbom-tools.yaml for a repo using the SBOM capture agent."
+    )
+    parser.add_argument("--repo", required=True, help="Repo slug (e.g. 'railiance-infra')")
+    parser.add_argument("--repo-path", help="Path to repo root (auto-resolved from state-hub if omitted)")
+    parser.add_argument("--dry-run", action="store_true",
+                        help="Show prompt and diff without writing sbom-tools.yaml")
+    parser.add_argument("--print-prompt", action="store_true",
+                        help="Print the rendered prompt and exit (useful for inspection)")
+    args = parser.parse_args()
+
+    # Resolve repo path
+    if args.repo_path:
+        repo_path = Path(args.repo_path).resolve()
+    else:
+        repo_path = resolve_repo_path(args.repo)
+        if repo_path is None:
+            # Fall back to ~/repo_slug convention
+            repo_path = Path.home() / args.repo
+            print(f"Could not resolve path from API; trying {repo_path}", file=sys.stderr)
+
+    if not repo_path.exists():
+        print(f"Error: repo path does not exist: {repo_path}", file=sys.stderr)
+        sys.exit(1)
+
+    target = repo_path / "sbom-tools.yaml"
+    existing_content = target.read_text() if target.exists() else None
+
+    prompt = load_prompt(args.repo, repo_path)
+
+    if args.print_prompt:
+        print(prompt)
+        return
+
+    print(f"Running SBOM capture agent for {args.repo} ({repo_path})…")
+    response = invoke_agent(prompt)
+    yaml_content = extract_yaml(response)
+
+    # Ensure trailing newline
+    if not yaml_content.endswith("\n"):
+        yaml_content += "\n"
+
+    show_diff(existing_content, yaml_content, target)
+
+    if args.dry_run:
+        print("\n[dry-run] sbom-tools.yaml not written.")
+        return
+
+    target.write_text(yaml_content)
+    print(f"\nWritten: {target}")
+    print("Review the file, correct any 'confidence: low' entries, then commit.")
+
+
+if __name__ == "__main__":
+    main()
--- a/state-hub/scripts/ingest_sbom.py
+++ b/state-hub/scripts/ingest_sbom.py
@@ -1,15 +1,19 @@
 #!/usr/bin/env python3
-"""Ingest a repo's lockfile into the State Hub SBOM store.
+"""Ingest a repo's lockfiles and tool manifests into the State Hub SBOM store.

 Usage:
-    python ingest_sbom.py --repo <slug> [--lockfile <path>] [--api-base <url>]
+    python ingest_sbom.py --repo <slug> [--repo-path <path>] [--dry-run]

-Auto-detects lockfile type:
-  uv.lock            → Python ecosystem
-  requirements.txt   → Python ecosystem (basic)
-  package-lock.json  → Node ecosystem
-  yarn.lock          → Node ecosystem
-  Cargo.lock         → Rust ecosystem
+Auto-detects all of the following in one scan:
+  uv.lock                    → python
+  requirements.txt           → python
+  package-lock.json          → node
+  yarn.lock                  → node
+  Cargo.lock                 → rust
+  .terraform.lock.hcl        → terraform  (anywhere in tree)
+  ansible/requirements.yml   → ansible    (anywhere under ansible/ dirs)
+  ansible/requirements.yaml  → ansible
+  sbom-tools.yaml            → tool       (repo root; agent-generated)
 """
 from __future__ import annotations

@@ -22,11 +26,17 @@ import urllib.error
 import urllib.request
 from pathlib import Path

+try:
+    import yaml  # optional; only needed for sbom-tools.yaml and ansible parsers
+    _YAML_AVAILABLE = True
+except ImportError:
+    _YAML_AVAILABLE = False
+
 API_BASE = os.environ.get("API_BASE", "http://127.0.0.1:8000").rstrip("/")


 # ---------------------------------------------------------------------------
-# Lockfile parsers
+# Lockfile parsers — each returns list[dict]
 # ---------------------------------------------------------------------------

 def _parse_uv_lock(path: Path) -> list[dict]:
@@ -55,7 +65,7 @@ def _parse_uv_lock(path: Path) -> list[dict]:
            "package_version": e.get("package_version"),
            "ecosystem": "python",
            "license_spdx": None,
-            "is_direct": False,  # uv.lock doesn't distinguish; treat all as transitive
+            "is_direct": False,
            "is_dev": False,
        }
        for e in entries
@@ -70,7 +80,6 @@ def _parse_requirements_txt(path: Path) -> list[dict]:
        line = line.strip()
        if not line or line.startswith("#") or line.startswith("-"):
            continue
-        # Handle: pkg==1.2.3, pkg>=1.2, pkg
        m = re.match(r"^([A-Za-z0-9_.\-]+)(?:[>=<!~^]+([^\s;]+))?", line)
        if m:
            entries.append({
@@ -95,7 +104,7 @@ def _parse_package_lock_json(path: Path) -> list[dict]:
    packages = data.get("packages", {})
    entries = []
    for pkg_path, info in packages.items():
-        if not pkg_path:  # root package
+        if not pkg_path:
            continue
        name = info.get("name") or pkg_path.split("node_modules/")[-1]
        entries.append({
@@ -120,8 +129,6 @@ def _parse_yarn_lock(path: Path) -> list[dict]:
        if not stripped or stripped.startswith("#"):
            continue
        if not line.startswith(" ") and stripped.endswith(":"):
-            # New package block header: "name@version::" or "\"name@version\":"
-            # May list multiple versions: "name@^1.0, name@~1.0:"
            current_names = []
            current_version = None
            for part in stripped.rstrip(":").split(","):
@@ -188,12 +195,10 @@ def _parse_terraform_lock_hcl(path: Path) -> list[dict]:

    for line in path.read_text().splitlines():
        stripped = line.strip()
-        # e.g.: provider "registry.terraform.io/hetznercloud/hcloud" {
        m = re.match(r'^provider\s+"([^"]+)"\s*\{', stripped)
        if m:
-            # Use full provider address as package_name, short name as display
            full = m.group(1)
-            current_name = full  # e.g. "registry.terraform.io/hetznercloud/hcloud"
+            current_name = full
            current_version = None
        elif current_name is not None:
            vm = re.match(r'version\s*=\s*"([^"]+)"', stripped)
@@ -203,7 +208,7 @@ def _parse_terraform_lock_hcl(path: Path) -> list[dict]:
                entries.append({
                    "package_name": current_name,
                    "package_version": current_version,
-                    "ecosystem": "other",   # "terraform" not yet in ENUM; tracked as other
+                    "ecosystem": "terraform",
                    "license_spdx": None,
                    "is_direct": True,
                    "is_dev": False,
@@ -214,7 +219,114 @@ def _parse_terraform_lock_hcl(path: Path) -> list[dict]:
    return entries


-_LOCKFILE_PARSERS = {
+def _parse_ansible_requirements(path: Path) -> list[dict]:
+    """Parse ansible/requirements.yml — collections and roles from Ansible Galaxy."""
+    if not _YAML_AVAILABLE:
+        print(f"Warning: PyYAML not available; skipping {path}", file=sys.stderr)
+        return []
+
+    try:
+        data = yaml.safe_load(path.read_text())
+    except yaml.YAMLError as e:
+        print(f"Warning: cannot parse {path}: {e}", file=sys.stderr)
+        return []
+
+    if not isinstance(data, dict):
+        return []
+
+    entries = []
+
+    for item in data.get("collections", []) or []:
+        if isinstance(item, str):
+            name, version = item, None
+        elif isinstance(item, dict):
+            name = item.get("name", "")
+            version = str(item.get("version", "")) if item.get("version") else None
+        else:
+            continue
+        if name:
+            entries.append({
+                "package_name": name,
+                "package_version": version,
+                "ecosystem": "ansible",
+                "license_spdx": None,
+                "is_direct": True,
+                "is_dev": False,
+            })
+
+    for item in data.get("roles", []) or []:
+        if isinstance(item, str):
+            name, version = item, None
+        elif isinstance(item, dict):
+            name = item.get("name", item.get("src", ""))
+            version = str(item.get("version", "")) if item.get("version") else None
+        else:
+            continue
+        if name:
+            entries.append({
+                "package_name": name,
+                "package_version": version,
+                "ecosystem": "ansible",
+                "license_spdx": None,
+                "is_direct": True,
+                "is_dev": False,
+            })
+
+    return entries
+
+
+def _parse_sbom_tools_yaml(path: Path) -> list[dict]:
+    """Parse sbom-tools.yaml — agent-generated tool manifest at repo root."""
+    if not _YAML_AVAILABLE:
+        print(f"Warning: PyYAML not available; skipping {path}", file=sys.stderr)
+        return []
+
+    try:
+        data = yaml.safe_load(path.read_text())
+    except yaml.YAMLError as e:
+        print(f"Warning: cannot parse {path}: {e}", file=sys.stderr)
+        return []
+
+    if not isinstance(data, dict):
+        return []
+
+    entries = []
+    valid_ecosystems = {
+        "python", "node", "rust", "go", "java",
+        "terraform", "ansible", "tool", "other",
+    }
+
+    for item in data.get("tools", []) or []:
+        if not isinstance(item, dict):
+            continue
+        name = item.get("name", "")
+        version = str(item.get("version", "")) if item.get("version") else None
+        if version == "unknown":
+            print(f"  Warning: tool '{name}' has version=unknown — flagged for review", file=sys.stderr)
+            version = None
+        ecosystem = item.get("ecosystem", "tool")
+        if ecosystem not in valid_ecosystems:
+            print(f"  Warning: unknown ecosystem '{ecosystem}' for '{name}'; using 'tool'", file=sys.stderr)
+            ecosystem = "tool"
+        license_spdx = item.get("license_spdx") or None
+        entries.append({
+            "package_name": name,
+            "package_version": version,
+            "ecosystem": ecosystem,
+            "license_spdx": license_spdx,
+            "is_direct": bool(item.get("is_direct", True)),
+            "is_dev": bool(item.get("is_dev", False)),
+        })
+
+    return entries
+
+
+# ---------------------------------------------------------------------------
+# Detection helpers
+# ---------------------------------------------------------------------------
+
+# Filename → parser for standard lockfiles (detected by filename anywhere in tree)
+_LOCKFILE_PARSERS: dict[str, object] = {
    "uv.lock": _parse_uv_lock,
    "requirements.txt": _parse_requirements_txt,
    "package-lock.json": _parse_package_lock_json,
@@ -234,6 +346,47 @@ _SKIP_DIRS = {
 }


+def detect_all(repo_path: Path) -> list[tuple[Path, str, object]]:
+    """Scan repo_path and return all discovered dependency sources.
+
+    Returns list of (path, label, parser_fn) tuples covering:
+    - Standard lockfiles (anywhere in tree)
+    - Ansible requirements files (in ansible/ subdirs)
+    - sbom-tools.yaml at repo root
+    """
+    found: list[tuple[Path, str, object]] = []
+    seen_paths: set[Path] = set()
+
+    # Walk tree for all source types
+    for dirpath, dirnames, filenames in os.walk(repo_path):
+        dirnames[:] = sorted(d for d in dirnames if d not in _SKIP_DIRS)
+        dirpath_p = Path(dirpath)
+
+        # Standard lockfiles
+        for fname, parser in _LOCKFILE_PARSERS.items():
+            if fname in filenames:
+                p = dirpath_p / fname
+                if p not in seen_paths:
+                    found.append((p, fname, parser))
+                    seen_paths.add(p)
+
+        # Ansible requirements files — only under directories named "ansible"
+        if dirpath_p.name == "ansible":
+            for fname in ("requirements.yml", "requirements.yaml"):
+                if fname in filenames:
+                    p = dirpath_p / fname
+                    if p not in seen_paths:
+                        found.append((p, f"ansible/{fname}", _parse_ansible_requirements))
+                        seen_paths.add(p)
+
+    # sbom-tools.yaml at repo root only
+    tools_manifest = repo_path / "sbom-tools.yaml"
+    if tools_manifest.exists() and tools_manifest not in seen_paths:
+        found.append((tools_manifest, "sbom-tools.yaml", _parse_sbom_tools_yaml))
+
+    return found
+
+
 def detect_lockfile(repo_path: Path) -> tuple[Path, str] | None:
    """Return (lockfile_path, filename) for the first recognised lockfile at repo root."""
    for name in _LOCKFILE_PARSERS:
@@ -244,7 +397,10 @@ def detect_lockfile(repo_path: Path) -> tuple[Path, str] | None:


 def detect_lockfiles_recursive(repo_path: Path) -> list[Path]:
-    """Walk repo_path and return all recognised lockfiles, skipping non-dep dirs."""
+    """Walk repo_path and return all recognised lockfiles, skipping non-dep dirs.
+
+    Kept for backwards compatibility; prefer detect_all() for new code.
+    """
    found: list[Path] = []
    for dirpath, dirnames, filenames in os.walk(repo_path):
        dirnames[:] = sorted(d for d in dirnames if d not in _SKIP_DIRS)
@@ -292,52 +448,47 @@ def post_ingest(api_base: str, repo_slug: str, entries: list[dict]) -> dict:
 # ---------------------------------------------------------------------------

 def main() -> None:
-    parser = argparse.ArgumentParser(description="Ingest a repo's lockfiles into the State Hub SBOM store.")
+    parser = argparse.ArgumentParser(
+        description="Ingest a repo's lockfiles and tool manifests into the State Hub SBOM store."
+    )
    parser.add_argument("--repo", required=True, help="Managed-repo slug (e.g. 'the-custodian')")
    parser.add_argument("--lockfile", action="append", dest="lockfiles",
                        metavar="PATH", help="Path to a specific lockfile (repeatable)")
    parser.add_argument("--repo-path", default=".", help="Repo root for auto-detection/scan (default: cwd)")
    parser.add_argument("--scan", action="store_true",
-                        help="Recursively find ALL lockfiles under --repo-path (handles multi-ecosystem repos)")
+                        help="Recursively find ALL lockfiles under --repo-path (deprecated; now default behaviour)")
    parser.add_argument("--api-base", default=API_BASE, help="State Hub API base URL")
    parser.add_argument("--dry-run", action="store_true", help="Parse only — do not submit")
    args = parser.parse_args()

    repo_root = Path(args.repo_path).resolve()
-    lockfile_paths: list[Path] = []
+    all_entries: list[dict] = []

    if args.lockfiles:
-        lockfile_paths = [Path(lf).resolve() for lf in args.lockfiles]
-    elif args.scan:
-        lockfile_paths = detect_lockfiles_recursive(repo_root)
-        if not lockfile_paths:
-            print(f"No lockfiles found under '{repo_root}'.", file=sys.stderr)
-            sys.exit(1)
-        print(f"Scan found {len(lockfile_paths)} lockfile(s):")
-        for lf in lockfile_paths:
-            print(f"  {lf.relative_to(repo_root) if lf.is_relative_to(repo_root) else lf}")
+        # Explicit paths: parse each, detect parser by filename
+        for lf_str in args.lockfiles:
+            lf = Path(lf_str).resolve()
+            parsed = parse_lockfile(lf)
+            rel = lf.relative_to(repo_root) if lf.is_relative_to(repo_root) else lf
+            print(f"  {rel}: {len(parsed)} packages")
+            all_entries.extend(parsed)
    else:
-        found = detect_lockfile(repo_root)
-        if not found:
+        # Comprehensive auto-detection: all mechanisms in one scan
+        sources = detect_all(repo_root)
+        if not sources:
            print(
-                f"No recognised lockfile found in '{repo_root}'. "
-                f"Supported: {', '.join(_LOCKFILE_PARSERS)}. "
-                "Use --scan to search subdirectories.",
+                f"No recognised dependency sources found in '{repo_root}'.",
                file=sys.stderr,
            )
            sys.exit(1)
-        lockfile_path, _ = found
-        print(f"Auto-detected: {lockfile_path}")
-        lockfile_paths = [lockfile_path]

-    all_entries: list[dict] = []
-    for lf in lockfile_paths:
-        parsed = parse_lockfile(lf)
-        rel = lf.relative_to(repo_root) if lf.is_relative_to(repo_root) else lf
-        print(f"  {rel}: {len(parsed)} packages")
-        all_entries.extend(parsed)
+        for src_path, label, parser_fn in sources:
+            parsed = parser_fn(src_path)
+            rel = src_path.relative_to(repo_root) if src_path.is_relative_to(repo_root) else src_path
+            print(f"  {label} ({rel}): {len(parsed)} entries")
+            all_entries.extend(parsed)

-    print(f"Total: {len(all_entries)} packages across {len(lockfile_paths)} lockfile(s)")
+    print(f"Total: {len(all_entries)} entries")

    if args.dry_run:
        print(json.dumps(all_entries[:5], indent=2))
--- a/state-hub/tests/test_ingest_sbom.py
+++ b/state-hub/tests/test_ingest_sbom.py
@@ -0,0 +1,397 @@
+"""Unit tests for ingest_sbom.py parsers and auto-detection."""
+from __future__ import annotations
+
+import json
+import sys
+import textwrap
+from pathlib import Path
+
+import pytest
+
+# Make scripts/ importable
+sys.path.insert(0, str(Path(__file__).parent.parent / "scripts"))
+import ingest_sbom as sb
+
+
+# ---------------------------------------------------------------------------
+# Terraform parser
+# ---------------------------------------------------------------------------
+
+TERRAFORM_LOCK = textwrap.dedent("""\
+    provider "registry.terraform.io/hashicorp/template" {
+      version     = "2.2.0"
+      constraints = ">= 2.0.0"
+      hashes = [
+        "h1:abc123",
+      ]
+    }
+
+    provider "registry.terraform.io/hetznercloud/hcloud" {
+      version     = "1.52.0"
+      constraints = ">= 1.40.0"
+    }
+""")
+
+
+def test_terraform_parser_ecosystem(tmp_path):
+    lock = tmp_path / ".terraform.lock.hcl"
+    lock.write_text(TERRAFORM_LOCK)
+    entries = sb._parse_terraform_lock_hcl(lock)
+    assert len(entries) == 2
+    for e in entries:
+        assert e["ecosystem"] == "terraform", f"expected terraform, got {e['ecosystem']}"
+    names = {e["package_name"] for e in entries}
+    assert "registry.terraform.io/hashicorp/template" in names
+    assert "registry.terraform.io/hetznercloud/hcloud" in names
+
+
+def test_terraform_parser_versions(tmp_path):
+    lock = tmp_path / ".terraform.lock.hcl"
+    lock.write_text(TERRAFORM_LOCK)
+    entries = sb._parse_terraform_lock_hcl(lock)
+    by_name = {e["package_name"]: e for e in entries}
+    assert by_name["registry.terraform.io/hashicorp/template"]["package_version"] == "2.2.0"
+    assert by_name["registry.terraform.io/hetznercloud/hcloud"]["package_version"] == "1.52.0"
+
+
+def test_terraform_parser_is_direct(tmp_path):
+    lock = tmp_path / ".terraform.lock.hcl"
+    lock.write_text(TERRAFORM_LOCK)
+    entries = sb._parse_terraform_lock_hcl(lock)
+    assert all(e["is_direct"] for e in entries)
+
+
+def test_terraform_parser_empty(tmp_path):
+    lock = tmp_path / ".terraform.lock.hcl"
+    lock.write_text("# no providers\n")
+    entries = sb._parse_terraform_lock_hcl(lock)
+    assert entries == []
+
+
+# ---------------------------------------------------------------------------
+# Ansible Galaxy parser
+# ---------------------------------------------------------------------------
+
+ANSIBLE_REQUIREMENTS_FULL = textwrap.dedent("""\
+    collections:
+      - name: community.general
+        version: "9.5.0"
+      - name: ansible.posix
+        version: "1.6.0"
+      - community.crypto
+
+    roles:
+      - name: geerlingguy.docker
+        version: "6.1.0"
+      - geerlingguy.pip
+""")
+
+ANSIBLE_REQUIREMENTS_EMPTY = textwrap.dedent("""\
+    collections: []
+    roles: []
+""")
+
+ANSIBLE_REQUIREMENTS_COLLECTIONS_ONLY = textwrap.dedent("""\
+    collections:
+      - name: community.general
+        version: "9.0.0"
+""")
+
+
+def test_ansible_parser_collections_and_roles(tmp_path):
+    req = tmp_path / "requirements.yml"
+    req.write_text(ANSIBLE_REQUIREMENTS_FULL)
+    entries = sb._parse_ansible_requirements(req)
+    assert len(entries) == 5
+    names = {e["package_name"] for e in entries}
+    assert "community.general" in names
+    assert "ansible.posix" in names
+    assert "community.crypto" in names
+    assert "geerlingguy.docker" in names
+    assert "geerlingguy.pip" in names
+
+
+def test_ansible_parser_ecosystem(tmp_path):
+    req = tmp_path / "requirements.yml"
+    req.write_text(ANSIBLE_REQUIREMENTS_FULL)
+    entries = sb._parse_ansible_requirements(req)
+    for e in entries:
+        assert e["ecosystem"] == "ansible"
+
+
+def test_ansible_parser_versions(tmp_path):
+    req = tmp_path / "requirements.yml"
+    req.write_text(ANSIBLE_REQUIREMENTS_FULL)
+    entries = sb._parse_ansible_requirements(req)
+    by_name = {e["package_name"]: e for e in entries}
+    assert by_name["community.general"]["package_version"] == "9.5.0"
+    assert by_name["ansible.posix"]["package_version"] == "1.6.0"
+    assert by_name["community.crypto"]["package_version"] is None  # no version specified
+    assert by_name["geerlingguy.docker"]["package_version"] == "6.1.0"
+    assert by_name["geerlingguy.pip"]["package_version"] is None
+
+
+def test_ansible_parser_is_direct(tmp_path):
+    req = tmp_path / "requirements.yml"
+    req.write_text(ANSIBLE_REQUIREMENTS_FULL)
+    entries = sb._parse_ansible_requirements(req)
+    assert all(e["is_direct"] for e in entries)
+
+
+def test_ansible_parser_empty(tmp_path):
+    req = tmp_path / "requirements.yml"
+    req.write_text(ANSIBLE_REQUIREMENTS_EMPTY)
+    entries = sb._parse_ansible_requirements(req)
+    assert entries == []
+
+
+def test_ansible_parser_collections_only(tmp_path):
+    req = tmp_path / "requirements.yml"
+    req.write_text(ANSIBLE_REQUIREMENTS_COLLECTIONS_ONLY)
+    entries = sb._parse_ansible_requirements(req)
+    assert len(entries) == 1
+    assert entries[0]["package_name"] == "community.general"
+
+
+def test_ansible_parser_yaml_extension(tmp_path):
+    """Both .yml and .yaml extensions must work."""
+    req = tmp_path / "requirements.yaml"
+    req.write_text(ANSIBLE_REQUIREMENTS_COLLECTIONS_ONLY)
+    entries = sb._parse_ansible_requirements(req)
+    assert len(entries) == 1
+
+
+def test_ansible_parser_invalid_yaml(tmp_path, capsys):
+    req = tmp_path / "requirements.yml"
+    req.write_text("collections: [unclosed")
+    entries = sb._parse_ansible_requirements(req)
+    assert entries == []
+    captured = capsys.readouterr()
+    assert "Warning" in captured.err
+
+
+# ---------------------------------------------------------------------------
+# sbom-tools.yaml parser
+# ---------------------------------------------------------------------------
+
+SBOM_TOOLS_YAML = textwrap.dedent("""\
+    tools:
+      - name: ansible
+        version: "12.3.0"
+        ecosystem: ansible
+        license_spdx: GPL-3.0-only
+        is_direct: true
+        is_dev: false
+      - name: terraform
+        version: "1.10.5"
+        ecosystem: terraform
+        license_spdx: BSL-1.1
+        is_direct: true
+        is_dev: false
+      - name: helm
+        version: "3.17.1"
+        ecosystem: tool
+        license_spdx: Apache-2.0
+        is_direct: true
+        is_dev: false
+      - name: k3s
+        version: unknown
+        ecosystem: other
+        license_spdx: Apache-2.0
+        is_direct: true
+        is_dev: false
+""")
+
+SBOM_TOOLS_YAML_MINIMAL = textwrap.dedent("""\
+    tools:
+      - name: kubectl
+        ecosystem: tool
+""")
+
+
+def test_sbom_tools_parser_basic(tmp_path):
+    manifest = tmp_path / "sbom-tools.yaml"
+    manifest.write_text(SBOM_TOOLS_YAML)
+    entries = sb._parse_sbom_tools_yaml(manifest)
+    assert len(entries) == 4
+    names = {e["package_name"] for e in entries}
+    assert {"ansible", "terraform", "helm", "k3s"} == names
+
+
+def test_sbom_tools_parser_ecosystems(tmp_path):
+    manifest = tmp_path / "sbom-tools.yaml"
+    manifest.write_text(SBOM_TOOLS_YAML)
+    entries = sb._parse_sbom_tools_yaml(manifest)
+    by_name = {e["package_name"]: e for e in entries}
+    assert by_name["ansible"]["ecosystem"] == "ansible"
+    assert by_name["terraform"]["ecosystem"] == "terraform"
+    assert by_name["helm"]["ecosystem"] == "tool"
+    assert by_name["k3s"]["ecosystem"] == "other"
+
+
+def test_sbom_tools_parser_licenses(tmp_path):
+    manifest = tmp_path / "sbom-tools.yaml"
+    manifest.write_text(SBOM_TOOLS_YAML)
+    entries = sb._parse_sbom_tools_yaml(manifest)
+    by_name = {e["package_name"]: e for e in entries}
+    assert by_name["ansible"]["license_spdx"] == "GPL-3.0-only"
+    assert by_name["terraform"]["license_spdx"] == "BSL-1.1"
+    assert by_name["helm"]["license_spdx"] == "Apache-2.0"
+
+
+def test_sbom_tools_parser_unknown_version_becomes_none(tmp_path, capsys):
+    """version: unknown must be converted to None and emit a warning."""
+    manifest = tmp_path / "sbom-tools.yaml"
+    manifest.write_text(SBOM_TOOLS_YAML)
+    entries = sb._parse_sbom_tools_yaml(manifest)
+    by_name = {e["package_name"]: e for e in entries}
+    assert by_name["k3s"]["package_version"] is None
+    captured = capsys.readouterr()
+    assert "unknown" in captured.err
+
+
+def test_sbom_tools_parser_minimal_entry(tmp_path):
+    """Only 'name' and 'ecosystem' required; version and license default to None."""
+    manifest = tmp_path / "sbom-tools.yaml"
+    manifest.write_text(SBOM_TOOLS_YAML_MINIMAL)
+    entries = sb._parse_sbom_tools_yaml(manifest)
+    assert len(entries) == 1
+    e = entries[0]
+    assert e["package_name"] == "kubectl"
+    assert e["ecosystem"] == "tool"
+    assert e["package_version"] is None
+    assert e["license_spdx"] is None
+    assert e["is_direct"] is True
+    assert e["is_dev"] is False
+
+
+def test_sbom_tools_parser_invalid_ecosystem_falls_back(tmp_path, capsys):
+    manifest = tmp_path / "sbom-tools.yaml"
+    manifest.write_text("tools:\n  - name: foo\n    ecosystem: nonsense\n")
+    entries = sb._parse_sbom_tools_yaml(manifest)
+    assert entries[0]["ecosystem"] == "tool"
+    captured = capsys.readouterr()
+    assert "Warning" in captured.err
+
+
+def test_sbom_tools_parser_empty_tools(tmp_path):
+    manifest = tmp_path / "sbom-tools.yaml"
+    manifest.write_text("tools: []\n")
+    entries = sb._parse_sbom_tools_yaml(manifest)
+    assert entries == []
+
+
+def test_sbom_tools_parser_invalid_yaml(tmp_path, capsys):
+    manifest = tmp_path / "sbom-tools.yaml"
+    manifest.write_text("tools: {bad yaml: [unclosed")
+    entries = sb._parse_sbom_tools_yaml(manifest)
+    assert entries == []
+    captured = capsys.readouterr()
+    assert "Warning" in captured.err
+
+
+# ---------------------------------------------------------------------------
+# detect_all — comprehensive multi-parser scan
+# ---------------------------------------------------------------------------
+
+def test_detect_all_uv_lock(tmp_path):
+    (tmp_path / "uv.lock").write_text("[[package]]\nname = \"typer\"\nversion = \"0.12.0\"\n")
+    sources = sb.detect_all(tmp_path)
+    labels = {label for _, label, _ in sources}
+    assert "uv.lock" in labels
+
+
+def test_detect_all_terraform_lock(tmp_path):
+    tf_dir = tmp_path / "terraform" / "hetzner"
+    tf_dir.mkdir(parents=True)
+    (tf_dir / ".terraform.lock.hcl").write_text(
+        'provider "registry.terraform.io/hetznercloud/hcloud" {\n  version = "1.52.0"\n}\n'
+    )
+    sources = sb.detect_all(tmp_path)
+    labels = {label for _, label, _ in sources}
+    assert ".terraform.lock.hcl" in labels
+
+
+def test_detect_all_ansible_requirements(tmp_path):
+    ansible_dir = tmp_path / "ansible"
+    ansible_dir.mkdir()
+    (ansible_dir / "requirements.yml").write_text("collections:\n  - name: community.general\n")
+    sources = sb.detect_all(tmp_path)
+    labels = {label for _, label, _ in sources}
+    assert "ansible/requirements.yml" in labels
+
+
+def test_detect_all_sbom_tools_yaml(tmp_path):
+    (tmp_path / "sbom-tools.yaml").write_text("tools:\n  - name: helm\n    ecosystem: tool\n")
+    sources = sb.detect_all(tmp_path)
+    labels = {label for _, label, _ in sources}
+    assert "sbom-tools.yaml" in labels
+
+
+def test_detect_all_multi_ecosystem(tmp_path):
+    """A repo with Python + Terraform + Ansible + tools manifest yields all four."""
+    # Python
+    (tmp_path / "uv.lock").write_text("[[package]]\nname = \"typer\"\nversion = \"0.12.0\"\n")
+    # Terraform
+    tf_dir = tmp_path / "terraform"
+    tf_dir.mkdir()
+    (tf_dir / ".terraform.lock.hcl").write_text(
+        'provider "registry.terraform.io/hashicorp/null" {\n  version = "3.2.3"\n}\n'
+    )
+    # Ansible
+    ansible_dir = tmp_path / "ansible"
+    ansible_dir.mkdir()
+    (ansible_dir / "requirements.yml").write_text("collections:\n  - name: ansible.posix\n    version: \"1.6.0\"\n")
+    # Tool manifest
+    (tmp_path / "sbom-tools.yaml").write_text("tools:\n  - name: helm\n    ecosystem: tool\n    version: \"3.17.1\"\n")
+
+    sources = sb.detect_all(tmp_path)
+    labels = {label for _, label, _ in sources}
+    assert "uv.lock" in labels
+    assert ".terraform.lock.hcl" in labels
+    assert "ansible/requirements.yml" in labels
+    assert "sbom-tools.yaml" in labels
+
+    # Parse all and verify merged entries
+    all_entries = []
+    for path, label, parser_fn in sources:
+        all_entries.extend(parser_fn(path))
+
+    ecosystems = {e["ecosystem"] for e in all_entries}
+    assert "python" in ecosystems
+    assert "terraform" in ecosystems
+    assert "ansible" in ecosystems
+    assert "tool" in ecosystems
+
+
+def test_detect_all_skips_venv(tmp_path):
+    """Lockfiles inside .venv must be ignored."""
+    venv_dir = tmp_path / ".venv" / "lib"
+    venv_dir.mkdir(parents=True)
+    (venv_dir / "requirements.txt").write_text("requests==2.31.0\n")
+    sources = sb.detect_all(tmp_path)
+    paths = {str(p) for p, _, _ in sources}
+    assert not any(".venv" in p for p in paths)
+
+
+def test_detect_all_ansible_req_only_in_ansible_dir(tmp_path):
+    """requirements.yml at repo root (not in ansible/) should not be picked up as ansible."""
+    (tmp_path / "requirements.yml").write_text("collections:\n  - name: community.general\n")
+    sources = sb.detect_all(tmp_path)
+    labels = {label for _, label, _ in sources}
+    # Should NOT be detected since it's not under an 'ansible/' directory
+    assert "ansible/requirements.yml" not in labels
+    assert "ansible/requirements.yaml" not in labels
+
+
+def test_detect_all_no_duplicates(tmp_path):
+    """Same file should not appear twice."""
+    (tmp_path / "uv.lock").write_text("[[package]]\nname = \"x\"\nversion = \"1.0\"\n")
+    sources = sb.detect_all(tmp_path)
+    paths = [p for p, _, _ in sources]
+    assert len(paths) == len(set(paths))
+
+
+def test_detect_all_empty_repo(tmp_path):
+    sources = sb.detect_all(tmp_path)
+    assert sources == []
--- a/workplans/CUST-WP-0013-sbom-infra-expansion.md
+++ b/workplans/CUST-WP-0013-sbom-infra-expansion.md
@@ -0,0 +1,386 @@
+---
+id: CUST-WP-0013
+type: workplan
+title: "SBOM Infrastructure Expansion"
+domain: custodian
+repo: the-custodian
+status: completed
+owner: custodian
+topic_slug: custodian
+state_hub_workstream_id: f4ba84c8-4d47-492d-b65e-73b157271a2b
+created: "2026-03-12"
+updated: "2026-03-12"
+---
+
+# CUST-WP-0013 — SBOM Infrastructure Expansion
+
+**Scope:** Extend SBOM capture beyond Python packages to cover Terraform providers,
+Ansible Galaxy collections, and system-level tools (Ansible, Terraform, Helm, k3s,
+cloud-init, etc.). Introduces an agent-assisted tool manifest capture workflow,
+new ecosystem enum values, comprehensive auto-detection in `ingest_sbom.py`, and
+delivers full SBOM coverage for `railiance-infra` and `railiance-cluster`.
+
+**Drives:** Licence risk visibility across the full dependency graph, not just
+language-level packages.
+
+---
+
+## Design Decisions
+
+### Tool manifest: agent-generated, not hand-written
+
+System tools (Ansible, Terraform, Helm, k3s, etc.) live outside any lockfile —
+they are provisioned, not installed by a package manager. Rather than asking
+operators to maintain a hand-written manifest, the SBOM capture agent inspects
+the repo and generates/updates `sbom-tools.yaml` automatically.
+
+The agent prompt (`state-hub/prompts/sbom-capture-agent.md`) is parameterised
+per repo. It reads the repo's CLAUDE.md, Makefile, README, CI configs, version
+pins, and provisioning files, then emits a structured `sbom-tools.yaml` with
+tool name, version, ecosystem, SPDX licence, and directness flags.
+
+A thin wrapper script (`state-hub/scripts/capture_sbom_tools.py`) invokes the
+agent prompt via `claude -p` (or prints it for manual use) and writes the result
+to `<repo-root>/sbom-tools.yaml`.
+
+### Comprehensive ingest: all mechanisms per repo
+
+`make ingest-sbom REPO=<slug>` must run all applicable parsers, not just
+whichever lockfile happens to be auto-detected first. The updated auto-detection
+in `ingest_sbom.py` scans:
+
+1. Package manager lockfiles (`uv.lock`, `requirements.txt`, `package-lock.json`,
+   `yarn.lock`, `Cargo.lock`, `go.sum`)
+2. Terraform provider locks (`.terraform.lock.hcl`, anywhere in the tree)
+3. Ansible Galaxy manifests (`requirements.yml` / `requirements.yaml`, anywhere
+   in the tree under `ansible/`)
+4. Agent-generated tool manifest (`sbom-tools.yaml` at repo root)
+
+All parsers run and their results are merged into a single snapshot.
+
+---
+
+## Phase 1 — Schema: Ecosystem Enum Extension
+
+**Acceptance:** `terraform` and `ansible` are valid ecosystem values; existing
+`other` entries are unaffected; migration applies cleanly.
+
+### T01 — Alembic migration: add terraform and ansible enum values
+
+```task
+id: CUST-WP-0013-T01
+state_hub_task_id: c0b6edc4-86ab-4cee-88a8-6c66fb81adee
+status: done
+priority: high
+```
+
+Add `terraform` and `ansible` to the `Ecosystem` enum in the DB. Check whether
+the column uses a native PostgreSQL ENUM type (requiring `ALTER TYPE`) or a
+`String` column (requiring no migration). Write the migration accordingly.
+Also add `tool` as a catch-all for tool-manifest entries that don't fit a
+named ecosystem.
+
+---
+
+## Phase 2 — Parser Improvements in ingest_sbom.py
+
+**Acceptance:** `--dry-run` on railiance-infra shows terraform providers and
+ansible collections correctly labelled; tool manifest entries appear with the
+declared ecosystem.
+
+### T02 — Promote Terraform parser: other → terraform ecosystem
+
+```task
+id: CUST-WP-0013-T02
+state_hub_task_id: 7686bccd-022c-4e30-8081-c8487eb82253
+status: done
+priority: high
+```
+
+The `.terraform.lock.hcl` parser already exists in `ingest_sbom.py` but stores
+entries as `ecosystem="other"`. Change to `ecosystem="terraform"` after T01
+migration lands. Re-ingest any repos that previously ingested terraform entries
+as `other` to correct the label.
+
+### T03 — Implement Ansible Galaxy requirements.yml parser
+
+```task
+id: CUST-WP-0013-T03
+state_hub_task_id: 48658bdd-4d16-4be0-a87e-45df4f4901b0
+status: done
+priority: high
+```
+
+Parse `requirements.yml` / `requirements.yaml` files found in `ansible/`
+subdirectories. Standard format:
+
+```yaml
+collections:
+  - name: community.general
+    version: "9.5.0"
+roles:
+  - name: geerlingguy.docker
+    version: "6.x"
+```
+
+Store as `ecosystem="ansible"`, `is_direct=True`. Licence left `null` (Galaxy
+API lookup is deferred). Handle both `collections:` and `roles:` blocks.
+
+### T04 — Implement sbom-tools.yaml manifest parser
+
+```task
+id: CUST-WP-0013-T04
+state_hub_task_id: 4522ea04-134b-40ee-a7a2-ea0e4c1c061d
+status: done
+priority: high
+```
+
+Parse `sbom-tools.yaml` at the repo root (written by the capture agent). Schema:
+
+```yaml
+# Generated by sbom-capture-agent — review before committing
+tools:
+  - name: ansible
+    version: "12.3.0"
+    ecosystem: ansible        # or: terraform, other, python, etc.
+    license_spdx: GPL-3.0-only
+    is_direct: true
+    is_dev: false
+  - name: helm
+    version: "3.17.x"
+    ecosystem: other
+    license_spdx: Apache-2.0
+    is_direct: true
+    is_dev: false
+```
+
+Supports all existing ecosystem values plus `tool`. Pass entries through the
+same normalisation as lockfile entries. Skip entries with `version: unknown`
+with a warning (agent could not determine version).
+
+### T05 — Comprehensive auto-detection: all formats in one scan
+
+```task
+id: CUST-WP-0013-T05
+state_hub_task_id: cdda6bf2-2a44-4444-a04a-ac2fe2314923
+status: done
+priority: high
+```
+
+Refactor the `--repo-path` scan to discover and run all applicable parsers,
+not just the first match. Scan order:
+
+1. Walk tree for all `uv.lock`, `requirements.txt`, `package-lock.json`,
+   `yarn.lock`, `Cargo.lock`
+2. Walk tree for all `.terraform.lock.hcl`
+3. Walk tree for `ansible/requirements.yml` and `ansible/requirements.yaml`
+4. Check repo root for `sbom-tools.yaml`
+
+Merge all results into a single batch for the snapshot ingest call. Log a
+summary line per parser: `  <parser>: N packages from <path>`.
+
+### T06 — Unit tests for new parsers
+
+```task
+id: CUST-WP-0013-T06
+state_hub_task_id: fee37e66-8f41-4dba-995b-97fc66493caf
+status: done
+priority: medium
+```
+
+Add test fixtures and unit tests for:
+- Ansible Galaxy requirements.yml (collections + roles, version pinned and
+  unpinned)
+- sbom-tools.yaml (valid, missing version, unknown ecosystem)
+- Multi-parser scan: repo root with uv.lock + .terraform.lock.hcl +
+  sbom-tools.yaml produces merged results
+
+---
+
+## Phase 3 — SBOM Capture Agent
+
+**Acceptance:** `make capture-tools REPO=railiance-infra` produces a reviewed
+`sbom-tools.yaml` that correctly identifies Ansible, Terraform, Helm, and other
+declared tools with versions and SPDX licences.
+
+### T07 — Write SBOM capture agent prompt
+
+```task
+id: CUST-WP-0013-T07
+state_hub_task_id: a3b919b5-63b0-44f7-a048-ebfae603ef7b
+status: done
+priority: high
+```
+
+Write `state-hub/prompts/sbom-capture-agent.md` — a Claude agent prompt
+parameterised with `{repo_slug}` and `{repo_path}`. The prompt instructs the
+agent to:
+
+1. Read `CLAUDE.md`, `Makefile`, `README.md`, `pyproject.toml`, `.tool-versions`,
+   CI configs, Dockerfiles, and provisioning files in `{repo_path}`
+2. Identify all system-level tools: name, version (from version pins, Makefile
+   vars, or documented prerequisites), ecosystem, SPDX licence
+3. Identify indirect/transitive tool deps (e.g. Ansible → Python; Terraform →
+   provider plugins already captured by `.terraform.lock.hcl`)
+4. Emit a well-formed `sbom-tools.yaml` with a comment header noting generation
+   date and confidence level per entry (`# confidence: high/medium/low`)
+5. Flag any tools where version could not be determined (`version: unknown`) for
+   human review
+
+The prompt must not hallucinate versions — it must derive them from evidence in
+the repo or mark them unknown.
+
+### T08 — Implement capture_sbom_tools.py
+
+```task
+id: CUST-WP-0013-T08
+state_hub_task_id: 9593dca7-e713-4d7a-b4f2-c5333ae0b3d2
+status: done
+priority: high
+```
+
+Write `state-hub/scripts/capture_sbom_tools.py`:
+
+- Accepts `--repo SLUG` and `--repo-path PATH`
+- Resolves repo path from slug via the state-hub API if `--repo-path` is omitted
+- Loads the agent prompt from `prompts/sbom-capture-agent.md`, substitutes
+  `{repo_slug}` and `{repo_path}`
+- Invokes `claude -p "<prompt>"` (non-interactive) and captures stdout
+- Parses the YAML block from the response
+- Writes or updates `<repo-path>/sbom-tools.yaml`
+- Prints a diff of changes if the file already exists
+- `--dry-run` flag: print the prompt and diff without writing
+
+### T09 — Add make capture-tools target
+
+```task
+id: CUST-WP-0013-T09
+state_hub_task_id: 6948e1d2-9c97-4709-bdb0-4b6ded700a22
+status: done
+priority: medium
+```
+
+Add to `state-hub/Makefile`:
+
+```makefile
+capture-tools:  ## Run SBOM capture agent for a repo (REPO=slug, REPO_PATH=path)
+	uv run python scripts/capture_sbom_tools.py --repo $(REPO) $(if $(REPO_PATH),--repo-path $(REPO_PATH),)
+```
+
+Also update `make ingest-sbom` to note that `capture-tools` should be run first
+for repos that have system-level tool dependencies.
+
+---
+
+## Phase 4 — Ingest railiance-infra
+
+**Acceptance:** `make ingest-sbom REPO=railiance-infra` shows terraform providers,
+ansible collections, and tool manifest entries in one snapshot.
+
+### T10 — Capture tools manifest for railiance-infra
+
+```task
+id: CUST-WP-0013-T10
+state_hub_task_id: 99b23998-5129-4777-9d42-7bee5981cdbb
+status: done
+priority: medium
+```
+
+Run `make capture-tools REPO=railiance-infra`. Review the generated
+`railiance-infra/sbom-tools.yaml` — verify Ansible, Terraform, cloud-init, goss,
+and any other tools with their versions and licences. Correct any `unknown`
+versions by consulting the repo. Commit the file.
+
+### T11 — Ingest railiance-infra
+
+```task
+id: CUST-WP-0013-T11
+state_hub_task_id: bb516909-f903-48ce-b60b-a24245e7382e
+status: done
+priority: medium
+```
+
+Run `make ingest-sbom REPO=railiance-infra REPO_PATH=~/railiance-infra`. Verify
+the snapshot contains:
+- Terraform providers (from `.terraform.lock.hcl`)
+- Ansible Galaxy collections (from `ansible/requirements.yaml`)
+- System tools (from `sbom-tools.yaml`)
+
+Check the licence report for any copyleft or BSL flags.
+
+---
+
+## Phase 5 — Ingest railiance-cluster
+
+**Acceptance:** railiance-cluster SBOM covers both Python packages (uv.lock) and
+system tools in a single snapshot.
+
+### T12 — Capture tools manifest for railiance-cluster
+
+```task
+id: CUST-WP-0013-T12
+state_hub_task_id: 7a890f1a-da9f-4e6d-86a7-4fd1aefd5b3f
+status: done
+priority: medium
+```
+
+Run `make capture-tools REPO=railiance-cluster`. Review the generated
+`railiance-cluster/sbom-tools.yaml` — verify Helm, kubectl, k3s, and any other
+operational tools. Commit the file.
+
+### T13 — Re-ingest railiance-cluster
+
+```task
+id: CUST-WP-0013-T13
+state_hub_task_id: 789dbe93-011a-4470-9fec-ebf249cd7134
+status: done
+priority: medium
+```
+
+Run `make ingest-sbom REPO=railiance-cluster REPO_PATH=~/railiance-cluster`.
+Verify the snapshot merges uv.lock (Python packages including ansible-core) and
+sbom-tools.yaml entries into one coherent snapshot. Confirm ansible-core GPL-3.0
+flag appears in the licence report.
+
+---
+
+## Phase 6 — Convention Documentation
+
+**Acceptance:** A developer reading the SBOM convention doc knows exactly how to
+add a new repo to SBOM coverage.
+
+### T14 — Document SBOM capture convention in canon/standards
+
+```task
+id: CUST-WP-0013-T14
+state_hub_task_id: dc3bb2a3-882e-4dd7-ab7c-8b1e88279a7d
+status: done
+priority: low
+```
+
+Write `canon/standards/sbom-convention_v0.1.md` documenting:
+- The four capture mechanisms and when each applies
+- The `sbom-tools.yaml` schema (with confidence annotation convention)
+- The `make capture-tools` → review → commit → `make ingest-sbom` workflow
+- Licence risk thresholds: copyleft = flag for review; BSL = flag for review;
+  null licence = acceptable for infra tools if well-known open source
+
+---
+
+## Licence Risk Preview
+
+Based on known tool licences, expect these flags once ingested:
+
+| Tool / Package | Licence | Risk level |
+|---|---|---|
+| ansible-core | GPL-3.0-only | Copyleft — flag (ops toolchain, not shipped) |
+| terraform ≥ 1.5.6 | BSL-1.1 | Non-OSI — flag for review |
+| hashicorp providers | BSL-1.1 | Same |
+| community.general | GPL-3.0 | Copyleft — flag (ops toolchain) |
+| Helm | Apache-2.0 | Clean |
+| k3s | Apache-2.0 | Clean |
+| cloud-init | Apache-2.0 / GPL-3.0 | Mixed — check version |
+| goss | Apache-2.0 | Clean |
+
+All copyleft/BSL entries here are **operational toolchain** dependencies, not
+shipped code — risk is low but worth tracking for compliance awareness.