feat(sbom): scan mode, domain grouping dashboard, SBOM convention doc

- ingest_sbom.py: add --scan flag (recursive lockfile discovery) +
  --lockfile repeatable for explicit multi-file ingestion; skip
  .venv/node_modules/.git/dist/etc; Makefile gains SCAN= and REPO_PATH= vars
- sbom.md: add /domains/ fetch; domain-level summary table; per-repo
  accordion with details/summary; domain filter on package table; dual-
  licence false-positive note; +1 KPI card (Domains Covered)
- canon/standards/sbom-convention_v0.1.md: authoritative lockfile table,
  ingest workflow (single/scan/explicit), snapshot semantics, direct-vs-
  transitive caveats, licence governance + copyleft escalation, update
  cadence, multi-repo domain pattern, planned enhancements

First ingest: the-custodian — 420 pkgs (88 python + 332 node), 13 licence
groups, 1 copyleft flag (jszip dual-licensed MIT OR GPL-3.0-or-later)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-03-01 16:15:40 +01:00
parent e471ed2cd5
commit 1c3c6ef27d
4 changed files with 450 additions and 29 deletions

View File

@@ -0,0 +1,253 @@
---
id: SBOM-CONV-001
type: standard
title: "SBOM Convention v0.1 — Dependency Tracking & Licence Governance"
domain: custodian
status: active
version: "0.1"
created: "2026-03-01"
updated: "2026-03-01"
---
# SBOM Convention v0.1 — Dependency Tracking & Licence Governance
## Purpose
This convention defines how every Custodian-registered project captures,
stores, and reports its software supply-chain inventory to the State Hub SBOM
store. It establishes:
- Which lockfiles are authoritative per ecosystem
- How to run SBOM ingestion (single-ecosystem and multi-ecosystem repos)
- How to keep the data current
- Licence governance rules and escalation thresholds
The State Hub SBOM store aggregates across all registered repos. The
dashboard (`/sbom`) provides domain-level and repo-level drill-down.
---
## 1. Authoritative Lockfiles per Ecosystem
| Ecosystem | Authoritative file | Notes |
|-----------|-------------------|-------|
| Python | `uv.lock` | Preferred. `requirements.txt` accepted as fallback |
| Node / npm | `package-lock.json` | Preferred. `yarn.lock` accepted |
| Rust | `Cargo.lock` | Auto-detected |
| Go | `go.sum` | *Not yet parsed — planned* |
| Java / JVM | `gradle.lockfile` / `pom.xml` | *Not yet parsed — planned* |
**Principle:** commit lockfiles to the repo. Lockfiles are the SBOM source
of truth; do not generate them at ingest time.
---
## 2. Repo Registration Prerequisite
Before SBOM data can be reported, the repo must be registered in the State Hub:
```bash
cd ~/the-custodian/state-hub
make add-repo DOMAIN=<domain-slug> SLUG=<repo-slug> NAME="<Display Name>" PATH=/absolute/path/to/repo
```
Check registered repos:
```bash
make list-repos
# or
curl -s http://127.0.0.1:8000/repos/ | python3 -m json.tool
```
---
## 3. SBOM Ingestion
### 3.1 Standard ingest (single lockfile at repo root)
```bash
cd ~/the-custodian/state-hub
make ingest-sbom REPO=<slug> REPO_PATH=/path/to/repo
```
The script auto-detects the first recognised lockfile at `REPO_PATH`.
### 3.2 Multi-ecosystem repos (recommended for complex repos)
Use `SCAN=1` to walk the repo tree and combine **all** lockfiles into a single
snapshot. Non-dep directories (`.venv`, `node_modules`, `.git`, `dist`, etc.)
are automatically skipped.
```bash
make ingest-sbom REPO=the-custodian SCAN=1 REPO_PATH=/home/worsch/the-custodian
```
This is the correct approach for repos that contain both a backend and a
frontend (e.g., a Python API + Node/Observable dashboard).
### 3.3 Explicit lockfile path
```bash
make ingest-sbom REPO=<slug> LOCKFILE=/path/to/specific/uv.lock
```
Multiple lockfiles can be passed by calling the script directly with repeated
`--lockfile` flags:
```bash
cd ~/the-custodian/state-hub
.venv/bin/python scripts/ingest_sbom.py \
--repo <slug> \
--lockfile /path/to/uv.lock \
--lockfile /path/to/package-lock.json
```
### 3.4 Dry run (inspect without submitting)
```bash
make ingest-sbom REPO=<slug> SCAN=1 REPO_PATH=/path/to/repo
# append: add --dry-run to the command, or run the script directly:
.venv/bin/python scripts/ingest_sbom.py --repo <slug> --scan --repo-path /path/to/repo --dry-run
```
---
## 4. Snapshot Semantics
Each `POST /sbom/ingest/` call **replaces** the entire previous snapshot for
that repo. This means:
- There is always exactly one snapshot per repo (the most recent ingest)
- Re-running ingest after a dependency update is idempotent — it simply
refreshes the data
- Historical snapshots are **not** retained (v0.1 scope; versioned history is
a planned extension)
The `last_sbom_at` timestamp on the managed_repo record indicates when the
last ingest ran.
---
## 5. Direct vs Transitive Dependencies
| Source | `is_direct` | Notes |
|--------|-------------|-------|
| `package-lock.json` | Accurate — npm `indirect` flag used | Dev packages also detected via `dev` flag |
| `yarn.lock` | `false` for all (yarn.lock doesn't distinguish) | Treat output as transitive |
| `uv.lock` | `false` for all (uv.lock doesn't distinguish direct from transitive) | |
| `requirements.txt` | `true` for all (every line is a direct dep) | |
| `Cargo.lock` | `false` for all (workspace member packages not yet distinguished) | |
**Governance implication:** `is_direct=true` entries receive stricter licence
scrutiny. Copyleft risk is reported specifically for `is_direct=true AND is_dev=false`.
---
## 6. Licence Governance
### 6.1 Copyleft detection
The following SPDX identifier substrings trigger a copyleft flag:
`GPL`, `AGPL`, `LGPL`, `EUPL`, `CDDL`, `MPL`
A copyleft flag on a **direct prod dependency** (`is_direct=true`, `is_dev=false`)
increments the `licence_risk_count` in the State Hub summary and triggers a
warning on the SBOM dashboard.
### 6.2 Dual-licensed packages
Packages with SPDX expressions like `(MIT OR GPL-3.0-or-later)` are flagged
**conservatively** — the presence of a copyleft identifier in the SPDX string
is sufficient to trigger the flag, regardless of the OR clause.
**Action required:** review flagged packages. If the non-copyleft licence is
used in practice, document this decision in a `contrib/` BR or FR artifact and
note it in the repo's CLAUDE.md.
### 6.3 Unknown licences
Packages with `license_spdx = null` are those whose lockfile did not contain
licence metadata (`uv.lock`, `yarn.lock`, `Cargo.lock` do not embed licence
info). These are listed in the dashboard but do not trigger risk flags.
To resolve unknowns, consult the package's registry page (PyPI, npm, crates.io)
and either accept the unknown status or enhance the ingest script.
### 6.4 Escalation
Per the Custodian Constitution, a copyleft direct prod dep **must be reviewed**
before the next production deployment. Record the decision via:
```
register_contribution(type="br", title="Licence review: <package>", ...)
```
or directly in `contrib/bug-reports/` using the BR template.
---
## 7. Keeping Data Current
### 7.1 When to re-run ingest
Re-run `make ingest-sbom` after any of the following:
- `uv add` / `uv remove` (Python)
- `npm install` / `npm update` (Node)
- `cargo add` / `cargo update` (Rust)
- Any lockfile regeneration
### 7.2 Recommended workflow integration
Add to your repo's CLAUDE.md (or developer runbook):
> After updating dependencies, run:
> ```bash
> cd ~/the-custodian/state-hub
> make ingest-sbom REPO=<your-slug> SCAN=1 REPO_PATH=<your-repo-path>
> ```
### 7.3 Verification
After ingest:
```bash
curl -s http://127.0.0.1:8000/sbom/<your-slug>/ | python3 -m json.tool | head -30
curl -s http://127.0.0.1:8000/sbom/report/licences/ | python3 -m json.tool
```
Or visit the State Hub dashboard → SBOM → By Repo to see the updated snapshot.
---
## 8. Multi-Repo Domains
When a domain has multiple repos (e.g., `api` + `frontend` + `infra`), each
repo should be registered separately and ingested separately:
```bash
make ingest-sbom REPO=myapp-api SCAN=1 REPO_PATH=/home/worsch/myapp
make ingest-sbom REPO=myapp-frontend SCAN=1 REPO_PATH=/home/worsch/myapp-frontend
```
The SBOM dashboard aggregates across all repos within a domain in the
**By Domain** table.
---
## 9. Current Registered Repos & Status
| Repo | Domain | Ecosystems | Last Ingest |
|------|--------|------------|-------------|
| `the-custodian` | custodian | python, node | 2026-03-01 |
*(This table is informational. The live view is at the SBOM dashboard.)*
---
## 10. Planned Enhancements
- **Go / Java parsers** — add to `ingest_sbom.py`
- **Versioned snapshots** — retain history per repo for trend analysis
- **Licence override file** — allow repos to document known-acceptable
copyleft exceptions (`.sbom-overrides.yaml`)
- **CI integration** — GitHub Actions step to run ingest on lockfile change
- **Direct-dep detection for uv.lock** — parse `pyproject.toml` `[project.dependencies]`
to mark direct deps accurately

View File

@@ -73,10 +73,16 @@ list-repos:
@test -n "$(DOMAIN)" || (echo "ERROR: DOMAIN is required."; exit 1)
curl -sf "http://127.0.0.1:8000/repos/?domain=$(DOMAIN)" | python3 -m json.tool
## Ingest a repo's lockfile into the SBOM store: make ingest-sbom REPO=the-custodian [LOCKFILE=uv.lock]
## Ingest SBOM data for a repo.
## Single lockfile (explicit): make ingest-sbom REPO=the-custodian LOCKFILE=/path/to/uv.lock
## Scan all lockfiles in tree: make ingest-sbom REPO=the-custodian SCAN=1 REPO_PATH=/home/worsch/the-custodian
## Auto-detect at repo root: make ingest-sbom REPO=the-custodian REPO_PATH=/home/worsch/the-custodian
ingest-sbom:
@test -n "$(REPO)" || (echo "ERROR: REPO is required. Usage: make ingest-sbom REPO=<slug> [LOCKFILE=<path>]"; exit 1)
uv run python scripts/ingest_sbom.py --repo "$(REPO)" $(if $(LOCKFILE),--lockfile "$(LOCKFILE)",)
@test -n "$(REPO)" || (echo "ERROR: REPO is required."; exit 1)
uv run python scripts/ingest_sbom.py --repo "$(REPO)" \
$(if $(LOCKFILE),--lockfile "$(LOCKFILE)") \
$(if $(SCAN),--scan) \
$(if $(REPO_PATH),--repo-path "$(REPO_PATH)")
## Check a repo for ADR-001 compliance: make validate-adr REPO=/path/to/repo [DOMAIN=custodian]
validate-adr:

View File

@@ -8,12 +8,13 @@ const API = "http://127.0.0.1:8000";
```js
// Fetch SBOM data on load
let _entries = [], _report = {groups: [], copyleft_direct_count: 0}, _repos = [];
let _entries = [], _report = {groups: [], copyleft_direct_count: 0}, _repos = [], _domains = [];
try {
[_entries, _report, _repos] = await Promise.all([
[_entries, _report, _repos, _domains] = await Promise.all([
fetch(`${API}/sbom/`).then(r => r.ok ? r.json() : []),
fetch(`${API}/sbom/report/licences/`).then(r => r.ok ? r.json() : {groups:[], copyleft_direct_count: 0}),
fetch(`${API}/repos/`).then(r => r.ok ? r.json() : []),
fetch(`${API}/domains/`).then(r => r.ok ? r.json() : []),
]);
} catch {}
```
@@ -22,13 +23,24 @@ try {
const entries = _entries ?? [];
const report = _report ?? {groups: [], copyleft_direct_count: 0};
const repos = _repos ?? [];
const domains = _domains ?? [];
const groups = report.groups ?? [];
const riskCount = report.copyleft_direct_count ?? 0;
// Domain + repo lookups
const domainById = Object.fromEntries(domains.map(d => [d.id, d]));
const repoById = Object.fromEntries(repos.map(r => [r.id, r]));
const repoDomain = Object.fromEntries(repos.map(r => [r.id, domainById[r.domain_id]?.slug ?? "—"]));
const domainSlugs = [...new Set(repos.map(r => repoDomain[r.id]).filter(s => s !== "—"))].sort();
// Copyleft detector (mirrors server-side logic)
const COPYLEFT_KW = ["GPL", "AGPL", "LGPL", "EUPL", "CDDL", "MPL"];
const isCopyleft = spdx => spdx && COPYLEFT_KW.some(k => spdx.toUpperCase().includes(k));
```
# SBOM
## Licence Risk
## Overview
```js
const riskBadge = riskCount === 0
@@ -43,6 +55,10 @@ display(html`<div class="kpi-row">
<h3>Repos Scanned</h3>
<p class="big-num">${new Set(entries.map(e => e.repo_id)).size}</p>
</div>
<div class="card">
<h3>Domains Covered</h3>
<p class="big-num">${domainSlugs.length || new Set(Object.values(repoDomain).filter(s => s !== "—")).size}</p>
</div>
<div class="card ${riskCount > 0 ? 'card-warn' : ''}">
<h3>Licence Risk</h3>
<p class="big-num">${riskCount}</p>
@@ -55,13 +71,50 @@ display(html`<div class="kpi-row">
</div>`);
```
## By Domain
```js
if (entries.length === 0) {
display(html`<p style="color:gray">No SBOM data ingested yet. Run <code>make ingest-sbom REPO=&lt;slug&gt; SCAN=1 REPO_PATH=&lt;path&gt;</code>.</p>`);
} else {
// Group entries by domain
const byDomain = {};
for (const e of entries) {
const slug = repoDomain[e.repo_id] ?? "—";
(byDomain[slug] = byDomain[slug] ?? []).push(e);
}
const domainTableRows = Object.entries(byDomain).map(([slug, es]) => {
const dom = domains.find(d => d.slug === slug);
const repoCount = new Set(es.map(e => e.repo_id)).size;
const directProd = es.filter(e => e.is_direct && !e.is_dev);
const copyleftRisk = directProd.filter(e => isCopyleft(e.license_spdx)).length;
const ecosystems = [...new Set(es.map(e => e.ecosystem))].sort().join(", ");
return {
domain: dom?.name ?? slug,
repos: repoCount,
packages: es.length,
direct: directProd.length,
copyleft: copyleftRisk,
ecosystems,
};
}).sort((a, b) => a.domain.localeCompare(b.domain));
display(Inputs.table(domainTableRows, {
columns: ["domain", "repos", "packages", "direct", "copyleft", "ecosystems"],
header: {domain: "Domain", repos: "Repos", packages: "All Pkgs", direct: "Direct Prod", copyleft: "Copyleft ⚠", ecosystems: "Ecosystems"},
maxWidth: 900,
}));
}
```
## Licence Distribution
```js
import * as Plot from "npm:@observablehq/plot";
if (groups.length === 0) {
display(html`<p style="color:gray">No SBOM data ingested yet. Run <code>make ingest-sbom REPO=&lt;slug&gt;</code>.</p>`);
display(html`<p style="color:gray">No SBOM data ingested yet.</p>`);
} else {
const plotData = groups.slice(0, 15).map(g => ({
licence: g.license_spdx ?? "(unknown)",
@@ -98,6 +151,57 @@ if (copyleftGroups.length === 0) {
<span class="copyleft-repos">${g.repos.join(", ")}</span>
</div>
`)}
</div>
<p style="font-size:0.8rem;color:gray">Note: dual-licensed packages (e.g. "MIT OR GPL-3.0") are flagged conservatively. Review if the non-copyleft variant is used.</p>`);
}
```
## By Repo
```js
// Group entries by repo, sorted by domain then repo name
const byRepo = {};
for (const e of entries) {
(byRepo[e.repo_id] = byRepo[e.repo_id] ?? []).push(e);
}
const repoSections = Object.entries(byRepo)
.map(([repoId, es]) => {
const repo = repoById[repoId];
const domSlug = repoDomain[repoId] ?? "—";
const dom = domains.find(d => d.slug === domSlug);
const directProd = es.filter(e => e.is_direct && !e.is_dev);
const copyleftRisk = directProd.filter(e => isCopyleft(e.license_spdx)).length;
const ecosystems = [...new Set(es.map(e => e.ecosystem))].sort();
return { repoId, repo, dom, domSlug, es, directProd, copyleftRisk, ecosystems };
})
.sort((a, b) => (a.domSlug + a.repo?.slug).localeCompare(b.domSlug + b.repo?.slug));
if (repoSections.length === 0) {
display(html`<p style="color:gray">No repo data.</p>`);
} else {
display(html`<div class="repo-list">
${repoSections.map(({repoId, repo, dom, domSlug, es, directProd, copyleftRisk, ecosystems}) => html`
<details class="repo-details">
<summary class="repo-summary">
<span class="repo-domain-tag">${dom?.name ?? domSlug}</span>
<span class="repo-name">${repo?.slug ?? repoId.slice(0,8)}</span>
<span class="repo-meta">${es.length} pkgs · ${ecosystems.join(" + ")} · ${directProd.length} direct</span>
${copyleftRisk > 0 ? html`<span class="repo-risk-badge">⚠ ${copyleftRisk} copyleft</span>` : ""}
</summary>
<div class="repo-pkg-table">
${Inputs.table(es.slice(0, 200).map(e => ({
Package: e.package_name,
Version: e.package_version ?? "—",
Ecosystem: e.ecosystem,
Licence: e.license_spdx ?? "—",
Direct: e.is_direct ? "✓" : "",
Dev: e.is_dev ? "✓" : "",
})), {maxWidth: 860})}
${es.length > 200 ? html`<p style="font-size:0.8rem;color:gray">Showing first 200 of ${es.length}</p>` : ""}
</div>
</details>
`)}
</div>`);
}
```
@@ -106,19 +210,19 @@ if (copyleftGroups.length === 0) {
```js
// Filters
const domainOpts = ["all", ...domainSlugs];
const domainFilter = Inputs.select(domainOpts, {label: "Domain", value: "all"});
const ecoFilter = Inputs.select(["all", "python", "node", "rust", "go", "java", "other"], {label: "Ecosystem", value: "all"});
const directOnly = Inputs.toggle({label: "Direct deps only", value: false});
const prodOnly = Inputs.toggle({label: "Prod deps only (no dev)", value: false});
display(html`<div style="display:flex;gap:1rem;flex-wrap:wrap;margin-bottom:1rem">
${ecoFilter}${directOnly}${prodOnly}
${domainFilter}${ecoFilter}${directOnly}${prodOnly}
</div>`);
```
```js
// Build repo_id → slug lookup
const repoById = Object.fromEntries(_repos.map(r => [r.id, r.slug]));
const filteredEntries = entries.filter(e =>
(domainFilter.value === "all" || repoDomain[e.repo_id] === domainFilter.value) &&
(ecoFilter.value === "all" || e.ecosystem === ecoFilter.value) &&
(!directOnly.value || e.is_direct) &&
(!prodOnly.value || !e.is_dev)
@@ -129,22 +233,37 @@ display(Inputs.table(filteredEntries.map(e => ({
Version: e.package_version ?? "—",
Ecosystem: e.ecosystem,
Licence: e.license_spdx ?? "—",
Repo: repoById[e.repo_id] ?? e.repo_id?.slice(0, 8) ?? "—",
Domain: repoDomain[e.repo_id] ?? "—",
Repo: repoById[e.repo_id]?.slug ?? e.repo_id?.slice(0, 8) ?? "—",
Direct: e.is_direct ? "✓" : "",
Dev: e.is_dev ? "✓" : "",
})), {maxWidth: 900}));
})), {maxWidth: 960}));
```
<style>
.card { background: var(--theme-background-alt); border-radius: 8px; padding: 1rem; }
.card-warn { border: 2px solid #e53935; }
.kpi-row { display: grid; grid-template-columns: repeat(auto-fill, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 1.5rem; }
.kpi-row { display: grid; grid-template-columns: repeat(auto-fill, minmax(160px, 1fr)); gap: 1rem; margin-bottom: 1.5rem; }
.big-num { font-size: 2.2rem; font-weight: bold; margin: 0.25rem 0; }
.risk-ok { color: #2e7d32; font-weight: 600; }
.risk-warn { color: #e53935; font-weight: 600; }
.copyleft-section { display: flex; flex-direction: column; gap: 0.5rem; margin-bottom: 1rem; }
.copyleft-card { background: #fde8e8; border-left: 4px solid #e53935; border-radius: 6px; padding: 0.5rem 0.9rem; display: flex; gap: 1rem; align-items: center; flex-wrap: wrap; }
.copyleft-badge { font-weight: 700; font-size: 0.85rem; color: #c62828; }
.copyleft-count { font-size: 0.82rem; color: #555; }
.copyleft-repos { font-size: 0.8rem; color: gray; font-family: monospace; }
.repo-list { display: flex; flex-direction: column; gap: 0.5rem; margin-bottom: 1.5rem; }
.repo-details { background: var(--theme-background-alt); border-radius: 8px; }
.repo-details[open] { border: 1px solid var(--theme-foreground-faint); }
.repo-summary { cursor: pointer; padding: 0.65rem 0.9rem; display: flex; gap: 0.6rem; align-items: center; flex-wrap: wrap; list-style: none; }
.repo-summary::-webkit-details-marker { display: none; }
.repo-summary::before { content: "▶"; font-size: 0.7rem; color: gray; flex-shrink: 0; }
details[open] > .repo-summary::before { content: "▼"; }
.repo-domain-tag { font-size: 0.7rem; font-weight: 600; background: var(--theme-background); border: 1px solid var(--theme-foreground-faint); border-radius: 10px; padding: 0.1rem 0.45rem; color: steelblue; }
.repo-name { font-weight: 600; font-size: 0.9rem; font-family: monospace; }
.repo-meta { font-size: 0.78rem; color: gray; }
.repo-risk-badge { font-size: 0.75rem; font-weight: 600; color: #c62828; background: #fde8e8; border-radius: 4px; padding: 0.1rem 0.4rem; }
.repo-pkg-table { padding: 0.5rem 0.75rem 0.75rem; }
</style>

View File

@@ -188,9 +188,19 @@ _LOCKFILE_PARSERS = {
"Cargo.lock": _parse_cargo_lock,
}
# Directories that never contain project-level lockfiles
_SKIP_DIRS = {
".git", ".hg", ".svn",
".venv", "venv", ".env",
"node_modules",
"__pycache__", ".mypy_cache", ".pytest_cache", ".ruff_cache",
"dist", "build", ".build", "target",
".tox", ".nox",
}
def detect_lockfile(repo_path: Path) -> tuple[Path, str] | None:
"""Return (lockfile_path, ecosystem) for the first recognised lockfile found."""
"""Return (lockfile_path, filename) for the first recognised lockfile at repo root."""
for name in _LOCKFILE_PARSERS:
candidate = repo_path / name
if candidate.exists():
@@ -198,6 +208,17 @@ def detect_lockfile(repo_path: Path) -> tuple[Path, str] | None:
return None
def detect_lockfiles_recursive(repo_path: Path) -> list[Path]:
"""Walk repo_path and return all recognised lockfiles, skipping non-dep dirs."""
found: list[Path] = []
for dirpath, dirnames, filenames in os.walk(repo_path):
dirnames[:] = sorted(d for d in dirnames if d not in _SKIP_DIRS)
for name in _LOCKFILE_PARSERS:
if name in filenames:
found.append(Path(dirpath) / name)
return found
def parse_lockfile(lockfile_path: Path) -> list[dict]:
filename = lockfile_path.name
parser = _LOCKFILE_PARSERS.get(filename)
@@ -236,38 +257,60 @@ def post_ingest(api_base: str, repo_slug: str, entries: list[dict]) -> dict:
# ---------------------------------------------------------------------------
def main() -> None:
parser = argparse.ArgumentParser(description="Ingest a lockfile into the State Hub SBOM store.")
parser = argparse.ArgumentParser(description="Ingest a repo's lockfiles into the State Hub SBOM store.")
parser.add_argument("--repo", required=True, help="Managed-repo slug (e.g. 'the-custodian')")
parser.add_argument("--lockfile", help="Path to lockfile (auto-detected if omitted)")
parser.add_argument("--repo-path", default=".", help="Repo root for auto-detection (default: cwd)")
parser.add_argument("--lockfile", action="append", dest="lockfiles",
metavar="PATH", help="Path to a specific lockfile (repeatable)")
parser.add_argument("--repo-path", default=".", help="Repo root for auto-detection/scan (default: cwd)")
parser.add_argument("--scan", action="store_true",
help="Recursively find ALL lockfiles under --repo-path (handles multi-ecosystem repos)")
parser.add_argument("--api-base", default=API_BASE, help="State Hub API base URL")
parser.add_argument("--dry-run", action="store_true", help="Parse only — do not submit")
args = parser.parse_args()
if args.lockfile:
lockfile_path = Path(args.lockfile).resolve()
repo_root = Path(args.repo_path).resolve()
lockfile_paths: list[Path] = []
if args.lockfiles:
lockfile_paths = [Path(lf).resolve() for lf in args.lockfiles]
elif args.scan:
lockfile_paths = detect_lockfiles_recursive(repo_root)
if not lockfile_paths:
print(f"No lockfiles found under '{repo_root}'.", file=sys.stderr)
sys.exit(1)
print(f"Scan found {len(lockfile_paths)} lockfile(s):")
for lf in lockfile_paths:
print(f" {lf.relative_to(repo_root) if lf.is_relative_to(repo_root) else lf}")
else:
found = detect_lockfile(Path(args.repo_path).resolve())
found = detect_lockfile(repo_root)
if not found:
print(
f"No recognised lockfile found in '{args.repo_path}'. "
"Supported: " + ", ".join(_LOCKFILE_PARSERS),
f"No recognised lockfile found in '{repo_root}'. "
f"Supported: {', '.join(_LOCKFILE_PARSERS)}. "
"Use --scan to search subdirectories.",
file=sys.stderr,
)
sys.exit(1)
lockfile_path, _ = found
print(f"Auto-detected: {lockfile_path}")
lockfile_paths = [lockfile_path]
entries = parse_lockfile(lockfile_path)
print(f"Parsed {len(entries)} packages from {lockfile_path.name}")
all_entries: list[dict] = []
for lf in lockfile_paths:
parsed = parse_lockfile(lf)
rel = lf.relative_to(repo_root) if lf.is_relative_to(repo_root) else lf
print(f" {rel}: {len(parsed)} packages")
all_entries.extend(parsed)
print(f"Total: {len(all_entries)} packages across {len(lockfile_paths)} lockfile(s)")
if args.dry_run:
print(json.dumps(entries[:5], indent=2))
if len(entries) > 5:
print(f" … and {len(entries) - 5} more")
print(json.dumps(all_entries[:5], indent=2))
if len(all_entries) > 5:
print(f" … and {len(all_entries) - 5} more")
return
result = post_ingest(args.api_base, args.repo, entries)
result = post_ingest(args.api_base, args.repo, all_entries)
print(f"Ingested {result.get('ingested', '?')} entries for repo '{args.repo}'")
print(f"Snapshot at: {result.get('snapshot_at', '?')}")