Files
reuse-surface/docs/RegistryFederation.md
tegwick 7c048a9f09
Some checks failed
ci / validate-registry (push) Has been cancelled
REUSE-WP-0014: T11 docs, roster stats, workplan finished
Link local-repo-roster in RegistryFederation; rollout milestone history;
update IntentScopeGapAnalysis (60 hub members). Add stats --roster
--federation-ready for workstation federation readiness.
2026-06-16 02:09:57 +02:00

297 lines
11 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Registry Federation
**Repository:** `reuse-surface`
**Audience:** Architects and agents composing multi-repo capability indexes
---
## Purpose
helix_forge capabilities may be registered in multiple repositories. Federation
composes capability indexes from configured sources into a single discovery
surface without silently merging duplicate IDs.
Sources may be **local filesystem paths** or **remote HTTP(S) URLs** (git raw
endpoints, published index artifacts, etc.). Remote indexes are cached under
`registry/federation/cache/` for offline reuse and faster compose.
## Manifest
`registry/federation/sources.yaml` lists index sources:
```yaml
version: 1
domain: helix_forge
collision_policy: warn
sources:
- repo: reuse-surface
index: registry/indexes/capabilities.yaml
enabled: true
required: true
- repo: sibling-repo
url: https://git.example.com/org/sibling-repo/raw/main/registry/indexes/capabilities.yaml
enabled: false
required: false
cache_ttl_seconds: 86400
auth_env: FEDERATION_TOKEN
auth_header: Authorization
```
Schema: `schemas/federation.schema.yaml`
### Source fields
| Field | Meaning |
|---|---|
| `repo` | Source repository slug |
| `index` | Local path to `capabilities.yaml` (repo-relative or `~/...`) |
| `url` | Remote HTTP(S) URL to a `capabilities.yaml` index |
| `enabled` | Include this source in compose |
| `required` | Fail compose if index missing or remote fetch fails with no cache |
| `domain` | Optional domain label |
| `cache_ttl_seconds` | Reuse cached remote index for this many seconds (`0` = always refetch) |
| `auth_env` | Environment variable holding token or full header value for `url` sources |
| `auth_header` | HTTP header for `auth_env` (default `Authorization`) |
Each source must specify **either** `index` **or** `url`, not both.
### Local workstation roster (WP-0014)
`registry/federation/local-repo-roster.yaml` tracks every git repo at
`~/<slug>/` on the custodian workstation:
| Field | Meaning |
|---|---|
| `status` | `established` or `pending` |
| `batch` | Rollout batch (`B01``B06`) or `null` for pre-rollout repos |
| `publish_check` | `pass`, `fail`, or `pending` (Gitea raw URL probe) |
| `hub_registered` | Registered on `https://reuse.coulomb.social` |
| `seed_capability_ids` | Entries copied from reuse-surface by `owner` |
**Scope:** one directory level under `$HOME` with a `.git` directory; excludes
dot-directories and non-git folders. Rollout milestone:
`history/2026-06-16-local-repo-registry-rollout-complete.md`.
```bash
reuse-surface stats --roster registry/federation/local-repo-roster.yaml --federation-ready
```
## Index publish contract (domain repos)
Before a sibling repo can register on the hosted hub, it must publish
`registry/indexes/capabilities.yaml` at a **stable raw HTTP(S) URL** that
returns **200** with valid YAML (not a redirect to login or HTML).
### Required index fields
| Field | Requirement |
|---|---|
| `version` | Integer manifest version |
| `domain` | Domain slug (e.g. `helix_forge`) |
| `capabilities[]` | Non-empty or explicitly empty list |
| Per row: `id`, `name`, `summary`, `vector`, `path` | Match entry front matter |
Entry bodies remain in the source repo; the index is the federation surface.
### Gitea raw URL shape
```text
https://gitea.coulomb.social/coulomb/<repo>/raw/<branch>/registry/indexes/capabilities.yaml
```
Use `main` (or the repo's default branch). Verify before registration:
```bash
curl -fsSI "<raw-url>" | head -n1 # expect HTTP/2 200 or HTTP/1.1 200
curl -fsS "<raw-url>" | head
```
### Auth expectations
- **Public indexes:** no auth; hub fetches without credentials.
- **Private indexes:** set `auth_env` on the hub registration (or local `url`
source) to an environment variable holding a Bearer token or full header value.
The hub stores `auth_env` / `auth_header` names only — never secret values.
### Local repo rollout tracking
Workstation-wide registry establishment is tracked in
`registry/federation/local-repo-roster.yaml` (workplan **REUSE-WP-0014**).
**Scope:** every git repository at `~/<slug>/` (one level under `$HOME`).
Update roster `status`, `hub_registered`, and `publish_check` after each repo
completes the establishment checklist below.
## Sibling onboarding (CLI)
```bash
cd ../state-hub
reuse-surface establish --scaffold --domain helix_forge
# optional: LLM_CONNECT_URL=... reuse-surface establish --discover --dry-run
reuse-surface validate --root .
git push origin main
reuse-surface establish --publish-check \
--raw-url https://gitea.coulomb.social/coulomb/state-hub/raw/main/registry/indexes/capabilities.yaml
```
### Registration checklist
1. Merge capability index to the default branch.
2. Confirm raw URL returns 200 YAML.
3. `reuse-surface hub register --repo <slug> --url <raw-url> --domain helix_forge`
4. `curl -fsS "$REUSE_SURFACE_URL/v1/federated" | jq '.capabilities | length'`
5. Optionally `reuse-surface hub sync --merge` to refresh local `sources.yaml`.
**Current blocks (2026-06-16):** `state-hub`, `feature-control`,
`identity-canon`, and `shard-wiki` raw URLs return **303** (not published).
See `history/2026-06-16-hub-registration-blocks.md` for probe evidence and owner
follow-ups.
## Compose workflow
```bash
reuse-surface federation compose
reuse-surface federation compose --refresh # bypass remote cache
```
Writes `registry/indexes/federated.yaml` with:
- Merged `capabilities` from all enabled sources
- `source_repo` and `source_index` on every row
- `source_url` when the row came from a remote source
- `collision_policy` and per-source counts
### Remote cache
Fetched URL indexes are stored at `registry/federation/cache/<repo>.yaml` with
metadata in `<repo>.meta.yaml`. The cache directory is gitignored; only
`.gitkeep` is tracked.
When a refetch fails, compose reuses a stale cache and emits a warning. Required
remote sources without cache fail compose with a clear error.
### Collision policy
`warn` (default): duplicate IDs across sources are kept but reported as
warnings. Consumers must inspect `source_repo` before choosing an entry.
### Owner migration and deduplication
After REUSE-WP-0014, many capabilities remain in both `reuse-surface` and their
`owner` repo index (seeded from reuse-surface during establishment). Federation
compose warns on these duplicates; it does **not** merge or prefer one source.
| Rule | Behavior |
|---|---|
| Canonical owner | The repo named in the entry `owner` field |
| Federation winner | Consumers pick the row where `source_repo` matches `owner` |
| reuse-surface copies | Planning stubs until owner index is published and fetchable |
| Removal | Separate commit per owner in `reuse-surface` — see `history/2026-06-16-federation-deduplication-plan.md` |
| Blocked owners | Keep reuse-surface row while owner `publish_check` fails (Gitea 404) |
Workstation rollout status: `registry/federation/local-repo-roster.yaml` (60 local
repos, publish pass/fail per slug).
Post-rollout compose (2026-06-16): **60** hub-synced URL sources, **37** federated
capability rows, **16** duplicate-ID warnings (mostly owner-migrated entries still
listed in reuse-surface).
## Hosted federation hub
Production hub: **`https://reuse.coulomb.social`** (Railiance `railiance01`,
companion deploy **RAILIANCE-WP-0007**).
The hub stores **repo registrations** (index URLs and metadata) and serves a
composed federated index at `GET /v1/federated`. It does not host capability
entry bodies — only coordinates which published indexes participate.
### Register and discover via CLI
```bash
export REUSE_SURFACE_URL=https://reuse.coulomb.social
export REUSE_SURFACE_TOKEN=<write-token> # cluster secret reuse-surface-env
reuse-surface hub status
reuse-surface hub list
reuse-surface hub register --repo reuse-surface \
--url https://gitea.coulomb.social/coulomb/reuse-surface/raw/main/registry/indexes/capabilities.yaml \
--domain helix_forge
curl -fsS "$REUSE_SURFACE_URL/v1/federated" | jq '.capabilities | length'
```
Read endpoints are public; writes require `REUSE_SURFACE_TOKEN` (Bearer). API
spec: `specs/FederationHubAPI.md`.
### Hub vs local `sources.yaml`
| Workflow | When to use |
|---|---|
| **Hub** | Shared membership across agents and repos; no per-machine `sources.yaml` edits |
| **Local compose** | Offline development, CI with checked-in sources, or hub unavailable |
Local `registry/federation/sources.yaml` remains valid for `reuse-surface
federation compose`. Use `reuse-surface hub sync` to materialize `sources.yaml`
from hub `GET /v1/repos` state.
### hub sync
```bash
export REUSE_SURFACE_URL=https://reuse.coulomb.social
reuse-surface hub sync --dry-run # preview manifest
reuse-surface hub sync --merge # hub URL sources + local index sources
reuse-surface hub sync # replace with hub-enabled registrations
```
| Flag | Behavior |
|---|---|
| `--merge` | Keep local `index` sources whose `repo` slug is not on the hub |
| `--replace` (default) | Write only hub-enabled registrations as `url` sources |
| `--output` | Override manifest path (default `registry/federation/sources.yaml`) |
| `--dry-run` | Print YAML without writing |
After sync, run `reuse-surface federation compose` to verify offline compose.
## Agent query pattern
1. **Hub path:** `GET /v1/federated` or `reuse-surface hub list` for registered
repos; fetch composed capabilities from the hub.
2. **Local path:** Run `reuse-surface federation compose` after manifest or
sibling index changes; read `registry/indexes/federated.yaml`.
3. Open `path` in the source repo for full entry detail when local; follow
`source_url` / `source_index` when remote.
4. Run `reuse-surface graph --check` before relying on relation navigation.
### Cross-repo discovery without local checkout
Register published raw index URLs on the hub, or enable a `url` source in
`sources.yaml` pointing at Gitea/GitHub/static hosts. Set `auth_env` when the
endpoint requires a token. Agents without sibling repo clones can discover
capabilities from the hub or from HTTP sources plus the local index.
## Relation graphs
```bash
reuse-surface graph
reuse-surface graph --check
reuse-surface graph --stdout
```
Generates `docs/graph/capability-graph.mmd` from local entry `relations`.
`--check` reports `depends_on` cycles and broken relation targets against the
federated ID set.
## CI integration
Gitea CI runs:
```bash
reuse-surface validate --relations --fail-on-warnings
reuse-surface federation compose
reuse-surface catalog
reuse-surface graph --check --fail-on-warnings
pytest -q
```
CI uses local sources only (remote examples are disabled). Warnings on missing
optional sibling indexes do not fail CI; schema validation errors do.