From 2f3334960905ccec18a52f2ff80ceaca74c41fb9 Mon Sep 17 00:00:00 2001 From: tegwick Date: Fri, 26 Jun 2026 19:54:35 +0200 Subject: [PATCH] docs: add ArchitectureBlueprint for establishing config-atlas Practical, efficiency-first blueprint: bound config-atlas to the map and evidence layers, define the configuration-surface entry schema (modeled on the existing capability-entry format), a read-only connector discovery pipeline, CI validation, State Hub reuse for the config graph, and a phased roadmap (Canon -> seed -> connectors -> explain) with a build-vs-reuse table. Co-Authored-By: Claude Opus 4.8 --- ArchitectureBlueprint.md | 225 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 225 insertions(+) create mode 100644 ArchitectureBlueprint.md diff --git a/ArchitectureBlueprint.md b/ArchitectureBlueprint.md new file mode 100644 index 0000000..3d6840b --- /dev/null +++ b/ArchitectureBlueprint.md @@ -0,0 +1,225 @@ +# Architecture Blueprint — config-atlas + +> How to establish config-atlas as an efficient, practical system. +> Companion to [`INTENT.md`](INTENT.md) (purpose/boundary), +> [`SCOPE.md`](SCOPE.md), and [`research/configuration-control-plane.md`](research/configuration-control-plane.md) +> (the thesis). Drafted 2026-06-26. + +--- + +## 1. Design constraints (what "efficient and practical" means here) + +These constraints come straight from `INTENT.md` and `SCOPE.md` and bound every +decision below: + +1. **Map and evidence layers only.** config-atlas owns *Registry* and *Evidence*. + It does **not** build a runtime *Resolver*, *Delivery*, or *Control* engine + (research §5). Anything that resolves or pushes live values is out of scope. +2. **Source-linked, never a second source of truth.** Entries point at canonical + files/APIs; the atlas stores *metadata and references*, not live values, and + **never** secret values. +3. **Read-first before write-first.** Discover, classify, attribute ownership, and + explain — before any controlled-change ambition (research §6 wedge). +4. **Markdown + YAML, agent-legible.** No application runtime. Entries must be + diffable, reviewable in a PR, and parseable without bespoke tooling. +5. **Reuse the ecosystem.** Lean on `reuse-surface` (federation/validation), the + State Hub (workplans, relationships, evidence events), and existing engines + (CUE/JSON Schema, OPA) rather than reimplementing them. + +The cheapest path to a *practical* system is therefore: **a well-specified entry +schema + validation in CI + a thin discovery pipeline + source links** — not a +service. Efficiency comes from buying, not building, every layer that already +exists elsewhere. + +--- + +## 2. System overview + +config-atlas realizes the left half of the control-plane pipeline. The right half +(resolve → deliver → control) is explicitly delegated to downstream systems. + +``` + OWNED BY config-atlas DELEGATED / EXTERNAL + ┌─────────────────────────────────┐ ┌──────────────────────────────┐ + │ Canon vocabulary + schema │ │ Resolver effective value │ + │ Registry surface entries │ ---> │ Policy OPA / Kyverno / CUE │ + │ Evidence relationships, audit │ │ Delivery env / ConfigMap /… │ + └─────────────────────────────────┘ │ Control feature-flag plane │ + ▲ └──────────────────────────────┘ + │ ingest (read-only connectors) + ┌─────────────────────────────────────────────────────────────────────────┐ + │ Sources: repos · K8s/Helm · Terraform state · feature-flag platforms · │ + │ secret-manager refs · cloud param stores · SaaS tenant settings │ + └─────────────────────────────────────────────────────────────────────────┘ +``` + +Five internal components, smallest-to-build first: + +| Component | What it is | Build cost | Status | +|-----------|-----------|-----------|--------| +| **Canon** | Schema + vocabulary for a config-surface entry; scope/precedence/merge model | low | partial (research has the model; needs a JSON Schema) | +| **Registry** | The corpus of surface entries + indexes | low | scaffold exists (`registry/`) | +| **Connectors** | Read-only ingest scripts that emit candidate entries | medium | not started | +| **Evidence graph** | Relationships (consumes/overrides/depends-on/secret-ref) + change audit | medium | partly free via State Hub | +| **Explain view** | Render an effective-config *path* from layered source links | medium-high | future | + +--- + +## 3. The core artifact: the configuration-surface entry + +Everything else exists to produce and validate these. Model it on the existing +capability-entry schema (`registry/capabilities/…`) so `reuse-surface` validation +and State Hub federation work unchanged. Proposed shape: + +```yaml +--- +id: surface... # stable, unique +name: Mail delivery batch sizing +kind: app-config | deploy-config | secret-ref | feature-flag | + policy | tenant-config | infra-state | runtime-override +summary: Controls max batch size for outbound mail delivery. +owner: platform-delivery # team/agent, not a person +status: draft | active | deprecated + +scope: # which layers may set this (research §3.1) + allowed_layers: [company, environment, installation, tenant] + default_layer: company +mutability: hot-reloadable # build|deploy|startup|hot|per-request|emergency +security_class: operational # operational | sensitive | secret-ref | policy + +schema: # the contract, not the value + type: integer + default: 500 + minimum: 1 + maximum: 5000 + validator: schemas/mail-delivery.schema.json # JSON Schema or CUE ref + +sources: # source-linked, never inlined values + - repo: railiance-platform + path: config/mail/delivery.yaml + role: company-baseline + - repo: railiance-platform + path: environments/prod.yaml + role: environment-overlay + +relations: + consumed_by: [service.mail-gateway] + overrides: [] + depends_on_secret: [] # references only, never the secret + related_to: [surface.platform.mail.rate-limit] + +evidence: + last_seen: '2026-06-26' # from connector run + discovery_method: connector:git-grep | manual + change_log_ref: +--- + +# Mail delivery batch sizing + +Prose: what it means, why it exists, precedence notes, known gotchas. +``` + +Key efficiency choices: +- **`kind` is the primary classifier** — it drives the research §3 kind-separation + (secrets vs flags vs infra-state are never treated alike). +- **`scope.allowed_layers` encodes the layering contract** per key — this is the + durable value even before a resolver exists. +- **`sources[].role`** carries the layer each source contributes; this is what a + future Explain view consumes to render `config explain `. +- **No value fields.** The atlas records *where* and *which layer wins by rule*, + never the live value. + +--- + +## 4. Discovery (read-first connectors) + +Connectors are **stateless, read-only scripts** that scan a source and emit +*candidate* entries (YAML) for human/agent review via PR. They never write live +systems and never auto-merge. + +Minimum viable connector set — pick the 3–4 that prove cross-tool resolution: + +| Connector | Source | Emits | +|-----------|--------|-------| +| `git-config` | repo grep for known config files/keys | `app-config`, `deploy-config` candidates | +| `helm-values` | Helm `values*.yaml` + overlays | `deploy-config` with layer roles | +| `terraform-vars` | TF/OpenTofu variables + tfvars | `infra-state` candidates | +| `flag-platform` | feature-flag API inventory | `feature-flag` candidates + stale-flag signal | +| `secret-ref` | grep for vault/OpenBao/SOPS refs | `secret-ref` (reference only) | + +Pipeline: `connector → candidate YAML → PR → reuse-surface validate (CI) → merge`. +The human/agent in the loop is the practical substitute for a resolution engine in +the early phases — and is *cheaper and safer* than one. + +--- + +## 5. Validation, evidence, and the graph + +**Validation (CI, day one).** Every PR runs: +- `reuse-surface validate --root .` (entry well-formedness, index sync) +- `git diff --check` +- JSON Schema / CUE check of each entry's `schema` block against the Canon schema. + +This is the single highest-leverage, lowest-cost piece: it makes the registry +*trustworthy* without any service. + +**Evidence & graph — reuse the State Hub.** Do not build a graph database. The +relationships in §3 (`consumed_by`, `overrides`, `depends_on_secret`) plus State +Hub `progress`/`decision`/relationship records already give a config knowledge +graph (research §5) for free. config-atlas contributes the *config-typed edges*; +the hub stores and queries them. + +**Explain view (later).** Once entries carry `sources[].role`, a small renderer can +produce the `config explain` output from the research primer — statically, from +source links, without reading live values. This is the first capability that feels +like a "control plane" and should be the headline of Phase 3. + +--- + +## 6. Phased roadmap (efficient path to practical) + +Each phase ships something usable and maps to an `ATLAS-WP-` workplan. + +**Phase 0 — Canon (now, days).** +Write the surface-entry JSON Schema + the scope/precedence/merge model as a +machine-checkable doc. Replace the inherited `repo-template` capability artifact +(ATLAS-WP-0002). *Exit:* one real surface entry validates in CI. + +**Phase 1 — Seed registry by hand (1–2 weeks).** +Hand-author 10–20 entries for the highest-value Coulomb surfaces (start with +railiance-platform mail/rate-limit, secret-refs, key feature flags). Stand up CI +validation. *Exit:* a reviewer can answer "what configures X, who owns it, where" +from the repo alone. + +**Phase 2 — First connectors (2–4 weeks).** +Build `git-config` + one of `helm-values`/`flag-platform`. Candidate-PR workflow. +*Exit:* registry grows from automated discovery, not just hand authoring; stale/ +unowned surfaces are surfaced. + +**Phase 3 — Explain & graph (4+ weeks).** +Render `config explain` from `sources[].role`; push config-typed edges to the State +Hub. *Exit:* given a key, show its layer path, what overrides what, owner, and +consumers — the read-first control-plane MVP. + +**Deferred (out of current scope).** Live resolution, controlled change, approval +workflows, rollout/rollback orchestration — these belong to downstream systems +(`feature-control`, GitOps, the platform), not this repo. + +--- + +## 7. Build-vs-reuse summary + +| Need | Decision | Why | +|------|----------|-----| +| Entry validation / federation | **reuse** reuse-surface | already the federation contract | +| Workplans, relationships, audit | **reuse** State Hub | edges + evidence for free | +| Schema/merge validation | **reuse** JSON Schema, evaluate CUE | CUE's order-independent merge fits effective-config (research §3.3) | +| Policy checks | **reuse** OPA/Kyverno as backends | config-atlas is the context layer, not the engine | +| Secret storage | **never** — reference only | OpenBao owns values | +| Discovery connectors | **build** (thin, read-only) | the genuinely novel, repo-specific piece | +| Effective-config resolver / delivery | **don't build** | out of scope; delegated downstream | + +The whole design optimizes for one thing: **the smallest amount of original +software that turns scattered configuration into a discoverable, explainable, +source-linked map** — and borrows everything else. +