docs: add ArchitectureBlueprint for establishing config-atlas

Practical, efficiency-first blueprint: bound config-atlas to the map and evidence layers, define the configuration-surface entry schema (modeled on the existing capability-entry format), a read-only connector discovery pipeline, CI validation, State Hub reuse for the config graph, and a phased roadmap (Canon -> seed -> connectors -> explain) with a build-vs-reuse table. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 19:54:35 +02:00
parent 9b89fc4026
commit 2f33349609
1 changed files with 225 additions and 0 deletions
--- a/ArchitectureBlueprint.md
+++ b/ArchitectureBlueprint.md
@@ -0,0 +1,225 @@
+# Architecture Blueprint — config-atlas
+
+> How to establish config-atlas as an efficient, practical system.
+> Companion to [`INTENT.md`](INTENT.md) (purpose/boundary),
+> [`SCOPE.md`](SCOPE.md), and [`research/configuration-control-plane.md`](research/configuration-control-plane.md)
+> (the thesis). Drafted 2026-06-26.
+
+---
+
+## 1. Design constraints (what "efficient and practical" means here)
+
+These constraints come straight from `INTENT.md` and `SCOPE.md` and bound every
+decision below:
+
+1. **Map and evidence layers only.** config-atlas owns *Registry* and *Evidence*.
+   It does **not** build a runtime *Resolver*, *Delivery*, or *Control* engine
+   (research §5). Anything that resolves or pushes live values is out of scope.
+2. **Source-linked, never a second source of truth.** Entries point at canonical
+   files/APIs; the atlas stores *metadata and references*, not live values, and
+   **never** secret values.
+3. **Read-first before write-first.** Discover, classify, attribute ownership, and
+   explain — before any controlled-change ambition (research §6 wedge).
+4. **Markdown + YAML, agent-legible.** No application runtime. Entries must be
+   diffable, reviewable in a PR, and parseable without bespoke tooling.
+5. **Reuse the ecosystem.** Lean on `reuse-surface` (federation/validation), the
+   State Hub (workplans, relationships, evidence events), and existing engines
+   (CUE/JSON Schema, OPA) rather than reimplementing them.
+
+The cheapest path to a *practical* system is therefore: **a well-specified entry
+schema + validation in CI + a thin discovery pipeline + source links** — not a
+service. Efficiency comes from buying, not building, every layer that already
+exists elsewhere.
+
+---
+
+## 2. System overview
+
+config-atlas realizes the left half of the control-plane pipeline. The right half
+(resolve → deliver → control) is explicitly delegated to downstream systems.
+
+```
+        OWNED BY config-atlas                        DELEGATED / EXTERNAL
+   ┌─────────────────────────────────┐        ┌──────────────────────────────┐
+   │  Canon      vocabulary + schema  │        │  Resolver  effective value    │
+   │  Registry   surface entries      │  --->  │  Policy    OPA / Kyverno / CUE │
+   │  Evidence   relationships, audit │        │  Delivery  env / ConfigMap /…  │
+   └─────────────────────────────────┘        │  Control   feature-flag plane  │
+              ▲                                 └──────────────────────────────┘
+              │ ingest (read-only connectors)
+   ┌─────────────────────────────────────────────────────────────────────────┐
+   │  Sources: repos · K8s/Helm · Terraform state · feature-flag platforms ·   │
+   │           secret-manager refs · cloud param stores · SaaS tenant settings │
+   └─────────────────────────────────────────────────────────────────────────┘
+```
+
+Five internal components, smallest-to-build first:
+
+| Component | What it is | Build cost | Status |
+|-----------|-----------|-----------|--------|
+| **Canon** | Schema + vocabulary for a config-surface entry; scope/precedence/merge model | low | partial (research has the model; needs a JSON Schema) |
+| **Registry** | The corpus of surface entries + indexes | low | scaffold exists (`registry/`) |
+| **Connectors** | Read-only ingest scripts that emit candidate entries | medium | not started |
+| **Evidence graph** | Relationships (consumes/overrides/depends-on/secret-ref) + change audit | medium | partly free via State Hub |
+| **Explain view** | Render an effective-config *path* from layered source links | medium-high | future |
+
+---
+
+## 3. The core artifact: the configuration-surface entry
+
+Everything else exists to produce and validate these. Model it on the existing
+capability-entry schema (`registry/capabilities/…`) so `reuse-surface` validation
+and State Hub federation work unchanged. Proposed shape:
+
+```yaml
+---
+id: surface.<domain>.<system>.<name>        # stable, unique
+name: Mail delivery batch sizing
+kind: app-config | deploy-config | secret-ref | feature-flag |
+      policy | tenant-config | infra-state | runtime-override
+summary: Controls max batch size for outbound mail delivery.
+owner: platform-delivery                     # team/agent, not a person
+status: draft | active | deprecated
+
+scope:                                        # which layers may set this (research §3.1)
+  allowed_layers: [company, environment, installation, tenant]
+  default_layer: company
+mutability: hot-reloadable                    # build|deploy|startup|hot|per-request|emergency
+security_class: operational                   # operational | sensitive | secret-ref | policy
+
+schema:                                       # the contract, not the value
+  type: integer
+  default: 500
+  minimum: 1
+  maximum: 5000
+  validator: schemas/mail-delivery.schema.json   # JSON Schema or CUE ref
+
+sources:                                       # source-linked, never inlined values
+  - repo: railiance-platform
+    path: config/mail/delivery.yaml
+    role: company-baseline
+  - repo: railiance-platform
+    path: environments/prod.yaml
+    role: environment-overlay
+
+relations:
+  consumed_by: [service.mail-gateway]
+  overrides: []
+  depends_on_secret: []                        # references only, never the secret
+  related_to: [surface.platform.mail.rate-limit]
+
+evidence:
+  last_seen: '2026-06-26'                      # from connector run
+  discovery_method: connector:git-grep | manual
+  change_log_ref: <state-hub progress event / PR url>
+---
+
+# Mail delivery batch sizing
+
+Prose: what it means, why it exists, precedence notes, known gotchas.
+```
+
+Key efficiency choices:
+- **`kind` is the primary classifier** — it drives the research §3 kind-separation
+  (secrets vs flags vs infra-state are never treated alike).
+- **`scope.allowed_layers` encodes the layering contract** per key — this is the
+  durable value even before a resolver exists.
+- **`sources[].role`** carries the layer each source contributes; this is what a
+  future Explain view consumes to render `config explain <key>`.
+- **No value fields.** The atlas records *where* and *which layer wins by rule*,
+  never the live value.
+
+---
+
+## 4. Discovery (read-first connectors)
+
+Connectors are **stateless, read-only scripts** that scan a source and emit
+*candidate* entries (YAML) for human/agent review via PR. They never write live
+systems and never auto-merge.
+
+Minimum viable connector set — pick the 3–4 that prove cross-tool resolution:
+
+| Connector | Source | Emits |
+|-----------|--------|-------|
+| `git-config` | repo grep for known config files/keys | `app-config`, `deploy-config` candidates |
+| `helm-values` | Helm `values*.yaml` + overlays | `deploy-config` with layer roles |
+| `terraform-vars` | TF/OpenTofu variables + tfvars | `infra-state` candidates |
+| `flag-platform` | feature-flag API inventory | `feature-flag` candidates + stale-flag signal |
+| `secret-ref` | grep for vault/OpenBao/SOPS refs | `secret-ref` (reference only) |
+
+Pipeline: `connector → candidate YAML → PR → reuse-surface validate (CI) → merge`.
+The human/agent in the loop is the practical substitute for a resolution engine in
+the early phases — and is *cheaper and safer* than one.
+
+---
+
+## 5. Validation, evidence, and the graph
+
+**Validation (CI, day one).** Every PR runs:
+- `reuse-surface validate --root .` (entry well-formedness, index sync)
+- `git diff --check`
+- JSON Schema / CUE check of each entry's `schema` block against the Canon schema.
+
+This is the single highest-leverage, lowest-cost piece: it makes the registry
+*trustworthy* without any service.
+
+**Evidence & graph — reuse the State Hub.** Do not build a graph database. The
+relationships in §3 (`consumed_by`, `overrides`, `depends_on_secret`) plus State
+Hub `progress`/`decision`/relationship records already give a config knowledge
+graph (research §5) for free. config-atlas contributes the *config-typed edges*;
+the hub stores and queries them.
+
+**Explain view (later).** Once entries carry `sources[].role`, a small renderer can
+produce the `config explain` output from the research primer — statically, from
+source links, without reading live values. This is the first capability that feels
+like a "control plane" and should be the headline of Phase 3.
+
+---
+
+## 6. Phased roadmap (efficient path to practical)
+
+Each phase ships something usable and maps to an `ATLAS-WP-` workplan.
+
+**Phase 0 — Canon (now, days).**
+Write the surface-entry JSON Schema + the scope/precedence/merge model as a
+machine-checkable doc. Replace the inherited `repo-template` capability artifact
+(ATLAS-WP-0002). *Exit:* one real surface entry validates in CI.
+
+**Phase 1 — Seed registry by hand (1–2 weeks).**
+Hand-author 10–20 entries for the highest-value Coulomb surfaces (start with
+railiance-platform mail/rate-limit, secret-refs, key feature flags). Stand up CI
+validation. *Exit:* a reviewer can answer "what configures X, who owns it, where"
+from the repo alone.
+
+**Phase 2 — First connectors (2–4 weeks).**
+Build `git-config` + one of `helm-values`/`flag-platform`. Candidate-PR workflow.
+*Exit:* registry grows from automated discovery, not just hand authoring; stale/
+unowned surfaces are surfaced.
+
+**Phase 3 — Explain & graph (4+ weeks).**
+Render `config explain` from `sources[].role`; push config-typed edges to the State
+Hub. *Exit:* given a key, show its layer path, what overrides what, owner, and
+consumers — the read-first control-plane MVP.
+
+**Deferred (out of current scope).** Live resolution, controlled change, approval
+workflows, rollout/rollback orchestration — these belong to downstream systems
+(`feature-control`, GitOps, the platform), not this repo.
+
+---
+
+## 7. Build-vs-reuse summary
+
+| Need | Decision | Why |
+|------|----------|-----|
+| Entry validation / federation | **reuse** reuse-surface | already the federation contract |
+| Workplans, relationships, audit | **reuse** State Hub | edges + evidence for free |
+| Schema/merge validation | **reuse** JSON Schema, evaluate CUE | CUE's order-independent merge fits effective-config (research §3.3) |
+| Policy checks | **reuse** OPA/Kyverno as backends | config-atlas is the context layer, not the engine |
+| Secret storage | **never** — reference only | OpenBao owns values |
+| Discovery connectors | **build** (thin, read-only) | the genuinely novel, repo-specific piece |
+| Effective-config resolver / delivery | **don't build** | out of scope; delegated downstream |
+
+The whole design optimizes for one thing: **the smallest amount of original
+software that turns scattered configuration into a discoverable, explainable,
+source-linked map** — and borrows everything else.
+</content>