Files
config-atlas/ArchitectureBlueprint.md
tegwick 2f33349609 docs: add ArchitectureBlueprint for establishing config-atlas
Practical, efficiency-first blueprint: bound config-atlas to the map and
evidence layers, define the configuration-surface entry schema (modeled
on the existing capability-entry format), a read-only connector
discovery pipeline, CI validation, State Hub reuse for the config graph,
and a phased roadmap (Canon -> seed -> connectors -> explain) with a
build-vs-reuse table.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 19:54:35 +02:00

11 KiB
Raw Blame History

Architecture Blueprint — config-atlas

How to establish config-atlas as an efficient, practical system. Companion to INTENT.md (purpose/boundary), SCOPE.md, and research/configuration-control-plane.md (the thesis). Drafted 2026-06-26.


1. Design constraints (what "efficient and practical" means here)

These constraints come straight from INTENT.md and SCOPE.md and bound every decision below:

  1. Map and evidence layers only. config-atlas owns Registry and Evidence. It does not build a runtime Resolver, Delivery, or Control engine (research §5). Anything that resolves or pushes live values is out of scope.
  2. Source-linked, never a second source of truth. Entries point at canonical files/APIs; the atlas stores metadata and references, not live values, and never secret values.
  3. Read-first before write-first. Discover, classify, attribute ownership, and explain — before any controlled-change ambition (research §6 wedge).
  4. Markdown + YAML, agent-legible. No application runtime. Entries must be diffable, reviewable in a PR, and parseable without bespoke tooling.
  5. Reuse the ecosystem. Lean on reuse-surface (federation/validation), the State Hub (workplans, relationships, evidence events), and existing engines (CUE/JSON Schema, OPA) rather than reimplementing them.

The cheapest path to a practical system is therefore: a well-specified entry schema + validation in CI + a thin discovery pipeline + source links — not a service. Efficiency comes from buying, not building, every layer that already exists elsewhere.


2. System overview

config-atlas realizes the left half of the control-plane pipeline. The right half (resolve → deliver → control) is explicitly delegated to downstream systems.

        OWNED BY config-atlas                        DELEGATED / EXTERNAL
   ┌─────────────────────────────────┐        ┌──────────────────────────────┐
   │  Canon      vocabulary + schema  │        │  Resolver  effective value    │
   │  Registry   surface entries      │  --->  │  Policy    OPA / Kyverno / CUE │
   │  Evidence   relationships, audit │        │  Delivery  env / ConfigMap /…  │
   └─────────────────────────────────┘        │  Control   feature-flag plane  │
              ▲                                 └──────────────────────────────┘
              │ ingest (read-only connectors)
   ┌─────────────────────────────────────────────────────────────────────────┐
   │  Sources: repos · K8s/Helm · Terraform state · feature-flag platforms ·   │
   │           secret-manager refs · cloud param stores · SaaS tenant settings │
   └─────────────────────────────────────────────────────────────────────────┘

Five internal components, smallest-to-build first:

Component What it is Build cost Status
Canon Schema + vocabulary for a config-surface entry; scope/precedence/merge model low partial (research has the model; needs a JSON Schema)
Registry The corpus of surface entries + indexes low scaffold exists (registry/)
Connectors Read-only ingest scripts that emit candidate entries medium not started
Evidence graph Relationships (consumes/overrides/depends-on/secret-ref) + change audit medium partly free via State Hub
Explain view Render an effective-config path from layered source links medium-high future

3. The core artifact: the configuration-surface entry

Everything else exists to produce and validate these. Model it on the existing capability-entry schema (registry/capabilities/…) so reuse-surface validation and State Hub federation work unchanged. Proposed shape:

---
id: surface.<domain>.<system>.<name>        # stable, unique
name: Mail delivery batch sizing
kind: app-config | deploy-config | secret-ref | feature-flag |
      policy | tenant-config | infra-state | runtime-override
summary: Controls max batch size for outbound mail delivery.
owner: platform-delivery                     # team/agent, not a person
status: draft | active | deprecated

scope:                                        # which layers may set this (research §3.1)
  allowed_layers: [company, environment, installation, tenant]
  default_layer: company
mutability: hot-reloadable                    # build|deploy|startup|hot|per-request|emergency
security_class: operational                   # operational | sensitive | secret-ref | policy

schema:                                       # the contract, not the value
  type: integer
  default: 500
  minimum: 1
  maximum: 5000
  validator: schemas/mail-delivery.schema.json   # JSON Schema or CUE ref

sources:                                       # source-linked, never inlined values
  - repo: railiance-platform
    path: config/mail/delivery.yaml
    role: company-baseline
  - repo: railiance-platform
    path: environments/prod.yaml
    role: environment-overlay

relations:
  consumed_by: [service.mail-gateway]
  overrides: []
  depends_on_secret: []                        # references only, never the secret
  related_to: [surface.platform.mail.rate-limit]

evidence:
  last_seen: '2026-06-26'                      # from connector run
  discovery_method: connector:git-grep | manual
  change_log_ref: <state-hub progress event / PR url>
---

# Mail delivery batch sizing

Prose: what it means, why it exists, precedence notes, known gotchas.

Key efficiency choices:

  • kind is the primary classifier — it drives the research §3 kind-separation (secrets vs flags vs infra-state are never treated alike).
  • scope.allowed_layers encodes the layering contract per key — this is the durable value even before a resolver exists.
  • sources[].role carries the layer each source contributes; this is what a future Explain view consumes to render config explain <key>.
  • No value fields. The atlas records where and which layer wins by rule, never the live value.

4. Discovery (read-first connectors)

Connectors are stateless, read-only scripts that scan a source and emit candidate entries (YAML) for human/agent review via PR. They never write live systems and never auto-merge.

Minimum viable connector set — pick the 34 that prove cross-tool resolution:

Connector Source Emits
git-config repo grep for known config files/keys app-config, deploy-config candidates
helm-values Helm values*.yaml + overlays deploy-config with layer roles
terraform-vars TF/OpenTofu variables + tfvars infra-state candidates
flag-platform feature-flag API inventory feature-flag candidates + stale-flag signal
secret-ref grep for vault/OpenBao/SOPS refs secret-ref (reference only)

Pipeline: connector → candidate YAML → PR → reuse-surface validate (CI) → merge. The human/agent in the loop is the practical substitute for a resolution engine in the early phases — and is cheaper and safer than one.


5. Validation, evidence, and the graph

Validation (CI, day one). Every PR runs:

  • reuse-surface validate --root . (entry well-formedness, index sync)
  • git diff --check
  • JSON Schema / CUE check of each entry's schema block against the Canon schema.

This is the single highest-leverage, lowest-cost piece: it makes the registry trustworthy without any service.

Evidence & graph — reuse the State Hub. Do not build a graph database. The relationships in §3 (consumed_by, overrides, depends_on_secret) plus State Hub progress/decision/relationship records already give a config knowledge graph (research §5) for free. config-atlas contributes the config-typed edges; the hub stores and queries them.

Explain view (later). Once entries carry sources[].role, a small renderer can produce the config explain output from the research primer — statically, from source links, without reading live values. This is the first capability that feels like a "control plane" and should be the headline of Phase 3.


6. Phased roadmap (efficient path to practical)

Each phase ships something usable and maps to an ATLAS-WP- workplan.

Phase 0 — Canon (now, days). Write the surface-entry JSON Schema + the scope/precedence/merge model as a machine-checkable doc. Replace the inherited repo-template capability artifact (ATLAS-WP-0002). Exit: one real surface entry validates in CI.

Phase 1 — Seed registry by hand (12 weeks). Hand-author 1020 entries for the highest-value Coulomb surfaces (start with railiance-platform mail/rate-limit, secret-refs, key feature flags). Stand up CI validation. Exit: a reviewer can answer "what configures X, who owns it, where" from the repo alone.

Phase 2 — First connectors (24 weeks). Build git-config + one of helm-values/flag-platform. Candidate-PR workflow. Exit: registry grows from automated discovery, not just hand authoring; stale/ unowned surfaces are surfaced.

Phase 3 — Explain & graph (4+ weeks). Render config explain from sources[].role; push config-typed edges to the State Hub. Exit: given a key, show its layer path, what overrides what, owner, and consumers — the read-first control-plane MVP.

Deferred (out of current scope). Live resolution, controlled change, approval workflows, rollout/rollback orchestration — these belong to downstream systems (feature-control, GitOps, the platform), not this repo.


7. Build-vs-reuse summary

Need Decision Why
Entry validation / federation reuse reuse-surface already the federation contract
Workplans, relationships, audit reuse State Hub edges + evidence for free
Schema/merge validation reuse JSON Schema, evaluate CUE CUE's order-independent merge fits effective-config (research §3.3)
Policy checks reuse OPA/Kyverno as backends config-atlas is the context layer, not the engine
Secret storage never — reference only OpenBao owns values
Discovery connectors build (thin, read-only) the genuinely novel, repo-specific piece
Effective-config resolver / delivery don't build out of scope; delegated downstream

The whole design optimizes for one thing: the smallest amount of original software that turns scattered configuration into a discoverable, explainable, source-linked map — and borrows everything else.