Practical, efficiency-first blueprint: bound config-atlas to the map and evidence layers, define the configuration-surface entry schema (modeled on the existing capability-entry format), a read-only connector discovery pipeline, CI validation, State Hub reuse for the config graph, and a phased roadmap (Canon -> seed -> connectors -> explain) with a build-vs-reuse table. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
11 KiB
Architecture Blueprint — config-atlas
How to establish config-atlas as an efficient, practical system. Companion to
INTENT.md(purpose/boundary),SCOPE.md, andresearch/configuration-control-plane.md(the thesis). Drafted 2026-06-26.
1. Design constraints (what "efficient and practical" means here)
These constraints come straight from INTENT.md and SCOPE.md and bound every
decision below:
- Map and evidence layers only. config-atlas owns Registry and Evidence. It does not build a runtime Resolver, Delivery, or Control engine (research §5). Anything that resolves or pushes live values is out of scope.
- Source-linked, never a second source of truth. Entries point at canonical files/APIs; the atlas stores metadata and references, not live values, and never secret values.
- Read-first before write-first. Discover, classify, attribute ownership, and explain — before any controlled-change ambition (research §6 wedge).
- Markdown + YAML, agent-legible. No application runtime. Entries must be diffable, reviewable in a PR, and parseable without bespoke tooling.
- Reuse the ecosystem. Lean on
reuse-surface(federation/validation), the State Hub (workplans, relationships, evidence events), and existing engines (CUE/JSON Schema, OPA) rather than reimplementing them.
The cheapest path to a practical system is therefore: a well-specified entry schema + validation in CI + a thin discovery pipeline + source links — not a service. Efficiency comes from buying, not building, every layer that already exists elsewhere.
2. System overview
config-atlas realizes the left half of the control-plane pipeline. The right half (resolve → deliver → control) is explicitly delegated to downstream systems.
OWNED BY config-atlas DELEGATED / EXTERNAL
┌─────────────────────────────────┐ ┌──────────────────────────────┐
│ Canon vocabulary + schema │ │ Resolver effective value │
│ Registry surface entries │ ---> │ Policy OPA / Kyverno / CUE │
│ Evidence relationships, audit │ │ Delivery env / ConfigMap /… │
└─────────────────────────────────┘ │ Control feature-flag plane │
▲ └──────────────────────────────┘
│ ingest (read-only connectors)
┌─────────────────────────────────────────────────────────────────────────┐
│ Sources: repos · K8s/Helm · Terraform state · feature-flag platforms · │
│ secret-manager refs · cloud param stores · SaaS tenant settings │
└─────────────────────────────────────────────────────────────────────────┘
Five internal components, smallest-to-build first:
| Component | What it is | Build cost | Status |
|---|---|---|---|
| Canon | Schema + vocabulary for a config-surface entry; scope/precedence/merge model | low | partial (research has the model; needs a JSON Schema) |
| Registry | The corpus of surface entries + indexes | low | scaffold exists (registry/) |
| Connectors | Read-only ingest scripts that emit candidate entries | medium | not started |
| Evidence graph | Relationships (consumes/overrides/depends-on/secret-ref) + change audit | medium | partly free via State Hub |
| Explain view | Render an effective-config path from layered source links | medium-high | future |
3. The core artifact: the configuration-surface entry
Everything else exists to produce and validate these. Model it on the existing
capability-entry schema (registry/capabilities/…) so reuse-surface validation
and State Hub federation work unchanged. Proposed shape:
---
id: surface.<domain>.<system>.<name> # stable, unique
name: Mail delivery batch sizing
kind: app-config | deploy-config | secret-ref | feature-flag |
policy | tenant-config | infra-state | runtime-override
summary: Controls max batch size for outbound mail delivery.
owner: platform-delivery # team/agent, not a person
status: draft | active | deprecated
scope: # which layers may set this (research §3.1)
allowed_layers: [company, environment, installation, tenant]
default_layer: company
mutability: hot-reloadable # build|deploy|startup|hot|per-request|emergency
security_class: operational # operational | sensitive | secret-ref | policy
schema: # the contract, not the value
type: integer
default: 500
minimum: 1
maximum: 5000
validator: schemas/mail-delivery.schema.json # JSON Schema or CUE ref
sources: # source-linked, never inlined values
- repo: railiance-platform
path: config/mail/delivery.yaml
role: company-baseline
- repo: railiance-platform
path: environments/prod.yaml
role: environment-overlay
relations:
consumed_by: [service.mail-gateway]
overrides: []
depends_on_secret: [] # references only, never the secret
related_to: [surface.platform.mail.rate-limit]
evidence:
last_seen: '2026-06-26' # from connector run
discovery_method: connector:git-grep | manual
change_log_ref: <state-hub progress event / PR url>
---
# Mail delivery batch sizing
Prose: what it means, why it exists, precedence notes, known gotchas.
Key efficiency choices:
kindis the primary classifier — it drives the research §3 kind-separation (secrets vs flags vs infra-state are never treated alike).scope.allowed_layersencodes the layering contract per key — this is the durable value even before a resolver exists.sources[].rolecarries the layer each source contributes; this is what a future Explain view consumes to renderconfig explain <key>.- No value fields. The atlas records where and which layer wins by rule, never the live value.
4. Discovery (read-first connectors)
Connectors are stateless, read-only scripts that scan a source and emit candidate entries (YAML) for human/agent review via PR. They never write live systems and never auto-merge.
Minimum viable connector set — pick the 3–4 that prove cross-tool resolution:
| Connector | Source | Emits |
|---|---|---|
git-config |
repo grep for known config files/keys | app-config, deploy-config candidates |
helm-values |
Helm values*.yaml + overlays |
deploy-config with layer roles |
terraform-vars |
TF/OpenTofu variables + tfvars | infra-state candidates |
flag-platform |
feature-flag API inventory | feature-flag candidates + stale-flag signal |
secret-ref |
grep for vault/OpenBao/SOPS refs | secret-ref (reference only) |
Pipeline: connector → candidate YAML → PR → reuse-surface validate (CI) → merge.
The human/agent in the loop is the practical substitute for a resolution engine in
the early phases — and is cheaper and safer than one.
5. Validation, evidence, and the graph
Validation (CI, day one). Every PR runs:
reuse-surface validate --root .(entry well-formedness, index sync)git diff --check- JSON Schema / CUE check of each entry's
schemablock against the Canon schema.
This is the single highest-leverage, lowest-cost piece: it makes the registry trustworthy without any service.
Evidence & graph — reuse the State Hub. Do not build a graph database. The
relationships in §3 (consumed_by, overrides, depends_on_secret) plus State
Hub progress/decision/relationship records already give a config knowledge
graph (research §5) for free. config-atlas contributes the config-typed edges;
the hub stores and queries them.
Explain view (later). Once entries carry sources[].role, a small renderer can
produce the config explain output from the research primer — statically, from
source links, without reading live values. This is the first capability that feels
like a "control plane" and should be the headline of Phase 3.
6. Phased roadmap (efficient path to practical)
Each phase ships something usable and maps to an ATLAS-WP- workplan.
Phase 0 — Canon (now, days).
Write the surface-entry JSON Schema + the scope/precedence/merge model as a
machine-checkable doc. Replace the inherited repo-template capability artifact
(ATLAS-WP-0002). Exit: one real surface entry validates in CI.
Phase 1 — Seed registry by hand (1–2 weeks). Hand-author 10–20 entries for the highest-value Coulomb surfaces (start with railiance-platform mail/rate-limit, secret-refs, key feature flags). Stand up CI validation. Exit: a reviewer can answer "what configures X, who owns it, where" from the repo alone.
Phase 2 — First connectors (2–4 weeks).
Build git-config + one of helm-values/flag-platform. Candidate-PR workflow.
Exit: registry grows from automated discovery, not just hand authoring; stale/
unowned surfaces are surfaced.
Phase 3 — Explain & graph (4+ weeks).
Render config explain from sources[].role; push config-typed edges to the State
Hub. Exit: given a key, show its layer path, what overrides what, owner, and
consumers — the read-first control-plane MVP.
Deferred (out of current scope). Live resolution, controlled change, approval
workflows, rollout/rollback orchestration — these belong to downstream systems
(feature-control, GitOps, the platform), not this repo.
7. Build-vs-reuse summary
| Need | Decision | Why |
|---|---|---|
| Entry validation / federation | reuse reuse-surface | already the federation contract |
| Workplans, relationships, audit | reuse State Hub | edges + evidence for free |
| Schema/merge validation | reuse JSON Schema, evaluate CUE | CUE's order-independent merge fits effective-config (research §3.3) |
| Policy checks | reuse OPA/Kyverno as backends | config-atlas is the context layer, not the engine |
| Secret storage | never — reference only | OpenBao owns values |
| Discovery connectors | build (thin, read-only) | the genuinely novel, repo-specific piece |
| Effective-config resolver / delivery | don't build | out of scope; delegated downstream |
The whole design optimizes for one thing: the smallest amount of original software that turns scattered configuration into a discoverable, explainable, source-linked map — and borrows everything else.