Files
config-atlas/specs/ArchitectureBlueprint.md
tegwick 05fa73e20f docs: add config-atlas Product Requirements Document
Add specs/ProductRequirementsDocument.md: hybrid product PRD (sister-repo
skeleton plus the template's Formal Standards / Related Concepts /
Appendix sections), heavy FR/NFR with Requirement/Details/Acceptance
triplets, Canon Alignment, 12 functional + 8 non-functional requirements,
conceptual model, MVP, roadmap, risks, and orientation-map appendix.
Substance traces to INTENT, ArchitectureBlueprint, ecosystem-boundaries,
and the research digest; no scope invented beyond repo-boundary.

Fix relative links broken by the ArchitectureBlueprint.md move into
specs/ (its own INTENT/SCOPE/research links and the ecosystem-boundaries
back-reference).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 22:37:06 +02:00

226 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Architecture Blueprint — config-atlas
> How to establish config-atlas as an efficient, practical system.
> Companion to [`INTENT.md`](../INTENT.md) (purpose/boundary),
> [`SCOPE.md`](../SCOPE.md), and [`research/configuration-control-plane.md`](../research/configuration-control-plane.md)
> (the thesis). Drafted 2026-06-26.
---
## 1. Design constraints (what "efficient and practical" means here)
These constraints come straight from `INTENT.md` and `SCOPE.md` and bound every
decision below:
1. **Map and evidence layers only.** config-atlas owns *Registry* and *Evidence*.
It does **not** build a runtime *Resolver*, *Delivery*, or *Control* engine
(research §5). Anything that resolves or pushes live values is out of scope.
2. **Source-linked, never a second source of truth.** Entries point at canonical
files/APIs; the atlas stores *metadata and references*, not live values, and
**never** secret values.
3. **Read-first before write-first.** Discover, classify, attribute ownership, and
explain — before any controlled-change ambition (research §6 wedge).
4. **Markdown + YAML, agent-legible.** No application runtime. Entries must be
diffable, reviewable in a PR, and parseable without bespoke tooling.
5. **Reuse the ecosystem.** Lean on `reuse-surface` (federation/validation), the
State Hub (workplans, relationships, evidence events), and existing engines
(CUE/JSON Schema, OPA) rather than reimplementing them.
The cheapest path to a *practical* system is therefore: **a well-specified entry
schema + validation in CI + a thin discovery pipeline + source links** — not a
service. Efficiency comes from buying, not building, every layer that already
exists elsewhere.
---
## 2. System overview
config-atlas realizes the left half of the control-plane pipeline. The right half
(resolve → deliver → control) is explicitly delegated to downstream systems.
```
OWNED BY config-atlas DELEGATED / EXTERNAL
┌─────────────────────────────────┐ ┌──────────────────────────────┐
│ Canon vocabulary + schema │ │ Resolver effective value │
│ Registry surface entries │ ---> │ Policy OPA / Kyverno / CUE │
│ Evidence relationships, audit │ │ Delivery env / ConfigMap /… │
└─────────────────────────────────┘ │ Control feature-flag plane │
▲ └──────────────────────────────┘
│ ingest (read-only connectors)
┌─────────────────────────────────────────────────────────────────────────┐
│ Sources: repos · K8s/Helm · Terraform state · feature-flag platforms · │
│ secret-manager refs · cloud param stores · SaaS tenant settings │
└─────────────────────────────────────────────────────────────────────────┘
```
Five internal components, smallest-to-build first:
| Component | What it is | Build cost | Status |
|-----------|-----------|-----------|--------|
| **Canon** | Schema + vocabulary for a config-surface entry; scope/precedence/merge model | low | partial (research has the model; needs a JSON Schema) |
| **Registry** | The corpus of surface entries + indexes | low | scaffold exists (`registry/`) |
| **Connectors** | Read-only ingest scripts that emit candidate entries | medium | not started |
| **Evidence graph** | Relationships (consumes/overrides/depends-on/secret-ref) + change audit | medium | partly free via State Hub |
| **Explain view** | Render an effective-config *path* from layered source links | medium-high | future |
---
## 3. The core artifact: the configuration-surface entry
Everything else exists to produce and validate these. Model it on the existing
capability-entry schema (`registry/capabilities/…`) so `reuse-surface` validation
and State Hub federation work unchanged. Proposed shape:
```yaml
---
id: surface.<domain>.<system>.<name> # stable, unique
name: Mail delivery batch sizing
kind: app-config | deploy-config | secret-ref | feature-flag |
policy | tenant-config | infra-state | runtime-override
summary: Controls max batch size for outbound mail delivery.
owner: platform-delivery # team/agent, not a person
status: draft | active | deprecated
scope: # which layers may set this (research §3.1)
allowed_layers: [company, environment, installation, tenant]
default_layer: company
mutability: hot-reloadable # build|deploy|startup|hot|per-request|emergency
security_class: operational # operational | sensitive | secret-ref | policy
schema: # the contract, not the value
type: integer
default: 500
minimum: 1
maximum: 5000
validator: schemas/mail-delivery.schema.json # JSON Schema or CUE ref
sources: # source-linked, never inlined values
- repo: railiance-platform
path: config/mail/delivery.yaml
role: company-baseline
- repo: railiance-platform
path: environments/prod.yaml
role: environment-overlay
relations:
consumed_by: [service.mail-gateway]
overrides: []
depends_on_secret: [] # references only, never the secret
related_to: [surface.platform.mail.rate-limit]
evidence:
last_seen: '2026-06-26' # from connector run
discovery_method: connector:git-grep | manual
change_log_ref: <state-hub progress event / PR url>
---
# Mail delivery batch sizing
Prose: what it means, why it exists, precedence notes, known gotchas.
```
Key efficiency choices:
- **`kind` is the primary classifier** — it drives the research §3 kind-separation
(secrets vs flags vs infra-state are never treated alike).
- **`scope.allowed_layers` encodes the layering contract** per key — this is the
durable value even before a resolver exists.
- **`sources[].role`** carries the layer each source contributes; this is what a
future Explain view consumes to render `config explain <key>`.
- **No value fields.** The atlas records *where* and *which layer wins by rule*,
never the live value.
---
## 4. Discovery (read-first connectors)
Connectors are **stateless, read-only scripts** that scan a source and emit
*candidate* entries (YAML) for human/agent review via PR. They never write live
systems and never auto-merge.
Minimum viable connector set — pick the 34 that prove cross-tool resolution:
| Connector | Source | Emits |
|-----------|--------|-------|
| `git-config` | repo grep for known config files/keys | `app-config`, `deploy-config` candidates |
| `helm-values` | Helm `values*.yaml` + overlays | `deploy-config` with layer roles |
| `terraform-vars` | TF/OpenTofu variables + tfvars | `infra-state` candidates |
| `flag-platform` | feature-flag API inventory | `feature-flag` candidates + stale-flag signal |
| `secret-ref` | grep for vault/OpenBao/SOPS refs | `secret-ref` (reference only) |
Pipeline: `connector → candidate YAML → PR → reuse-surface validate (CI) → merge`.
The human/agent in the loop is the practical substitute for a resolution engine in
the early phases — and is *cheaper and safer* than one.
---
## 5. Validation, evidence, and the graph
**Validation (CI, day one).** Every PR runs:
- `reuse-surface validate --root .` (entry well-formedness, index sync)
- `git diff --check`
- JSON Schema / CUE check of each entry's `schema` block against the Canon schema.
This is the single highest-leverage, lowest-cost piece: it makes the registry
*trustworthy* without any service.
**Evidence & graph — reuse the State Hub.** Do not build a graph database. The
relationships in §3 (`consumed_by`, `overrides`, `depends_on_secret`) plus State
Hub `progress`/`decision`/relationship records already give a config knowledge
graph (research §5) for free. config-atlas contributes the *config-typed edges*;
the hub stores and queries them.
**Explain view (later).** Once entries carry `sources[].role`, a small renderer can
produce the `config explain` output from the research primer — statically, from
source links, without reading live values. This is the first capability that feels
like a "control plane" and should be the headline of Phase 3.
---
## 6. Phased roadmap (efficient path to practical)
Each phase ships something usable and maps to an `ATLAS-WP-` workplan.
**Phase 0 — Canon (now, days).**
Write the surface-entry JSON Schema + the scope/precedence/merge model as a
machine-checkable doc. Replace the inherited `repo-template` capability artifact
(ATLAS-WP-0002). *Exit:* one real surface entry validates in CI.
**Phase 1 — Seed registry by hand (12 weeks).**
Hand-author 1020 entries for the highest-value Coulomb surfaces (start with
railiance-platform mail/rate-limit, secret-refs, key feature flags). Stand up CI
validation. *Exit:* a reviewer can answer "what configures X, who owns it, where"
from the repo alone.
**Phase 2 — First connectors (24 weeks).**
Build `git-config` + one of `helm-values`/`flag-platform`. Candidate-PR workflow.
*Exit:* registry grows from automated discovery, not just hand authoring; stale/
unowned surfaces are surfaced.
**Phase 3 — Explain & graph (4+ weeks).**
Render `config explain` from `sources[].role`; push config-typed edges to the State
Hub. *Exit:* given a key, show its layer path, what overrides what, owner, and
consumers — the read-first control-plane MVP.
**Deferred (out of current scope).** Live resolution, controlled change, approval
workflows, rollout/rollback orchestration — these belong to downstream systems
(`feature-control`, GitOps, the platform), not this repo.
---
## 7. Build-vs-reuse summary
| Need | Decision | Why |
|------|----------|-----|
| Entry validation / federation | **reuse** reuse-surface | already the federation contract |
| Workplans, relationships, audit | **reuse** State Hub | edges + evidence for free |
| Schema/merge validation | **reuse** JSON Schema, evaluate CUE | CUE's order-independent merge fits effective-config (research §3.3) |
| Policy checks | **reuse** OPA/Kyverno as backends | config-atlas is the context layer, not the engine |
| Secret storage | **never** — reference only | OpenBao owns values |
| Discovery connectors | **build** (thin, read-only) | the genuinely novel, repo-specific piece |
| Effective-config resolver / delivery | **don't build** | out of scope; delegated downstream |
The whole design optimizes for one thing: **the smallest amount of original
software that turns scattered configuration into a discoverable, explainable,
source-linked map** — and borrows everything else.
</content>