generated from coulomb/repo-seed
Add specs/ProductRequirementsDocument.md: hybrid product PRD (sister-repo skeleton plus the template's Formal Standards / Related Concepts / Appendix sections), heavy FR/NFR with Requirement/Details/Acceptance triplets, Canon Alignment, 12 functional + 8 non-functional requirements, conceptual model, MVP, roadmap, risks, and orientation-map appendix. Substance traces to INTENT, ArchitectureBlueprint, ecosystem-boundaries, and the research digest; no scope invented beyond repo-boundary. Fix relative links broken by the ArchitectureBlueprint.md move into specs/ (its own INTENT/SCOPE/research links and the ecosystem-boundaries back-reference). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
226 lines
11 KiB
Markdown
226 lines
11 KiB
Markdown
# Architecture Blueprint — config-atlas
|
||
|
||
> How to establish config-atlas as an efficient, practical system.
|
||
> Companion to [`INTENT.md`](../INTENT.md) (purpose/boundary),
|
||
> [`SCOPE.md`](../SCOPE.md), and [`research/configuration-control-plane.md`](../research/configuration-control-plane.md)
|
||
> (the thesis). Drafted 2026-06-26.
|
||
|
||
---
|
||
|
||
## 1. Design constraints (what "efficient and practical" means here)
|
||
|
||
These constraints come straight from `INTENT.md` and `SCOPE.md` and bound every
|
||
decision below:
|
||
|
||
1. **Map and evidence layers only.** config-atlas owns *Registry* and *Evidence*.
|
||
It does **not** build a runtime *Resolver*, *Delivery*, or *Control* engine
|
||
(research §5). Anything that resolves or pushes live values is out of scope.
|
||
2. **Source-linked, never a second source of truth.** Entries point at canonical
|
||
files/APIs; the atlas stores *metadata and references*, not live values, and
|
||
**never** secret values.
|
||
3. **Read-first before write-first.** Discover, classify, attribute ownership, and
|
||
explain — before any controlled-change ambition (research §6 wedge).
|
||
4. **Markdown + YAML, agent-legible.** No application runtime. Entries must be
|
||
diffable, reviewable in a PR, and parseable without bespoke tooling.
|
||
5. **Reuse the ecosystem.** Lean on `reuse-surface` (federation/validation), the
|
||
State Hub (workplans, relationships, evidence events), and existing engines
|
||
(CUE/JSON Schema, OPA) rather than reimplementing them.
|
||
|
||
The cheapest path to a *practical* system is therefore: **a well-specified entry
|
||
schema + validation in CI + a thin discovery pipeline + source links** — not a
|
||
service. Efficiency comes from buying, not building, every layer that already
|
||
exists elsewhere.
|
||
|
||
---
|
||
|
||
## 2. System overview
|
||
|
||
config-atlas realizes the left half of the control-plane pipeline. The right half
|
||
(resolve → deliver → control) is explicitly delegated to downstream systems.
|
||
|
||
```
|
||
OWNED BY config-atlas DELEGATED / EXTERNAL
|
||
┌─────────────────────────────────┐ ┌──────────────────────────────┐
|
||
│ Canon vocabulary + schema │ │ Resolver effective value │
|
||
│ Registry surface entries │ ---> │ Policy OPA / Kyverno / CUE │
|
||
│ Evidence relationships, audit │ │ Delivery env / ConfigMap /… │
|
||
└─────────────────────────────────┘ │ Control feature-flag plane │
|
||
▲ └──────────────────────────────┘
|
||
│ ingest (read-only connectors)
|
||
┌─────────────────────────────────────────────────────────────────────────┐
|
||
│ Sources: repos · K8s/Helm · Terraform state · feature-flag platforms · │
|
||
│ secret-manager refs · cloud param stores · SaaS tenant settings │
|
||
└─────────────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
Five internal components, smallest-to-build first:
|
||
|
||
| Component | What it is | Build cost | Status |
|
||
|-----------|-----------|-----------|--------|
|
||
| **Canon** | Schema + vocabulary for a config-surface entry; scope/precedence/merge model | low | partial (research has the model; needs a JSON Schema) |
|
||
| **Registry** | The corpus of surface entries + indexes | low | scaffold exists (`registry/`) |
|
||
| **Connectors** | Read-only ingest scripts that emit candidate entries | medium | not started |
|
||
| **Evidence graph** | Relationships (consumes/overrides/depends-on/secret-ref) + change audit | medium | partly free via State Hub |
|
||
| **Explain view** | Render an effective-config *path* from layered source links | medium-high | future |
|
||
|
||
---
|
||
|
||
## 3. The core artifact: the configuration-surface entry
|
||
|
||
Everything else exists to produce and validate these. Model it on the existing
|
||
capability-entry schema (`registry/capabilities/…`) so `reuse-surface` validation
|
||
and State Hub federation work unchanged. Proposed shape:
|
||
|
||
```yaml
|
||
---
|
||
id: surface.<domain>.<system>.<name> # stable, unique
|
||
name: Mail delivery batch sizing
|
||
kind: app-config | deploy-config | secret-ref | feature-flag |
|
||
policy | tenant-config | infra-state | runtime-override
|
||
summary: Controls max batch size for outbound mail delivery.
|
||
owner: platform-delivery # team/agent, not a person
|
||
status: draft | active | deprecated
|
||
|
||
scope: # which layers may set this (research §3.1)
|
||
allowed_layers: [company, environment, installation, tenant]
|
||
default_layer: company
|
||
mutability: hot-reloadable # build|deploy|startup|hot|per-request|emergency
|
||
security_class: operational # operational | sensitive | secret-ref | policy
|
||
|
||
schema: # the contract, not the value
|
||
type: integer
|
||
default: 500
|
||
minimum: 1
|
||
maximum: 5000
|
||
validator: schemas/mail-delivery.schema.json # JSON Schema or CUE ref
|
||
|
||
sources: # source-linked, never inlined values
|
||
- repo: railiance-platform
|
||
path: config/mail/delivery.yaml
|
||
role: company-baseline
|
||
- repo: railiance-platform
|
||
path: environments/prod.yaml
|
||
role: environment-overlay
|
||
|
||
relations:
|
||
consumed_by: [service.mail-gateway]
|
||
overrides: []
|
||
depends_on_secret: [] # references only, never the secret
|
||
related_to: [surface.platform.mail.rate-limit]
|
||
|
||
evidence:
|
||
last_seen: '2026-06-26' # from connector run
|
||
discovery_method: connector:git-grep | manual
|
||
change_log_ref: <state-hub progress event / PR url>
|
||
---
|
||
|
||
# Mail delivery batch sizing
|
||
|
||
Prose: what it means, why it exists, precedence notes, known gotchas.
|
||
```
|
||
|
||
Key efficiency choices:
|
||
- **`kind` is the primary classifier** — it drives the research §3 kind-separation
|
||
(secrets vs flags vs infra-state are never treated alike).
|
||
- **`scope.allowed_layers` encodes the layering contract** per key — this is the
|
||
durable value even before a resolver exists.
|
||
- **`sources[].role`** carries the layer each source contributes; this is what a
|
||
future Explain view consumes to render `config explain <key>`.
|
||
- **No value fields.** The atlas records *where* and *which layer wins by rule*,
|
||
never the live value.
|
||
|
||
---
|
||
|
||
## 4. Discovery (read-first connectors)
|
||
|
||
Connectors are **stateless, read-only scripts** that scan a source and emit
|
||
*candidate* entries (YAML) for human/agent review via PR. They never write live
|
||
systems and never auto-merge.
|
||
|
||
Minimum viable connector set — pick the 3–4 that prove cross-tool resolution:
|
||
|
||
| Connector | Source | Emits |
|
||
|-----------|--------|-------|
|
||
| `git-config` | repo grep for known config files/keys | `app-config`, `deploy-config` candidates |
|
||
| `helm-values` | Helm `values*.yaml` + overlays | `deploy-config` with layer roles |
|
||
| `terraform-vars` | TF/OpenTofu variables + tfvars | `infra-state` candidates |
|
||
| `flag-platform` | feature-flag API inventory | `feature-flag` candidates + stale-flag signal |
|
||
| `secret-ref` | grep for vault/OpenBao/SOPS refs | `secret-ref` (reference only) |
|
||
|
||
Pipeline: `connector → candidate YAML → PR → reuse-surface validate (CI) → merge`.
|
||
The human/agent in the loop is the practical substitute for a resolution engine in
|
||
the early phases — and is *cheaper and safer* than one.
|
||
|
||
---
|
||
|
||
## 5. Validation, evidence, and the graph
|
||
|
||
**Validation (CI, day one).** Every PR runs:
|
||
- `reuse-surface validate --root .` (entry well-formedness, index sync)
|
||
- `git diff --check`
|
||
- JSON Schema / CUE check of each entry's `schema` block against the Canon schema.
|
||
|
||
This is the single highest-leverage, lowest-cost piece: it makes the registry
|
||
*trustworthy* without any service.
|
||
|
||
**Evidence & graph — reuse the State Hub.** Do not build a graph database. The
|
||
relationships in §3 (`consumed_by`, `overrides`, `depends_on_secret`) plus State
|
||
Hub `progress`/`decision`/relationship records already give a config knowledge
|
||
graph (research §5) for free. config-atlas contributes the *config-typed edges*;
|
||
the hub stores and queries them.
|
||
|
||
**Explain view (later).** Once entries carry `sources[].role`, a small renderer can
|
||
produce the `config explain` output from the research primer — statically, from
|
||
source links, without reading live values. This is the first capability that feels
|
||
like a "control plane" and should be the headline of Phase 3.
|
||
|
||
---
|
||
|
||
## 6. Phased roadmap (efficient path to practical)
|
||
|
||
Each phase ships something usable and maps to an `ATLAS-WP-` workplan.
|
||
|
||
**Phase 0 — Canon (now, days).**
|
||
Write the surface-entry JSON Schema + the scope/precedence/merge model as a
|
||
machine-checkable doc. Replace the inherited `repo-template` capability artifact
|
||
(ATLAS-WP-0002). *Exit:* one real surface entry validates in CI.
|
||
|
||
**Phase 1 — Seed registry by hand (1–2 weeks).**
|
||
Hand-author 10–20 entries for the highest-value Coulomb surfaces (start with
|
||
railiance-platform mail/rate-limit, secret-refs, key feature flags). Stand up CI
|
||
validation. *Exit:* a reviewer can answer "what configures X, who owns it, where"
|
||
from the repo alone.
|
||
|
||
**Phase 2 — First connectors (2–4 weeks).**
|
||
Build `git-config` + one of `helm-values`/`flag-platform`. Candidate-PR workflow.
|
||
*Exit:* registry grows from automated discovery, not just hand authoring; stale/
|
||
unowned surfaces are surfaced.
|
||
|
||
**Phase 3 — Explain & graph (4+ weeks).**
|
||
Render `config explain` from `sources[].role`; push config-typed edges to the State
|
||
Hub. *Exit:* given a key, show its layer path, what overrides what, owner, and
|
||
consumers — the read-first control-plane MVP.
|
||
|
||
**Deferred (out of current scope).** Live resolution, controlled change, approval
|
||
workflows, rollout/rollback orchestration — these belong to downstream systems
|
||
(`feature-control`, GitOps, the platform), not this repo.
|
||
|
||
---
|
||
|
||
## 7. Build-vs-reuse summary
|
||
|
||
| Need | Decision | Why |
|
||
|------|----------|-----|
|
||
| Entry validation / federation | **reuse** reuse-surface | already the federation contract |
|
||
| Workplans, relationships, audit | **reuse** State Hub | edges + evidence for free |
|
||
| Schema/merge validation | **reuse** JSON Schema, evaluate CUE | CUE's order-independent merge fits effective-config (research §3.3) |
|
||
| Policy checks | **reuse** OPA/Kyverno as backends | config-atlas is the context layer, not the engine |
|
||
| Secret storage | **never** — reference only | OpenBao owns values |
|
||
| Discovery connectors | **build** (thin, read-only) | the genuinely novel, repo-specific piece |
|
||
| Effective-config resolver / delivery | **don't build** | out of scope; delegated downstream |
|
||
|
||
The whole design optimizes for one thing: **the smallest amount of original
|
||
software that turns scattered configuration into a discoverable, explainable,
|
||
source-linked map** — and borrows everything else.
|
||
</content>
|