Files
config-atlas/specs/ProductRequirementsDocument.md
tegwick 05fa73e20f docs: add config-atlas Product Requirements Document
Add specs/ProductRequirementsDocument.md: hybrid product PRD (sister-repo
skeleton plus the template's Formal Standards / Related Concepts /
Appendix sections), heavy FR/NFR with Requirement/Details/Acceptance
triplets, Canon Alignment, 12 functional + 8 non-functional requirements,
conceptual model, MVP, roadmap, risks, and orientation-map appendix.
Substance traces to INTENT, ArchitectureBlueprint, ecosystem-boundaries,
and the research digest; no scope invented beyond repo-boundary.

Fix relative links broken by the ArchitectureBlueprint.md move into
specs/ (its own INTENT/SCOPE/research links and the ecosystem-boundaries
back-reference).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 22:37:06 +02:00

556 lines
25 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# config-atlas — Product Requirements Document
Status: Draft v0.1
Date: 2026-06-26
Owner: config-atlas initiative
Primary integration standard: reuse-surface federation (capability registry model)
Terminology alignment: InfoTechCanon-compatible; extends InfoTechCanon only where a
configuration surface requires precision ITC does not yet provide. See
`../docs/canon-mapping.md` (planned) and `../docs/ecosystem-boundaries.md`.
Companion artifacts: `ArchitectureBlueprint.md`, `../INTENT.md`,
`../research/configuration-control-plane.md`. Relevant workplan: `ATLAS-WP-0002`.
> This PRD defines *what* config-atlas must achieve and under which constraints. It
> is implementation-independent; the *how* lives in `ArchitectureBlueprint.md`.
---
## 1. Product summary
config-atlas is the **read-first, cross-kind configuration map and evidence layer**
for fast-moving, multi-repo, multi-tenant software landscapes. It treats each
**configuration surface** — a bounded, named place where configuration is defined,
read, or overridden — as a first-class, registry-backed entry with ownership,
scope, validation hooks, and source links.
The product answers four questions an operator or agent cannot answer today without
tribal knowledge:
1. What configuration exists for a repo, capability, or deployment context?
2. Who owns it and where is the source of truth?
3. What are the safe defaults and precedence rules?
4. Which other surfaces depend on, override, or are affected by it?
config-atlas is **not** where configuration lives and **not** a runtime engine. It
is where the distributed configuration surface becomes visible, explainable,
governable, and safe to change. (`../INTENT.md`, `../wiki/ProductVision.md`)
---
## 2. Problem statement
Configuration is **distributed control information**: the live mechanism that
changes how systems behave, often faster and with less ceremony than a code deploy.
As cloud-native scale grew, configuration became the dominant operational failure
mode — a disproportionate share of large 20242026 incidents trace to a
configuration change, not a code defect (CrowdStrike, AT&T, Cloudflare, Azure;
`../research/configuration-control-plane.md` §2).
Yet configuration knowledge is scattered across repos, manifests, environment
variables, feature-flag platforms, policy files, secret managers, and operator
runbooks. Teams and agents rediscover the same surfaces repeatedly and cannot
reason confidently about defaults, precedence, or ownership. Existing tools manage
the configuration *they own*; few **discover** configuration across tools, and
fewer can resolve and explain the **effective** value that actually applies.
The product thesis: **map the territory before governing it.** A company must first
*see* its configuration surface — discover it, classify it by kind and scope,
attribute ownership, and attach evidence — before any safe-change ambition is
credible.
---
## 3. Goals
### G1 — Discoverable configuration surface
The product shall make every configuration surface that matters to reuse or
operations discoverable from a single, source-linked registry.
### G2 — Effective-configuration explainability
The product shall make it possible to explain, for a key, which layer won, what it
overrode, which validating schema applied, and who owns it — without reading live
values.
### G3 — Ownership and scope clarity
The product shall attribute every surface to an owner and a scope, resolving
ownership against `domain-tree` rather than inventing a private org model.
### G4 — Map before control
The product shall deliver read-first configuration intelligence (discover,
classify, attribute, explain) and shall not require write access to any production
configuration system.
### G5 — Ecosystem reuse over reinvention
The product shall reuse sister-repo capabilities — `reuse-surface` (schema,
validation, federation), `repo-scoping` (scanning/candidate/approval),
`info-tech-canon` (vocabulary), the State Hub (graph/evidence) — rather than
duplicating them (`../docs/ecosystem-boundaries.md`).
### G6 — No secret exposure
The product shall never store secret values; secrets appear only as references.
### G7 — Deterministic, explainable merge semantics
The product shall represent layer precedence and merge rules explicitly so that a
winning value is always attributable to a declared rule, never to hidden
last-writer-wins behavior.
### G8 — Federation compatibility
The product shall participate in `reuse-surface` federation as a typed registry
peer, with a reserved id namespace, so config surfaces interoperate with the
broader capability surface without colliding.
---
## 4. Non-goals
The product shall **not**:
1. Build a runtime configuration **resolver, delivery engine, or control plane**
resolution/delivery/control are delegated downstream (`ArchitectureBlueprint.md`
§1; `../research/configuration-control-plane.md` §5).
2. Own the **runtime resolution or control of feature availability**, including
feature resolvers and kill switches — that is `feature-control`'s plane; config
surfaces of kind `feature-flag` link to it and never re-derive it.
3. Store **secret values** or live, environment-specific configuration values
(OpenBao / railiance-platform own values).
4. Become a **second source of truth** for configuration values; entries point at
canonical sources.
5. **Replace** sister repos: `info-tech-canon` (vocabulary), `repo-scoping`
(scanning), `domain-tree` (placement/ownership identity), `reuse-surface`
(registry/federation), `state-hub` (graph/identity store), `repo-seed`
(template).
6. Define the configuration **vocabulary** itself — it maps to InfoTechCanon.
---
## 5. Canon Alignment & Terminology
config-atlas conforms to InfoTechCanon (ITC) where possible and consumes, rather
than redefines, established concepts. The authoritative mapping is `docs/canon-mapping.md`
(planned, mirroring `feature-control`'s pattern). Summary of ownership boundaries:
| Concept area | Source | config-atlas relationship |
|---|---|---|
| Policy, decision, evidence, control | ITC-GOV | **Consume** — reuse governance vocabulary for evidence/audit |
| Schema, data contract, classification | ITC-DATA | **Consume** — surface `schema`/`security_class` reference these |
| Delivery flow, mutability, environments | ITC-DEVSECOPS | **Consume**`mutability` class derives from delivery stages |
| Environment, deployment, service, repository | ITC-LAND | **Consume** — scope/source identifiers |
| Actor / agent / ownership identity | ITC-ORG via `domain-tree` | **Reference**`owner` resolves to domain-tree bindings |
| Feature availability, evaluation scope | `feature-control` (`EvaluationScope`) | **Align** — share one scope vocabulary; link, do not re-derive |
| **Configuration surface** entry | config-atlas | **Own** |
| **Layering order** (L0L9) over the shared scope vocabulary | config-atlas | **Own** (an ordering, not new scope names) |
| The **cross-kind effective-config map** | config-atlas | **Own** |
Terminology rule (per ITC "import concepts instead of redefining them"): the L0L9
layer model is expressed as an **ordering over** the shared ITC/feature-control
scope vocabulary, not a competing set of scope names. New terms genuinely original
to config-atlas — *configuration surface*, *effective-config path* — are proposed
to ITC as extensions via the canon mapping.
---
## 6. Users & stakeholders
Agents are first-class consumers, not an afterthought.
| Stakeholder | Needs |
|---|---|
| Platform engineer | Find what configures a system, its defaults, precedence, and owner without reading every repo |
| SRE / incident commander | During an incident, see which surfaces affect a service and what recently changed |
| Security / compliance owner | Audit configuration ownership, secret references, and change evidence across the company |
| Tenant / installation admin | Understand which settings are tenant-overridable vs non-overridable guardrails |
| Product owner | See entitlements and feature surfaces as part of one configuration picture |
| Coding agent | Orient on a repo's configuration surface from markdown/YAML without bespoke tooling |
| Architect | Reason about cross-repo configuration relationships, drift, and blast radius |
| Configuration-surface owner | Declare, document, and maintain the surfaces they are accountable for |
---
## 7. Conceptual model
### 7.1 Entities
| Entity | Meaning | Canon mapping |
|---|---|---|
| Configuration surface | A bounded, named place where config is defined/read/overridden | **Owned** (proposed ITC extension) |
| Kind | Class of surface: app-config, deploy-config, secret-ref, feature-flag, policy, tenant-config, infra-state, runtime-override | ITC-DATA / ITC-GOV / ITC-LAND |
| Scope / layer | Dimension where a value may be set (company, environment, tenant, …) | ITC-LAND + feature-control `EvaluationScope` |
| Effective configuration | The resolved value that actually applies for a context | **Owned** (resolution delegated; *path* owned) |
| Source | A canonical file/API contributing a value at a given layer role | ITC-LAND / ITC-DEVSECOPS |
| Merge semantics | Declared rule for combining layer contributions | **Owned** |
| Mutability class | build / deploy / startup / hot / per-request / emergency | ITC-DEVSECOPS |
| Evidence | last-seen, change log, drift, who/what/why/when | ITC-GOV.Evidence |
| Relationship / edge | consumed_by, overrides, depends_on_secret, related_to | ITC-GOV + State Hub graph |
### 7.2 Layering order and merge rules
The effective configuration is composed from ordered scopes (from
`../wiki/ConfigLayering.md` and `ArchitectureBlueprint.md` §3):
```text
L0 vendor/product defaults L5 installation/deployment overlay
L1 company baseline L6 tenant/customer/community overlay
L2 platform/domain baseline L7 group/role overlay
L3 environment overlay L8 user/agent/workload overlay
L4 region/zone/cluster L9 emergency/runtime override
```
"More specific wins" by default; higher layers may declare **non-overridable
guardrails**. Merge rules are explicit, never implicit:
```text
scalar more specific layer replaces earlier value
object/map deep merge by key
array/list replace by default; keyed merge only if declared
null not deletion unless tombstone semantics are defined
secret never merged into normal config
policy restrictive rule wins unless explicitly delegated
```
---
## 8. Functional requirements
### FR-1 — Configuration surface registry
**Requirement:** The product shall provide a markdown/YAML registry of
configuration-surface entries, each with a stable id.
**Details:**
- Entry id namespace `surface.<domain>.<system>.<name>`.
- Modeled as a typed sibling of the `reuse-surface` capability entry.
- One file per surface plus a YAML index, mirroring `registry/`.
**Acceptance criteria:**
- A new surface can be added as a single reviewable file + index row.
- Each entry has a unique, stable id validated in CI.
### FR-2 — Kind taxonomy
**Requirement:** Every surface shall declare a `kind` from a closed taxonomy.
**Details:**
- Kinds: `app-config`, `deploy-config`, `secret-ref`, `feature-flag`, `policy`,
`tenant-config`, `infra-state`, `runtime-override`.
- `kind` drives kind-separation: secrets, flags, and infra-state are never treated
as ordinary config.
**Acceptance criteria:**
- An entry with an unknown `kind` fails validation.
- Reports can filter and group surfaces by `kind`.
### FR-3 — Scope / layer model
**Requirement:** Each surface shall declare which layers may set it and a default
layer, using the shared scope vocabulary.
**Details:**
- `scope.allowed_layers` is a subset of the L0L9 ordering.
- Layer names align with ITC / feature-control `EvaluationScope`; no new scope
names are introduced.
**Acceptance criteria:**
- A surface can declare, e.g., `allowed_layers: [company, environment, tenant]`.
- An override proposed at a disallowed layer is flagged.
### FR-4 — Source linking without values
**Requirement:** Each surface shall reference its canonical sources by location and
layer role, and shall not inline live or secret values.
**Details:**
- `sources[]` carries `repo`, `path`/endpoint, and `role` (the contributed layer).
- No value fields exist in the schema.
**Acceptance criteria:**
- An entry records two or more sources with distinct layer roles.
- CI rejects any entry that embeds a literal configuration value or secret.
### FR-5 — Effective-config explain rendering
**Requirement:** The product shall render an effective-config *path* for a key from
its layered source links, statically, without reading live values.
**Details:**
- Output names the winning source layer, what it overrode, the validating schema,
and the owner (the `config explain` shape in `../wiki/ConfigLayering.md`).
- Resolution of *actual values* is out of scope; only the path is owned.
**Acceptance criteria:**
- Given a surface with ordered sources, the product emits an ordered override path
with owner and validator references.
### FR-6 — Ownership resolution
**Requirement:** Every surface shall have an `owner`, resolved against `domain-tree`
bindings rather than a private ownership model.
**Details:**
- `owner` references a team/agent identity, not a person.
- Placement/relevance defers to domain-tree primary/secondary bindings.
**Acceptance criteria:**
- An entry without an owner fails validation.
- Owner references resolve to known domain-tree identities (or are flagged unknown).
### FR-7 — Relationship / edge model
**Requirement:** The product shall record cross-surface relationships and contribute
them as config-typed edges to the State Hub graph.
**Details:**
- Relations: `consumed_by`, `overrides`, `depends_on_secret` (reference only),
`related_to`.
- config-atlas owns the config semantics of each edge; the State Hub stores topology.
**Acceptance criteria:**
- A surface can declare consumers and secret dependencies by reference.
- Declared edges are expressible to the State Hub without duplicating its store.
### FR-8 — Read-only discovery connectors
**Requirement:** The product shall support read-only connectors that emit *candidate*
surface entries for human/agent review, reusing `repo-scoping`'s
scanner→candidate→approval workflow.
**Details:**
- Connectors are stateless and never write live systems or auto-merge.
- Candidate source is `repo-scoping` observed facts where available, with config-kind
classification added on top.
- Pipeline: `connector → candidate YAML → PR → validate → merge`.
**Acceptance criteria:**
- A connector run produces candidate entries that enter via PR review.
- No connector mutates any source system.
### FR-9 — Validation
**Requirement:** Every entry shall be schema-validated in CI via `reuse-surface
validate` plus a surface-entry schema (JSON Schema or CUE).
**Details:**
- Validation covers id uniqueness, kind, scope, owner presence, and absence of
values/secrets.
- `git diff --check` runs on every change.
**Acceptance criteria:**
- A malformed entry blocks merge.
- CI passes on a well-formed seed entry.
### FR-10 — Federation as a typed sibling
**Requirement:** The product shall federate under `reuse-surface` as a registry peer
with a reserved `surface.*` id namespace.
**Details:**
- The configuration-surface entry is a typed sibling of the capability entry, not a
new federation mechanism.
- The `surface.*` namespace is reserved in the reuse-surface federation roster.
**Acceptance criteria:**
- config-atlas entries are discoverable through reuse-surface federation.
- No id collision occurs between capability and surface registries.
### FR-11 — Evidence and audit
**Requirement:** Each surface shall carry discovery and change evidence.
**Details:**
- `evidence.last_seen`, `discovery_method`, and a change-log reference (PR or State
Hub progress event).
- Supports answering who/what/why/when and "is this still used?".
**Acceptance criteria:**
- An entry records when it was last observed and by which method.
- A change to an entry is traceable to a PR or progress event.
### FR-12 — Feature-flag delegation
**Requirement:** Surfaces of kind `feature-flag` shall link to the authoritative
`feature-control` key and shall not duplicate its rules, resolver, or kill switches.
**Details:**
- `sources[]` points at the feature-control key; config-atlas records classification,
ownership, and relationships only.
**Acceptance criteria:**
- A `feature-flag` surface references a feature-control key.
- config-atlas contains no runtime flag-evaluation logic.
---
## 9. Non-functional requirements
### NFR-1 — Markdown- and agent-legible
Entries shall be markdown/YAML, diffable, and parseable by agents without bespoke
tooling.
### NFR-2 — Source-linked, never authoritative
The registry shall reference canonical sources and never become a second source of
truth for configuration values.
### NFR-3 — Read-first, no live values
The product shall function without read/write access to live values; it stores
metadata and references only.
### NFR-4 — Never stores secrets
No secret value shall ever be stored; secrets appear only as references.
### NFR-5 — Deterministic and explainable
Every winning value shall be attributable to a declared precedence/merge rule;
hidden last-writer-wins behavior is prohibited.
### NFR-6 — Low-friction contribution
Adding or updating a surface shall require only a single reviewable PR validated in
CI.
### NFR-7 — Federation compatible
Entry schema and ids shall remain compatible with `reuse-surface` federation and
validation.
### NFR-8 — Boundary-respecting
The product shall not implement capabilities owned by sister repos
(`../docs/ecosystem-boundaries.md`); overlaps are resolved by reference, not
reimplementation.
---
## 10. Data model / repository structure
Surface-entry shape (from `ArchitectureBlueprint.md` §3 — values intentionally
absent):
```yaml
id: surface.<domain>.<system>.<name>
name: Mail delivery batch sizing
kind: app-config | deploy-config | secret-ref | feature-flag |
policy | tenant-config | infra-state | runtime-override
summary: Controls max batch size for outbound mail delivery.
owner: platform-delivery # resolves to domain-tree identity
status: draft | active | deprecated
scope:
allowed_layers: [company, environment, installation, tenant]
default_layer: company
mutability: hot-reloadable
security_class: operational # operational | sensitive | secret-ref | policy
schema:
type: integer
default: 500
minimum: 1
maximum: 5000
validator: schemas/mail-delivery.schema.json
sources:
- { repo: railiance-platform, path: config/mail/delivery.yaml, role: company-baseline }
- { repo: railiance-platform, path: environments/prod.yaml, role: environment-overlay }
relations:
consumed_by: [service.mail-gateway]
overrides: []
depends_on_secret: [] # references only
related_to: [surface.platform.mail.rate-limit]
evidence:
last_seen: '2026-06-26'
discovery_method: connector:repo-scoping | manual
change_log_ref: <PR or State Hub progress event>
```
Repository layout extends the existing `registry/`:
```text
registry/
surfaces/ # per-surface markdown+yaml entries (surface.*)
indexes/ # surfaces.yaml index (+ existing capabilities.yaml)
schemas/ # surface-entry JSON Schema / CUE (Phase 0)
```
---
## 11. MVP proposal
### 11.1 MVP scope
- Surface-entry schema (the Canon) and the L0L9 + merge-rule model as a
machine-checkable doc.
- 1020 hand-authored entries for the highest-value Coulomb surfaces.
- CI validation (`reuse-surface validate` + schema + `git diff --check`).
- Replacement of the inherited `repo-template` registry artifact (`ATLAS-WP-0002`).
### 11.2 MVP non-scope
- Connectors, effective-config rendering, graph push, federation rollout.
- Any runtime resolution, delivery, or control.
### 11.3 MVP success criteria
A human or agent can, from the repo alone:
- find what configures a given high-value system, its owner, and where it lives;
- see which layers may set a key and which sources contribute;
- trust that entries are schema-valid and contain no values or secrets.
---
## 12. Roadmap
Maps to `ArchitectureBlueprint.md` §6:
- **Phase 0 — Canon (days):** surface-entry schema + scope/precedence/merge model;
replace inherited template artifact. *Exit:* one real entry validates in CI.
- **Phase 1 — Seed by hand (12 weeks):** 1020 entries; CI validation live.
- **Phase 2 — First connectors (24 weeks):** reuse `repo-scoping` facts; candidate-PR
workflow; surface stale/unowned config.
- **Phase 3 — Explain & graph (4+ weeks):** render `config explain`; push config-typed
edges to the State Hub.
- **Deferred (out of scope):** live resolution, controlled change, approval
workflows, rollout/rollback — owned by downstream systems.
---
## 13. Risks & mitigations
| Risk | Impact | Mitigation |
|---|---|---|
| Scope creep into a runtime resolver / kill switches | Collision with `feature-control`; boundary erosion | Hard non-goal (FR-12, §4.2); `feature-flag` links out, no eval logic |
| Becoming a second source of truth for values | Drift, stale data, trust loss | No value fields (FR-4, NFR-2); source-linked only |
| Rebuilding discovery instead of reusing repo-scoping | Duplicated, divergent scanners | Connectors consume repo-scoping facts (FR-8; ecosystem-boundaries §2.4) |
| Id-namespace collision with reuse-surface | Federation conflicts | Reserve `surface.*` namespace (FR-10) |
| Inventing a third scope taxonomy | "Integration by interpretation" ITC exists to prevent | Express L0L9 as an ordering over shared vocab (§5) |
| Canon drift from InfoTechCanon | Terms diverge from the ecosystem | `docs/canon-mapping.md`; consume-don't-redefine (§5) |
---
## 14. Open questions
From `ArchitectureBlueprint.md` §7:
1. What is the minimum viable connector set to prove cross-tool effective-config
resolution end to end?
2. Can the entry schema carry enough provenance to render a full `config explain`
without becoming a value source of truth?
3. What is the canonical edge set for the configuration knowledge graph, and does it
reuse the State Hub's relationship model?
4. CUE vs JSON Schema for entry validation — does order-independent merge justify the
toolchain cost? (`../research/configuration-control-plane.md` §3.3)
5. Should "agent/model configuration" be a named scope class now, given the
LaunchDarkly AI-config trajectory?
---
## 15. Formal standards & authoritative sources
config-atlas has no single governing standard; it derives legitimacy from adjacent
standards and the ecosystem canon. Full citations in
`../research/sources.md`.
- **InfoTechCanon** — internal semantic canon; the vocabulary config-atlas maps to.
- **OpenFeature** — vendor-neutral feature-flag standard; the integration boundary
with `feature-control`.
- **JSON Schema** — declarative structure/constraint validation for entry schemas.
- **CUE** — order-independent unification for deterministic, explainable merge
(`../research/configuration-control-plane.md` §3.3).
- **The Twelve-Factor App (Config)** — separate config from code; kind separation.
- **Kustomize / Helm / NixOS** — the base+overlay layering pattern config-atlas maps.
- **InfoQ — Configuration as a Control Plane** — the category framing (problem,
blast-radius/rollback safety patterns).
---
## 16. Related concepts
Condensed from `../wiki/CompetitiveLandscape.md`:
- **Configuration-as-data** (ConfigHub / KRM) — config as authoritative graph-shaped
data; config-atlas is discovery-first and cross-tool.
- **GitOps desired vs effective state** (Argo CD / Flux) — GitOps owns "desired
state"; config-atlas adds the "effective state" narrative.
- **Feature management** (LaunchDarkly / Unleash / OpenFeature) — one config *kind*;
delegated to `feature-control`.
- **Policy-as-code** (OPA / Kyverno / Checkov) — validation backends; config-atlas is
the context/evidence layer around them.
- **CMDB / SSPM** (ServiceNow / CoreView / AppOmni) — assets and SaaS posture;
config-atlas models layered behavioral config and integrates rather than replaces.
---
## Appendix: orientation map (descriptive, not prescriptive)
How config-atlas relates to adjacent product categories. Entry points for deeper
exploration, not competing definitions.
| Category | Core question it answers | config-atlas stance |
|---|---|---|
| Feature management | Can we change behavior safely at runtime? | Map flags as one kind; **integrate** (`feature-control`) |
| GitOps / IaC | Is desired state declared and reconciled? | Add effective-state map; **complement** |
| Secrets management | Are sensitive values protected? | Reference dependencies; **never store values** |
| Policy-as-code | Is this change allowed? | Provide context/evidence; **integrate as backend** |
| CMDB / developer portal | What assets/services exist and who owns them? | Enrich with config scope/ownership; **integrate** |
| SSPM | Is SaaS config secure? | Treat SaaS config as part of the surface; **integrate** |
| Config-as-data store | Where should config live authoritatively? | **Not** a store; the map/evidence layer over stores |
---
## Closing — guiding principle
> config-atlas is not where all configuration must live. It is where configuration
> becomes visible, explainable, governable, and safe to change.
</content>