Files

tegwick 05fa73e20f docs: add config-atlas Product Requirements Document

Add specs/ProductRequirementsDocument.md: hybrid product PRD (sister-repo
skeleton plus the template's Formal Standards / Related Concepts /
Appendix sections), heavy FR/NFR with Requirement/Details/Acceptance
triplets, Canon Alignment, 12 functional + 8 non-functional requirements,
conceptual model, MVP, roadmap, risks, and orientation-map appendix.
Substance traces to INTENT, ArchitectureBlueprint, ecosystem-boundaries,
and the research digest; no scope invented beyond repo-boundary.

Fix relative links broken by the ArchitectureBlueprint.md move into
specs/ (its own INTENT/SCOPE/research links and the ecosystem-boundaries
back-reference).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-26 22:37:06 +02:00

25 KiB

Raw Blame History

config-atlas — Product Requirements Document

Status: Draft v0.1 Date: 2026-06-26 Owner: config-atlas initiative Primary integration standard: reuse-surface federation (capability registry model) Terminology alignment: InfoTechCanon-compatible; extends InfoTechCanon only where a configuration surface requires precision ITC does not yet provide. See ../docs/canon-mapping.md (planned) and ../docs/ecosystem-boundaries.md. Companion artifacts: ArchitectureBlueprint.md, ../INTENT.md, ../research/configuration-control-plane.md. Relevant workplan: ATLAS-WP-0002.

This PRD defines what config-atlas must achieve and under which constraints. It is implementation-independent; the how lives in ArchitectureBlueprint.md.

1. Product summary

config-atlas is the read-first, cross-kind configuration map and evidence layer for fast-moving, multi-repo, multi-tenant software landscapes. It treats each configuration surface — a bounded, named place where configuration is defined, read, or overridden — as a first-class, registry-backed entry with ownership, scope, validation hooks, and source links.

The product answers four questions an operator or agent cannot answer today without tribal knowledge:

What configuration exists for a repo, capability, or deployment context?
Who owns it and where is the source of truth?
What are the safe defaults and precedence rules?
Which other surfaces depend on, override, or are affected by it?

config-atlas is not where configuration lives and not a runtime engine. It is where the distributed configuration surface becomes visible, explainable, governable, and safe to change. (../INTENT.md, ../wiki/ProductVision.md)

2. Problem statement

Configuration is distributed control information: the live mechanism that changes how systems behave, often faster and with less ceremony than a code deploy. As cloud-native scale grew, configuration became the dominant operational failure mode — a disproportionate share of large 2024–2026 incidents trace to a configuration change, not a code defect (CrowdStrike, AT&T, Cloudflare, Azure; ../research/configuration-control-plane.md §2).

Yet configuration knowledge is scattered across repos, manifests, environment variables, feature-flag platforms, policy files, secret managers, and operator runbooks. Teams and agents rediscover the same surfaces repeatedly and cannot reason confidently about defaults, precedence, or ownership. Existing tools manage the configuration they own; few discover configuration across tools, and fewer can resolve and explain the effective value that actually applies.

The product thesis: map the territory before governing it. A company must first see its configuration surface — discover it, classify it by kind and scope, attribute ownership, and attach evidence — before any safe-change ambition is credible.

3. Goals

G1 — Discoverable configuration surface

The product shall make every configuration surface that matters to reuse or operations discoverable from a single, source-linked registry.

G2 — Effective-configuration explainability

The product shall make it possible to explain, for a key, which layer won, what it overrode, which validating schema applied, and who owns it — without reading live values.

G3 — Ownership and scope clarity

The product shall attribute every surface to an owner and a scope, resolving ownership against domain-tree rather than inventing a private org model.

G4 — Map before control

The product shall deliver read-first configuration intelligence (discover, classify, attribute, explain) and shall not require write access to any production configuration system.

G5 — Ecosystem reuse over reinvention

The product shall reuse sister-repo capabilities — reuse-surface (schema, validation, federation), repo-scoping (scanning/candidate/approval), info-tech-canon (vocabulary), the State Hub (graph/evidence) — rather than duplicating them (../docs/ecosystem-boundaries.md).

G6 — No secret exposure

The product shall never store secret values; secrets appear only as references.

G7 — Deterministic, explainable merge semantics

The product shall represent layer precedence and merge rules explicitly so that a winning value is always attributable to a declared rule, never to hidden last-writer-wins behavior.

G8 — Federation compatibility

The product shall participate in reuse-surface federation as a typed registry peer, with a reserved id namespace, so config surfaces interoperate with the broader capability surface without colliding.

4. Non-goals

The product shall not:

Build a runtime configuration resolver, delivery engine, or control plane — resolution/delivery/control are delegated downstream (ArchitectureBlueprint.md §1; ../research/configuration-control-plane.md §5).
Own the runtime resolution or control of feature availability, including feature resolvers and kill switches — that is feature-control's plane; config surfaces of kind feature-flag link to it and never re-derive it.
Store secret values or live, environment-specific configuration values (OpenBao / railiance-platform own values).
Become a second source of truth for configuration values; entries point at canonical sources.
Replace sister repos: info-tech-canon (vocabulary), repo-scoping (scanning), domain-tree (placement/ownership identity), reuse-surface (registry/federation), state-hub (graph/identity store), repo-seed (template).
Define the configuration vocabulary itself — it maps to InfoTechCanon.

5. Canon Alignment & Terminology

config-atlas conforms to InfoTechCanon (ITC) where possible and consumes, rather than redefines, established concepts. The authoritative mapping is docs/canon-mapping.md (planned, mirroring feature-control's pattern). Summary of ownership boundaries:

Concept area	Source	config-atlas relationship
Policy, decision, evidence, control	ITC-GOV	Consume — reuse governance vocabulary for evidence/audit
Schema, data contract, classification	ITC-DATA	Consume — surface `schema`/`security_class` reference these
Delivery flow, mutability, environments	ITC-DEVSECOPS	Consume — `mutability` class derives from delivery stages
Environment, deployment, service, repository	ITC-LAND	Consume — scope/source identifiers
Actor / agent / ownership identity	ITC-ORG via `domain-tree`	Reference — `owner` resolves to domain-tree bindings
Feature availability, evaluation scope	`feature-control` (`EvaluationScope`)	Align — share one scope vocabulary; link, do not re-derive
Configuration surface entry	config-atlas	Own
Layering order (L0–L9) over the shared scope vocabulary	config-atlas	Own (an ordering, not new scope names)
The cross-kind effective-config map	config-atlas	Own

Terminology rule (per ITC "import concepts instead of redefining them"): the L0–L9 layer model is expressed as an ordering over the shared ITC/feature-control scope vocabulary, not a competing set of scope names. New terms genuinely original to config-atlas — configuration surface, effective-config path — are proposed to ITC as extensions via the canon mapping.

6. Users & stakeholders

Agents are first-class consumers, not an afterthought.

Stakeholder	Needs
Platform engineer	Find what configures a system, its defaults, precedence, and owner without reading every repo
SRE / incident commander	During an incident, see which surfaces affect a service and what recently changed
Security / compliance owner	Audit configuration ownership, secret references, and change evidence across the company
Tenant / installation admin	Understand which settings are tenant-overridable vs non-overridable guardrails
Product owner	See entitlements and feature surfaces as part of one configuration picture
Coding agent	Orient on a repo's configuration surface from markdown/YAML without bespoke tooling
Architect	Reason about cross-repo configuration relationships, drift, and blast radius
Configuration-surface owner	Declare, document, and maintain the surfaces they are accountable for

7. Conceptual model

7.1 Entities

Entity	Meaning	Canon mapping
Configuration surface	A bounded, named place where config is defined/read/overridden	Owned (proposed ITC extension)
Kind	Class of surface: app-config, deploy-config, secret-ref, feature-flag, policy, tenant-config, infra-state, runtime-override	ITC-DATA / ITC-GOV / ITC-LAND
Scope / layer	Dimension where a value may be set (company, environment, tenant, …)	ITC-LAND + feature-control `EvaluationScope`
Effective configuration	The resolved value that actually applies for a context	Owned (resolution delegated; path owned)
Source	A canonical file/API contributing a value at a given layer role	ITC-LAND / ITC-DEVSECOPS
Merge semantics	Declared rule for combining layer contributions	Owned
Mutability class	build / deploy / startup / hot / per-request / emergency	ITC-DEVSECOPS
Evidence	last-seen, change log, drift, who/what/why/when	ITC-GOV.Evidence
Relationship / edge	consumed_by, overrides, depends_on_secret, related_to	ITC-GOV + State Hub graph

7.2 Layering order and merge rules

The effective configuration is composed from ordered scopes (from ../wiki/ConfigLayering.md and ArchitectureBlueprint.md §3):

L0 vendor/product defaults   L5 installation/deployment overlay
L1 company baseline          L6 tenant/customer/community overlay
L2 platform/domain baseline  L7 group/role overlay
L3 environment overlay       L8 user/agent/workload overlay
L4 region/zone/cluster       L9 emergency/runtime override

"More specific wins" by default; higher layers may declare non-overridable guardrails. Merge rules are explicit, never implicit:

scalar     more specific layer replaces earlier value
object/map deep merge by key
array/list replace by default; keyed merge only if declared
null       not deletion unless tombstone semantics are defined
secret     never merged into normal config
policy     restrictive rule wins unless explicitly delegated

8. Functional requirements

FR-1 — Configuration surface registry

Requirement: The product shall provide a markdown/YAML registry of configuration-surface entries, each with a stable id. Details:

Entry id namespace surface.<domain>.<system>.<name>.
Modeled as a typed sibling of the reuse-surface capability entry.
One file per surface plus a YAML index, mirroring registry/. Acceptance criteria:
A new surface can be added as a single reviewable file + index row.
Each entry has a unique, stable id validated in CI.

FR-2 — Kind taxonomy

Requirement: Every surface shall declare a kind from a closed taxonomy. Details:

Kinds: app-config, deploy-config, secret-ref, feature-flag, policy, tenant-config, infra-state, runtime-override.
kind drives kind-separation: secrets, flags, and infra-state are never treated as ordinary config. Acceptance criteria:
An entry with an unknown kind fails validation.
Reports can filter and group surfaces by kind.

FR-3 — Scope / layer model

Requirement: Each surface shall declare which layers may set it and a default layer, using the shared scope vocabulary. Details:

scope.allowed_layers is a subset of the L0–L9 ordering.
Layer names align with ITC / feature-control EvaluationScope; no new scope names are introduced. Acceptance criteria:
A surface can declare, e.g., allowed_layers: [company, environment, tenant].
An override proposed at a disallowed layer is flagged.

FR-4 — Source linking without values

Requirement: Each surface shall reference its canonical sources by location and layer role, and shall not inline live or secret values. Details:

sources[] carries repo, path/endpoint, and role (the contributed layer).
No value fields exist in the schema. Acceptance criteria:
An entry records two or more sources with distinct layer roles.
CI rejects any entry that embeds a literal configuration value or secret.

FR-5 — Effective-config explain rendering

Requirement: The product shall render an effective-config path for a key from its layered source links, statically, without reading live values. Details:

Output names the winning source layer, what it overrode, the validating schema, and the owner (the config explain shape in ../wiki/ConfigLayering.md).
Resolution of actual values is out of scope; only the path is owned. Acceptance criteria:
Given a surface with ordered sources, the product emits an ordered override path with owner and validator references.

FR-6 — Ownership resolution

Requirement: Every surface shall have an owner, resolved against domain-tree bindings rather than a private ownership model. Details:

owner references a team/agent identity, not a person.
Placement/relevance defers to domain-tree primary/secondary bindings. Acceptance criteria:
An entry without an owner fails validation.
Owner references resolve to known domain-tree identities (or are flagged unknown).

FR-7 — Relationship / edge model

Requirement: The product shall record cross-surface relationships and contribute them as config-typed edges to the State Hub graph. Details:

Relations: consumed_by, overrides, depends_on_secret (reference only), related_to.
config-atlas owns the config semantics of each edge; the State Hub stores topology. Acceptance criteria:
A surface can declare consumers and secret dependencies by reference.
Declared edges are expressible to the State Hub without duplicating its store.

FR-8 — Read-only discovery connectors

Requirement: The product shall support read-only connectors that emit candidate surface entries for human/agent review, reusing repo-scoping's scanner→candidate→approval workflow. Details:

Connectors are stateless and never write live systems or auto-merge.
Candidate source is repo-scoping observed facts where available, with config-kind classification added on top.
Pipeline: connector → candidate YAML → PR → validate → merge. Acceptance criteria:
A connector run produces candidate entries that enter via PR review.
No connector mutates any source system.

FR-9 — Validation

Requirement: Every entry shall be schema-validated in CI via reuse-surface validate plus a surface-entry schema (JSON Schema or CUE). Details:

Validation covers id uniqueness, kind, scope, owner presence, and absence of values/secrets.
git diff --check runs on every change. Acceptance criteria:
A malformed entry blocks merge.
CI passes on a well-formed seed entry.

FR-10 — Federation as a typed sibling

Requirement: The product shall federate under reuse-surface as a registry peer with a reserved surface.* id namespace. Details:

The configuration-surface entry is a typed sibling of the capability entry, not a new federation mechanism.
The surface.* namespace is reserved in the reuse-surface federation roster. Acceptance criteria:
config-atlas entries are discoverable through reuse-surface federation.
No id collision occurs between capability and surface registries.

FR-11 — Evidence and audit

Requirement: Each surface shall carry discovery and change evidence. Details:

evidence.last_seen, discovery_method, and a change-log reference (PR or State Hub progress event).
Supports answering who/what/why/when and "is this still used?". Acceptance criteria:
An entry records when it was last observed and by which method.
A change to an entry is traceable to a PR or progress event.

FR-12 — Feature-flag delegation

Requirement: Surfaces of kind feature-flag shall link to the authoritative feature-control key and shall not duplicate its rules, resolver, or kill switches. Details:

sources[] points at the feature-control key; config-atlas records classification, ownership, and relationships only. Acceptance criteria:
A feature-flag surface references a feature-control key.
config-atlas contains no runtime flag-evaluation logic.

9. Non-functional requirements

NFR-1 — Markdown- and agent-legible

Entries shall be markdown/YAML, diffable, and parseable by agents without bespoke tooling.

NFR-2 — Source-linked, never authoritative

The registry shall reference canonical sources and never become a second source of truth for configuration values.

NFR-3 — Read-first, no live values

The product shall function without read/write access to live values; it stores metadata and references only.

NFR-4 — Never stores secrets

No secret value shall ever be stored; secrets appear only as references.

NFR-5 — Deterministic and explainable

Every winning value shall be attributable to a declared precedence/merge rule; hidden last-writer-wins behavior is prohibited.

NFR-6 — Low-friction contribution

Adding or updating a surface shall require only a single reviewable PR validated in CI.

NFR-7 — Federation compatible

Entry schema and ids shall remain compatible with reuse-surface federation and validation.

NFR-8 — Boundary-respecting

The product shall not implement capabilities owned by sister repos (../docs/ecosystem-boundaries.md); overlaps are resolved by reference, not reimplementation.

10. Data model / repository structure

Surface-entry shape (from ArchitectureBlueprint.md §3 — values intentionally absent):

id: surface.<domain>.<system>.<name>
name: Mail delivery batch sizing
kind: app-config | deploy-config | secret-ref | feature-flag |
      policy | tenant-config | infra-state | runtime-override
summary: Controls max batch size for outbound mail delivery.
owner: platform-delivery                 # resolves to domain-tree identity
status: draft | active | deprecated
scope:
  allowed_layers: [company, environment, installation, tenant]
  default_layer: company
mutability: hot-reloadable
security_class: operational              # operational | sensitive | secret-ref | policy
schema:
  type: integer
  default: 500
  minimum: 1
  maximum: 5000
  validator: schemas/mail-delivery.schema.json
sources:
  - { repo: railiance-platform, path: config/mail/delivery.yaml, role: company-baseline }
  - { repo: railiance-platform, path: environments/prod.yaml,    role: environment-overlay }
relations:
  consumed_by: [service.mail-gateway]
  overrides: []
  depends_on_secret: []                  # references only
  related_to: [surface.platform.mail.rate-limit]
evidence:
  last_seen: '2026-06-26'
  discovery_method: connector:repo-scoping | manual
  change_log_ref: <PR or State Hub progress event>

Repository layout extends the existing registry/:

registry/
  surfaces/        # per-surface markdown+yaml entries (surface.*)
  indexes/         # surfaces.yaml index (+ existing capabilities.yaml)
schemas/           # surface-entry JSON Schema / CUE (Phase 0)

11. MVP proposal

11.1 MVP scope

Surface-entry schema (the Canon) and the L0–L9 + merge-rule model as a machine-checkable doc.
10–20 hand-authored entries for the highest-value Coulomb surfaces.
CI validation (reuse-surface validate + schema + git diff --check).
Replacement of the inherited repo-template registry artifact (ATLAS-WP-0002).

11.2 MVP non-scope

Connectors, effective-config rendering, graph push, federation rollout.
Any runtime resolution, delivery, or control.

11.3 MVP success criteria

A human or agent can, from the repo alone:

find what configures a given high-value system, its owner, and where it lives;
see which layers may set a key and which sources contribute;
trust that entries are schema-valid and contain no values or secrets.

12. Roadmap

Maps to ArchitectureBlueprint.md §6:

Phase 0 — Canon (days): surface-entry schema + scope/precedence/merge model; replace inherited template artifact. Exit: one real entry validates in CI.
Phase 1 — Seed by hand (1–2 weeks): 10–20 entries; CI validation live.
Phase 2 — First connectors (2–4 weeks): reuse repo-scoping facts; candidate-PR workflow; surface stale/unowned config.
Phase 3 — Explain & graph (4+ weeks): render config explain; push config-typed edges to the State Hub.
Deferred (out of scope): live resolution, controlled change, approval workflows, rollout/rollback — owned by downstream systems.

13. Risks & mitigations

Risk	Impact	Mitigation
Scope creep into a runtime resolver / kill switches	Collision with `feature-control`; boundary erosion	Hard non-goal (FR-12, §4.2); `feature-flag` links out, no eval logic
Becoming a second source of truth for values	Drift, stale data, trust loss	No value fields (FR-4, NFR-2); source-linked only
Rebuilding discovery instead of reusing repo-scoping	Duplicated, divergent scanners	Connectors consume repo-scoping facts (FR-8; ecosystem-boundaries §2.4)
Id-namespace collision with reuse-surface	Federation conflicts	Reserve `surface.*` namespace (FR-10)
Inventing a third scope taxonomy	"Integration by interpretation" ITC exists to prevent	Express L0–L9 as an ordering over shared vocab (§5)
Canon drift from InfoTechCanon	Terms diverge from the ecosystem	`docs/canon-mapping.md`; consume-don't-redefine (§5)

14. Open questions

From ArchitectureBlueprint.md §7:

What is the minimum viable connector set to prove cross-tool effective-config resolution end to end?
Can the entry schema carry enough provenance to render a full config explain without becoming a value source of truth?
What is the canonical edge set for the configuration knowledge graph, and does it reuse the State Hub's relationship model?
CUE vs JSON Schema for entry validation — does order-independent merge justify the toolchain cost? (../research/configuration-control-plane.md §3.3)
Should "agent/model configuration" be a named scope class now, given the LaunchDarkly AI-config trajectory?

15. Formal standards & authoritative sources

config-atlas has no single governing standard; it derives legitimacy from adjacent standards and the ecosystem canon. Full citations in ../research/sources.md.

InfoTechCanon — internal semantic canon; the vocabulary config-atlas maps to.
OpenFeature — vendor-neutral feature-flag standard; the integration boundary with feature-control.
JSON Schema — declarative structure/constraint validation for entry schemas.
CUE — order-independent unification for deterministic, explainable merge (../research/configuration-control-plane.md §3.3).
The Twelve-Factor App (Config) — separate config from code; kind separation.
Kustomize / Helm / NixOS — the base+overlay layering pattern config-atlas maps.
InfoQ — Configuration as a Control Plane — the category framing (problem, blast-radius/rollback safety patterns).

Condensed from ../wiki/CompetitiveLandscape.md:

Configuration-as-data (ConfigHub / KRM) — config as authoritative graph-shaped data; config-atlas is discovery-first and cross-tool.
GitOps desired vs effective state (Argo CD / Flux) — GitOps owns "desired state"; config-atlas adds the "effective state" narrative.
Feature management (LaunchDarkly / Unleash / OpenFeature) — one config kind; delegated to feature-control.
Policy-as-code (OPA / Kyverno / Checkov) — validation backends; config-atlas is the context/evidence layer around them.
CMDB / SSPM (ServiceNow / CoreView / AppOmni) — assets and SaaS posture; config-atlas models layered behavioral config and integrates rather than replaces.

Appendix: orientation map (descriptive, not prescriptive)

How config-atlas relates to adjacent product categories. Entry points for deeper exploration, not competing definitions.

Category	Core question it answers	config-atlas stance
Feature management	Can we change behavior safely at runtime?	Map flags as one kind; integrate (`feature-control`)
GitOps / IaC	Is desired state declared and reconciled?	Add effective-state map; complement
Secrets management	Are sensitive values protected?	Reference dependencies; never store values
Policy-as-code	Is this change allowed?	Provide context/evidence; integrate as backend
CMDB / developer portal	What assets/services exist and who owns them?	Enrich with config scope/ownership; integrate
SSPM	Is SaaS config secure?	Treat SaaS config as part of the surface; integrate
Config-as-data store	Where should config live authoritatively?	Not a store; the map/evidence layer over stores

Closing — guiding principle

config-atlas is not where all configuration must live. It is where configuration becomes visible, explainable, governable, and safe to change.

25 KiB Raw Blame History Unescape Escape