generated from coulomb/repo-seed
Added specification files
This commit is contained in:
538
wiki/OpsCatalogSpecification.md
Normal file
538
wiki/OpsCatalogSpecification.md
Normal file
@@ -0,0 +1,538 @@
|
||||
OpsCatalogSpecification
|
||||
|
||||
*IT Operations Knowledge Repository*
|
||||
|
||||
Below is a **structured OpsCatalog specification** designed as an **extension to OpsBridge**.
|
||||
|
||||
It includes:
|
||||
|
||||
1. **Why / How / What introduction**
|
||||
2. **PRD for OpsCatalog**
|
||||
3. **FRS for OpsCatalog**
|
||||
4. **Schemas**
|
||||
5. **Repository structure**
|
||||
6. **Appendices with operational notes**
|
||||
|
||||
|
||||
---
|
||||
|
||||
# OpsCatalog Specification
|
||||
|
||||
*Operations Knowledge Repository for Infrastructure Operations*
|
||||
|
||||
Version: **0.1**
|
||||
Status: Draft
|
||||
Date: 2026-03-11
|
||||
|
||||
---
|
||||
|
||||
# Introduction
|
||||
|
||||
## Why
|
||||
|
||||
Modern infrastructure teams operate with two complementary models of reality.
|
||||
|
||||
**DevOps Model — Declared Infrastructure**
|
||||
|
||||
Infrastructure-as-code systems describe the desired state of systems:
|
||||
|
||||
* Terraform
|
||||
* Kubernetes manifests
|
||||
* Helm charts
|
||||
* GitOps pipelines
|
||||
|
||||
These systems encode **how infrastructure should behave**.
|
||||
|
||||
However, real systems rarely match the declared state perfectly.
|
||||
|
||||
Operations teams must deal with:
|
||||
|
||||
* incidents
|
||||
* degraded services
|
||||
* bottlenecks
|
||||
* debugging environments
|
||||
* manual recovery actions
|
||||
* temporary workarounds
|
||||
* unexpected interactions
|
||||
|
||||
This produces a second model.
|
||||
|
||||
**Operations Model — Experienced Infrastructure**
|
||||
|
||||
This model captures:
|
||||
|
||||
* how operators actually access systems
|
||||
* which debugging paths exist
|
||||
* where bottlenecks occur
|
||||
* which entry points are used for remediation
|
||||
* which bridges exist between infrastructure components
|
||||
|
||||
Most organizations lack a formal system for capturing this operational knowledge.
|
||||
|
||||
OpsCatalog exists to address this gap.
|
||||
|
||||
---
|
||||
|
||||
## How
|
||||
|
||||
OpsCatalog introduces a **structured repository for operations infrastructure knowledge**.
|
||||
|
||||
The repository is typically maintained in **Git** and contains structured definitions of:
|
||||
|
||||
* operations domains
|
||||
* infrastructure targets
|
||||
* operations access bridges
|
||||
* actor classes
|
||||
* operations annotations
|
||||
|
||||
OpsBridge consumes this catalog to:
|
||||
|
||||
* resolve bridges
|
||||
* orient operators
|
||||
* guide automation agents
|
||||
* provide operations context
|
||||
|
||||
Git provides several properties that make it suitable for this purpose:
|
||||
|
||||
* version history
|
||||
* collaborative editing
|
||||
* review workflows
|
||||
* diffability for humans and agents
|
||||
* narrative context through commit messages
|
||||
|
||||
OpsCatalog stores **experienced operations knowledge**, not runtime state.
|
||||
|
||||
---
|
||||
|
||||
## What
|
||||
|
||||
OpsCatalog defines a **shared operations map of infrastructure**.
|
||||
|
||||
It captures:
|
||||
|
||||
*Operations Domains*
|
||||
|
||||
Logical spaces representing operations infrastructure areas.
|
||||
|
||||
Examples:
|
||||
|
||||
* production clusters
|
||||
* staging environments
|
||||
* development infrastructure
|
||||
* incident analysis sandboxes
|
||||
|
||||
*Targets*
|
||||
|
||||
Infrastructure components relevant to operations.
|
||||
|
||||
Examples:
|
||||
|
||||
* hosts
|
||||
* services
|
||||
* containers
|
||||
* Kubernetes resources
|
||||
* debugging entry points
|
||||
|
||||
*Bridges*
|
||||
|
||||
Operations access paths between systems.
|
||||
|
||||
Examples:
|
||||
|
||||
* SSH reverse bridges
|
||||
* debugging entry tunnels
|
||||
* maintenance access paths
|
||||
|
||||
*Operations Notes*
|
||||
|
||||
Structured annotations describing:
|
||||
|
||||
* debugging procedures
|
||||
* common incidents
|
||||
* bottlenecks
|
||||
* known workarounds
|
||||
* operations entry points
|
||||
|
||||
Together these elements provide a **living operations topology**.
|
||||
|
||||
---
|
||||
|
||||
# Part 1 — Product Requirements Document (PRD)
|
||||
|
||||
## 1. Definition
|
||||
|
||||
OpsCatalog is a structured repository that defines **operations knowledge about infrastructure environments**, including domains, targets, bridges, and operations annotations.
|
||||
|
||||
It provides a shared operations map used by human operators and automation agents to understand how infrastructure is accessed and maintained in practice.
|
||||
|
||||
OpsCatalog complements infrastructure-as-code systems by capturing the **experienced operations topology** rather than the declared infrastructure state.
|
||||
|
||||
---
|
||||
|
||||
## 2. Context
|
||||
|
||||
OpsCatalog operates within environments that already use:
|
||||
|
||||
* infrastructure-as-code tools
|
||||
* automated deployment systems
|
||||
* identity management systems
|
||||
* operations monitoring platforms
|
||||
|
||||
These systems define and monitor infrastructure but often fail to capture how operators interact with systems during incidents or maintenance.
|
||||
|
||||
OpsCatalog fills this gap by providing a **structured operations cognition layer**.
|
||||
|
||||
OpsBridge integrates with OpsCatalog to translate catalog definitions into actionable access bridges.
|
||||
|
||||
---
|
||||
|
||||
## 3. Core Concepts
|
||||
|
||||
### Operations Domain
|
||||
|
||||
A logical operational boundary representing a group of related infrastructure systems.
|
||||
|
||||
Domains help operators navigate complex environments.
|
||||
|
||||
---
|
||||
|
||||
### Target
|
||||
|
||||
An operationally relevant infrastructure component that may be inspected or accessed.
|
||||
|
||||
Targets represent entry points for diagnostics and maintenance.
|
||||
|
||||
---
|
||||
|
||||
### Bridge
|
||||
|
||||
A defined operations access path enabling connectivity between infrastructure contexts.
|
||||
|
||||
Bridges describe **how targets are accessed**.
|
||||
|
||||
---
|
||||
|
||||
### Actor Class
|
||||
|
||||
A category of operators or automation systems that may interact with infrastructure.
|
||||
|
||||
Examples:
|
||||
|
||||
* human operators
|
||||
* remediation agents
|
||||
* incident responders
|
||||
|
||||
---
|
||||
|
||||
### Operations Annotation
|
||||
|
||||
Structured knowledge describing operations behaviors, known issues, or debugging strategies.
|
||||
|
||||
---
|
||||
|
||||
## 4. Scope and Non-Scope
|
||||
|
||||
### In Scope
|
||||
|
||||
OpsCatalog defines:
|
||||
|
||||
* operations domains
|
||||
* infrastructure targets
|
||||
* operations bridges
|
||||
* actor classifications
|
||||
* operations annotations
|
||||
* repository structure for catalog storage
|
||||
|
||||
---
|
||||
|
||||
### Out of Scope
|
||||
|
||||
OpsCatalog does not:
|
||||
|
||||
* manage infrastructure resources
|
||||
* maintain runtime infrastructure state
|
||||
* replace monitoring systems
|
||||
* replace configuration management systems
|
||||
* enforce security policies
|
||||
* store credentials or secrets
|
||||
|
||||
These responsibilities remain with external systems.
|
||||
|
||||
---
|
||||
|
||||
## 5. Practical Implications
|
||||
|
||||
OpsCatalog provides several operations advantages.
|
||||
|
||||
### Shared operations knowledge
|
||||
|
||||
Teams maintain a common understanding of infrastructure access paths.
|
||||
|
||||
### Improved incident response
|
||||
|
||||
Operators can quickly locate operations entry points.
|
||||
|
||||
### Automation enablement
|
||||
|
||||
AI agents and automation systems gain structured knowledge about infrastructure navigation.
|
||||
|
||||
### Organizational resilience
|
||||
|
||||
Operations knowledge becomes versioned and reviewable rather than implicit.
|
||||
|
||||
However, maintaining the catalog requires:
|
||||
|
||||
* operations discipline
|
||||
* periodic review
|
||||
* integration with infrastructure evolution
|
||||
|
||||
---
|
||||
|
||||
## 6. External Dependencies
|
||||
|
||||
OpsCatalog assumes integration with several external systems.
|
||||
|
||||
Examples include:
|
||||
|
||||
* infrastructure-as-code platforms
|
||||
* operations access tools such as OpsBridge
|
||||
* identity systems such as privacyIDEA
|
||||
* version control systems such as Git
|
||||
|
||||
---
|
||||
|
||||
## 7. Success Criteria
|
||||
|
||||
OpsCatalog is successful if it enables operators and automation agents to:
|
||||
|
||||
* locate relevant infrastructure targets quickly
|
||||
* identify operations access paths
|
||||
* understand operations context during incidents
|
||||
* maintain shared operations knowledge across teams
|
||||
|
||||
---
|
||||
|
||||
# Part 2 — Functional Requirements Specification (FRS)
|
||||
|
||||
## 1. Domain Management
|
||||
|
||||
### FR-1 Domain Definition
|
||||
|
||||
The system shall allow definition of operations domains.
|
||||
|
||||
### FR-2 Domain Listing
|
||||
|
||||
The system shall allow retrieval of all defined domains.
|
||||
|
||||
### FR-3 Domain Inspection
|
||||
|
||||
The system shall allow inspection of a specific domain and its associated elements.
|
||||
|
||||
---
|
||||
|
||||
## 2. Target Management
|
||||
|
||||
### FR-4 Target Definition
|
||||
|
||||
The system shall allow definition of infrastructure targets within domains.
|
||||
|
||||
### FR-5 Target Query
|
||||
|
||||
The system shall allow retrieval of targets belonging to a domain.
|
||||
|
||||
### FR-6 Target Inspection
|
||||
|
||||
The system shall allow inspection of metadata associated with a target.
|
||||
|
||||
---
|
||||
|
||||
## 3. Bridge Definition
|
||||
|
||||
### FR-7 Bridge Definition
|
||||
|
||||
The system shall allow definition of operations bridges connecting infrastructure contexts.
|
||||
|
||||
### FR-8 Bridge Query
|
||||
|
||||
The system shall allow retrieval of bridges associated with a target or domain.
|
||||
|
||||
### FR-9 Bridge Inspection
|
||||
|
||||
The system shall allow inspection of bridge metadata.
|
||||
|
||||
---
|
||||
|
||||
## 4. Actor Classification
|
||||
|
||||
### FR-10 Actor Class Definition
|
||||
|
||||
The system shall allow definition of actor classes.
|
||||
|
||||
### FR-11 Actor Attribution
|
||||
|
||||
The system shall allow bridges to reference actor classes.
|
||||
|
||||
---
|
||||
|
||||
## 5. Operational Annotations
|
||||
|
||||
### FR-12 Operational Notes
|
||||
|
||||
The system shall allow structured annotations associated with domains, targets, and bridges.
|
||||
|
||||
### FR-13 Annotation Retrieval
|
||||
|
||||
The system shall allow retrieval of annotations associated with infrastructure elements.
|
||||
|
||||
---
|
||||
|
||||
## 6. Repository Interaction
|
||||
|
||||
### FR-14 Catalog Retrieval
|
||||
|
||||
The system shall load catalog data from a repository structure.
|
||||
|
||||
### FR-15 Catalog Validation
|
||||
|
||||
The system shall validate the structure of catalog definitions.
|
||||
|
||||
---
|
||||
|
||||
# Schemas
|
||||
|
||||
Example schemas are expressed in YAML.
|
||||
|
||||
---
|
||||
|
||||
## Domain Schema
|
||||
|
||||
```yaml
|
||||
type: domain
|
||||
id: coulombcore
|
||||
name: CoulombCore Infrastructure
|
||||
description: Core infrastructure domain for operational services
|
||||
environment: production
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Target Schema
|
||||
|
||||
```yaml
|
||||
type: target
|
||||
id: state-hub
|
||||
domain: coulombcore
|
||||
kind: service
|
||||
description: Infrastructure state coordination service
|
||||
reachable_via:
|
||||
- state-hub-coulombcore
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Bridge Schema
|
||||
|
||||
```yaml
|
||||
type: bridge
|
||||
id: state-hub-coulombcore
|
||||
domain: coulombcore
|
||||
target: state-hub
|
||||
description: Operations bridge for state hub diagnostics
|
||||
access_method: ssh-reverse
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Actor Schema
|
||||
|
||||
```yaml
|
||||
type: actor
|
||||
id: agent.claude-remediator
|
||||
class: automation
|
||||
description: Automated remediation agent
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
# Repository Structure
|
||||
|
||||
Recommended repository layout:
|
||||
|
||||
```
|
||||
opscatalog/
|
||||
domains/
|
||||
coulombcore/
|
||||
domain.yaml
|
||||
|
||||
targets/
|
||||
state-hub.yaml
|
||||
api-server.yaml
|
||||
|
||||
bridges/
|
||||
state-hub-coulombcore.yaml
|
||||
|
||||
docs/
|
||||
overview.md
|
||||
operations.md
|
||||
|
||||
actors/
|
||||
human-operators.yaml
|
||||
automation-agents.yaml
|
||||
|
||||
schemas/
|
||||
domain.schema.yaml
|
||||
target.schema.yaml
|
||||
bridge.schema.yaml
|
||||
```
|
||||
|
||||
This layout supports both human readability and machine parsing.
|
||||
|
||||
---
|
||||
|
||||
# Appendices
|
||||
|
||||
## Appendix A — Operations Notes
|
||||
|
||||
Operations notes provide context about real-world infrastructure behavior.
|
||||
|
||||
Examples include:
|
||||
|
||||
* known debugging entry points
|
||||
* typical failure modes
|
||||
* operational shortcuts
|
||||
* historical incidents
|
||||
* recommended inspection procedures
|
||||
|
||||
Operations notes may be written in structured markdown files stored alongside catalog entries.
|
||||
|
||||
---
|
||||
|
||||
## Appendix B — Catalog Maintenance Guidelines
|
||||
|
||||
Maintaining an effective OpsCatalog requires operational discipline.
|
||||
|
||||
Recommended practices include:
|
||||
|
||||
* review changes through pull requests
|
||||
* annotate bridges with operational purpose
|
||||
* update catalog entries after major infrastructure changes
|
||||
* document common debugging procedures
|
||||
* avoid storing secrets in catalog files
|
||||
|
||||
---
|
||||
|
||||
## Appendix C — Relationship to OpsBridge
|
||||
|
||||
OpsCatalog serves as a **knowledge source for OpsBridge**.
|
||||
|
||||
OpsBridge may consume catalog data to:
|
||||
|
||||
* resolve bridge identifiers
|
||||
* display infrastructure orientation
|
||||
* assist operators in establishing bridges
|
||||
* provide contextual operational information
|
||||
|
||||
The catalog does not control runtime behavior but provides **structured operations intent**.
|
||||
|
||||
|
||||
xxx
|
||||
Reference in New Issue
Block a user