Files
ops-bridge/wiki/OpsCatalogSpecification.md
2026-03-11 21:29:59 +01:00

539 lines
11 KiB
Markdown

OpsCatalogSpecification
*IT Operations Knowledge Repository*
Below is a **structured OpsCatalog specification** designed as an **extension to OpsBridge**.
It includes:
1. **Why / How / What introduction**
2. **PRD for OpsCatalog**
3. **FRS for OpsCatalog**
4. **Schemas**
5. **Repository structure**
6. **Appendices with operational notes**
---
# OpsCatalog Specification
*Operations Knowledge Repository for Infrastructure Operations*
Version: **0.1**
Status: Draft
Date: 2026-03-11
---
# Introduction
## Why
Modern infrastructure teams operate with two complementary models of reality.
**DevOps Model — Declared Infrastructure**
Infrastructure-as-code systems describe the desired state of systems:
* Terraform
* Kubernetes manifests
* Helm charts
* GitOps pipelines
These systems encode **how infrastructure should behave**.
However, real systems rarely match the declared state perfectly.
Operations teams must deal with:
* incidents
* degraded services
* bottlenecks
* debugging environments
* manual recovery actions
* temporary workarounds
* unexpected interactions
This produces a second model.
**Operations Model — Experienced Infrastructure**
This model captures:
* how operators actually access systems
* which debugging paths exist
* where bottlenecks occur
* which entry points are used for remediation
* which bridges exist between infrastructure components
Most organizations lack a formal system for capturing this operational knowledge.
OpsCatalog exists to address this gap.
---
## How
OpsCatalog introduces a **structured repository for operations infrastructure knowledge**.
The repository is typically maintained in **Git** and contains structured definitions of:
* operations domains
* infrastructure targets
* operations access bridges
* actor classes
* operations annotations
OpsBridge consumes this catalog to:
* resolve bridges
* orient operators
* guide automation agents
* provide operations context
Git provides several properties that make it suitable for this purpose:
* version history
* collaborative editing
* review workflows
* diffability for humans and agents
* narrative context through commit messages
OpsCatalog stores **experienced operations knowledge**, not runtime state.
---
## What
OpsCatalog defines a **shared operations map of infrastructure**.
It captures:
*Operations Domains*
Logical spaces representing operations infrastructure areas.
Examples:
* production clusters
* staging environments
* development infrastructure
* incident analysis sandboxes
*Targets*
Infrastructure components relevant to operations.
Examples:
* hosts
* services
* containers
* Kubernetes resources
* debugging entry points
*Bridges*
Operations access paths between systems.
Examples:
* SSH reverse bridges
* debugging entry tunnels
* maintenance access paths
*Operations Notes*
Structured annotations describing:
* debugging procedures
* common incidents
* bottlenecks
* known workarounds
* operations entry points
Together these elements provide a **living operations topology**.
---
# Part 1 — Product Requirements Document (PRD)
## 1. Definition
OpsCatalog is a structured repository that defines **operations knowledge about infrastructure environments**, including domains, targets, bridges, and operations annotations.
It provides a shared operations map used by human operators and automation agents to understand how infrastructure is accessed and maintained in practice.
OpsCatalog complements infrastructure-as-code systems by capturing the **experienced operations topology** rather than the declared infrastructure state.
---
## 2. Context
OpsCatalog operates within environments that already use:
* infrastructure-as-code tools
* automated deployment systems
* identity management systems
* operations monitoring platforms
These systems define and monitor infrastructure but often fail to capture how operators interact with systems during incidents or maintenance.
OpsCatalog fills this gap by providing a **structured operations cognition layer**.
OpsBridge integrates with OpsCatalog to translate catalog definitions into actionable access bridges.
---
## 3. Core Concepts
### Operations Domain
A logical operational boundary representing a group of related infrastructure systems.
Domains help operators navigate complex environments.
---
### Target
An operationally relevant infrastructure component that may be inspected or accessed.
Targets represent entry points for diagnostics and maintenance.
---
### Bridge
A defined operations access path enabling connectivity between infrastructure contexts.
Bridges describe **how targets are accessed**.
---
### Actor Class
A category of operators or automation systems that may interact with infrastructure.
Examples:
* human operators
* remediation agents
* incident responders
---
### Operations Annotation
Structured knowledge describing operations behaviors, known issues, or debugging strategies.
---
## 4. Scope and Non-Scope
### In Scope
OpsCatalog defines:
* operations domains
* infrastructure targets
* operations bridges
* actor classifications
* operations annotations
* repository structure for catalog storage
---
### Out of Scope
OpsCatalog does not:
* manage infrastructure resources
* maintain runtime infrastructure state
* replace monitoring systems
* replace configuration management systems
* enforce security policies
* store credentials or secrets
These responsibilities remain with external systems.
---
## 5. Practical Implications
OpsCatalog provides several operations advantages.
### Shared operations knowledge
Teams maintain a common understanding of infrastructure access paths.
### Improved incident response
Operators can quickly locate operations entry points.
### Automation enablement
AI agents and automation systems gain structured knowledge about infrastructure navigation.
### Organizational resilience
Operations knowledge becomes versioned and reviewable rather than implicit.
However, maintaining the catalog requires:
* operations discipline
* periodic review
* integration with infrastructure evolution
---
## 6. External Dependencies
OpsCatalog assumes integration with several external systems.
Examples include:
* infrastructure-as-code platforms
* operations access tools such as OpsBridge
* identity systems such as privacyIDEA
* version control systems such as Git
---
## 7. Success Criteria
OpsCatalog is successful if it enables operators and automation agents to:
* locate relevant infrastructure targets quickly
* identify operations access paths
* understand operations context during incidents
* maintain shared operations knowledge across teams
---
# Part 2 — Functional Requirements Specification (FRS)
## 1. Domain Management
### FR-1 Domain Definition
The system shall allow definition of operations domains.
### FR-2 Domain Listing
The system shall allow retrieval of all defined domains.
### FR-3 Domain Inspection
The system shall allow inspection of a specific domain and its associated elements.
---
## 2. Target Management
### FR-4 Target Definition
The system shall allow definition of infrastructure targets within domains.
### FR-5 Target Query
The system shall allow retrieval of targets belonging to a domain.
### FR-6 Target Inspection
The system shall allow inspection of metadata associated with a target.
---
## 3. Bridge Definition
### FR-7 Bridge Definition
The system shall allow definition of operations bridges connecting infrastructure contexts.
### FR-8 Bridge Query
The system shall allow retrieval of bridges associated with a target or domain.
### FR-9 Bridge Inspection
The system shall allow inspection of bridge metadata.
---
## 4. Actor Classification
### FR-10 Actor Class Definition
The system shall allow definition of actor classes.
### FR-11 Actor Attribution
The system shall allow bridges to reference actor classes.
---
## 5. Operational Annotations
### FR-12 Operational Notes
The system shall allow structured annotations associated with domains, targets, and bridges.
### FR-13 Annotation Retrieval
The system shall allow retrieval of annotations associated with infrastructure elements.
---
## 6. Repository Interaction
### FR-14 Catalog Retrieval
The system shall load catalog data from a repository structure.
### FR-15 Catalog Validation
The system shall validate the structure of catalog definitions.
---
# Schemas
Example schemas are expressed in YAML.
---
## Domain Schema
```yaml
type: domain
id: coulombcore
name: CoulombCore Infrastructure
description: Core infrastructure domain for operational services
environment: production
```
---
## Target Schema
```yaml
type: target
id: state-hub
domain: coulombcore
kind: service
description: Infrastructure state coordination service
reachable_via:
- state-hub-coulombcore
```
---
## Bridge Schema
```yaml
type: bridge
id: state-hub-coulombcore
domain: coulombcore
target: state-hub
description: Operations bridge for state hub diagnostics
access_method: ssh-reverse
```
---
## Actor Schema
```yaml
type: actor
id: agent.claude-remediator
class: automation
description: Automated remediation agent
```
---
# Repository Structure
Recommended repository layout:
```
opscatalog/
domains/
coulombcore/
domain.yaml
targets/
state-hub.yaml
api-server.yaml
bridges/
state-hub-coulombcore.yaml
docs/
overview.md
operations.md
actors/
human-operators.yaml
automation-agents.yaml
schemas/
domain.schema.yaml
target.schema.yaml
bridge.schema.yaml
```
This layout supports both human readability and machine parsing.
---
# Appendices
## Appendix A — Operations Notes
Operations notes provide context about real-world infrastructure behavior.
Examples include:
* known debugging entry points
* typical failure modes
* operational shortcuts
* historical incidents
* recommended inspection procedures
Operations notes may be written in structured markdown files stored alongside catalog entries.
---
## Appendix B — Catalog Maintenance Guidelines
Maintaining an effective OpsCatalog requires operational discipline.
Recommended practices include:
* review changes through pull requests
* annotate bridges with operational purpose
* update catalog entries after major infrastructure changes
* document common debugging procedures
* avoid storing secrets in catalog files
---
## Appendix C — Relationship to OpsBridge
OpsCatalog serves as a **knowledge source for OpsBridge**.
OpsBridge may consume catalog data to:
* resolve bridge identifiers
* display infrastructure orientation
* assist operators in establishing bridges
* provide contextual operational information
The catalog does not control runtime behavior but provides **structured operations intent**.
xxx