generated from coulomb/repo-seed
539 lines
11 KiB
Markdown
539 lines
11 KiB
Markdown
OpsCatalogSpecification
|
|
|
|
*IT Operations Knowledge Repository*
|
|
|
|
Below is a **structured OpsCatalog specification** designed as an **extension to OpsBridge**.
|
|
|
|
It includes:
|
|
|
|
1. **Why / How / What introduction**
|
|
2. **PRD for OpsCatalog**
|
|
3. **FRS for OpsCatalog**
|
|
4. **Schemas**
|
|
5. **Repository structure**
|
|
6. **Appendices with operational notes**
|
|
|
|
|
|
---
|
|
|
|
# OpsCatalog Specification
|
|
|
|
*Operations Knowledge Repository for Infrastructure Operations*
|
|
|
|
Version: **0.1**
|
|
Status: Draft
|
|
Date: 2026-03-11
|
|
|
|
---
|
|
|
|
# Introduction
|
|
|
|
## Why
|
|
|
|
Modern infrastructure teams operate with two complementary models of reality.
|
|
|
|
**DevOps Model — Declared Infrastructure**
|
|
|
|
Infrastructure-as-code systems describe the desired state of systems:
|
|
|
|
* Terraform
|
|
* Kubernetes manifests
|
|
* Helm charts
|
|
* GitOps pipelines
|
|
|
|
These systems encode **how infrastructure should behave**.
|
|
|
|
However, real systems rarely match the declared state perfectly.
|
|
|
|
Operations teams must deal with:
|
|
|
|
* incidents
|
|
* degraded services
|
|
* bottlenecks
|
|
* debugging environments
|
|
* manual recovery actions
|
|
* temporary workarounds
|
|
* unexpected interactions
|
|
|
|
This produces a second model.
|
|
|
|
**Operations Model — Experienced Infrastructure**
|
|
|
|
This model captures:
|
|
|
|
* how operators actually access systems
|
|
* which debugging paths exist
|
|
* where bottlenecks occur
|
|
* which entry points are used for remediation
|
|
* which bridges exist between infrastructure components
|
|
|
|
Most organizations lack a formal system for capturing this operational knowledge.
|
|
|
|
OpsCatalog exists to address this gap.
|
|
|
|
---
|
|
|
|
## How
|
|
|
|
OpsCatalog introduces a **structured repository for operations infrastructure knowledge**.
|
|
|
|
The repository is typically maintained in **Git** and contains structured definitions of:
|
|
|
|
* operations domains
|
|
* infrastructure targets
|
|
* operations access bridges
|
|
* actor classes
|
|
* operations annotations
|
|
|
|
OpsBridge consumes this catalog to:
|
|
|
|
* resolve bridges
|
|
* orient operators
|
|
* guide automation agents
|
|
* provide operations context
|
|
|
|
Git provides several properties that make it suitable for this purpose:
|
|
|
|
* version history
|
|
* collaborative editing
|
|
* review workflows
|
|
* diffability for humans and agents
|
|
* narrative context through commit messages
|
|
|
|
OpsCatalog stores **experienced operations knowledge**, not runtime state.
|
|
|
|
---
|
|
|
|
## What
|
|
|
|
OpsCatalog defines a **shared operations map of infrastructure**.
|
|
|
|
It captures:
|
|
|
|
*Operations Domains*
|
|
|
|
Logical spaces representing operations infrastructure areas.
|
|
|
|
Examples:
|
|
|
|
* production clusters
|
|
* staging environments
|
|
* development infrastructure
|
|
* incident analysis sandboxes
|
|
|
|
*Targets*
|
|
|
|
Infrastructure components relevant to operations.
|
|
|
|
Examples:
|
|
|
|
* hosts
|
|
* services
|
|
* containers
|
|
* Kubernetes resources
|
|
* debugging entry points
|
|
|
|
*Bridges*
|
|
|
|
Operations access paths between systems.
|
|
|
|
Examples:
|
|
|
|
* SSH reverse bridges
|
|
* debugging entry tunnels
|
|
* maintenance access paths
|
|
|
|
*Operations Notes*
|
|
|
|
Structured annotations describing:
|
|
|
|
* debugging procedures
|
|
* common incidents
|
|
* bottlenecks
|
|
* known workarounds
|
|
* operations entry points
|
|
|
|
Together these elements provide a **living operations topology**.
|
|
|
|
---
|
|
|
|
# Part 1 — Product Requirements Document (PRD)
|
|
|
|
## 1. Definition
|
|
|
|
OpsCatalog is a structured repository that defines **operations knowledge about infrastructure environments**, including domains, targets, bridges, and operations annotations.
|
|
|
|
It provides a shared operations map used by human operators and automation agents to understand how infrastructure is accessed and maintained in practice.
|
|
|
|
OpsCatalog complements infrastructure-as-code systems by capturing the **experienced operations topology** rather than the declared infrastructure state.
|
|
|
|
---
|
|
|
|
## 2. Context
|
|
|
|
OpsCatalog operates within environments that already use:
|
|
|
|
* infrastructure-as-code tools
|
|
* automated deployment systems
|
|
* identity management systems
|
|
* operations monitoring platforms
|
|
|
|
These systems define and monitor infrastructure but often fail to capture how operators interact with systems during incidents or maintenance.
|
|
|
|
OpsCatalog fills this gap by providing a **structured operations cognition layer**.
|
|
|
|
OpsBridge integrates with OpsCatalog to translate catalog definitions into actionable access bridges.
|
|
|
|
---
|
|
|
|
## 3. Core Concepts
|
|
|
|
### Operations Domain
|
|
|
|
A logical operational boundary representing a group of related infrastructure systems.
|
|
|
|
Domains help operators navigate complex environments.
|
|
|
|
---
|
|
|
|
### Target
|
|
|
|
An operationally relevant infrastructure component that may be inspected or accessed.
|
|
|
|
Targets represent entry points for diagnostics and maintenance.
|
|
|
|
---
|
|
|
|
### Bridge
|
|
|
|
A defined operations access path enabling connectivity between infrastructure contexts.
|
|
|
|
Bridges describe **how targets are accessed**.
|
|
|
|
---
|
|
|
|
### Actor Class
|
|
|
|
A category of operators or automation systems that may interact with infrastructure.
|
|
|
|
Examples:
|
|
|
|
* human operators
|
|
* remediation agents
|
|
* incident responders
|
|
|
|
---
|
|
|
|
### Operations Annotation
|
|
|
|
Structured knowledge describing operations behaviors, known issues, or debugging strategies.
|
|
|
|
---
|
|
|
|
## 4. Scope and Non-Scope
|
|
|
|
### In Scope
|
|
|
|
OpsCatalog defines:
|
|
|
|
* operations domains
|
|
* infrastructure targets
|
|
* operations bridges
|
|
* actor classifications
|
|
* operations annotations
|
|
* repository structure for catalog storage
|
|
|
|
---
|
|
|
|
### Out of Scope
|
|
|
|
OpsCatalog does not:
|
|
|
|
* manage infrastructure resources
|
|
* maintain runtime infrastructure state
|
|
* replace monitoring systems
|
|
* replace configuration management systems
|
|
* enforce security policies
|
|
* store credentials or secrets
|
|
|
|
These responsibilities remain with external systems.
|
|
|
|
---
|
|
|
|
## 5. Practical Implications
|
|
|
|
OpsCatalog provides several operations advantages.
|
|
|
|
### Shared operations knowledge
|
|
|
|
Teams maintain a common understanding of infrastructure access paths.
|
|
|
|
### Improved incident response
|
|
|
|
Operators can quickly locate operations entry points.
|
|
|
|
### Automation enablement
|
|
|
|
AI agents and automation systems gain structured knowledge about infrastructure navigation.
|
|
|
|
### Organizational resilience
|
|
|
|
Operations knowledge becomes versioned and reviewable rather than implicit.
|
|
|
|
However, maintaining the catalog requires:
|
|
|
|
* operations discipline
|
|
* periodic review
|
|
* integration with infrastructure evolution
|
|
|
|
---
|
|
|
|
## 6. External Dependencies
|
|
|
|
OpsCatalog assumes integration with several external systems.
|
|
|
|
Examples include:
|
|
|
|
* infrastructure-as-code platforms
|
|
* operations access tools such as OpsBridge
|
|
* identity systems such as privacyIDEA
|
|
* version control systems such as Git
|
|
|
|
---
|
|
|
|
## 7. Success Criteria
|
|
|
|
OpsCatalog is successful if it enables operators and automation agents to:
|
|
|
|
* locate relevant infrastructure targets quickly
|
|
* identify operations access paths
|
|
* understand operations context during incidents
|
|
* maintain shared operations knowledge across teams
|
|
|
|
---
|
|
|
|
# Part 2 — Functional Requirements Specification (FRS)
|
|
|
|
## 1. Domain Management
|
|
|
|
### FR-1 Domain Definition
|
|
|
|
The system shall allow definition of operations domains.
|
|
|
|
### FR-2 Domain Listing
|
|
|
|
The system shall allow retrieval of all defined domains.
|
|
|
|
### FR-3 Domain Inspection
|
|
|
|
The system shall allow inspection of a specific domain and its associated elements.
|
|
|
|
---
|
|
|
|
## 2. Target Management
|
|
|
|
### FR-4 Target Definition
|
|
|
|
The system shall allow definition of infrastructure targets within domains.
|
|
|
|
### FR-5 Target Query
|
|
|
|
The system shall allow retrieval of targets belonging to a domain.
|
|
|
|
### FR-6 Target Inspection
|
|
|
|
The system shall allow inspection of metadata associated with a target.
|
|
|
|
---
|
|
|
|
## 3. Bridge Definition
|
|
|
|
### FR-7 Bridge Definition
|
|
|
|
The system shall allow definition of operations bridges connecting infrastructure contexts.
|
|
|
|
### FR-8 Bridge Query
|
|
|
|
The system shall allow retrieval of bridges associated with a target or domain.
|
|
|
|
### FR-9 Bridge Inspection
|
|
|
|
The system shall allow inspection of bridge metadata.
|
|
|
|
---
|
|
|
|
## 4. Actor Classification
|
|
|
|
### FR-10 Actor Class Definition
|
|
|
|
The system shall allow definition of actor classes.
|
|
|
|
### FR-11 Actor Attribution
|
|
|
|
The system shall allow bridges to reference actor classes.
|
|
|
|
---
|
|
|
|
## 5. Operational Annotations
|
|
|
|
### FR-12 Operational Notes
|
|
|
|
The system shall allow structured annotations associated with domains, targets, and bridges.
|
|
|
|
### FR-13 Annotation Retrieval
|
|
|
|
The system shall allow retrieval of annotations associated with infrastructure elements.
|
|
|
|
---
|
|
|
|
## 6. Repository Interaction
|
|
|
|
### FR-14 Catalog Retrieval
|
|
|
|
The system shall load catalog data from a repository structure.
|
|
|
|
### FR-15 Catalog Validation
|
|
|
|
The system shall validate the structure of catalog definitions.
|
|
|
|
---
|
|
|
|
# Schemas
|
|
|
|
Example schemas are expressed in YAML.
|
|
|
|
---
|
|
|
|
## Domain Schema
|
|
|
|
```yaml
|
|
type: domain
|
|
id: coulombcore
|
|
name: CoulombCore Infrastructure
|
|
description: Core infrastructure domain for operational services
|
|
environment: production
|
|
```
|
|
|
|
---
|
|
|
|
## Target Schema
|
|
|
|
```yaml
|
|
type: target
|
|
id: state-hub
|
|
domain: coulombcore
|
|
kind: service
|
|
description: Infrastructure state coordination service
|
|
reachable_via:
|
|
- state-hub-coulombcore
|
|
```
|
|
|
|
---
|
|
|
|
## Bridge Schema
|
|
|
|
```yaml
|
|
type: bridge
|
|
id: state-hub-coulombcore
|
|
domain: coulombcore
|
|
target: state-hub
|
|
description: Operations bridge for state hub diagnostics
|
|
access_method: ssh-reverse
|
|
```
|
|
|
|
---
|
|
|
|
## Actor Schema
|
|
|
|
```yaml
|
|
type: actor
|
|
id: agent.claude-remediator
|
|
class: automation
|
|
description: Automated remediation agent
|
|
```
|
|
|
|
---
|
|
|
|
# Repository Structure
|
|
|
|
Recommended repository layout:
|
|
|
|
```
|
|
opscatalog/
|
|
domains/
|
|
coulombcore/
|
|
domain.yaml
|
|
|
|
targets/
|
|
state-hub.yaml
|
|
api-server.yaml
|
|
|
|
bridges/
|
|
state-hub-coulombcore.yaml
|
|
|
|
docs/
|
|
overview.md
|
|
operations.md
|
|
|
|
actors/
|
|
human-operators.yaml
|
|
automation-agents.yaml
|
|
|
|
schemas/
|
|
domain.schema.yaml
|
|
target.schema.yaml
|
|
bridge.schema.yaml
|
|
```
|
|
|
|
This layout supports both human readability and machine parsing.
|
|
|
|
---
|
|
|
|
# Appendices
|
|
|
|
## Appendix A — Operations Notes
|
|
|
|
Operations notes provide context about real-world infrastructure behavior.
|
|
|
|
Examples include:
|
|
|
|
* known debugging entry points
|
|
* typical failure modes
|
|
* operational shortcuts
|
|
* historical incidents
|
|
* recommended inspection procedures
|
|
|
|
Operations notes may be written in structured markdown files stored alongside catalog entries.
|
|
|
|
---
|
|
|
|
## Appendix B — Catalog Maintenance Guidelines
|
|
|
|
Maintaining an effective OpsCatalog requires operational discipline.
|
|
|
|
Recommended practices include:
|
|
|
|
* review changes through pull requests
|
|
* annotate bridges with operational purpose
|
|
* update catalog entries after major infrastructure changes
|
|
* document common debugging procedures
|
|
* avoid storing secrets in catalog files
|
|
|
|
---
|
|
|
|
## Appendix C — Relationship to OpsBridge
|
|
|
|
OpsCatalog serves as a **knowledge source for OpsBridge**.
|
|
|
|
OpsBridge may consume catalog data to:
|
|
|
|
* resolve bridge identifiers
|
|
* display infrastructure orientation
|
|
* assist operators in establishing bridges
|
|
* provide contextual operational information
|
|
|
|
The catalog does not control runtime behavior but provides **structured operations intent**.
|
|
|
|
|
|
xxx
|