11 KiB
OpsCatalogSpecification
IT Operations Knowledge Repository
Below is a structured OpsCatalog specification designed as an extension to OpsBridge.
It includes:
- Why / How / What introduction
- PRD for OpsCatalog
- FRS for OpsCatalog
- Schemas
- Repository structure
- Appendices with operational notes
OpsCatalog Specification
Operations Knowledge Repository for Infrastructure Operations
Version: 0.1 Status: Draft Date: 2026-03-11
Introduction
Why
Modern infrastructure teams operate with two complementary models of reality.
DevOps Model — Declared Infrastructure
Infrastructure-as-code systems describe the desired state of systems:
- Terraform
- Kubernetes manifests
- Helm charts
- GitOps pipelines
These systems encode how infrastructure should behave.
However, real systems rarely match the declared state perfectly.
Operations teams must deal with:
- incidents
- degraded services
- bottlenecks
- debugging environments
- manual recovery actions
- temporary workarounds
- unexpected interactions
This produces a second model.
Operations Model — Experienced Infrastructure
This model captures:
- how operators actually access systems
- which debugging paths exist
- where bottlenecks occur
- which entry points are used for remediation
- which bridges exist between infrastructure components
Most organizations lack a formal system for capturing this operational knowledge.
OpsCatalog exists to address this gap.
How
OpsCatalog introduces a structured repository for operations infrastructure knowledge.
The repository is typically maintained in Git and contains structured definitions of:
- operations domains
- infrastructure targets
- operations access bridges
- actor classes
- operations annotations
OpsBridge consumes this catalog to:
- resolve bridges
- orient operators
- guide automation agents
- provide operations context
Git provides several properties that make it suitable for this purpose:
- version history
- collaborative editing
- review workflows
- diffability for humans and agents
- narrative context through commit messages
OpsCatalog stores experienced operations knowledge, not runtime state.
What
OpsCatalog defines a shared operations map of infrastructure.
It captures:
Operations Domains
Logical spaces representing operations infrastructure areas.
Examples:
- production clusters
- staging environments
- development infrastructure
- incident analysis sandboxes
Targets
Infrastructure components relevant to operations.
Examples:
- hosts
- services
- containers
- Kubernetes resources
- debugging entry points
Bridges
Operations access paths between systems.
Examples:
- SSH reverse bridges
- debugging entry tunnels
- maintenance access paths
Operations Notes
Structured annotations describing:
- debugging procedures
- common incidents
- bottlenecks
- known workarounds
- operations entry points
Together these elements provide a living operations topology.
Part 1 — Product Requirements Document (PRD)
1. Definition
OpsCatalog is a structured repository that defines operations knowledge about infrastructure environments, including domains, targets, bridges, and operations annotations.
It provides a shared operations map used by human operators and automation agents to understand how infrastructure is accessed and maintained in practice.
OpsCatalog complements infrastructure-as-code systems by capturing the experienced operations topology rather than the declared infrastructure state.
2. Context
OpsCatalog operates within environments that already use:
- infrastructure-as-code tools
- automated deployment systems
- identity management systems
- operations monitoring platforms
These systems define and monitor infrastructure but often fail to capture how operators interact with systems during incidents or maintenance.
OpsCatalog fills this gap by providing a structured operations cognition layer.
OpsBridge integrates with OpsCatalog to translate catalog definitions into actionable access bridges.
3. Core Concepts
Operations Domain
A logical operational boundary representing a group of related infrastructure systems.
Domains help operators navigate complex environments.
Target
An operationally relevant infrastructure component that may be inspected or accessed.
Targets represent entry points for diagnostics and maintenance.
Bridge
A defined operations access path enabling connectivity between infrastructure contexts.
Bridges describe how targets are accessed.
Actor Class
A category of operators or automation systems that may interact with infrastructure.
Examples:
- human operators
- remediation agents
- incident responders
Operations Annotation
Structured knowledge describing operations behaviors, known issues, or debugging strategies.
4. Scope and Non-Scope
In Scope
OpsCatalog defines:
- operations domains
- infrastructure targets
- operations bridges
- actor classifications
- operations annotations
- repository structure for catalog storage
Out of Scope
OpsCatalog does not:
- manage infrastructure resources
- maintain runtime infrastructure state
- replace monitoring systems
- replace configuration management systems
- enforce security policies
- store credentials or secrets
These responsibilities remain with external systems.
5. Practical Implications
OpsCatalog provides several operations advantages.
Shared operations knowledge
Teams maintain a common understanding of infrastructure access paths.
Improved incident response
Operators can quickly locate operations entry points.
Automation enablement
AI agents and automation systems gain structured knowledge about infrastructure navigation.
Organizational resilience
Operations knowledge becomes versioned and reviewable rather than implicit.
However, maintaining the catalog requires:
- operations discipline
- periodic review
- integration with infrastructure evolution
6. External Dependencies
OpsCatalog assumes integration with several external systems.
Examples include:
- infrastructure-as-code platforms
- operations access tools such as OpsBridge
- identity systems such as privacyIDEA
- version control systems such as Git
7. Success Criteria
OpsCatalog is successful if it enables operators and automation agents to:
- locate relevant infrastructure targets quickly
- identify operations access paths
- understand operations context during incidents
- maintain shared operations knowledge across teams
Part 2 — Functional Requirements Specification (FRS)
1. Domain Management
FR-1 Domain Definition
The system shall allow definition of operations domains.
FR-2 Domain Listing
The system shall allow retrieval of all defined domains.
FR-3 Domain Inspection
The system shall allow inspection of a specific domain and its associated elements.
2. Target Management
FR-4 Target Definition
The system shall allow definition of infrastructure targets within domains.
FR-5 Target Query
The system shall allow retrieval of targets belonging to a domain.
FR-6 Target Inspection
The system shall allow inspection of metadata associated with a target.
3. Bridge Definition
FR-7 Bridge Definition
The system shall allow definition of operations bridges connecting infrastructure contexts.
FR-8 Bridge Query
The system shall allow retrieval of bridges associated with a target or domain.
FR-9 Bridge Inspection
The system shall allow inspection of bridge metadata.
4. Actor Classification
FR-10 Actor Class Definition
The system shall allow definition of actor classes.
FR-11 Actor Attribution
The system shall allow bridges to reference actor classes.
5. Operational Annotations
FR-12 Operational Notes
The system shall allow structured annotations associated with domains, targets, and bridges.
FR-13 Annotation Retrieval
The system shall allow retrieval of annotations associated with infrastructure elements.
6. Repository Interaction
FR-14 Catalog Retrieval
The system shall load catalog data from a repository structure.
FR-15 Catalog Validation
The system shall validate the structure of catalog definitions.
Schemas
Example schemas are expressed in YAML.
Domain Schema
type: domain
id: coulombcore
name: CoulombCore Infrastructure
description: Core infrastructure domain for operational services
environment: production
Target Schema
type: target
id: state-hub
domain: coulombcore
kind: service
description: Infrastructure state coordination service
reachable_via:
- state-hub-coulombcore
Bridge Schema
type: bridge
id: state-hub-coulombcore
domain: coulombcore
target: state-hub
description: Operations bridge for state hub diagnostics
access_method: ssh-reverse
Actor Schema
type: actor
id: agent.claude-remediator
class: automation
description: Automated remediation agent
Repository Structure
Recommended repository layout:
opscatalog/
domains/
coulombcore/
domain.yaml
targets/
state-hub.yaml
api-server.yaml
bridges/
state-hub-coulombcore.yaml
docs/
overview.md
operations.md
actors/
human-operators.yaml
automation-agents.yaml
schemas/
domain.schema.yaml
target.schema.yaml
bridge.schema.yaml
This layout supports both human readability and machine parsing.
Appendices
Appendix A — Operations Notes
Operations notes provide context about real-world infrastructure behavior.
Examples include:
- known debugging entry points
- typical failure modes
- operational shortcuts
- historical incidents
- recommended inspection procedures
Operations notes may be written in structured markdown files stored alongside catalog entries.
Appendix B — Catalog Maintenance Guidelines
Maintaining an effective OpsCatalog requires operational discipline.
Recommended practices include:
- review changes through pull requests
- annotate bridges with operational purpose
- update catalog entries after major infrastructure changes
- document common debugging procedures
- avoid storing secrets in catalog files
Appendix C — Relationship to OpsBridge
OpsCatalog serves as a knowledge source for OpsBridge.
OpsBridge may consume catalog data to:
- resolve bridge identifiers
- display infrastructure orientation
- assist operators in establishing bridges
- provide contextual operational information
The catalog does not control runtime behavior but provides structured operations intent.
xxx