47 KiB
InfoTechCanon Observability Model
Short Name: ITC-OBS
Document Status: Seed Standard Release Candidate 1
Version: RC1-seed
Date: 2026-05-23
Repository Context: info-tech-canon
Document Type: InfoTechCanon Domain Standard
Intended Audience: SREs, platform engineers, DevSecOps teams, service owners, observability engineers, incident responders, network operators, security analysts, product owners, governance designers, knowledge-system builders, and agentic tooling.
1. Purpose
The InfoTechCanon Observability Model defines a canonical seed model for representing telemetry, signals, events, logs, metrics, traces, profiles, health, service levels, alerts, incidents as observed phenomena, dashboards, runbooks, investigations, and operational evidence.
It exists to make runtime understanding interoperable across systems, services, platforms, networks, security, delivery pipelines, data products, and agentic operations.
This standard provides a canonical vocabulary for:
- telemetry sources,
- resources,
- signals,
- metrics,
- logs,
- events,
- traces,
- spans,
- profiles,
- exemplars,
- attributes,
- dimensions,
- correlation context,
- service level indicators,
- service level objectives,
- error budgets,
- health states,
- alerts,
- notifications,
- incidents,
- investigations,
- dashboards,
- runbooks,
- observability evidence,
- and feedback loops.
2. Position in InfoTechCanon
The Observability Model is a domain standard within InfoTechCanon.
It depends on the existing seed standards as follows:
Landscape = services, runtime resources, environments, endpoints, workloads.
Organization = owners, on-call actors, responders, teams, accountable roles.
Governance = policies, controls, evidence, reviews, assurance, obligations.
Task = incident work, remediation work, investigation, follow-up tasks.
Tagging = lightweight classification of signals, alerts, incidents, dashboards.
Access Control = access to telemetry, dashboards, logs, admin actions, incident tools.
Security = security signals, detections, alerts, incidents, forensic evidence.
Data = telemetry as data, retention, classification, quality, lineage.
DevSecOps = deployment events, delivery metrics, verification, change failures.
Network = flow logs, reachability tests, network metrics, DNS logs, latency.
Observability = signals, telemetry, correlation, health, SLOs, alerts, operational evidence.
InfoTechCanon
├── InfoTechCanonCore
├── InfoTechCanonLandscapeModel
├── InfoTechCanonOrganizationModel
├── InfoTechCanonGovernanceModel
├── InfoTechCanonTaskModel
├── InfoTechCanonTaggingStandard
├── InfoTechCanonAccessControlModel
├── InfoTechCanonSecurityModel
├── InfoTechCanonDataModel
├── InfoTechCanonDevSecOpsModel
├── InfoTechCanonNetworkModel
├── InfoTechCanonObservabilityModel <-- this standard
├── InfoTechCanonPatternLanguage
└── Application Profiles
3. Boundary with Adjacent Standards
3.1 Boundary with Landscape
The Landscape Model owns the entities being observed:
ApplicationService
TechnicalService
RuntimeWorkload
Environment
Endpoint
NetworkEntity
DataStore
DeploymentRecord
The Observability Model owns telemetry and signals about those entities:
Metric
LogRecord
Trace
Span
Event
Profile
Alert
HealthState
SLI
SLO
Dashboard
IncidentSignal
Boundary rule:
Landscape owns what exists.
Observability owns what is observed, measured, correlated, alerted, and evidenced.
3.2 Boundary with Security
The Security Model owns security interpretation:
SecurityFinding
Detection
SecurityIncident
Threat
AttackPath
SecurityEvidence
Observability owns telemetry substrate and operational signals.
Example:
LogRecord may be evidence for SecurityFinding.
SecurityDetection may be derived from ObservabilitySignal.
SecurityIncident may reference Alert, Trace, LogRecord, or Event.
3.3 Boundary with Governance
Governance owns policies, controls, evidence, reviews, assurance, and compliance claims.
Observability provides evidence and indicators.
Example:
SLOEvidence supports ServiceReview.
Metric supports ControlResult.
AlertPolicy implements Governance Policy.
3.4 Boundary with Task
Task owns work semantics.
Observability creates or references tasks:
Alert creates IncidentTask
Incident creates RemediationTask
Investigation creates FollowUpTask
SLOBurn creates ReliabilityTask
3.5 Boundary with DevSecOps
DevSecOps owns delivery events and deployment records.
Observability owns runtime signals used to verify deployments and measure change impact.
Example:
DeploymentRecord produces DeploymentEvent
DeploymentHealthSignal verifies DeploymentRecord
ChangeFailure detected_by ObservabilitySignal
3.6 Boundary with Data
Data owns dataset, classification, lineage, quality, and retention semantics.
Observability telemetry may itself be data, but Observability owns telemetry-specific semantics.
Example:
LogDataset classified_as Restricted
MetricStream has_retention RetentionRuleReference
TraceSample derived_from RuntimeWorkload
4. Research Basis and External Alignment
This seed standard draws on several mature observability and operations bodies of knowledge.
4.1 OpenTelemetry
OpenTelemetry provides a broad observability framework covering traces, metrics, logs, baggage, resources, semantic conventions, instrumentation, collection, and export. Its semantic conventions define common attributes that give meaning to telemetry across systems.
4.2 SRE and Service Level Objectives
SRE practice distinguishes Service Level Indicators, Service Level Objectives, Service Level Agreements, and error budgets. It emphasizes that SLOs should measure user-relevant reliability and guide operational decision-making.
4.3 Prometheus and OpenMetrics
Prometheus and OpenMetrics influence metric naming, metric exposition, labels, time series, counters, gauges, histograms, summaries, and scraping/pull-based metric collection.
4.4 CloudEvents
CloudEvents standardizes common event metadata for interoperability across services, platforms, and systems. It is a strong mapping target for event structure and routing metadata.
4.5 IT Operations and Incident Management
IT operations practice distinguishes alerts, incidents, problems, changes, runbooks, on-call, escalation, resolution, and post-incident review. The Observability Model provides signal semantics while Task and Governance own work and decision semantics.
4.6 AIOps and Event Correlation
AIOps practice emphasizes correlation, anomaly detection, event deduplication, root-cause analysis, topology-aware alerting, and automated remediation. These are advanced profiles rather than mandatory core concepts.
5. Seed Standard Design Stance
This standard is a seed standard, not a vendor-specific observability schema.
It shall:
- define canonical observability semantics,
- distinguish telemetry, signal, event, log, metric, trace, span, profile, alert, and incident,
- support OpenTelemetry alignment without being limited to it,
- support SLOs, SLIs, and error budgets,
- support correlation across services, runtime, network, security, data, and delivery,
- support operational evidence and feedback loops,
- support human and agentic operations,
- map to external standards and tools without becoming subordinate to them,
- remain markdown-first and agent-retrievable,
- and support future assimilation of observability tools, standards, and practices.
6. Scope
6.1 In Scope
This standard covers canonical representation of:
- telemetry,
- telemetry sources,
- observed resources,
- observability signals,
- metrics,
- time series,
- metric points,
- metric instruments,
- logs,
- log records,
- events,
- event envelopes,
- traces,
- spans,
- span links,
- trace context,
- profiles,
- exemplars,
- attributes,
- dimensions,
- labels,
- correlation context,
- service-level indicators,
- service-level objectives,
- service-level agreements as references,
- error budgets,
- burn rates,
- health states,
- alert rules,
- alerts,
- notifications,
- alert routes,
- incidents as observed operational objects,
- investigations,
- dashboards,
- runbooks,
- telemetry pipelines,
- collectors,
- exporters,
- sampling,
- retention,
- and observability evidence.
6.2 Out of Scope
This standard does not fully define:
- all monitoring tool schemas,
- all incident-management process details,
- all SRE organizational practice,
- complete AIOps algorithms,
- all logging formats,
- all SIEM detection content,
- full OpenTelemetry SDK implementation,
- all Prometheus query semantics,
- complete data-retention law,
- complete security incident-response methodology,
- or every vendor-specific telemetry backend.
Those may be mapped, assimilated, profiled, or handled by adjacent standards.
7. Normative Language
The following terms are used normatively:
- SHALL indicates a mandatory rule for conformance.
- SHOULD indicates a recommended practice.
- MAY indicates an optional capability.
- MUST NOT indicates a prohibited practice.
- SEED marks a concept defined provisionally here but open to later refinement.
- EXTRACT marks a concept that may later move to a more specialized standard.
8. Core Principles
8.1 Observability Is More Than Monitoring
Monitoring checks known conditions.
Observability supports understanding system behavior, including unknown or emergent failure modes, through signals and correlation.
8.2 Telemetry Is Not Insight
Raw telemetry becomes useful through context, correlation, aggregation, interpretation, and action.
8.3 Signal Is Not Incident
A signal, alert, or event may indicate a possible problem.
An incident is an operationally relevant situation requiring response.
8.4 Alert Is Not Evidence by Itself
An alert indicates that a rule fired or condition was detected.
Evidence should include the underlying signals, query, thresholds, state, and context.
8.5 Metrics, Logs, Traces, Events, and Profiles Are Distinct
Each signal type has different strengths and should not be collapsed into one generic “event” concept.
8.6 Service Levels Must Be Explicit
SLIs, SLOs, and error budgets SHOULD be modeled explicitly when reliability is important.
8.7 Correlation Requires Identity
Telemetry SHOULD be linked to canonical landscape entities, deployment records, network endpoints, data resources, or security entities where possible.
8.8 Observability Must Support Feedback
Observability should feed tasks, incidents, governance reviews, deployment verification, security detection, reliability improvement, and standard evolution.
8.9 External Standards Are Mapped, Not Obeyed
The Observability Model MAY map to OpenTelemetry, Prometheus, OpenMetrics, CloudEvents, SRE SLO concepts, ITIL incident practices, and vendor schemas.
It MUST NOT subordinate its internal semantics to any single external model.
9. Canonical Seed Metadata
Every observability artifact SHOULD support structured metadata.
Recommended front matter:
---
id: itc-obs:Metric
type: concept
standard: InfoTechCanonObservabilityModel
standard_version: RC1-seed
status: candidate
canonical_owner: InfoTechCanonObservabilityModel
preferred_label: Metric
related:
- itc-obs:TimeSeries
- itc-obs:SLI
- itc-obs:AlertRule
mappings:
- itc-map:metric-to-opentelemetry-metric
---
Recommended artifact statuses:
idea
draft
candidate
release-candidate
adopted
stable
deprecated
retired
Recommended concept statuses:
proposed
experimental
candidate
canonical
deprecated
retired
10. Root Observability Taxonomy
ObservabilityEntity
├── TelemetryEntity
│ ├── Telemetry
│ ├── TelemetrySource
│ ├── ObservedResource
│ ├── ResourceAttribute
│ ├── Signal
│ ├── SignalSource
│ └── TelemetryPipeline
├── MetricEntity
│ ├── Metric
│ ├── MetricInstrument
│ ├── TimeSeries
│ ├── MetricPoint
│ ├── Counter
│ ├── Gauge
│ ├── Histogram
│ ├── Summary
│ └── Exemplar
├── LogEntity
│ ├── Log
│ ├── LogRecord
│ ├── LogStream
│ ├── LogLevel
│ ├── LogContext
│ └── StructuredLogField
├── TraceEntity
│ ├── Trace
│ ├── Span
│ ├── SpanEvent
│ ├── SpanLink
│ ├── TraceContext
│ ├── Baggage
│ └── TraceSample
├── EventEntity
│ ├── Event
│ ├── EventEnvelope
│ ├── EventSource
│ ├── EventType
│ ├── EventConsumer
│ └── EventCorrelationKey
├── ProfileEntity
│ ├── Profile
│ ├── ProfilingSample
│ ├── ResourceProfile
│ └── PerformanceProfile
├── ReliabilityEntity
│ ├── SLI
│ ├── SLO
│ ├── SLAReference
│ ├── ErrorBudget
│ ├── BurnRate
│ ├── HealthState
│ └── AvailabilityWindow
├── AlertingEntity
│ ├── AlertRule
│ ├── Alert
│ ├── Notification
│ ├── AlertRoute
│ ├── AlertSuppression
│ ├── AlertCorrelation
│ └── EscalationReference
├── OperationsEntity
│ ├── ObservedIncident
│ ├── Investigation
│ ├── Timeline
│ ├── Runbook
│ ├── Dashboard
│ ├── OperationalView
│ └── PostIncidentObservation
└── EvidenceEntity
├── ObservabilityEvidence
├── Query
├── QueryResult
├── Snapshot
├── Annotation
├── Correlation
└── RootCauseHypothesis
11. Core Concepts
11.1 ObservabilityEntity
An ObservabilityEntity is any identifiable concept used to represent telemetry, signals, correlation, health, service levels, alerts, incidents as observed phenomena, dashboards, runbooks, or operational evidence.
Recommended attributes:
id:
entity_type:
canonical_name:
display_name:
lifecycle_state:
source_system:
created_at:
updated_at:
Optional attributes:
owner:
steward:
observed_resource:
service:
environment:
source_confidence:
valid_from:
valid_to:
tags:
external_references:
11.2 Telemetry
Telemetry is machine-generated or manually recorded operational data about system behavior, state, performance, events, or activity.
Examples:
metric sample
log record
trace span
event
profile sample
flow record
health check result
11.3 TelemetrySource
A TelemetrySource is a system, component, agent, collector, service, device, pipeline, or actor that emits or provides telemetry.
11.4 ObservedResource
An ObservedResource is the entity about which telemetry is emitted or collected.
Observed resources SHOULD map to Landscape, Network, Data, Security, or DevSecOps entities where possible.
11.5 ResourceAttribute
A ResourceAttribute is an attribute describing an observed resource.
Examples:
service.name
service.version
deployment.environment
host.name
cloud.region
k8s.cluster.name
container.image.name
11.6 Signal
A Signal is an interpretable unit or stream of observability data.
Signal types include:
metric
log
trace
event
profile
alert
health check
synthetic result
11.7 SignalSource
A SignalSource is the origin of a signal.
11.8 TelemetryPipeline
A TelemetryPipeline is a flow that collects, processes, transforms, samples, enriches, routes, stores, or exports telemetry.
11.9 Metric
A Metric is a measurement of a system, service, resource, or process over time.
Metrics may be used for alerting, dashboards, SLOs, capacity planning, anomaly detection, and evidence.
11.10 MetricInstrument
A MetricInstrument defines the kind of measurement instrument.
Seed instrument types:
counter
gauge
histogram
summary
up_down_counter
observable_gauge
11.11 TimeSeries
A TimeSeries is a sequence of metric points over time for a metric and a set of dimensions or labels.
11.12 MetricPoint
A MetricPoint is a single measurement value at a time.
11.13 Counter
A Counter is a monotonically increasing measurement of occurrences or accumulated quantity.
11.14 Gauge
A Gauge is a measurement that can go up or down.
11.15 Histogram
A Histogram is a distribution of measurements across buckets or ranges.
11.16 Summary
A Summary is a metric representation of observations including quantiles or summary statistics.
11.17 Exemplar
An Exemplar is a representative sample connecting an aggregate metric point to a trace, log, or other detailed signal.
11.18 Log
A Log is a stream or collection of timestamped records describing events, state, actions, or messages.
11.19 LogRecord
A LogRecord is a single log entry.
Recommended attributes:
timestamp:
severity:
message:
body:
resource:
trace_id:
span_id:
attributes:
source:
11.20 LogStream
A LogStream is a sequence of log records from a source or resource.
11.21 LogLevel
A LogLevel is a severity or importance category for log records.
Examples:
trace
debug
info
warn
error
fatal
11.22 LogContext
LogContext is contextual metadata attached to log records.
Examples:
request id
trace id
user id reference
tenant id
deployment version
environment
component
11.23 Trace
A Trace is a representation of a request, transaction, workflow, or operation as it moves through a distributed system.
11.24 Span
A Span is a single timed operation within a trace.
Recommended attributes:
trace_id:
span_id:
parent_span_id:
name:
kind:
start_time:
end_time:
status:
attributes:
events:
links:
11.25 SpanEvent
A SpanEvent is a timestamped event attached to a span.
11.26 SpanLink
A SpanLink connects a span to another span or trace context.
11.27 TraceContext
TraceContext is propagation metadata that links operations across process, service, or network boundaries.
11.28 Baggage
Baggage is contextual metadata propagated across process boundaries.
Baggage SHOULD be governed carefully when it may contain sensitive data.
11.29 TraceSample
A TraceSample is a selected trace or subset of trace data retained or analyzed.
11.30 Event
An Event is a record of an occurrence and its context.
Events may be operational, domain, security, deployment, infrastructure, or business events.
11.31 EventEnvelope
An EventEnvelope is structured metadata around event data.
CloudEvents is a primary mapping target.
11.32 EventSource
An EventSource is the producer or origin of an event.
11.33 EventType
An EventType classifies the kind of occurrence represented by an event.
11.34 EventConsumer
An EventConsumer is an actor, system, service, or pipeline that consumes events.
11.35 EventCorrelationKey
An EventCorrelationKey links events to related traces, logs, requests, incidents, deployments, or resources.
11.36 Profile
A Profile is sampled performance or resource-use data.
This concept is observability-specific and distinct from InfoTechCanon application profiles.
11.37 ProfilingSample
A ProfilingSample is one sample of profiling data.
11.38 ResourceProfile
A ResourceProfile describes resource use over time or sampled execution.
Examples:
CPU profile
memory profile
allocation profile
lock profile
I/O profile
11.39 PerformanceProfile
A PerformanceProfile describes performance characteristics of a system, component, or operation.
11.40 SLI
A Service Level Indicator is a quantitative measure of a service level.
Examples:
availability
latency
error rate
throughput
correctness
freshness
durability
11.41 SLO
A Service Level Objective is a target value or range for an SLI over a defined measurement window.
Recommended attributes:
service:
sli:
target:
window:
scope:
owner:
evidence_source:
11.42 SLAReference
An SLAReference points to a contractual or formal service-level agreement.
Governance owns contractual obligation semantics. Observability owns measured service-level signals.
11.43 ErrorBudget
An ErrorBudget is the allowed amount of unreliability implied by an SLO over a measurement window.
11.44 BurnRate
BurnRate is the rate at which an error budget is being consumed.
11.45 HealthState
HealthState is an assessed operational state of a resource, service, dependency, or system.
Seed health states:
unknown
healthy
degraded
unhealthy
down
recovering
maintenance
11.46 AvailabilityWindow
An AvailabilityWindow is the time period over which availability or service level is measured.
11.47 AlertRule
An AlertRule defines conditions under which an alert is created.
Recommended attributes:
query:
condition:
threshold:
window:
severity:
for_duration:
labels:
annotations:
owner:
runbook:
11.48 Alert
An Alert is an instance of an alert rule firing or resolving.
Seed alert states:
pending
firing
acknowledged
suppressed
resolved
expired
11.49 Notification
A Notification is a message sent to humans, agents, or systems about an alert, incident, or operational state.
11.50 AlertRoute
An AlertRoute defines how alerts are routed to responders, teams, tools, or escalation paths.
11.51 AlertSuppression
AlertSuppression is a rule or state that suppresses notifications for known, duplicate, maintenance, or intentionally ignored alert conditions.
11.52 AlertCorrelation
AlertCorrelation groups related alerts or signals.
11.53 EscalationReference
An EscalationReference points to Organization, Task, or Governance concepts defining who should respond and how escalation works.
11.54 ObservedIncident
An ObservedIncident is an operationally significant situation inferred or declared from observability signals.
Task and ITSM systems may own incident work records. Observability owns the signal-derived incident view.
11.55 Investigation
An Investigation is analysis of signals, alerts, telemetry, incidents, or hypotheses to understand cause, scope, impact, and remediation.
11.56 Timeline
A Timeline is an ordered sequence of events, signals, decisions, actions, and observations.
11.57 Runbook
A Runbook is an operational procedure used to investigate, respond, recover, or verify a condition.
11.58 Dashboard
A Dashboard is a visual or structured view of observability data.
11.59 OperationalView
An OperationalView is a purpose-specific view of system state, health, risk, or performance.
11.60 PostIncidentObservation
A PostIncidentObservation is a signal, fact, lesson, or finding captured after an incident.
11.61 ObservabilityEvidence
ObservabilityEvidence is telemetry, query output, screenshot, dashboard state, trace, log, metric, or event used to support a claim.
11.62 Query
A Query is an expression used to retrieve or calculate observability data.
Examples:
PromQL query
LogQL query
SQL query
trace search
SIEM query
dashboard panel query
11.63 QueryResult
A QueryResult is the result of executing a query.
11.64 Snapshot
A Snapshot is a captured state of telemetry, dashboard, trace, log, metric, or query result at a time.
11.65 Annotation
An Annotation is a human, agent, or system-added note attached to telemetry, dashboard, timeline, incident, deployment, or event.
11.66 Correlation
A Correlation is a relationship linking signals, resources, events, deployments, incidents, or hypotheses.
11.67 RootCauseHypothesis
A RootCauseHypothesis is a candidate explanation for an observed issue.
Canonical rule:
RootCauseHypothesis SHOULD remain distinguishable from verified cause.
12. Core Relationship Vocabulary
Recommended root relationship types:
emitted_by
observes
measures
describes
correlates_with
derived_from
generated_by
triggered_by
alerts_on
routes_to
acknowledged_by
suppressed_by
resolves
affects
indicates
supports
evidences
verifies
invalidates
samples
aggregates
annotates
links_to
maps_to
Relationship records SHOULD support:
id:
relationship_type:
source_entity:
target_entity:
scope:
time_window:
state_context:
valid_from:
valid_to:
source_system:
confidence:
evidence:
rationale:
13. Observability State Models
13.1 Signal States
unknown
emitting
missing
delayed
partial
degraded
invalid
stale
13.2 Alert States
pending
firing
acknowledged
suppressed
resolved
expired
13.3 Incident Observation States
suspected
confirmed
investigating
mitigating
recovering
resolved
post_review
closed
13.4 Health States
unknown
healthy
degraded
unhealthy
down
recovering
maintenance
13.5 SLO States
not_measured
within_budget
burning_fast
at_risk
violated
paused
retired
13.6 Telemetry Pipeline States
configured
active
degraded
dropping_data
stalled
misconfigured
retired
14. Observability Patterns
14.1 Pattern: Resource-Linked Telemetry
Context: Telemetry is collected from many systems.
Problem: Signals are hard to interpret if they cannot be linked to canonical resources.
Solution: Attach telemetry to ObservedResource references mapped to Landscape, Network, DevSecOps, Security, or Data entities.
14.2 Pattern: Signal-to-Alert-to-Task
Context: A condition needs human or agent response.
Problem: Alerts fire but do not become accountable work.
Solution:
Signal
-> AlertRule
-> Alert
-> ObservedIncident or Task
-> Investigation
-> RemediationTask
-> VerificationEvidence
14.3 Pattern: SLO as Reliability Contract
Context: Service reliability must be operationally meaningful.
Problem: Teams alert on low-level metrics that do not represent user experience.
Solution: Define SLIs and SLOs for user-meaningful service behavior and use error budgets to guide action.
14.4 Pattern: Deployment Health Verification
Context: A deployment has completed.
Problem: Successful deployment command does not prove healthy service behavior.
Solution: Link DeploymentRecord to DeploymentHealthSignal, SLO state, traces, logs, metrics, and verification evidence.
14.5 Pattern: Correlated Timeline
Context: Incidents require understanding what happened.
Problem: Logs, alerts, deployments, changes, and network events are scattered.
Solution: Build Timeline from correlated events, alerts, traces, deployment records, annotations, and task actions.
14.6 Pattern: Alert with Runbook
Context: An alert requires response.
Problem: Responders waste time discovering what the alert means.
Solution: AlertRule SHOULD reference owner, runbook, dashboard, likely causes, and escalation path.
14.7 Pattern: Metric with Exemplar
Context: Aggregate metrics show a problem.
Problem: Aggregates hide individual requests or traces.
Solution: Link MetricPoint or histogram bucket to trace/log exemplar.
14.8 Pattern: Observability as Governance Evidence
Context: Governance requires proof that controls or SLOs are operating.
Problem: Compliance claims rely on manual screenshots or weak assertions.
Solution: Use query results, snapshots, dashboards, and telemetry evidence as structured ObservabilityEvidence.
14.9 Pattern: Missing Signal as Signal
Context: A telemetry source goes silent.
Problem: Systems only alert on bad values, not missing data.
Solution: Model missing, stale, or delayed telemetry as signal states and potential alerts.
15. Observability Profiles
15.1 Profile Format
An Observability Profile SHALL declare:
id:
profile_name:
status:
implements:
- InfoTechCanonObservabilityModel
target_context:
included_concepts:
required_relationships:
required_metadata:
state_model:
source_of_truth_rules:
mapping_files:
validation_rules:
examples:
known_deviations:
15.2 Seed Profile: Small SaaS Observability Profile
Purpose:
Provide a minimal observability model for a small SaaS platform moving toward production readiness.
Included concepts:
ObservedResource
Metric
LogRecord
Trace
Span
Event
AlertRule
Alert
Dashboard
Runbook
SLI
SLO
HealthState
ObservedIncident
ObservabilityEvidence
Required relationships:
Metric emitted_by ObservedResource
LogRecord emitted_by ObservedResource
Trace observes Service
Alert triggered_by AlertRule
Alert affects Service
SLO measures Service
Dashboard displays Metric
Runbook supports Alert
ObservabilityEvidence supports Investigation
15.3 Seed Profile: OpenTelemetry Profile
Purpose:
Map OpenTelemetry resources, traces, metrics, logs, attributes, baggage, and semantic conventions into InfoTechCanon.
Example mappings:
Resource -> ObservedResource
Resource attributes -> ResourceAttribute
Metric -> Metric
LogRecord -> LogRecord
Trace -> Trace
Span -> Span
Span event -> SpanEvent
Span link -> SpanLink
Baggage -> Baggage
Semantic conventions -> Mapping / Attribute vocabulary
Collector -> TelemetryPipeline component
Exporter -> TelemetryPipeline component
15.4 Seed Profile: Prometheus / OpenMetrics Profile
Purpose:
Represent metrics, labels, time series, scrape targets, alert rules, and query results.
Example mappings:
metric name -> Metric
labels -> dimensions / attributes
sample -> MetricPoint
time series -> TimeSeries
PromQL -> Query
recording rule -> DerivedMetric / Query
alerting rule -> AlertRule
target -> TelemetrySource / ObservedResource
15.5 Seed Profile: CloudEvents Profile
Purpose:
Represent event metadata and event envelopes.
Example mappings:
id -> Event id
source -> EventSource
type -> EventType
specversion -> EventEnvelope version
subject -> Event subject
time -> Event timestamp
datacontenttype -> Event data content type
data -> Event data
15.6 Seed Profile: SRE Reliability Profile
Purpose:
Represent SLIs, SLOs, error budgets, burn rates, and reliability decisions.
Included concepts:
SLI
SLO
ErrorBudget
BurnRate
AvailabilityWindow
AlertRule
ReliabilityReview
ServiceHealthState
ErrorBudgetPolicyReference
Required relationships:
SLO applies_to Service
SLI measures Service
ErrorBudget derived_from SLO
BurnRate measures ErrorBudgetConsumption
AlertRule alerts_on BurnRate
ReliabilityReview reviews SLOState
15.7 Seed Profile: Incident Observability Profile
Purpose:
Represent telemetry, alerts, timelines, dashboards, and evidence for incident response.
Included concepts:
Alert
ObservedIncident
Timeline
Investigation
Dashboard
Runbook
Annotation
RootCauseHypothesis
ObservabilityEvidence
PostIncidentObservation
15.8 Seed Profile: Network Observability Profile
Purpose:
Represent network metrics, flow logs, reachability tests, DNS logs, and latency signals.
Included concepts:
NetworkMetric
ObservedFlowSignal
DNSLogRecord
ReachabilityTestResult
LatencyMetric
PacketLossMetric
EndpointHealthSignal
Mapping targets:
NetFlow/IPFIX
VPC Flow Logs
Kubernetes CNI telemetry
service mesh telemetry
DNS logs
synthetic probes
15.9 Seed Profile: Security Observability Profile
Purpose:
Represent observability signals used for security detection, investigation, and evidence.
Included concepts:
SecuritySignal
SecurityLogRecord
DetectionEvent
Alert
TraceEvidence
AccessSessionLog
AuditLogReference
SecurityEvidence
Security interpretation remains owned by the Security Model.
16. Mapping Model for the Observability Standard
Mappings relate InfoTechCanon observability concepts to external standards, tools, and products.
16.1 Mapping Types
Recommended mapping types:
exactMatch
closeMatch
broadMatch
narrowMatch
relatedMatch
conflictMatch
gapMatch
derivedFrom
regulatoryReference
toolEquivalent
16.2 Mapping Record
Example:
id: itc-map:span-to-opentelemetry-span
source_concept: itc-obs:Span
target_body: OpenTelemetry
target_version: "current"
target_concept: Span
mapping_type: closeMatch
scope:
- distributed tracing
not_valid_for:
- all event or log semantics
rationale: >
OpenTelemetry Span is the primary mapping target for timed operations in traces.
InfoTechCanon keeps Span as a canonical concept to allow mappings to other tracing systems.
confidence: high
status: candidate
owner: InfoTechCanonObservabilityModel
16.3 Seed Mapping Targets
The Observability Model SHOULD maintain mappings to:
OpenTelemetry
OpenTelemetry Semantic Conventions
Prometheus
OpenMetrics / Prometheus exposition format
CloudEvents
W3C Trace Context
Google SRE SLI/SLO/Error Budget concepts
Grafana dashboards and alerting
Prometheus Alertmanager
Loki / LogQL
Jaeger
Tempo
Elastic Observability
Datadog
New Relic
Splunk
OpenSearch
ITIL incident concepts
NetFlow / IPFIX
VPC Flow Logs
Kubernetes events and metrics
service mesh telemetry
17. Assimilation Hooks
The Observability Model SHALL be able to receive new observability standards, tool models, telemetry schemas, incident practices, and operational patterns through the InfoTechCanon assimilation process.
17.1 Assimilation Triggers
Assimilation may be triggered by:
new telemetry standard
new observability backend
new incident-management tool
new SLO practice
new dashboard model
new alerting model
new tracing model
new logging schema
new AIOps product
new runtime verification practice
new recurring signal classification conflict
17.2 Observability Assimilation Output
An observability assimilation SHOULD produce:
source summary
extracted observability concepts
concept comparison matrix
gap list
conflict list
mapping file
candidate new concepts
candidate relationship changes
candidate pattern changes
candidate profile changes
open questions
17.3 Recommended First Assimilation Candidates
OpenTelemetry specification and semantic conventions
Prometheus / OpenMetrics
CloudEvents
W3C Trace Context
Google SRE SLO chapters
Grafana dashboard and alerting models
Prometheus Alertmanager
Kubernetes events and metrics
VPC Flow Logs / NetFlow / IPFIX
ITIL incident management concepts
18. Integration with Other InfoTechCanon Standards
18.1 Landscape Model
Observability links signals to:
ApplicationService
TechnicalService
RuntimeWorkload
Environment
Endpoint
DataStore
DeploymentRecord
NetworkEntity
18.2 Organization Model
Observability imports organization concepts for:
service owner
on-call responder
team
escalation target
runbook owner
incident commander
18.3 Governance Model
Observability imports governance concepts for:
evidence
control result
review
assurance
policy
SLA obligation
audit evidence
18.4 Task Model
Observability creates or references:
incident task
investigation task
remediation task
follow-up task
reliability improvement task
18.5 Tagging Standard
Observability uses tags for:
service
environment
severity
signal type
dashboard category
incident category
team
Tags must not replace ObservedResource, AlertRule, SLO, or Evidence records.
18.6 Access Control Model
Observability imports access concepts for:
dashboard access
log access
trace access
incident tool access
telemetry pipeline access
sensitive telemetry access
18.7 Security Model
Security imports observability concepts for:
security signal
detection evidence
security alert
audit log
trace evidence
incident timeline
18.8 Data Model
Data imports observability concepts when telemetry is treated as a dataset and for data freshness, quality, and lineage signals.
18.9 DevSecOps Model
DevSecOps imports observability concepts for:
deployment verification
change failure detection
delivery metric
runtime feedback
SLO impact
18.10 Network Model
Network imports observability concepts for:
flow logs
reachability test results
latency
packet loss
DNS logs
endpoint health
19. Canon Interface Card Usage
Subsystems that implement or produce observability knowledge SHOULD publish a Canon Interface Card.
Example:
subsystem: prometheus-importer
implements:
- InfoTechCanonObservabilityModel
- PrometheusOpenMetricsProfile
produces:
- Metric
- TimeSeries
- MetricPoint
- AlertRule
- Alert
- QueryResult
consumes:
- ObservedResource
- Service
- Environment
relations:
- Metric emitted_by ObservedResource
- Alert triggered_by AlertRule
- Alert affects Service
source_of_truth:
metric_samples: Prometheus
alert_rule_state: Prometheus
known_deviations:
- resource identity depends on labels
- long-term retention may be external
20. Retrieval Requirements
The Observability Model is designed for markdown-based infospaces.
20.1 Required Retrieval Properties
Every major concept SHOULD provide:
- stable heading,
- stable identifier,
- short definition,
- longer explanation,
- examples,
- distinction notes,
- relationship examples,
- mapping hooks,
- profile references,
- and common mistakes.
20.2 Agent Brief
A mature Observability Model SHOULD include an agent-brief.md file with:
purpose
scope
owned concepts
imported concepts
core distinctions
do / do not rules
relationship patterns
minimal examples
common mistakes
profile list
mapping list
20.3 Indexes
The observability information space SHOULD provide indexes by:
concept
relationship
signal type
metric
log
trace
event
resource
service
alert
SLO
dashboard
incident
profile
pattern
mapping target
status
source system
21. Conformance Levels
21.1 Reference-Conformant
A document or system is reference-conformant if it uses Observability Model terminology consistently but does not implement structured metadata or validation rules.
21.2 Metadata-Conformant
A system is metadata-conformant if it uses stable identifiers, concept names, lifecycle states, source metadata, and relationship types.
21.3 Signal-Conformant
A system is signal-conformant if it distinguishes metrics, logs, traces, events, profiles, alerts, and health signals.
21.4 Resource-Correlated
A system is resource-correlated if observability signals can be linked to observed resources and canonical landscape entities.
21.5 SLO-Conformant
A system is SLO-conformant if it represents SLIs, SLOs, error budgets, burn rates, and measurement windows.
21.6 Evidence-Conformant
A system is evidence-conformant if observability claims, incidents, alerts, and service-level states can be linked to evidence.
21.7 Profile-Conformant
A system is profile-conformant if it implements a declared Observability Profile and passes its validation rules.
21.8 Assimilation-Conformant
A system or repository is assimilation-conformant if it can accept external observability concepts through the InfoTechCanon assimilation workflow and produce mappings, gaps, conflicts, and proposed changes.
22. Validation Rules
Initial validation rules:
VAL-OBS-001: Metric, LogRecord, Trace, Span, Event, Profile, Alert, and Incident SHOULD be modeled as distinct concepts.
VAL-OBS-002: Telemetry SHOULD reference an ObservedResource where possible.
VAL-OBS-003: ObservedResource SHOULD map to a Landscape, Network, Data, Security, or DevSecOps entity where possible.
VAL-OBS-004: Metric SHOULD declare unit, instrument type, source, and dimensions where available.
VAL-OBS-005: TimeSeries SHOULD distinguish metric identity from labels/dimensions.
VAL-OBS-006: LogRecord SHOULD include timestamp, severity, source, and body where available.
VAL-OBS-007: Span SHOULD include trace id, span id, timing, name, status, and parent/link references where available.
VAL-OBS-008: Event SHOULD distinguish event data from event context metadata.
VAL-OBS-009: Alert SHOULD reference AlertRule or source condition where available.
VAL-OBS-010: AlertRule SHOULD reference query or condition, threshold, time window, owner, and runbook where applicable.
VAL-OBS-011: SLO SHOULD reference SLI, target, measurement window, service, and evidence source.
VAL-OBS-012: ErrorBudget SHOULD derive from an SLO.
VAL-OBS-013: Dashboard SHOULD NOT be treated as evidence unless a Snapshot or QueryResult is captured.
VAL-OBS-014: Incident SHOULD NOT be inferred solely from one alert unless profile permits it.
VAL-OBS-015: RootCauseHypothesis SHOULD remain distinguishable from verified cause.
VAL-OBS-016: Missing, stale, or delayed telemetry SHOULD be representable as signal state.
VAL-OBS-017: Tags MUST NOT replace resource identity, SLO definitions, alert rules, or evidence.
VAL-OBS-018: Imported external observability concepts SHOULD be represented through mapping records rather than silently reused.
VAL-OBS-019: Profiles MUST NOT redefine canonical concepts. They may constrain them.
VAL-OBS-020: Telemetry containing sensitive data SHOULD reference Data, Security, Access Control, or Governance constraints where relevant.
23. Anti-Patterns
23.1 Dashboard as Truth
Treating a dashboard view as evidence without preserving query, time window, data source, or snapshot.
23.2 Alert Equals Incident
Treating every alert as an incident.
23.3 Metric Soup
Collecting many metrics without ownership, resource identity, interpretation, or action path.
23.4 Logs Without Context
Logging messages that cannot be correlated to service, request, trace, tenant, deployment, or resource.
23.5 Traces Without Boundaries
Tracing calls without linking them to service ownership, deployment version, or runtime resource.
23.6 SLO Theater
Creating SLOs that do not reflect user experience or guide operational decisions.
23.7 Alert Without Runbook
Creating alerts without ownership, runbook, dashboard, or response expectation.
23.8 Missing Signal Blindness
Failing to alert when telemetry stops arriving.
23.9 Tool-Native Capture
Letting one observability backend define the internal observability model.
23.10 Telemetry Without Governance
Collecting sensitive logs, traces, or profiles without classification, retention, access control, or privacy consideration.
24. Initial Repository Placement
Recommended repository layout:
info-tech-canon/
standards/
observability/
InfoTechCanonObservabilityModel.md
agent-brief.md
concepts/
relationships/
patterns/
profiles/
mappings/
assimilation/
examples/
validation/
Seed files:
standards/observability/InfoTechCanonObservabilityModel.md
standards/observability/agent-brief.md
standards/observability/concepts/telemetry.md
standards/observability/concepts/metric.md
standards/observability/concepts/log-record.md
standards/observability/concepts/trace.md
standards/observability/concepts/span.md
standards/observability/concepts/event.md
standards/observability/concepts/sli.md
standards/observability/concepts/slo.md
standards/observability/concepts/alert.md
standards/observability/concepts/observability-evidence.md
standards/observability/patterns/resource-linked-telemetry.md
standards/observability/patterns/signal-to-alert-to-task.md
standards/observability/patterns/slo-as-reliability-contract.md
standards/observability/patterns/deployment-health-verification.md
standards/observability/profiles/small-saas-observability-profile.md
standards/observability/profiles/opentelemetry-profile.md
standards/observability/profiles/prometheus-openmetrics-profile.md
standards/observability/profiles/sre-reliability-profile.md
standards/observability/mappings/opentelemetry.yaml
standards/observability/mappings/prometheus-openmetrics.yaml
standards/observability/mappings/cloudevents.yaml
standards/observability/mappings/sre-slo.yaml
25. Roadmap
Phase 1: Seed Stabilization
- Establish this standard as
InfoTechCanonObservabilityModel. - Add seed concepts, relationship vocabulary, patterns, and profiles.
- Define validation rules.
- Align with Landscape, Network, DevSecOps, Security, Data, Governance, Task, Access Control, and Tagging.
Phase 2: First Assimilations
Recommended first assimilations:
OpenTelemetry specification and semantic conventions
Prometheus / OpenMetrics
CloudEvents
W3C Trace Context
Google SRE SLO chapters
Grafana dashboard and alerting model
Prometheus Alertmanager
Kubernetes events and metrics
VPC Flow Logs / NetFlow / IPFIX
ITIL incident management concepts
Phase 3: Profile Maturation
- Mature Small SaaS Observability Profile.
- Mature OpenTelemetry Profile.
- Mature Prometheus / OpenMetrics Profile.
- Mature CloudEvents Profile.
- Mature SRE Reliability Profile.
- Mature Incident Observability Profile.
- Mature Network Observability Profile.
- Mature Security Observability Profile.
Phase 4: Tooling Integration
- Generate concept indexes.
- Generate agent brief.
- Create machine-readable YAML/JSON exports.
- Add validation scripts.
- Integrate telemetry pipelines, metrics, logs, traces, dashboards, alerts, incident tools, and service catalogs.
Phase 5: Operational Intelligence Loop
- Connect telemetry to canonical resources.
- Connect alerts to tasks and incidents.
- Connect SLOs to governance and service ownership.
- Connect deployment records to runtime health signals.
- Connect security detections to security incidents.
- Connect network flows to reachability and exposure.
- Connect post-incident observations to improvements and standard evolution.
26. Summary
The InfoTechCanon Observability Model is the seed standard for representing telemetry, signals, metrics, logs, traces, events, profiles, alerts, SLOs, health, incidents as observed phenomena, and operational evidence.
Its most important commitments are:
Separate telemetry, signal, metric, log, trace, span, event, profile, alert, and incident.
Link signals to canonical resources and landscape entities.
Treat SLOs, SLIs, error budgets, burn rates, and health states as first-class reliability concepts.
Use observability evidence to support governance, security, delivery, incident response, and operational review.
Map to OpenTelemetry, Prometheus/OpenMetrics, CloudEvents, SRE practices, and observability tools
without surrendering internal semantic autonomy.
Use profiles to make the model practical for SaaS systems, OpenTelemetry, Prometheus,
SRE reliability, incident response, network observability, and security observability.
This makes the Observability Model a core seed for runtime intelligence, production readiness, SRE practice, incident response, deployment verification, security detection, and agent-supported operations.