generated from coulomb/repo-seed
2181 lines
43 KiB
Markdown
2181 lines
43 KiB
Markdown
# InfoTechCanon Data Model
|
|
|
|
**Short Name:** `ITC-DATA`
|
|
**Document Status:** Seed Standard Release Candidate 1
|
|
**Version:** RC1-seed
|
|
**Date:** 2026-05-22
|
|
**Repository Context:** `info-tech-canon`
|
|
**Document Type:** InfoTechCanon Domain Standard
|
|
**Intended Audience:** Data architects, data engineers, data stewards, platform engineers, governance designers, security architects, application architects, product owners, knowledge-system builders, compliance reviewers, AI/analytics teams, and agentic tooling.
|
|
|
|
---
|
|
|
|
# 1. Purpose
|
|
|
|
The **InfoTechCanon Data Model** defines a canonical seed model for representing data as a managed, governed, discoverable, classifiable, lineage-bearing, quality-assessable, and reusable information asset.
|
|
|
|
It exists to give data its own canonical domain instead of leaving data semantics scattered across landscape, security, governance, DevSecOps, observability, and application models.
|
|
|
|
This standard provides a canonical vocabulary for:
|
|
|
|
- data domains,
|
|
- datasets,
|
|
- data products,
|
|
- data objects,
|
|
- records,
|
|
- fields,
|
|
- schemas,
|
|
- data elements,
|
|
- code lists,
|
|
- data stores as references,
|
|
- data flows,
|
|
- data lineage,
|
|
- data quality,
|
|
- metadata,
|
|
- catalogs,
|
|
- distributions,
|
|
- data services,
|
|
- data classification,
|
|
- sensitivity,
|
|
- residency,
|
|
- retention,
|
|
- processing purpose,
|
|
- data ownership and stewardship references,
|
|
- data contracts,
|
|
- and data evidence.
|
|
|
|
---
|
|
|
|
# 2. Position in InfoTechCanon
|
|
|
|
The Data Model is a **domain standard** within InfoTechCanon.
|
|
|
|
It depends on the existing seed standards as follows:
|
|
|
|
```text
|
|
Landscape = where data is stored, processed, moved, and exposed.
|
|
Organization = data owners, stewards, custodians, producers, consumers.
|
|
Governance = data policies, obligations, controls, evidence, exceptions.
|
|
Security = data exposure, data-security findings, data attack paths.
|
|
Access Control = permissions and grants to data resources.
|
|
Task = data-quality work, migration work, remediation, reviews.
|
|
Tagging = lightweight classification and retrieval.
|
|
Data = datasets, schemas, metadata, lineage, quality, classification, retention.
|
|
```
|
|
|
|
```text
|
|
InfoTechCanon
|
|
├── InfoTechCanonCore
|
|
├── InfoTechCanonLandscapeModel
|
|
├── InfoTechCanonOrganizationModel
|
|
├── InfoTechCanonGovernanceModel
|
|
├── InfoTechCanonTaskModel
|
|
├── InfoTechCanonTaggingStandard
|
|
├── InfoTechCanonAccessControlModel
|
|
├── InfoTechCanonSecurityModel
|
|
├── InfoTechCanonDataModel <-- this standard
|
|
├── InfoTechCanonDevSecOpsModel
|
|
├── InfoTechCanonNetworkModel
|
|
├── InfoTechCanonObservabilityModel
|
|
├── InfoTechCanonPatternLanguage
|
|
└── Application Profiles
|
|
```
|
|
|
|
---
|
|
|
|
# 3. Boundary with Adjacent Standards
|
|
|
|
## 3.1 Boundary with Landscape
|
|
|
|
The Landscape Model owns:
|
|
|
|
```text
|
|
DataStore
|
|
DatabaseInstance
|
|
ObjectBucket
|
|
FileShare
|
|
Queue
|
|
Cache
|
|
RuntimeResource
|
|
ApplicationService
|
|
IntegrationFlow
|
|
Endpoint
|
|
```
|
|
|
|
The Data Model owns:
|
|
|
|
```text
|
|
Dataset
|
|
DataProduct
|
|
DataObject
|
|
Schema
|
|
Field
|
|
DataElement
|
|
DataFlow
|
|
DataLineage
|
|
DataClassification
|
|
DataQualityRule
|
|
DataContract
|
|
DataDistribution
|
|
```
|
|
|
|
Boundary rule:
|
|
|
|
```text
|
|
Landscape owns the technical and runtime places where data lives or moves.
|
|
Data owns the semantic, structural, quality, classification, and lineage meaning of data.
|
|
```
|
|
|
|
## 3.2 Boundary with Governance
|
|
|
|
The Governance Model owns:
|
|
|
|
```text
|
|
Policy
|
|
Requirement
|
|
Obligation
|
|
Control
|
|
Risk
|
|
Exception
|
|
Evidence
|
|
Review
|
|
Approval
|
|
ComplianceRequirement
|
|
```
|
|
|
|
The Data Model owns data-specific structures that are governed:
|
|
|
|
```text
|
|
RetentionRuleReference
|
|
ProcessingPurpose
|
|
DataClassification
|
|
DataQualityRule
|
|
DataContract
|
|
DataLineage
|
|
```
|
|
|
|
Boundary rule:
|
|
|
|
```text
|
|
Governance defines why data must be governed.
|
|
Data defines what data is and how it is described, classified, measured, and traced.
|
|
```
|
|
|
|
## 3.3 Boundary with Security
|
|
|
|
The Security Model owns:
|
|
|
|
```text
|
|
DataSecurityFinding
|
|
ExposureFinding
|
|
CredentialExposure
|
|
SecurityIncident
|
|
AttackPath
|
|
Mitigation
|
|
```
|
|
|
|
The Data Model owns:
|
|
|
|
```text
|
|
Sensitivity
|
|
Classification
|
|
DataResidency
|
|
DataSubjectCategory
|
|
DataCategory
|
|
DataLineage
|
|
```
|
|
|
|
Security may use these for posture analysis.
|
|
|
|
## 3.4 Boundary with Access Control
|
|
|
|
Access Control owns permissions, grants, authorization decisions, and enforcement.
|
|
|
|
Data owns data resources and classifications that access policies may use.
|
|
|
|
Example:
|
|
|
|
```text
|
|
Dataset classified_as Confidential
|
|
AccessPolicy permits Role to read Dataset
|
|
AuthorizationDecision permits read on Dataset
|
|
```
|
|
|
|
## 3.5 Boundary with Organization
|
|
|
|
Organization owns actors and responsibilities.
|
|
|
|
Data references Organization concepts for:
|
|
|
|
```text
|
|
DataOwner
|
|
DataSteward
|
|
DataCustodian
|
|
DataProducer
|
|
DataConsumer
|
|
DataTrustee
|
|
```
|
|
|
|
## 3.6 Boundary with DevSecOps
|
|
|
|
DevSecOps owns source, build, artifact, pipeline, release, deployment, SBOM, and attestation semantics.
|
|
|
|
Data owns data contracts, schema evolution, migration data, test data, synthetic data, lineage, and data-quality semantics.
|
|
|
|
---
|
|
|
|
# 4. Research Basis and External Alignment
|
|
|
|
This seed standard draws on multiple data-management bodies of knowledge.
|
|
|
|
## 4.1 DAMA-DMBOK
|
|
|
|
DAMA-DMBOK is a broad reference for data management disciplines including data governance, architecture, modeling, storage, security, integration, documents/content, reference/master data, warehousing/BI, metadata, and data quality. InfoTechCanon uses it as a broad mapping and assimilation target, not as a direct controlling model.
|
|
|
|
## 4.2 DCAT
|
|
|
|
W3C DCAT defines a vocabulary for data catalogs. DCAT Version 3 organizes catalog access around datasets, distributions, data services, and dataset series. This is highly relevant for InfoTechCanon catalog, dataset, distribution, and data-service concepts.
|
|
|
|
## 4.3 PROV-O
|
|
|
|
W3C PROV-O models provenance using entities, activities, and agents. This is highly relevant for data lineage, derivation, generation, transformation, and responsibility.
|
|
|
|
## 4.4 ISO/IEC 11179
|
|
|
|
ISO/IEC 11179 provides a metadata registry framework for data elements, naming, identification, definitions, classification, and registration. It is an important mapping target for data element, representation, data definition, code list, and metadata registry concepts.
|
|
|
|
## 4.5 Data Mesh and Data Products
|
|
|
|
Data product thinking emphasizes ownership, discoverability, quality, fitness for use, service-like interfaces, and domain responsibility. InfoTechCanon should support data products without requiring a specific data-mesh organizational model.
|
|
|
|
## 4.6 Data Contracts
|
|
|
|
Data contracts define expectations between producers and consumers around schema, semantics, quality, delivery, compatibility, ownership, and change management. They are critical for reliable information-processing systems.
|
|
|
|
## 4.7 Privacy and Data Protection Practice
|
|
|
|
Privacy and data-protection practice contributes concepts such as personal data, sensitive data, data subject, processing purpose, lawful basis, retention, residency, and minimization. The Data Model provides data semantics, while Governance owns legal obligations and Security owns data exposure and incident semantics.
|
|
|
|
---
|
|
|
|
# 5. Seed Standard Design Stance
|
|
|
|
This standard is a **seed standard**, not a full data-governance or database-design manual.
|
|
|
|
It shall:
|
|
|
|
1. define canonical data semantics,
|
|
2. distinguish data from storage infrastructure,
|
|
3. distinguish dataset, data product, data object, schema, field, and data element,
|
|
4. support data classification, lineage, quality, retention, residency, and processing purpose,
|
|
5. support catalog and discovery concepts,
|
|
6. support data contracts and schema evolution,
|
|
7. support operational, analytical, reference, master, event, and document data,
|
|
8. support mappings to external standards without becoming subordinate to them,
|
|
9. remain markdown-first and agent-retrievable,
|
|
10. and support future assimilation of data standards, platforms, regulations, and product schemas.
|
|
|
|
---
|
|
|
|
# 6. Scope
|
|
|
|
## 6.1 In Scope
|
|
|
|
This standard covers canonical representation of:
|
|
|
|
- data domains,
|
|
- data products,
|
|
- datasets,
|
|
- dataset series,
|
|
- data distributions,
|
|
- data services,
|
|
- data objects,
|
|
- entities,
|
|
- records,
|
|
- fields,
|
|
- attributes,
|
|
- data elements,
|
|
- schemas,
|
|
- schema versions,
|
|
- code lists,
|
|
- reference data,
|
|
- master data references,
|
|
- metadata,
|
|
- catalogs,
|
|
- data lineage,
|
|
- data flows,
|
|
- data transformations,
|
|
- data quality rules,
|
|
- data quality results,
|
|
- data contracts,
|
|
- data classification,
|
|
- sensitivity,
|
|
- confidentiality level,
|
|
- integrity expectation,
|
|
- availability expectation,
|
|
- retention rules as data semantics,
|
|
- data residency,
|
|
- data minimization,
|
|
- processing purpose,
|
|
- data subject categories,
|
|
- data provenance,
|
|
- data ownership and stewardship references,
|
|
- and data lifecycle states.
|
|
|
|
## 6.2 Out of Scope
|
|
|
|
This standard does not fully define:
|
|
|
|
- database engine internals,
|
|
- storage infrastructure,
|
|
- full data warehouse architecture,
|
|
- full analytics modeling,
|
|
- full privacy-law interpretation,
|
|
- full data-governance process,
|
|
- full security incident handling,
|
|
- all ontology modeling,
|
|
- all semantic-web representation,
|
|
- complete ETL/ELT implementation,
|
|
- or every vendor-specific data catalog schema.
|
|
|
|
Those may be mapped, assimilated, profiled, or handled by adjacent standards.
|
|
|
|
---
|
|
|
|
# 7. Normative Language
|
|
|
|
The following terms are used normatively:
|
|
|
|
- **SHALL** indicates a mandatory rule for conformance.
|
|
- **SHOULD** indicates a recommended practice.
|
|
- **MAY** indicates an optional capability.
|
|
- **MUST NOT** indicates a prohibited practice.
|
|
- **SEED** marks a concept defined provisionally here but open to later refinement.
|
|
- **EXTRACT** marks a concept that may later move to a more specialized standard.
|
|
|
|
---
|
|
|
|
# 8. Core Principles
|
|
|
|
## 8.1 Data Is Not Its Store
|
|
|
|
A dataset is not the same thing as a database, bucket, table, file, topic, or API.
|
|
|
|
Storage and runtime locations are Landscape concepts. Data semantics belong here.
|
|
|
|
## 8.2 Dataset Is Not Schema
|
|
|
|
A dataset may have one or more schemas, distributions, versions, contracts, lineage records, and quality expectations.
|
|
|
|
## 8.3 Schema Is Not Meaning
|
|
|
|
A schema describes structure. It does not fully define business meaning, ownership, usage constraints, quality, or purpose.
|
|
|
|
## 8.4 Classification Is First-Class
|
|
|
|
Data classification and sensitivity SHOULD be explicit where data has security, privacy, compliance, operational, or business significance.
|
|
|
|
## 8.5 Lineage Is Evidence-Carrying
|
|
|
|
Lineage SHOULD identify source data, transformations, activities, agents, and derived outputs with confidence and evidence where possible.
|
|
|
|
## 8.6 Data Quality Is Contextual
|
|
|
|
Data quality depends on intended use, domain meaning, contract expectations, and consumer needs.
|
|
|
|
## 8.7 Data Contracts Make Data Reliable
|
|
|
|
Producer-consumer expectations SHOULD be explicit when data is reused across system boundaries.
|
|
|
|
## 8.8 External Standards Are Mapped, Not Obeyed
|
|
|
|
The Data Model MAY map to DAMA-DMBOK, DCAT, PROV-O, ISO/IEC 11179, schema.org, OpenLineage, DataHub, OpenMetadata, dbt, Great Expectations, or similar standards and tools.
|
|
|
|
It MUST NOT subordinate its internal semantics to any single external model.
|
|
|
|
---
|
|
|
|
# 9. Canonical Seed Metadata
|
|
|
|
Every data artifact SHOULD support structured metadata.
|
|
|
|
Recommended front matter:
|
|
|
|
```yaml
|
|
---
|
|
id: itc-data:Dataset
|
|
type: concept
|
|
standard: InfoTechCanonDataModel
|
|
standard_version: RC1-seed
|
|
status: candidate
|
|
canonical_owner: InfoTechCanonDataModel
|
|
preferred_label: Dataset
|
|
related:
|
|
- itc-data:DataProduct
|
|
- itc-data:Schema
|
|
- itc-data:DataDistribution
|
|
- itc-data:DataLineage
|
|
mappings:
|
|
- itc-map:dataset-to-dcat-dataset
|
|
---
|
|
```
|
|
|
|
Recommended artifact statuses:
|
|
|
|
```text
|
|
idea
|
|
draft
|
|
candidate
|
|
release-candidate
|
|
adopted
|
|
stable
|
|
deprecated
|
|
retired
|
|
```
|
|
|
|
Recommended concept statuses:
|
|
|
|
```text
|
|
proposed
|
|
experimental
|
|
candidate
|
|
canonical
|
|
deprecated
|
|
retired
|
|
```
|
|
|
|
---
|
|
|
|
# 10. Root Data Taxonomy
|
|
|
|
```text
|
|
DataEntity
|
|
├── DataAssetEntity
|
|
│ ├── DataDomain
|
|
│ ├── DataProduct
|
|
│ ├── Dataset
|
|
│ ├── DatasetSeries
|
|
│ ├── DataDistribution
|
|
│ ├── DataService
|
|
│ ├── DataObject
|
|
│ ├── Record
|
|
│ └── DocumentData
|
|
├── StructureEntity
|
|
│ ├── Schema
|
|
│ ├── SchemaVersion
|
|
│ ├── Field
|
|
│ ├── Attribute
|
|
│ ├── DataElement
|
|
│ ├── DataElementConcept
|
|
│ ├── Representation
|
|
│ ├── DataType
|
|
│ ├── Constraint
|
|
│ └── CodeList
|
|
├── SemanticEntity
|
|
│ ├── BusinessTerm
|
|
│ ├── GlossaryTerm
|
|
│ ├── ConceptualEntity
|
|
│ ├── DataDefinition
|
|
│ ├── ReferenceData
|
|
│ ├── MasterDataReference
|
|
│ └── CanonicalValue
|
|
├── GovernanceReferenceEntity
|
|
│ ├── DataClassification
|
|
│ ├── Sensitivity
|
|
│ ├── DataCategory
|
|
│ ├── DataSubjectCategory
|
|
│ ├── ProcessingPurpose
|
|
│ ├── RetentionRuleReference
|
|
│ ├── DataResidency
|
|
│ └── DataUsageConstraint
|
|
├── QualityEntity
|
|
│ ├── DataQualityDimension
|
|
│ ├── DataQualityRule
|
|
│ ├── DataQualityCheck
|
|
│ ├── DataQualityResult
|
|
│ ├── DataQualityIssue
|
|
│ └── FitnessForUse
|
|
├── LineageEntity
|
|
│ ├── DataFlow
|
|
│ ├── DataLineage
|
|
│ ├── Transformation
|
|
│ ├── Derivation
|
|
│ ├── SourceDataset
|
|
│ ├── TargetDataset
|
|
│ └── ProvenanceRecord
|
|
├── ContractEntity
|
|
│ ├── DataContract
|
|
│ ├── ProducerExpectation
|
|
│ ├── ConsumerExpectation
|
|
│ ├── CompatibilityRule
|
|
│ ├── BreakingChange
|
|
│ └── SchemaEvolutionPolicy
|
|
└── OperationalDataEntity
|
|
├── DataPipelineReference
|
|
├── DataStoreReference
|
|
├── QueryReference
|
|
├── DataAccessPattern
|
|
├── DataFreshness
|
|
└── DataAvailability
|
|
```
|
|
|
|
---
|
|
|
|
# 11. Core Concepts
|
|
|
|
## 11.1 DataEntity
|
|
|
|
A **DataEntity** is any identifiable concept used to represent data, metadata, structure, classification, quality, lineage, contract, or data lifecycle.
|
|
|
|
Recommended attributes:
|
|
|
|
```yaml
|
|
id:
|
|
entity_type:
|
|
canonical_name:
|
|
display_name:
|
|
lifecycle_state:
|
|
source_system:
|
|
created_at:
|
|
updated_at:
|
|
```
|
|
|
|
Optional attributes:
|
|
|
|
```yaml
|
|
owner:
|
|
steward:
|
|
data_domain:
|
|
classification:
|
|
source_confidence:
|
|
valid_from:
|
|
valid_to:
|
|
tags:
|
|
external_references:
|
|
```
|
|
|
|
---
|
|
|
|
## 11.2 DataDomain
|
|
|
|
A **DataDomain** is a bounded area of data meaning, ownership, stewardship, or subject matter.
|
|
|
|
Examples:
|
|
|
|
```text
|
|
customer
|
|
billing
|
|
product
|
|
identity
|
|
orders
|
|
support
|
|
security
|
|
operations
|
|
finance
|
|
```
|
|
|
|
---
|
|
|
|
## 11.3 DataProduct
|
|
|
|
A **DataProduct** is a managed data asset or set of data assets offered for use by consumers with explicit ownership, quality expectations, documentation, interfaces, and lifecycle.
|
|
|
|
Recommended attributes:
|
|
|
|
```yaml
|
|
owner:
|
|
steward:
|
|
producer:
|
|
consumers:
|
|
service_level_expectations:
|
|
quality_expectations:
|
|
contract:
|
|
distribution_methods:
|
|
```
|
|
|
|
---
|
|
|
|
## 11.4 Dataset
|
|
|
|
A **Dataset** is a coherent collection of data published, managed, processed, analyzed, or consumed as a unit.
|
|
|
|
A dataset may have:
|
|
|
|
```text
|
|
schema
|
|
distribution
|
|
catalog entry
|
|
classification
|
|
lineage
|
|
quality rules
|
|
owner
|
|
steward
|
|
contract
|
|
retention expectation
|
|
```
|
|
|
|
Canonical rule:
|
|
|
|
```text
|
|
Dataset MUST NOT be treated as identical to its storage location.
|
|
```
|
|
|
|
---
|
|
|
|
## 11.5 DatasetSeries
|
|
|
|
A **DatasetSeries** is a sequence or family of related datasets organized over time, version, geography, domain, or release.
|
|
|
|
---
|
|
|
|
## 11.6 DataDistribution
|
|
|
|
A **DataDistribution** is an accessible representation of a dataset.
|
|
|
|
Examples:
|
|
|
|
```text
|
|
CSV file
|
|
Parquet file
|
|
API response
|
|
database table export
|
|
event stream
|
|
report download
|
|
object storage path
|
|
```
|
|
|
|
---
|
|
|
|
## 11.7 DataService
|
|
|
|
A **DataService** is a service that provides access to data or operations over data.
|
|
|
|
Examples:
|
|
|
|
```text
|
|
query API
|
|
data product API
|
|
metadata API
|
|
streaming endpoint
|
|
analytics service
|
|
```
|
|
|
|
---
|
|
|
|
## 11.8 DataObject
|
|
|
|
A **DataObject** is a meaningful object or structure represented in data.
|
|
|
|
Examples:
|
|
|
|
```text
|
|
Customer
|
|
Invoice
|
|
Order
|
|
Payment
|
|
Product
|
|
Device
|
|
UserProfile
|
|
AccessGrant
|
|
SecurityFinding
|
|
```
|
|
|
|
---
|
|
|
|
## 11.9 Record
|
|
|
|
A **Record** is an instance-level representation of data about an entity, event, relationship, or observation.
|
|
|
|
---
|
|
|
|
## 11.10 Field
|
|
|
|
A **Field** is a named component of a schema, record, message, or table.
|
|
|
|
---
|
|
|
|
## 11.11 Attribute
|
|
|
|
An **Attribute** is a property of a data object or conceptual entity.
|
|
|
|
A field may represent an attribute, but field is structural while attribute is semantic.
|
|
|
|
---
|
|
|
|
## 11.12 DataElement
|
|
|
|
A **DataElement** is a defined unit of data with meaning, representation, and expected usage.
|
|
|
|
It may map to ISO/IEC 11179 data element concepts.
|
|
|
|
Recommended attributes:
|
|
|
|
```yaml
|
|
object_class:
|
|
property:
|
|
representation:
|
|
data_type:
|
|
definition:
|
|
permitted_values:
|
|
```
|
|
|
|
---
|
|
|
|
## 11.13 DataElementConcept
|
|
|
|
A **DataElementConcept** is the semantic idea of a data element independent of representation.
|
|
|
|
Example:
|
|
|
|
```text
|
|
Customer birth date
|
|
Invoice total amount
|
|
Repository default branch name
|
|
```
|
|
|
|
---
|
|
|
|
## 11.14 Representation
|
|
|
|
A **Representation** describes how a data element is represented.
|
|
|
|
Examples:
|
|
|
|
```text
|
|
string
|
|
integer
|
|
decimal
|
|
boolean
|
|
date
|
|
timestamp
|
|
code
|
|
identifier
|
|
URI
|
|
```
|
|
|
|
---
|
|
|
|
## 11.15 DataType
|
|
|
|
A **DataType** specifies the technical or logical type of a field or data element.
|
|
|
|
---
|
|
|
|
## 11.16 Constraint
|
|
|
|
A **Constraint** is a rule limiting valid data.
|
|
|
|
Examples:
|
|
|
|
```text
|
|
required
|
|
unique
|
|
minimum
|
|
maximum
|
|
regex
|
|
foreign key
|
|
enum
|
|
format
|
|
cardinality
|
|
```
|
|
|
|
---
|
|
|
|
## 11.17 CodeList
|
|
|
|
A **CodeList** is a controlled set of allowed values with definitions.
|
|
|
|
Examples:
|
|
|
|
```text
|
|
country codes
|
|
currency codes
|
|
status codes
|
|
classification labels
|
|
risk levels
|
|
```
|
|
|
|
---
|
|
|
|
## 11.18 BusinessTerm
|
|
|
|
A **BusinessTerm** is a term used by domain actors to describe data meaning.
|
|
|
|
---
|
|
|
|
## 11.19 GlossaryTerm
|
|
|
|
A **GlossaryTerm** is a documented term in a glossary with definition, synonyms, ownership, and mappings.
|
|
|
|
---
|
|
|
|
## 11.20 DataDefinition
|
|
|
|
A **DataDefinition** is a textual or structured definition explaining the meaning, scope, and intended use of a data concept.
|
|
|
|
---
|
|
|
|
## 11.21 ReferenceData
|
|
|
|
**ReferenceData** is data used to classify, categorize, or constrain other data.
|
|
|
|
Examples:
|
|
|
|
```text
|
|
country list
|
|
currency list
|
|
product category list
|
|
status code list
|
|
business unit list
|
|
```
|
|
|
|
---
|
|
|
|
## 11.22 MasterDataReference
|
|
|
|
A **MasterDataReference** points to a controlled source of core business entities.
|
|
|
|
Examples:
|
|
|
|
```text
|
|
customer master
|
|
product master
|
|
supplier master
|
|
employee master
|
|
```
|
|
|
|
The Data Model references master-data semantics but does not require a specific MDM architecture.
|
|
|
|
---
|
|
|
|
## 11.23 DataClassification
|
|
|
|
A **DataClassification** is a classification assigned to data based on sensitivity, confidentiality, regulatory concern, operational criticality, or business significance.
|
|
|
|
Examples:
|
|
|
|
```text
|
|
public
|
|
internal
|
|
confidential
|
|
restricted
|
|
regulated
|
|
personal
|
|
sensitive personal
|
|
secret
|
|
```
|
|
|
|
---
|
|
|
|
## 11.24 Sensitivity
|
|
|
|
**Sensitivity** indicates potential harm, obligation, or restriction associated with data disclosure, modification, loss, misuse, or processing.
|
|
|
|
---
|
|
|
|
## 11.25 DataCategory
|
|
|
|
A **DataCategory** groups data by semantic, legal, operational, or analytical type.
|
|
|
|
Examples:
|
|
|
|
```text
|
|
personal data
|
|
financial data
|
|
health data
|
|
authentication data
|
|
transaction data
|
|
telemetry data
|
|
metadata
|
|
content data
|
|
```
|
|
|
|
---
|
|
|
|
## 11.26 DataSubjectCategory
|
|
|
|
A **DataSubjectCategory** identifies the kind of person or entity data is about.
|
|
|
|
Examples:
|
|
|
|
```text
|
|
customer
|
|
employee
|
|
applicant
|
|
supplier contact
|
|
child
|
|
patient
|
|
user
|
|
administrator
|
|
```
|
|
|
|
---
|
|
|
|
## 11.27 ProcessingPurpose
|
|
|
|
A **ProcessingPurpose** describes why data is collected, stored, transformed, shared, or used.
|
|
|
|
Examples:
|
|
|
|
```text
|
|
billing
|
|
support
|
|
security monitoring
|
|
analytics
|
|
product improvement
|
|
legal compliance
|
|
identity verification
|
|
```
|
|
|
|
---
|
|
|
|
## 11.28 RetentionRuleReference
|
|
|
|
A **RetentionRuleReference** links data to governance-defined retention obligations, policies, or rules.
|
|
|
|
The Data Model may model retention expectation, but Governance owns the policy and obligation.
|
|
|
|
---
|
|
|
|
## 11.29 DataResidency
|
|
|
|
**DataResidency** describes where data is stored, processed, transferred, or legally required to remain.
|
|
|
|
Examples:
|
|
|
|
```text
|
|
EU
|
|
Germany
|
|
customer region
|
|
cloud region
|
|
on-premises only
|
|
```
|
|
|
|
---
|
|
|
|
## 11.30 DataUsageConstraint
|
|
|
|
A **DataUsageConstraint** describes a restriction on how data may be used.
|
|
|
|
Examples:
|
|
|
|
```text
|
|
not for training
|
|
not for export
|
|
internal analytics only
|
|
production use prohibited
|
|
no cross-border transfer
|
|
only aggregated use
|
|
```
|
|
|
|
---
|
|
|
|
## 11.31 DataQualityDimension
|
|
|
|
A **DataQualityDimension** is an aspect of data quality.
|
|
|
|
Common dimensions:
|
|
|
|
```text
|
|
accuracy
|
|
completeness
|
|
consistency
|
|
timeliness
|
|
validity
|
|
uniqueness
|
|
freshness
|
|
integrity
|
|
fitness_for_use
|
|
```
|
|
|
|
---
|
|
|
|
## 11.32 DataQualityRule
|
|
|
|
A **DataQualityRule** is a testable expectation about data quality.
|
|
|
|
Examples:
|
|
|
|
```text
|
|
customer_id must not be null
|
|
invoice_total must be >= 0
|
|
country_code must be in ISO country code list
|
|
event_timestamp must be within expected delay window
|
|
```
|
|
|
|
---
|
|
|
|
## 11.33 DataQualityCheck
|
|
|
|
A **DataQualityCheck** is an execution of one or more data quality rules.
|
|
|
|
---
|
|
|
|
## 11.34 DataQualityResult
|
|
|
|
A **DataQualityResult** is the outcome of a data quality check.
|
|
|
|
---
|
|
|
|
## 11.35 DataQualityIssue
|
|
|
|
A **DataQualityIssue** is a finding indicating data does not meet a quality rule or fitness expectation.
|
|
|
|
It may create Task Model remediation work.
|
|
|
|
---
|
|
|
|
## 11.36 FitnessForUse
|
|
|
|
**FitnessForUse** is the degree to which data is suitable for a specific purpose or consumer context.
|
|
|
|
---
|
|
|
|
## 11.37 DataFlow
|
|
|
|
A **DataFlow** is movement or transfer of data between sources, systems, stores, services, actors, or processes.
|
|
|
|
---
|
|
|
|
## 11.38 DataLineage
|
|
|
|
**DataLineage** describes the origin, movement, transformation, derivation, and usage path of data.
|
|
|
|
Lineage may include:
|
|
|
|
```text
|
|
source dataset
|
|
transformation
|
|
activity
|
|
agent
|
|
target dataset
|
|
time
|
|
evidence
|
|
confidence
|
|
```
|
|
|
|
---
|
|
|
|
## 11.39 Transformation
|
|
|
|
A **Transformation** is an activity that changes data structure, content, format, aggregation, classification, or meaning.
|
|
|
|
---
|
|
|
|
## 11.40 Derivation
|
|
|
|
A **Derivation** is a relationship where one data entity is derived from another.
|
|
|
|
---
|
|
|
|
## 11.41 ProvenanceRecord
|
|
|
|
A **ProvenanceRecord** records information about how data came to exist, who or what generated it, what activity produced it, and what source influenced it.
|
|
|
|
---
|
|
|
|
## 11.42 DataContract
|
|
|
|
A **DataContract** is an explicit agreement between data producers and consumers about data structure, semantics, quality, delivery, compatibility, ownership, and change expectations.
|
|
|
|
---
|
|
|
|
## 11.43 ProducerExpectation
|
|
|
|
A **ProducerExpectation** describes what a data producer commits to provide.
|
|
|
|
Examples:
|
|
|
|
```text
|
|
schema stability
|
|
freshness
|
|
completeness
|
|
availability
|
|
documentation
|
|
change notice
|
|
```
|
|
|
|
---
|
|
|
|
## 11.44 ConsumerExpectation
|
|
|
|
A **ConsumerExpectation** describes what a data consumer expects or is allowed to assume.
|
|
|
|
---
|
|
|
|
## 11.45 CompatibilityRule
|
|
|
|
A **CompatibilityRule** describes what changes are considered compatible or breaking.
|
|
|
|
---
|
|
|
|
## 11.46 BreakingChange
|
|
|
|
A **BreakingChange** is a data, schema, semantic, quality, or delivery change that violates consumer expectations or compatibility rules.
|
|
|
|
---
|
|
|
|
## 11.47 SchemaEvolutionPolicy
|
|
|
|
A **SchemaEvolutionPolicy** defines rules for how schemas may change over time.
|
|
|
|
---
|
|
|
|
## 11.48 DataStoreReference
|
|
|
|
A **DataStoreReference** points to a Landscape data store or storage resource.
|
|
|
|
Examples:
|
|
|
|
```text
|
|
database
|
|
table
|
|
bucket
|
|
file share
|
|
topic
|
|
queue
|
|
index
|
|
warehouse
|
|
lakehouse table
|
|
```
|
|
|
|
---
|
|
|
|
## 11.49 DataAccessPattern
|
|
|
|
A **DataAccessPattern** describes how data is accessed.
|
|
|
|
Examples:
|
|
|
|
```text
|
|
batch export
|
|
API query
|
|
event stream
|
|
direct database query
|
|
file download
|
|
replication
|
|
analytics dashboard
|
|
```
|
|
|
|
---
|
|
|
|
## 11.50 DataFreshness
|
|
|
|
**DataFreshness** describes how current data is relative to a defined expectation.
|
|
|
|
---
|
|
|
|
## 11.51 DataAvailability
|
|
|
|
**DataAvailability** describes whether data is accessible according to expectations.
|
|
|
|
---
|
|
|
|
# 12. Core Relationship Vocabulary
|
|
|
|
Recommended root relationship types:
|
|
|
|
```text
|
|
contains
|
|
part_of
|
|
describes
|
|
classified_as
|
|
has_schema
|
|
has_field
|
|
has_distribution
|
|
provided_by
|
|
consumed_by
|
|
stored_in
|
|
accessed_via
|
|
flows_to
|
|
derived_from
|
|
generated_by
|
|
transformed_by
|
|
governed_by
|
|
constrained_by
|
|
subject_to
|
|
owned_by
|
|
stewarded_by
|
|
produced_by
|
|
consumed_by
|
|
validated_by
|
|
violates
|
|
satisfies
|
|
maps_to
|
|
```
|
|
|
|
Relationship records SHOULD support:
|
|
|
|
```yaml
|
|
id:
|
|
relationship_type:
|
|
source_entity:
|
|
target_entity:
|
|
scope:
|
|
valid_from:
|
|
valid_to:
|
|
source_system:
|
|
confidence:
|
|
evidence:
|
|
rationale:
|
|
```
|
|
|
|
---
|
|
|
|
# 13. Data State Models
|
|
|
|
## 13.1 Dataset Lifecycle States
|
|
|
|
```text
|
|
proposed
|
|
designed
|
|
active
|
|
deprecated
|
|
retired
|
|
archived
|
|
deleted
|
|
```
|
|
|
|
## 13.2 Schema States
|
|
|
|
```text
|
|
draft
|
|
candidate
|
|
active
|
|
deprecated
|
|
superseded
|
|
retired
|
|
```
|
|
|
|
## 13.3 Data Quality States
|
|
|
|
```text
|
|
unknown
|
|
unchecked
|
|
passing
|
|
warning
|
|
failing
|
|
waived
|
|
remediating
|
|
verified
|
|
```
|
|
|
|
## 13.4 Data Contract States
|
|
|
|
```text
|
|
draft
|
|
under_review
|
|
active
|
|
violated
|
|
deprecated
|
|
superseded
|
|
retired
|
|
```
|
|
|
|
## 13.5 Lineage Confidence States
|
|
|
|
```text
|
|
unknown
|
|
declared
|
|
inferred
|
|
observed
|
|
verified
|
|
conflicting
|
|
```
|
|
|
|
---
|
|
|
|
# 14. Data Patterns
|
|
|
|
## 14.1 Pattern: Data Is Not Its Store
|
|
|
|
**Context:** Teams model data by pointing at tables, buckets, or files.
|
|
|
|
**Problem:** Storage location does not explain semantic meaning, ownership, classification, quality, or lineage.
|
|
|
|
**Solution:** Model Dataset, Schema, Distribution, StoreReference, and Lineage separately.
|
|
|
|
---
|
|
|
|
## 14.2 Pattern: Dataset Catalog Entry
|
|
|
|
**Context:** Data consumers need to discover and understand data.
|
|
|
|
**Problem:** Data assets remain invisible or only known by tribal knowledge.
|
|
|
|
**Solution:** Provide a catalog entry with:
|
|
|
|
```text
|
|
dataset name
|
|
description
|
|
owner
|
|
steward
|
|
classification
|
|
schema
|
|
distribution
|
|
quality expectations
|
|
lineage
|
|
access method
|
|
usage constraints
|
|
```
|
|
|
|
---
|
|
|
|
## 14.3 Pattern: Data Contract at Boundary
|
|
|
|
**Context:** Data crosses a team, service, product, or system boundary.
|
|
|
|
**Problem:** Consumers break when producers change data unexpectedly.
|
|
|
|
**Solution:** Define a DataContract with schema, semantic expectations, quality rules, compatibility rules, and change process.
|
|
|
|
---
|
|
|
|
## 14.4 Pattern: Classification Drives Controls
|
|
|
|
**Context:** Data has different sensitivity and obligations.
|
|
|
|
**Problem:** Systems apply uniform controls or rely on ad hoc judgment.
|
|
|
|
**Solution:** Classify data and map classifications to governance controls, access policies, security measures, and retention expectations.
|
|
|
|
---
|
|
|
|
## 14.5 Pattern: Lineage as Evidence
|
|
|
|
**Context:** A derived dataset is used for decisions or compliance.
|
|
|
|
**Problem:** Consumers cannot determine origin, transformations, or trustworthiness.
|
|
|
|
**Solution:** Model lineage with source datasets, transformations, activities, agents, target datasets, and evidence.
|
|
|
|
---
|
|
|
|
## 14.6 Pattern: Quality Rule to Remediation
|
|
|
|
**Context:** Data quality checks fail.
|
|
|
|
**Problem:** Failures remain dashboards instead of corrective action.
|
|
|
|
**Solution:**
|
|
|
|
```text
|
|
DataQualityRule
|
|
-> DataQualityCheck
|
|
-> DataQualityResult
|
|
-> DataQualityIssue
|
|
-> RemediationTask
|
|
-> VerificationEvidence
|
|
```
|
|
|
|
---
|
|
|
|
## 14.7 Pattern: Semantic Term and Field Split
|
|
|
|
**Context:** Database columns are treated as business terms.
|
|
|
|
**Problem:** Field names do not fully encode business meaning.
|
|
|
|
**Solution:** Link Field to DataElement, BusinessTerm, and DataDefinition.
|
|
|
|
---
|
|
|
|
## 14.8 Pattern: Retention with Governance Reference
|
|
|
|
**Context:** Data must be kept or deleted according to obligations.
|
|
|
|
**Problem:** Retention is encoded as undocumented operational behavior.
|
|
|
|
**Solution:** Link Dataset or DataObject to RetentionRuleReference and keep the governing obligation in Governance.
|
|
|
|
---
|
|
|
|
# 15. Data Profiles
|
|
|
|
## 15.1 Profile Format
|
|
|
|
A Data Profile SHALL declare:
|
|
|
|
```yaml
|
|
id:
|
|
profile_name:
|
|
status:
|
|
implements:
|
|
- InfoTechCanonDataModel
|
|
target_context:
|
|
included_concepts:
|
|
required_relationships:
|
|
required_metadata:
|
|
state_model:
|
|
source_of_truth_rules:
|
|
mapping_files:
|
|
validation_rules:
|
|
examples:
|
|
known_deviations:
|
|
```
|
|
|
|
---
|
|
|
|
## 15.2 Seed Profile: Small SaaS Data Profile
|
|
|
|
Purpose:
|
|
|
|
```text
|
|
Provide a minimal data model for a small SaaS platform moving toward production readiness.
|
|
```
|
|
|
|
Included concepts:
|
|
|
|
```text
|
|
DataDomain
|
|
Dataset
|
|
DataObject
|
|
Schema
|
|
Field
|
|
DataClassification
|
|
DataStoreReference
|
|
DataFlow
|
|
DataQualityRule
|
|
RetentionRuleReference
|
|
DataOwnerReference
|
|
DataStewardReference
|
|
```
|
|
|
|
Required relationships:
|
|
|
|
```text
|
|
Dataset has_schema Schema
|
|
Schema has_field Field
|
|
Dataset classified_as DataClassification
|
|
Dataset stored_in DataStoreReference
|
|
Dataset owned_by DataOwnerReference
|
|
Dataset stewarded_by DataStewardReference
|
|
DataFlow moves Dataset
|
|
RetentionRuleReference applies_to Dataset
|
|
```
|
|
|
|
---
|
|
|
|
## 15.3 Seed Profile: Data Catalog Profile
|
|
|
|
Purpose:
|
|
|
|
```text
|
|
Represent data catalog entries for discoverability and reuse.
|
|
```
|
|
|
|
Included concepts:
|
|
|
|
```text
|
|
Catalog
|
|
Dataset
|
|
DatasetSeries
|
|
DataDistribution
|
|
DataService
|
|
DataOwnerReference
|
|
DataStewardReference
|
|
DataClassification
|
|
DataQualitySummary
|
|
DataLineageSummary
|
|
```
|
|
|
|
Mapping targets:
|
|
|
|
```text
|
|
DCAT
|
|
DCAT-AP
|
|
DataHub
|
|
OpenMetadata
|
|
Amundsen
|
|
Collibra / catalog tools
|
|
```
|
|
|
|
---
|
|
|
|
## 15.4 Seed Profile: Data Contract Profile
|
|
|
|
Purpose:
|
|
|
|
```text
|
|
Represent data producer-consumer agreements.
|
|
```
|
|
|
|
Included concepts:
|
|
|
|
```text
|
|
DataContract
|
|
ProducerExpectation
|
|
ConsumerExpectation
|
|
Schema
|
|
SchemaVersion
|
|
DataQualityRule
|
|
CompatibilityRule
|
|
BreakingChange
|
|
ChangeNotice
|
|
DataContractViolation
|
|
```
|
|
|
|
Required relationships:
|
|
|
|
```text
|
|
DataContract applies_to Dataset
|
|
ProducerExpectation constrains Producer
|
|
ConsumerExpectation informs Consumer
|
|
CompatibilityRule governs SchemaEvolution
|
|
BreakingChange violates DataContract
|
|
```
|
|
|
|
---
|
|
|
|
## 15.5 Seed Profile: Data Lineage Profile
|
|
|
|
Purpose:
|
|
|
|
```text
|
|
Represent lineage across datasets, transformations, pipelines, and systems.
|
|
```
|
|
|
|
Included concepts:
|
|
|
|
```text
|
|
Dataset
|
|
SourceDataset
|
|
TargetDataset
|
|
Transformation
|
|
DataFlow
|
|
DataLineage
|
|
ProvenanceRecord
|
|
DataPipelineReference
|
|
ActivityReference
|
|
AgentReference
|
|
```
|
|
|
|
Mapping targets:
|
|
|
|
```text
|
|
PROV-O
|
|
OpenLineage
|
|
Marquez
|
|
dbt exposures/models/sources
|
|
DataHub lineage
|
|
```
|
|
|
|
---
|
|
|
|
## 15.6 Seed Profile: Privacy-Relevant Data Profile
|
|
|
|
Purpose:
|
|
|
|
```text
|
|
Represent data concepts relevant to privacy, data protection, retention, and processing.
|
|
```
|
|
|
|
Included concepts:
|
|
|
|
```text
|
|
PersonalDataCategory
|
|
SensitiveDataCategory
|
|
DataSubjectCategory
|
|
ProcessingPurpose
|
|
DataResidency
|
|
RetentionRuleReference
|
|
DataUsageConstraint
|
|
DataMinimizationExpectation
|
|
```
|
|
|
|
Governance owns legal obligations and lawful-basis interpretation.
|
|
|
|
---
|
|
|
|
## 15.7 Seed Profile: Analytics Dataset Profile
|
|
|
|
Purpose:
|
|
|
|
```text
|
|
Represent analytical datasets, metrics, dimensions, facts, models, and reports.
|
|
```
|
|
|
|
Included concepts:
|
|
|
|
```text
|
|
Dataset
|
|
Metric
|
|
Dimension
|
|
Fact
|
|
Measure
|
|
AggregationRule
|
|
ReportReference
|
|
DashboardReference
|
|
DataQualityRule
|
|
FreshnessExpectation
|
|
```
|
|
|
|
---
|
|
|
|
# 16. Mapping Model for the Data Standard
|
|
|
|
Mappings relate InfoTechCanon data concepts to external standards, frameworks, products, and regulations.
|
|
|
|
## 16.1 Mapping Types
|
|
|
|
Recommended mapping types:
|
|
|
|
```text
|
|
exactMatch
|
|
closeMatch
|
|
broadMatch
|
|
narrowMatch
|
|
relatedMatch
|
|
conflictMatch
|
|
gapMatch
|
|
derivedFrom
|
|
regulatoryReference
|
|
toolEquivalent
|
|
```
|
|
|
|
## 16.2 Mapping Record
|
|
|
|
Example:
|
|
|
|
```yaml
|
|
id: itc-map:dataset-to-dcat-dataset
|
|
source_concept: itc-data:Dataset
|
|
target_body: W3C DCAT
|
|
target_version: "3"
|
|
target_concept: dcat:Dataset
|
|
mapping_type: closeMatch
|
|
scope:
|
|
- data catalog interoperability
|
|
not_valid_for:
|
|
- all internal schema semantics
|
|
- all data product lifecycle semantics
|
|
rationale: >
|
|
DCAT Dataset is a strong catalog-oriented match for InfoTechCanon Dataset,
|
|
but InfoTechCanon includes additional governance, quality, contract,
|
|
and lineage expectations that may not be required by DCAT.
|
|
confidence: high
|
|
status: candidate
|
|
owner: InfoTechCanonDataModel
|
|
```
|
|
|
|
## 16.3 Seed Mapping Targets
|
|
|
|
The Data Model SHOULD maintain mappings to:
|
|
|
|
```text
|
|
DAMA-DMBOK
|
|
W3C DCAT 3
|
|
DCAT-AP
|
|
W3C PROV-O
|
|
ISO/IEC 11179
|
|
schema.org Dataset
|
|
OpenLineage
|
|
DataHub metadata model
|
|
OpenMetadata
|
|
dbt sources/models/exposures
|
|
Great Expectations
|
|
Apache Atlas
|
|
Collibra / data catalog concepts
|
|
GDPR / privacy-regulation references
|
|
Dublin Core metadata
|
|
SPDX / CycloneDX data references where relevant
|
|
```
|
|
|
|
---
|
|
|
|
# 17. Assimilation Hooks
|
|
|
|
The Data Model SHALL be able to receive new data standards, platforms, regulations, product schemas, and practices through the InfoTechCanon assimilation process.
|
|
|
|
## 17.1 Assimilation Triggers
|
|
|
|
Assimilation may be triggered by:
|
|
|
|
```text
|
|
new data catalog model
|
|
new data lineage standard
|
|
new metadata registry standard
|
|
new privacy regulation
|
|
new data-quality tool
|
|
new data-contract practice
|
|
new data-product pattern
|
|
new analytics modeling method
|
|
new data platform integration
|
|
new recurring data classification conflict
|
|
```
|
|
|
|
## 17.2 Data Assimilation Output
|
|
|
|
A data assimilation SHOULD produce:
|
|
|
|
```text
|
|
source summary
|
|
extracted data concepts
|
|
concept comparison matrix
|
|
gap list
|
|
conflict list
|
|
mapping file
|
|
candidate new concepts
|
|
candidate relationship changes
|
|
candidate pattern changes
|
|
candidate profile changes
|
|
open questions
|
|
```
|
|
|
|
## 17.3 Recommended First Assimilation Candidates
|
|
|
|
```text
|
|
W3C DCAT 3
|
|
PROV-O
|
|
ISO/IEC 11179
|
|
DAMA-DMBOK
|
|
OpenLineage
|
|
DataHub
|
|
OpenMetadata
|
|
Great Expectations
|
|
dbt semantic layer / metadata
|
|
GDPR data categories and processing concepts
|
|
```
|
|
|
|
---
|
|
|
|
# 18. Integration with Other InfoTechCanon Standards
|
|
|
|
## 18.1 Landscape Model
|
|
|
|
Data references Landscape concepts for:
|
|
|
|
```text
|
|
data store
|
|
database
|
|
bucket
|
|
queue
|
|
topic
|
|
pipeline
|
|
runtime service
|
|
application service
|
|
endpoint
|
|
environment
|
|
```
|
|
|
|
## 18.2 Organization Model
|
|
|
|
Data imports organization concepts for:
|
|
|
|
```text
|
|
data owner
|
|
data steward
|
|
data custodian
|
|
data producer
|
|
data consumer
|
|
data trustee
|
|
responsible team
|
|
```
|
|
|
|
## 18.3 Governance Model
|
|
|
|
Data imports governance concepts for:
|
|
|
|
```text
|
|
policy
|
|
retention requirement
|
|
processing obligation
|
|
control
|
|
exception
|
|
evidence
|
|
review
|
|
compliance requirement
|
|
```
|
|
|
|
## 18.4 Security Model
|
|
|
|
Security imports data concepts for:
|
|
|
|
```text
|
|
classification
|
|
sensitivity
|
|
data category
|
|
data subject category
|
|
data exposure
|
|
residency
|
|
data security finding
|
|
```
|
|
|
|
## 18.5 Access Control Model
|
|
|
|
Access Control imports data concepts for:
|
|
|
|
```text
|
|
dataset
|
|
data object
|
|
data classification
|
|
data usage constraint
|
|
data access pattern
|
|
```
|
|
|
|
## 18.6 Task Model
|
|
|
|
Data creates or references tasks such as:
|
|
|
|
```text
|
|
data-quality remediation
|
|
schema migration
|
|
contract review
|
|
lineage clarification
|
|
classification review
|
|
retention cleanup
|
|
data incident investigation
|
|
```
|
|
|
|
## 18.7 Tagging Standard
|
|
|
|
Tagging supports data discovery and classification but must not replace data classification, schema, lineage, quality, or governance records.
|
|
|
|
---
|
|
|
|
# 19. Canon Interface Card Usage
|
|
|
|
Subsystems that implement or produce data knowledge SHOULD publish a Canon Interface Card.
|
|
|
|
Example:
|
|
|
|
```yaml
|
|
subsystem: data-catalog-importer
|
|
implements:
|
|
- InfoTechCanonDataModel
|
|
- DataCatalogProfile
|
|
produces:
|
|
- Dataset
|
|
- Schema
|
|
- Field
|
|
- DataDistribution
|
|
- DataOwnerReference
|
|
consumes:
|
|
- Team
|
|
- DataStoreReference
|
|
- Policy
|
|
relations:
|
|
- Dataset has_schema Schema
|
|
- Schema has_field Field
|
|
- Dataset stored_in DataStoreReference
|
|
- Dataset owned_by Team
|
|
source_of_truth:
|
|
dataset_catalog_entries: data-catalog
|
|
known_deviations:
|
|
- lineage is summary-only
|
|
- data quality checks are imported from separate system
|
|
```
|
|
|
|
---
|
|
|
|
# 20. Retrieval Requirements
|
|
|
|
The Data Model is designed for markdown-based infospaces.
|
|
|
|
## 20.1 Required Retrieval Properties
|
|
|
|
Every major concept SHOULD provide:
|
|
|
|
- stable heading,
|
|
- stable identifier,
|
|
- short definition,
|
|
- longer explanation,
|
|
- examples,
|
|
- distinction notes,
|
|
- relationship examples,
|
|
- mapping hooks,
|
|
- profile references,
|
|
- and common mistakes.
|
|
|
|
## 20.2 Agent Brief
|
|
|
|
A mature Data Model SHOULD include an `agent-brief.md` file with:
|
|
|
|
```text
|
|
purpose
|
|
scope
|
|
owned concepts
|
|
imported concepts
|
|
core distinctions
|
|
do / do not rules
|
|
relationship patterns
|
|
minimal examples
|
|
common mistakes
|
|
profile list
|
|
mapping list
|
|
```
|
|
|
|
## 20.3 Indexes
|
|
|
|
The data information space SHOULD provide indexes by:
|
|
|
|
```text
|
|
concept
|
|
relationship
|
|
data domain
|
|
dataset
|
|
schema
|
|
field
|
|
classification
|
|
quality rule
|
|
lineage
|
|
contract
|
|
profile
|
|
pattern
|
|
mapping target
|
|
status
|
|
source system
|
|
```
|
|
|
|
---
|
|
|
|
# 21. Conformance Levels
|
|
|
|
## 21.1 Reference-Conformant
|
|
|
|
A document or system is reference-conformant if it uses Data Model terminology consistently but does not implement structured metadata or validation rules.
|
|
|
|
## 21.2 Metadata-Conformant
|
|
|
|
A system is metadata-conformant if it uses stable identifiers, concept names, lifecycle states, source metadata, and relationship types.
|
|
|
|
## 21.3 Catalog-Conformant
|
|
|
|
A system is catalog-conformant if datasets, distributions, data services, owners, stewards, descriptions, and classifications are represented.
|
|
|
|
## 21.4 Lineage-Conformant
|
|
|
|
A system is lineage-conformant if it represents data sources, transformations, targets, provenance, and confidence.
|
|
|
|
## 21.5 Quality-Conformant
|
|
|
|
A system is quality-conformant if it represents data quality rules, checks, results, and issues.
|
|
|
|
## 21.6 Contract-Conformant
|
|
|
|
A system is contract-conformant if producer and consumer expectations are represented as DataContracts.
|
|
|
|
## 21.7 Profile-Conformant
|
|
|
|
A system is profile-conformant if it implements a declared Data Profile and passes its validation rules.
|
|
|
|
## 21.8 Assimilation-Conformant
|
|
|
|
A system or repository is assimilation-conformant if it can accept external data concepts through the InfoTechCanon assimilation workflow and produce mappings, gaps, conflicts, and proposed changes.
|
|
|
|
---
|
|
|
|
# 22. Validation Rules
|
|
|
|
Initial validation rules:
|
|
|
|
```text
|
|
VAL-DATA-001: Dataset SHOULD NOT be modeled as identical to DataStoreReference.
|
|
|
|
VAL-DATA-002: Dataset SHOULD have owner or steward reference when used for operational or governed purposes.
|
|
|
|
VAL-DATA-003: Dataset SHOULD have classification when it may contain sensitive, regulated, operationally critical, or business-critical data.
|
|
|
|
VAL-DATA-004: Schema SHOULD have version when used across system boundaries.
|
|
|
|
VAL-DATA-005: Field SHOULD be distinguishable from DataElement where semantic precision matters.
|
|
|
|
VAL-DATA-006: DataQualityRule SHOULD declare the dataset, field, or data object it applies to.
|
|
|
|
VAL-DATA-007: DataQualityResult SHOULD reference the executed rule and check.
|
|
|
|
VAL-DATA-008: DataLineage SHOULD distinguish declared, inferred, observed, and verified lineage.
|
|
|
|
VAL-DATA-009: DataContract SHOULD declare producer, consumer, dataset, schema or semantic expectations, quality expectations, and compatibility rules where applicable.
|
|
|
|
VAL-DATA-010: BreakingChange SHOULD reference the DataContract or CompatibilityRule it violates.
|
|
|
|
VAL-DATA-011: RetentionRuleReference SHOULD point to Governance concepts rather than embedding legal interpretation in Data.
|
|
|
|
VAL-DATA-012: DataResidency SHOULD reference region, jurisdiction, environment, or storage/processing scope where available.
|
|
|
|
VAL-DATA-013: Tags MUST NOT replace DataClassification, Schema, Lineage, Quality, or Contract records.
|
|
|
|
VAL-DATA-014: External data concepts SHOULD be represented through mapping records rather than silently reused.
|
|
|
|
VAL-DATA-015: Profiles MUST NOT redefine canonical concepts. They may constrain them.
|
|
|
|
VAL-DATA-016: Data used for AI training, analytics, or automation SHOULD declare usage constraints and provenance where relevant.
|
|
```
|
|
|
|
---
|
|
|
|
# 23. Anti-Patterns
|
|
|
|
## 23.1 Table Equals Dataset
|
|
|
|
Treating every table as a complete dataset and every dataset as a table.
|
|
|
|
## 23.2 Schema Equals Meaning
|
|
|
|
Assuming column names and types fully define business meaning.
|
|
|
|
## 23.3 Classification by Tag Only
|
|
|
|
Using tags such as `confidential` without a governed DataClassification record.
|
|
|
|
## 23.4 Lineage by Diagram Only
|
|
|
|
Drawing flows without source, transformation, target, evidence, or confidence.
|
|
|
|
## 23.5 Quality Dashboard Graveyard
|
|
|
|
Tracking quality failures without owners, tasks, remediation, or fitness-for-use decisions.
|
|
|
|
## 23.6 Contract-Free Integration
|
|
|
|
Letting consumers depend on producer data without explicit compatibility expectations.
|
|
|
|
## 23.7 Hidden Retention Logic
|
|
|
|
Deleting or keeping data based on undocumented scripts or tribal knowledge.
|
|
|
|
## 23.8 Catalog Without Trust
|
|
|
|
Cataloging datasets without owner, freshness, classification, quality, or lineage.
|
|
|
|
## 23.9 Privacy in Free Text
|
|
|
|
Recording processing purpose, data subject category, residency, or sensitivity as unstructured notes only.
|
|
|
|
## 23.10 Vendor Model Capture
|
|
|
|
Letting one data catalog, warehouse, or governance product define the internal data model.
|
|
|
|
---
|
|
|
|
# 24. Initial Repository Placement
|
|
|
|
Recommended repository layout:
|
|
|
|
```text
|
|
info-tech-canon/
|
|
standards/
|
|
data/
|
|
InfoTechCanonDataModel.md
|
|
agent-brief.md
|
|
concepts/
|
|
relationships/
|
|
patterns/
|
|
profiles/
|
|
mappings/
|
|
assimilation/
|
|
examples/
|
|
validation/
|
|
```
|
|
|
|
Seed files:
|
|
|
|
```text
|
|
standards/data/InfoTechCanonDataModel.md
|
|
standards/data/agent-brief.md
|
|
standards/data/concepts/dataset.md
|
|
standards/data/concepts/data-product.md
|
|
standards/data/concepts/schema.md
|
|
standards/data/concepts/data-element.md
|
|
standards/data/concepts/data-classification.md
|
|
standards/data/concepts/data-lineage.md
|
|
standards/data/concepts/data-quality-rule.md
|
|
standards/data/concepts/data-contract.md
|
|
standards/data/patterns/data-is-not-its-store.md
|
|
standards/data/patterns/dataset-catalog-entry.md
|
|
standards/data/patterns/data-contract-at-boundary.md
|
|
standards/data/patterns/lineage-as-evidence.md
|
|
standards/data/profiles/small-saas-data-profile.md
|
|
standards/data/profiles/data-catalog-profile.md
|
|
standards/data/profiles/data-contract-profile.md
|
|
standards/data/profiles/data-lineage-profile.md
|
|
standards/data/mappings/dcat.yaml
|
|
standards/data/mappings/prov-o.yaml
|
|
standards/data/mappings/iso-11179.yaml
|
|
standards/data/mappings/dama-dmbok.yaml
|
|
```
|
|
|
|
---
|
|
|
|
# 25. Roadmap
|
|
|
|
## Phase 1: Seed Stabilization
|
|
|
|
- Establish this standard as `InfoTechCanonDataModel`.
|
|
- Add seed concepts, relationship vocabulary, patterns, and profiles.
|
|
- Define validation rules.
|
|
- Align with Landscape, Governance, Security, Access Control, Task, and Tagging.
|
|
|
|
## Phase 2: First Assimilations
|
|
|
|
Recommended first assimilations:
|
|
|
|
```text
|
|
W3C DCAT 3
|
|
PROV-O
|
|
ISO/IEC 11179
|
|
DAMA-DMBOK
|
|
OpenLineage
|
|
DataHub
|
|
OpenMetadata
|
|
Great Expectations
|
|
dbt metadata
|
|
GDPR data category concepts
|
|
```
|
|
|
|
## Phase 3: Profile Maturation
|
|
|
|
- Mature Small SaaS Data Profile.
|
|
- Mature Data Catalog Profile.
|
|
- Mature Data Contract Profile.
|
|
- Mature Data Lineage Profile.
|
|
- Mature Privacy-Relevant Data Profile.
|
|
- Mature Analytics Dataset Profile.
|
|
|
|
## Phase 4: Tooling Integration
|
|
|
|
- Generate concept indexes.
|
|
- Generate agent brief.
|
|
- Create machine-readable YAML/JSON exports.
|
|
- Add validation scripts.
|
|
- Integrate data catalog, lineage, data-quality, schema registry, and contract tooling.
|
|
|
|
## Phase 5: Data Intelligence Loop
|
|
|
|
- Connect datasets to services and repositories.
|
|
- Connect classification to access control and security.
|
|
- Connect quality issues to tasks.
|
|
- Connect lineage to provenance and assurance.
|
|
- Connect data contracts to DevSecOps and release workflows.
|
|
- Connect privacy and retention to governance obligations.
|
|
|
|
---
|
|
|
|
# 26. Summary
|
|
|
|
The InfoTechCanon Data Model is the seed standard for representing data as a managed, governed, discoverable, reusable, classifiable, lineage-bearing, and quality-assessable asset.
|
|
|
|
Its most important commitments are:
|
|
|
|
```text
|
|
Separate data from storage.
|
|
|
|
Separate dataset, schema, field, data element, data object, and data product.
|
|
|
|
Treat classification, lineage, quality, retention, residency, and processing purpose as first-class concerns.
|
|
|
|
Use data contracts at producer-consumer boundaries.
|
|
|
|
Import governance, access-control, security, task, tagging, organization, and landscape concepts
|
|
instead of redefining them.
|
|
|
|
Map to DCAT, PROV-O, ISO/IEC 11179, DAMA-DMBOK, OpenLineage, and catalog tools
|
|
without surrendering internal semantic autonomy.
|
|
|
|
Use profiles to make the model practical for SaaS systems, catalogs, contracts,
|
|
lineage, privacy-relevant data, analytics, and AI/agentic workflows.
|
|
```
|
|
|
|
This makes the Data Model a core seed for information architecture, data governance, security posture, AI readiness, analytics reliability, and interoperable information-processing systems.
|