Files

tegwick 9883a99f78 Implement infospace scaffold and service baseline

2026-05-23 03:12:02 +02:00

43 KiB

Raw Blame History

InfoTechCanon Data Model

Short Name: ITC-DATA Document Status: Seed Standard Release Candidate 1 Version: RC1-seed Date: 2026-05-22 Repository Context: info-tech-canon Document Type: InfoTechCanon Domain Standard Intended Audience: Data architects, data engineers, data stewards, platform engineers, governance designers, security architects, application architects, product owners, knowledge-system builders, compliance reviewers, AI/analytics teams, and agentic tooling.

1. Purpose

The InfoTechCanon Data Model defines a canonical seed model for representing data as a managed, governed, discoverable, classifiable, lineage-bearing, quality-assessable, and reusable information asset.

It exists to give data its own canonical domain instead of leaving data semantics scattered across landscape, security, governance, DevSecOps, observability, and application models.

This standard provides a canonical vocabulary for:

data domains,
datasets,
data products,
data objects,
records,
fields,
schemas,
data elements,
code lists,
data stores as references,
data flows,
data lineage,
data quality,
metadata,
catalogs,
distributions,
data services,
data classification,
sensitivity,
residency,
retention,
processing purpose,
data ownership and stewardship references,
data contracts,
and data evidence.

2. Position in InfoTechCanon

The Data Model is a domain standard within InfoTechCanon.

It depends on the existing seed standards as follows:

Landscape      = where data is stored, processed, moved, and exposed.
Organization   = data owners, stewards, custodians, producers, consumers.
Governance     = data policies, obligations, controls, evidence, exceptions.
Security       = data exposure, data-security findings, data attack paths.
Access Control = permissions and grants to data resources.
Task           = data-quality work, migration work, remediation, reviews.
Tagging        = lightweight classification and retrieval.
Data           = datasets, schemas, metadata, lineage, quality, classification, retention.

InfoTechCanon
├── InfoTechCanonCore
├── InfoTechCanonLandscapeModel
├── InfoTechCanonOrganizationModel
├── InfoTechCanonGovernanceModel
├── InfoTechCanonTaskModel
├── InfoTechCanonTaggingStandard
├── InfoTechCanonAccessControlModel
├── InfoTechCanonSecurityModel
├── InfoTechCanonDataModel              <-- this standard
├── InfoTechCanonDevSecOpsModel
├── InfoTechCanonNetworkModel
├── InfoTechCanonObservabilityModel
├── InfoTechCanonPatternLanguage
└── Application Profiles

3. Boundary with Adjacent Standards

3.1 Boundary with Landscape

The Landscape Model owns:

DataStore
DatabaseInstance
ObjectBucket
FileShare
Queue
Cache
RuntimeResource
ApplicationService
IntegrationFlow
Endpoint

The Data Model owns:

Dataset
DataProduct
DataObject
Schema
Field
DataElement
DataFlow
DataLineage
DataClassification
DataQualityRule
DataContract
DataDistribution

Boundary rule:

Landscape owns the technical and runtime places where data lives or moves.
Data owns the semantic, structural, quality, classification, and lineage meaning of data.

3.2 Boundary with Governance

The Governance Model owns:

Policy
Requirement
Obligation
Control
Risk
Exception
Evidence
Review
Approval
ComplianceRequirement

The Data Model owns data-specific structures that are governed:

RetentionRuleReference
ProcessingPurpose
DataClassification
DataQualityRule
DataContract
DataLineage

Boundary rule:

Governance defines why data must be governed.
Data defines what data is and how it is described, classified, measured, and traced.

3.3 Boundary with Security

The Security Model owns:

DataSecurityFinding
ExposureFinding
CredentialExposure
SecurityIncident
AttackPath
Mitigation

The Data Model owns:

Sensitivity
Classification
DataResidency
DataSubjectCategory
DataCategory
DataLineage

Security may use these for posture analysis.

3.4 Boundary with Access Control

Access Control owns permissions, grants, authorization decisions, and enforcement.

Data owns data resources and classifications that access policies may use.

Example:

Dataset classified_as Confidential
AccessPolicy permits Role to read Dataset
AuthorizationDecision permits read on Dataset

3.5 Boundary with Organization

Organization owns actors and responsibilities.

Data references Organization concepts for:

DataOwner
DataSteward
DataCustodian
DataProducer
DataConsumer
DataTrustee

3.6 Boundary with DevSecOps

DevSecOps owns source, build, artifact, pipeline, release, deployment, SBOM, and attestation semantics.

Data owns data contracts, schema evolution, migration data, test data, synthetic data, lineage, and data-quality semantics.

4. Research Basis and External Alignment

This seed standard draws on multiple data-management bodies of knowledge.

4.1 DAMA-DMBOK

DAMA-DMBOK is a broad reference for data management disciplines including data governance, architecture, modeling, storage, security, integration, documents/content, reference/master data, warehousing/BI, metadata, and data quality. InfoTechCanon uses it as a broad mapping and assimilation target, not as a direct controlling model.

4.2 DCAT

W3C DCAT defines a vocabulary for data catalogs. DCAT Version 3 organizes catalog access around datasets, distributions, data services, and dataset series. This is highly relevant for InfoTechCanon catalog, dataset, distribution, and data-service concepts.

4.3 PROV-O

W3C PROV-O models provenance using entities, activities, and agents. This is highly relevant for data lineage, derivation, generation, transformation, and responsibility.

4.4 ISO/IEC 11179

ISO/IEC 11179 provides a metadata registry framework for data elements, naming, identification, definitions, classification, and registration. It is an important mapping target for data element, representation, data definition, code list, and metadata registry concepts.

4.5 Data Mesh and Data Products

Data product thinking emphasizes ownership, discoverability, quality, fitness for use, service-like interfaces, and domain responsibility. InfoTechCanon should support data products without requiring a specific data-mesh organizational model.

4.6 Data Contracts

Data contracts define expectations between producers and consumers around schema, semantics, quality, delivery, compatibility, ownership, and change management. They are critical for reliable information-processing systems.

4.7 Privacy and Data Protection Practice

Privacy and data-protection practice contributes concepts such as personal data, sensitive data, data subject, processing purpose, lawful basis, retention, residency, and minimization. The Data Model provides data semantics, while Governance owns legal obligations and Security owns data exposure and incident semantics.

5. Seed Standard Design Stance

This standard is a seed standard, not a full data-governance or database-design manual.

It shall:

define canonical data semantics,
distinguish data from storage infrastructure,
distinguish dataset, data product, data object, schema, field, and data element,
support data classification, lineage, quality, retention, residency, and processing purpose,
support catalog and discovery concepts,
support data contracts and schema evolution,
support operational, analytical, reference, master, event, and document data,
support mappings to external standards without becoming subordinate to them,
remain markdown-first and agent-retrievable,
and support future assimilation of data standards, platforms, regulations, and product schemas.

6. Scope

6.1 In Scope

This standard covers canonical representation of:

data domains,
data products,
datasets,
dataset series,
data distributions,
data services,
data objects,
entities,
records,
fields,
attributes,
data elements,
schemas,
schema versions,
code lists,
reference data,
master data references,
metadata,
catalogs,
data lineage,
data flows,
data transformations,
data quality rules,
data quality results,
data contracts,
data classification,
sensitivity,
confidentiality level,
integrity expectation,
availability expectation,
retention rules as data semantics,
data residency,
data minimization,
processing purpose,
data subject categories,
data provenance,
data ownership and stewardship references,
and data lifecycle states.

6.2 Out of Scope

This standard does not fully define:

database engine internals,
storage infrastructure,
full data warehouse architecture,
full analytics modeling,
full privacy-law interpretation,
full data-governance process,
full security incident handling,
all ontology modeling,
all semantic-web representation,
complete ETL/ELT implementation,
or every vendor-specific data catalog schema.

Those may be mapped, assimilated, profiled, or handled by adjacent standards.

7. Normative Language

The following terms are used normatively:

SHALL indicates a mandatory rule for conformance.
SHOULD indicates a recommended practice.
MAY indicates an optional capability.
MUST NOT indicates a prohibited practice.
SEED marks a concept defined provisionally here but open to later refinement.
EXTRACT marks a concept that may later move to a more specialized standard.

8. Core Principles

8.1 Data Is Not Its Store

A dataset is not the same thing as a database, bucket, table, file, topic, or API.

Storage and runtime locations are Landscape concepts. Data semantics belong here.

8.2 Dataset Is Not Schema

A dataset may have one or more schemas, distributions, versions, contracts, lineage records, and quality expectations.

8.3 Schema Is Not Meaning

A schema describes structure. It does not fully define business meaning, ownership, usage constraints, quality, or purpose.

8.4 Classification Is First-Class

Data classification and sensitivity SHOULD be explicit where data has security, privacy, compliance, operational, or business significance.

8.5 Lineage Is Evidence-Carrying

Lineage SHOULD identify source data, transformations, activities, agents, and derived outputs with confidence and evidence where possible.

8.6 Data Quality Is Contextual

Data quality depends on intended use, domain meaning, contract expectations, and consumer needs.

8.7 Data Contracts Make Data Reliable

Producer-consumer expectations SHOULD be explicit when data is reused across system boundaries.

8.8 External Standards Are Mapped, Not Obeyed

The Data Model MAY map to DAMA-DMBOK, DCAT, PROV-O, ISO/IEC 11179, schema.org, OpenLineage, DataHub, OpenMetadata, dbt, Great Expectations, or similar standards and tools.

It MUST NOT subordinate its internal semantics to any single external model.

9. Canonical Seed Metadata

Every data artifact SHOULD support structured metadata.

Recommended front matter:

---
id: itc-data:Dataset
type: concept
standard: InfoTechCanonDataModel
standard_version: RC1-seed
status: candidate
canonical_owner: InfoTechCanonDataModel
preferred_label: Dataset
related:
  - itc-data:DataProduct
  - itc-data:Schema
  - itc-data:DataDistribution
  - itc-data:DataLineage
mappings:
  - itc-map:dataset-to-dcat-dataset
---

Recommended artifact statuses:

idea
draft
candidate
release-candidate
adopted
stable
deprecated
retired

Recommended concept statuses:

proposed
experimental
candidate
canonical
deprecated
retired

10. Root Data Taxonomy

DataEntity
├── DataAssetEntity
│   ├── DataDomain
│   ├── DataProduct
│   ├── Dataset
│   ├── DatasetSeries
│   ├── DataDistribution
│   ├── DataService
│   ├── DataObject
│   ├── Record
│   └── DocumentData
├── StructureEntity
│   ├── Schema
│   ├── SchemaVersion
│   ├── Field
│   ├── Attribute
│   ├── DataElement
│   ├── DataElementConcept
│   ├── Representation
│   ├── DataType
│   ├── Constraint
│   └── CodeList
├── SemanticEntity
│   ├── BusinessTerm
│   ├── GlossaryTerm
│   ├── ConceptualEntity
│   ├── DataDefinition
│   ├── ReferenceData
│   ├── MasterDataReference
│   └── CanonicalValue
├── GovernanceReferenceEntity
│   ├── DataClassification
│   ├── Sensitivity
│   ├── DataCategory
│   ├── DataSubjectCategory
│   ├── ProcessingPurpose
│   ├── RetentionRuleReference
│   ├── DataResidency
│   └── DataUsageConstraint
├── QualityEntity
│   ├── DataQualityDimension
│   ├── DataQualityRule
│   ├── DataQualityCheck
│   ├── DataQualityResult
│   ├── DataQualityIssue
│   └── FitnessForUse
├── LineageEntity
│   ├── DataFlow
│   ├── DataLineage
│   ├── Transformation
│   ├── Derivation
│   ├── SourceDataset
│   ├── TargetDataset
│   └── ProvenanceRecord
├── ContractEntity
│   ├── DataContract
│   ├── ProducerExpectation
│   ├── ConsumerExpectation
│   ├── CompatibilityRule
│   ├── BreakingChange
│   └── SchemaEvolutionPolicy
└── OperationalDataEntity
    ├── DataPipelineReference
    ├── DataStoreReference
    ├── QueryReference
    ├── DataAccessPattern
    ├── DataFreshness
    └── DataAvailability

11. Core Concepts

11.1 DataEntity

A DataEntity is any identifiable concept used to represent data, metadata, structure, classification, quality, lineage, contract, or data lifecycle.

Recommended attributes:

id:
entity_type:
canonical_name:
display_name:
lifecycle_state:
source_system:
created_at:
updated_at:

Optional attributes:

owner:
steward:
data_domain:
classification:
source_confidence:
valid_from:
valid_to:
tags:
external_references:

11.2 DataDomain

A DataDomain is a bounded area of data meaning, ownership, stewardship, or subject matter.

Examples:

customer
billing
product
identity
orders
support
security
operations
finance

11.3 DataProduct

A DataProduct is a managed data asset or set of data assets offered for use by consumers with explicit ownership, quality expectations, documentation, interfaces, and lifecycle.

Recommended attributes:

owner:
steward:
producer:
consumers:
service_level_expectations:
quality_expectations:
contract:
distribution_methods:

11.4 Dataset

A Dataset is a coherent collection of data published, managed, processed, analyzed, or consumed as a unit.

A dataset may have:

schema
distribution
catalog entry
classification
lineage
quality rules
owner
steward
contract
retention expectation

Canonical rule:

Dataset MUST NOT be treated as identical to its storage location.

11.5 DatasetSeries

A DatasetSeries is a sequence or family of related datasets organized over time, version, geography, domain, or release.

11.6 DataDistribution

A DataDistribution is an accessible representation of a dataset.

Examples:

CSV file
Parquet file
API response
database table export
event stream
report download
object storage path

11.7 DataService

A DataService is a service that provides access to data or operations over data.

Examples:

query API
data product API
metadata API
streaming endpoint
analytics service

11.8 DataObject

A DataObject is a meaningful object or structure represented in data.

Examples:

Customer
Invoice
Order
Payment
Product
Device
UserProfile
AccessGrant
SecurityFinding

11.9 Record

A Record is an instance-level representation of data about an entity, event, relationship, or observation.

11.10 Field

A Field is a named component of a schema, record, message, or table.

11.11 Attribute

An Attribute is a property of a data object or conceptual entity.

A field may represent an attribute, but field is structural while attribute is semantic.

11.12 DataElement

A DataElement is a defined unit of data with meaning, representation, and expected usage.

It may map to ISO/IEC 11179 data element concepts.

Recommended attributes:

object_class:
property:
representation:
data_type:
definition:
permitted_values:

11.13 DataElementConcept

A DataElementConcept is the semantic idea of a data element independent of representation.

Example:

Customer birth date
Invoice total amount
Repository default branch name

11.14 Representation

A Representation describes how a data element is represented.

Examples:

string
integer
decimal
boolean
date
timestamp
code
identifier
URI

11.15 DataType

A DataType specifies the technical or logical type of a field or data element.

11.16 Constraint

A Constraint is a rule limiting valid data.

Examples:

required
unique
minimum
maximum
regex
foreign key
enum
format
cardinality

11.17 CodeList

A CodeList is a controlled set of allowed values with definitions.

Examples:

country codes
currency codes
status codes
classification labels
risk levels

11.18 BusinessTerm

A BusinessTerm is a term used by domain actors to describe data meaning.

11.19 GlossaryTerm

A GlossaryTerm is a documented term in a glossary with definition, synonyms, ownership, and mappings.

11.20 DataDefinition

A DataDefinition is a textual or structured definition explaining the meaning, scope, and intended use of a data concept.

11.21 ReferenceData

ReferenceData is data used to classify, categorize, or constrain other data.

Examples:

country list
currency list
product category list
status code list
business unit list

11.22 MasterDataReference

A MasterDataReference points to a controlled source of core business entities.

Examples:

customer master
product master
supplier master
employee master

The Data Model references master-data semantics but does not require a specific MDM architecture.

11.23 DataClassification

A DataClassification is a classification assigned to data based on sensitivity, confidentiality, regulatory concern, operational criticality, or business significance.

Examples:

public
internal
confidential
restricted
regulated
personal
sensitive personal
secret

11.24 Sensitivity

Sensitivity indicates potential harm, obligation, or restriction associated with data disclosure, modification, loss, misuse, or processing.

11.25 DataCategory

A DataCategory groups data by semantic, legal, operational, or analytical type.

Examples:

personal data
financial data
health data
authentication data
transaction data
telemetry data
metadata
content data

11.26 DataSubjectCategory

A DataSubjectCategory identifies the kind of person or entity data is about.

Examples:

customer
employee
applicant
supplier contact
child
patient
user
administrator

11.27 ProcessingPurpose

A ProcessingPurpose describes why data is collected, stored, transformed, shared, or used.

Examples:

billing
support
security monitoring
analytics
product improvement
legal compliance
identity verification

11.28 RetentionRuleReference

A RetentionRuleReference links data to governance-defined retention obligations, policies, or rules.

The Data Model may model retention expectation, but Governance owns the policy and obligation.

11.29 DataResidency

DataResidency describes where data is stored, processed, transferred, or legally required to remain.

Examples:

EU
Germany
customer region
cloud region
on-premises only

11.30 DataUsageConstraint

A DataUsageConstraint describes a restriction on how data may be used.

Examples:

not for training
not for export
internal analytics only
production use prohibited
no cross-border transfer
only aggregated use

11.31 DataQualityDimension

A DataQualityDimension is an aspect of data quality.

Common dimensions:

accuracy
completeness
consistency
timeliness
validity
uniqueness
freshness
integrity
fitness_for_use

11.32 DataQualityRule

A DataQualityRule is a testable expectation about data quality.

Examples:

customer_id must not be null
invoice_total must be >= 0
country_code must be in ISO country code list
event_timestamp must be within expected delay window

11.33 DataQualityCheck

A DataQualityCheck is an execution of one or more data quality rules.

11.34 DataQualityResult

A DataQualityResult is the outcome of a data quality check.

11.35 DataQualityIssue

A DataQualityIssue is a finding indicating data does not meet a quality rule or fitness expectation.

It may create Task Model remediation work.

11.36 FitnessForUse

FitnessForUse is the degree to which data is suitable for a specific purpose or consumer context.

11.37 DataFlow

A DataFlow is movement or transfer of data between sources, systems, stores, services, actors, or processes.

11.38 DataLineage

DataLineage describes the origin, movement, transformation, derivation, and usage path of data.

Lineage may include:

source dataset
transformation
activity
agent
target dataset
time
evidence
confidence

11.39 Transformation

A Transformation is an activity that changes data structure, content, format, aggregation, classification, or meaning.

11.40 Derivation

A Derivation is a relationship where one data entity is derived from another.

11.41 ProvenanceRecord

A ProvenanceRecord records information about how data came to exist, who or what generated it, what activity produced it, and what source influenced it.

11.42 DataContract

A DataContract is an explicit agreement between data producers and consumers about data structure, semantics, quality, delivery, compatibility, ownership, and change expectations.

11.43 ProducerExpectation

A ProducerExpectation describes what a data producer commits to provide.

Examples:

schema stability
freshness
completeness
availability
documentation
change notice

11.44 ConsumerExpectation

A ConsumerExpectation describes what a data consumer expects or is allowed to assume.

11.45 CompatibilityRule

A CompatibilityRule describes what changes are considered compatible or breaking.

11.46 BreakingChange

A BreakingChange is a data, schema, semantic, quality, or delivery change that violates consumer expectations or compatibility rules.

11.47 SchemaEvolutionPolicy

A SchemaEvolutionPolicy defines rules for how schemas may change over time.

11.48 DataStoreReference

A DataStoreReference points to a Landscape data store or storage resource.

Examples:

database
table
bucket
file share
topic
queue
index
warehouse
lakehouse table

11.49 DataAccessPattern

A DataAccessPattern describes how data is accessed.

Examples:

batch export
API query
event stream
direct database query
file download
replication
analytics dashboard

11.50 DataFreshness

DataFreshness describes how current data is relative to a defined expectation.

11.51 DataAvailability

DataAvailability describes whether data is accessible according to expectations.

12. Core Relationship Vocabulary

Recommended root relationship types:

contains
part_of
describes
classified_as
has_schema
has_field
has_distribution
provided_by
consumed_by
stored_in
accessed_via
flows_to
derived_from
generated_by
transformed_by
governed_by
constrained_by
subject_to
owned_by
stewarded_by
produced_by
consumed_by
validated_by
violates
satisfies
maps_to

Relationship records SHOULD support:

id:
relationship_type:
source_entity:
target_entity:
scope:
valid_from:
valid_to:
source_system:
confidence:
evidence:
rationale:

13. Data State Models

13.1 Dataset Lifecycle States

proposed
designed
active
deprecated
retired
archived
deleted

13.2 Schema States

draft
candidate
active
deprecated
superseded
retired

13.3 Data Quality States

unknown
unchecked
passing
warning
failing
waived
remediating
verified

13.4 Data Contract States

draft
under_review
active
violated
deprecated
superseded
retired

13.5 Lineage Confidence States

unknown
declared
inferred
observed
verified
conflicting

14. Data Patterns

14.1 Pattern: Data Is Not Its Store

Context: Teams model data by pointing at tables, buckets, or files.

Problem: Storage location does not explain semantic meaning, ownership, classification, quality, or lineage.

Solution: Model Dataset, Schema, Distribution, StoreReference, and Lineage separately.

14.2 Pattern: Dataset Catalog Entry

Context: Data consumers need to discover and understand data.

Problem: Data assets remain invisible or only known by tribal knowledge.

Solution: Provide a catalog entry with:

dataset name
description
owner
steward
classification
schema
distribution
quality expectations
lineage
access method
usage constraints

14.3 Pattern: Data Contract at Boundary

Context: Data crosses a team, service, product, or system boundary.

Problem: Consumers break when producers change data unexpectedly.

Solution: Define a DataContract with schema, semantic expectations, quality rules, compatibility rules, and change process.

14.4 Pattern: Classification Drives Controls

Context: Data has different sensitivity and obligations.

Problem: Systems apply uniform controls or rely on ad hoc judgment.

Solution: Classify data and map classifications to governance controls, access policies, security measures, and retention expectations.

14.5 Pattern: Lineage as Evidence

Context: A derived dataset is used for decisions or compliance.

Problem: Consumers cannot determine origin, transformations, or trustworthiness.

Solution: Model lineage with source datasets, transformations, activities, agents, target datasets, and evidence.

14.6 Pattern: Quality Rule to Remediation

Context: Data quality checks fail.

Problem: Failures remain dashboards instead of corrective action.

Solution:

DataQualityRule
  -> DataQualityCheck
  -> DataQualityResult
  -> DataQualityIssue
  -> RemediationTask
  -> VerificationEvidence

14.7 Pattern: Semantic Term and Field Split

Context: Database columns are treated as business terms.

Problem: Field names do not fully encode business meaning.

Solution: Link Field to DataElement, BusinessTerm, and DataDefinition.

14.8 Pattern: Retention with Governance Reference

Context: Data must be kept or deleted according to obligations.

Problem: Retention is encoded as undocumented operational behavior.

Solution: Link Dataset or DataObject to RetentionRuleReference and keep the governing obligation in Governance.

15. Data Profiles

15.1 Profile Format

A Data Profile SHALL declare:

id:
profile_name:
status:
implements:
  - InfoTechCanonDataModel
target_context:
included_concepts:
required_relationships:
required_metadata:
state_model:
source_of_truth_rules:
mapping_files:
validation_rules:
examples:
known_deviations:

15.2 Seed Profile: Small SaaS Data Profile

Purpose:

Provide a minimal data model for a small SaaS platform moving toward production readiness.

Included concepts:

DataDomain
Dataset
DataObject
Schema
Field
DataClassification
DataStoreReference
DataFlow
DataQualityRule
RetentionRuleReference
DataOwnerReference
DataStewardReference

Required relationships:

Dataset has_schema Schema
Schema has_field Field
Dataset classified_as DataClassification
Dataset stored_in DataStoreReference
Dataset owned_by DataOwnerReference
Dataset stewarded_by DataStewardReference
DataFlow moves Dataset
RetentionRuleReference applies_to Dataset

15.3 Seed Profile: Data Catalog Profile

Purpose:

Represent data catalog entries for discoverability and reuse.

Included concepts:

Catalog
Dataset
DatasetSeries
DataDistribution
DataService
DataOwnerReference
DataStewardReference
DataClassification
DataQualitySummary
DataLineageSummary

Mapping targets:

DCAT
DCAT-AP
DataHub
OpenMetadata
Amundsen
Collibra / catalog tools

15.4 Seed Profile: Data Contract Profile

Purpose:

Represent data producer-consumer agreements.

Included concepts:

DataContract
ProducerExpectation
ConsumerExpectation
Schema
SchemaVersion
DataQualityRule
CompatibilityRule
BreakingChange
ChangeNotice
DataContractViolation

Required relationships:

DataContract applies_to Dataset
ProducerExpectation constrains Producer
ConsumerExpectation informs Consumer
CompatibilityRule governs SchemaEvolution
BreakingChange violates DataContract

15.5 Seed Profile: Data Lineage Profile

Purpose:

Represent lineage across datasets, transformations, pipelines, and systems.

Included concepts:

Dataset
SourceDataset
TargetDataset
Transformation
DataFlow
DataLineage
ProvenanceRecord
DataPipelineReference
ActivityReference
AgentReference

Mapping targets:

PROV-O
OpenLineage
Marquez
dbt exposures/models/sources
DataHub lineage

15.6 Seed Profile: Privacy-Relevant Data Profile

Purpose:

Represent data concepts relevant to privacy, data protection, retention, and processing.

Included concepts:

PersonalDataCategory
SensitiveDataCategory
DataSubjectCategory
ProcessingPurpose
DataResidency
RetentionRuleReference
DataUsageConstraint
DataMinimizationExpectation

Governance owns legal obligations and lawful-basis interpretation.

15.7 Seed Profile: Analytics Dataset Profile

Purpose:

Represent analytical datasets, metrics, dimensions, facts, models, and reports.

Included concepts:

Dataset
Metric
Dimension
Fact
Measure
AggregationRule
ReportReference
DashboardReference
DataQualityRule
FreshnessExpectation

16. Mapping Model for the Data Standard

Mappings relate InfoTechCanon data concepts to external standards, frameworks, products, and regulations.

16.1 Mapping Types

Recommended mapping types:

exactMatch
closeMatch
broadMatch
narrowMatch
relatedMatch
conflictMatch
gapMatch
derivedFrom
regulatoryReference
toolEquivalent

16.2 Mapping Record

Example:

id: itc-map:dataset-to-dcat-dataset
source_concept: itc-data:Dataset
target_body: W3C DCAT
target_version: "3"
target_concept: dcat:Dataset
mapping_type: closeMatch
scope:
  - data catalog interoperability
not_valid_for:
  - all internal schema semantics
  - all data product lifecycle semantics
rationale: >
  DCAT Dataset is a strong catalog-oriented match for InfoTechCanon Dataset,
  but InfoTechCanon includes additional governance, quality, contract,
  and lineage expectations that may not be required by DCAT.
confidence: high
status: candidate
owner: InfoTechCanonDataModel

16.3 Seed Mapping Targets

The Data Model SHOULD maintain mappings to:

DAMA-DMBOK
W3C DCAT 3
DCAT-AP
W3C PROV-O
ISO/IEC 11179
schema.org Dataset
OpenLineage
DataHub metadata model
OpenMetadata
dbt sources/models/exposures
Great Expectations
Apache Atlas
Collibra / data catalog concepts
GDPR / privacy-regulation references
Dublin Core metadata
SPDX / CycloneDX data references where relevant

17. Assimilation Hooks

The Data Model SHALL be able to receive new data standards, platforms, regulations, product schemas, and practices through the InfoTechCanon assimilation process.

17.1 Assimilation Triggers

Assimilation may be triggered by:

new data catalog model
new data lineage standard
new metadata registry standard
new privacy regulation
new data-quality tool
new data-contract practice
new data-product pattern
new analytics modeling method
new data platform integration
new recurring data classification conflict

17.2 Data Assimilation Output

A data assimilation SHOULD produce:

source summary
extracted data concepts
concept comparison matrix
gap list
conflict list
mapping file
candidate new concepts
candidate relationship changes
candidate pattern changes
candidate profile changes
open questions

17.3 Recommended First Assimilation Candidates

W3C DCAT 3
PROV-O
ISO/IEC 11179
DAMA-DMBOK
OpenLineage
DataHub
OpenMetadata
Great Expectations
dbt semantic layer / metadata
GDPR data categories and processing concepts

18. Integration with Other InfoTechCanon Standards

18.1 Landscape Model

Data references Landscape concepts for:

data store
database
bucket
queue
topic
pipeline
runtime service
application service
endpoint
environment

18.2 Organization Model

Data imports organization concepts for:

data owner
data steward
data custodian
data producer
data consumer
data trustee
responsible team

18.3 Governance Model

Data imports governance concepts for:

policy
retention requirement
processing obligation
control
exception
evidence
review
compliance requirement

18.4 Security Model

Security imports data concepts for:

classification
sensitivity
data category
data subject category
data exposure
residency
data security finding

18.5 Access Control Model

Access Control imports data concepts for:

dataset
data object
data classification
data usage constraint
data access pattern

18.6 Task Model

Data creates or references tasks such as:

data-quality remediation
schema migration
contract review
lineage clarification
classification review
retention cleanup
data incident investigation

18.7 Tagging Standard

Tagging supports data discovery and classification but must not replace data classification, schema, lineage, quality, or governance records.

19. Canon Interface Card Usage

Subsystems that implement or produce data knowledge SHOULD publish a Canon Interface Card.

Example:

subsystem: data-catalog-importer
implements:
  - InfoTechCanonDataModel
  - DataCatalogProfile
produces:
  - Dataset
  - Schema
  - Field
  - DataDistribution
  - DataOwnerReference
consumes:
  - Team
  - DataStoreReference
  - Policy
relations:
  - Dataset has_schema Schema
  - Schema has_field Field
  - Dataset stored_in DataStoreReference
  - Dataset owned_by Team
source_of_truth:
  dataset_catalog_entries: data-catalog
known_deviations:
  - lineage is summary-only
  - data quality checks are imported from separate system

20. Retrieval Requirements

The Data Model is designed for markdown-based infospaces.

20.1 Required Retrieval Properties

Every major concept SHOULD provide:

stable heading,
stable identifier,
short definition,
longer explanation,
examples,
distinction notes,
relationship examples,
mapping hooks,
profile references,
and common mistakes.

20.2 Agent Brief

A mature Data Model SHOULD include an agent-brief.md file with:

purpose
scope
owned concepts
imported concepts
core distinctions
do / do not rules
relationship patterns
minimal examples
common mistakes
profile list
mapping list

20.3 Indexes

The data information space SHOULD provide indexes by:

concept
relationship
data domain
dataset
schema
field
classification
quality rule
lineage
contract
profile
pattern
mapping target
status
source system

21. Conformance Levels

21.1 Reference-Conformant

A document or system is reference-conformant if it uses Data Model terminology consistently but does not implement structured metadata or validation rules.

21.2 Metadata-Conformant

A system is metadata-conformant if it uses stable identifiers, concept names, lifecycle states, source metadata, and relationship types.

21.3 Catalog-Conformant

A system is catalog-conformant if datasets, distributions, data services, owners, stewards, descriptions, and classifications are represented.

21.4 Lineage-Conformant

A system is lineage-conformant if it represents data sources, transformations, targets, provenance, and confidence.

21.5 Quality-Conformant

A system is quality-conformant if it represents data quality rules, checks, results, and issues.

21.6 Contract-Conformant

A system is contract-conformant if producer and consumer expectations are represented as DataContracts.

21.7 Profile-Conformant

A system is profile-conformant if it implements a declared Data Profile and passes its validation rules.

21.8 Assimilation-Conformant

A system or repository is assimilation-conformant if it can accept external data concepts through the InfoTechCanon assimilation workflow and produce mappings, gaps, conflicts, and proposed changes.

22. Validation Rules

Initial validation rules:

VAL-DATA-001: Dataset SHOULD NOT be modeled as identical to DataStoreReference.

VAL-DATA-002: Dataset SHOULD have owner or steward reference when used for operational or governed purposes.

VAL-DATA-003: Dataset SHOULD have classification when it may contain sensitive, regulated, operationally critical, or business-critical data.

VAL-DATA-004: Schema SHOULD have version when used across system boundaries.

VAL-DATA-005: Field SHOULD be distinguishable from DataElement where semantic precision matters.

VAL-DATA-006: DataQualityRule SHOULD declare the dataset, field, or data object it applies to.

VAL-DATA-007: DataQualityResult SHOULD reference the executed rule and check.

VAL-DATA-008: DataLineage SHOULD distinguish declared, inferred, observed, and verified lineage.

VAL-DATA-009: DataContract SHOULD declare producer, consumer, dataset, schema or semantic expectations, quality expectations, and compatibility rules where applicable.

VAL-DATA-010: BreakingChange SHOULD reference the DataContract or CompatibilityRule it violates.

VAL-DATA-011: RetentionRuleReference SHOULD point to Governance concepts rather than embedding legal interpretation in Data.

VAL-DATA-012: DataResidency SHOULD reference region, jurisdiction, environment, or storage/processing scope where available.

VAL-DATA-013: Tags MUST NOT replace DataClassification, Schema, Lineage, Quality, or Contract records.

VAL-DATA-014: External data concepts SHOULD be represented through mapping records rather than silently reused.

VAL-DATA-015: Profiles MUST NOT redefine canonical concepts. They may constrain them.

VAL-DATA-016: Data used for AI training, analytics, or automation SHOULD declare usage constraints and provenance where relevant.

23. Anti-Patterns

23.1 Table Equals Dataset

Treating every table as a complete dataset and every dataset as a table.

23.2 Schema Equals Meaning

Assuming column names and types fully define business meaning.

23.3 Classification by Tag Only

Using tags such as confidential without a governed DataClassification record.

23.4 Lineage by Diagram Only

Drawing flows without source, transformation, target, evidence, or confidence.

23.5 Quality Dashboard Graveyard

Tracking quality failures without owners, tasks, remediation, or fitness-for-use decisions.

23.6 Contract-Free Integration

Letting consumers depend on producer data without explicit compatibility expectations.

23.7 Hidden Retention Logic

Deleting or keeping data based on undocumented scripts or tribal knowledge.

23.8 Catalog Without Trust

Cataloging datasets without owner, freshness, classification, quality, or lineage.

23.9 Privacy in Free Text

Recording processing purpose, data subject category, residency, or sensitivity as unstructured notes only.

23.10 Vendor Model Capture

Letting one data catalog, warehouse, or governance product define the internal data model.

24. Initial Repository Placement

Recommended repository layout:

info-tech-canon/
  standards/
    data/
      InfoTechCanonDataModel.md
      agent-brief.md
      concepts/
      relationships/
      patterns/
      profiles/
      mappings/
      assimilation/
      examples/
      validation/

Seed files:

standards/data/InfoTechCanonDataModel.md
standards/data/agent-brief.md
standards/data/concepts/dataset.md
standards/data/concepts/data-product.md
standards/data/concepts/schema.md
standards/data/concepts/data-element.md
standards/data/concepts/data-classification.md
standards/data/concepts/data-lineage.md
standards/data/concepts/data-quality-rule.md
standards/data/concepts/data-contract.md
standards/data/patterns/data-is-not-its-store.md
standards/data/patterns/dataset-catalog-entry.md
standards/data/patterns/data-contract-at-boundary.md
standards/data/patterns/lineage-as-evidence.md
standards/data/profiles/small-saas-data-profile.md
standards/data/profiles/data-catalog-profile.md
standards/data/profiles/data-contract-profile.md
standards/data/profiles/data-lineage-profile.md
standards/data/mappings/dcat.yaml
standards/data/mappings/prov-o.yaml
standards/data/mappings/iso-11179.yaml
standards/data/mappings/dama-dmbok.yaml

25. Roadmap

Phase 1: Seed Stabilization

Establish this standard as InfoTechCanonDataModel.
Add seed concepts, relationship vocabulary, patterns, and profiles.
Define validation rules.
Align with Landscape, Governance, Security, Access Control, Task, and Tagging.

Phase 2: First Assimilations

Recommended first assimilations:

W3C DCAT 3
PROV-O
ISO/IEC 11179
DAMA-DMBOK
OpenLineage
DataHub
OpenMetadata
Great Expectations
dbt metadata
GDPR data category concepts

Phase 3: Profile Maturation

Mature Small SaaS Data Profile.
Mature Data Catalog Profile.
Mature Data Contract Profile.
Mature Data Lineage Profile.
Mature Privacy-Relevant Data Profile.
Mature Analytics Dataset Profile.

Phase 4: Tooling Integration

Generate concept indexes.
Generate agent brief.
Create machine-readable YAML/JSON exports.
Add validation scripts.
Integrate data catalog, lineage, data-quality, schema registry, and contract tooling.

Phase 5: Data Intelligence Loop

Connect datasets to services and repositories.
Connect classification to access control and security.
Connect quality issues to tasks.
Connect lineage to provenance and assurance.
Connect data contracts to DevSecOps and release workflows.
Connect privacy and retention to governance obligations.

26. Summary

The InfoTechCanon Data Model is the seed standard for representing data as a managed, governed, discoverable, reusable, classifiable, lineage-bearing, and quality-assessable asset.

Its most important commitments are:

Separate data from storage.

Separate dataset, schema, field, data element, data object, and data product.

Treat classification, lineage, quality, retention, residency, and processing purpose as first-class concerns.

Use data contracts at producer-consumer boundaries.

Import governance, access-control, security, task, tagging, organization, and landscape concepts
instead of redefining them.

Map to DCAT, PROV-O, ISO/IEC 11179, DAMA-DMBOK, OpenLineage, and catalog tools
without surrendering internal semantic autonomy.

Use profiles to make the model practical for SaaS systems, catalogs, contracts,
lineage, privacy-relevant data, analytics, and AI/agentic workflows.

This makes the Data Model a core seed for information architecture, data governance, security posture, AI readiness, analytics reliability, and interoperable information-processing systems.

43 KiB Raw Blame History

InfoTechCanon Data Model

1. Purpose

2. Position in InfoTechCanon

3. Boundary with Adjacent Standards

3.1 Boundary with Landscape

3.2 Boundary with Governance

3.3 Boundary with Security

3.4 Boundary with Access Control

3.5 Boundary with Organization

3.6 Boundary with DevSecOps

4. Research Basis and External Alignment

4.1 DAMA-DMBOK

4.2 DCAT

4.3 PROV-O

4.4 ISO/IEC 11179

4.5 Data Mesh and Data Products

4.6 Data Contracts

4.7 Privacy and Data Protection Practice

5. Seed Standard Design Stance

6. Scope

6.1 In Scope

6.2 Out of Scope

7. Normative Language

8. Core Principles

8.1 Data Is Not Its Store

8.2 Dataset Is Not Schema

8.3 Schema Is Not Meaning

8.4 Classification Is First-Class

8.5 Lineage Is Evidence-Carrying

8.6 Data Quality Is Contextual

8.7 Data Contracts Make Data Reliable

8.8 External Standards Are Mapped, Not Obeyed

9. Canonical Seed Metadata

10. Root Data Taxonomy

11. Core Concepts

11.1 DataEntity

11.2 DataDomain

11.3 DataProduct

11.4 Dataset

11.5 DatasetSeries

11.6 DataDistribution

11.7 DataService

11.8 DataObject

11.9 Record

11.10 Field

11.11 Attribute

11.12 DataElement

11.13 DataElementConcept

11.14 Representation

11.15 DataType

11.16 Constraint

11.17 CodeList

11.18 BusinessTerm

11.19 GlossaryTerm

11.20 DataDefinition

11.21 ReferenceData

11.22 MasterDataReference

11.23 DataClassification

11.24 Sensitivity

11.25 DataCategory

11.26 DataSubjectCategory

11.27 ProcessingPurpose

11.28 RetentionRuleReference

11.29 DataResidency

11.30 DataUsageConstraint

11.31 DataQualityDimension

11.32 DataQualityRule

11.33 DataQualityCheck

11.34 DataQualityResult

11.35 DataQualityIssue

11.36 FitnessForUse

11.37 DataFlow

11.38 DataLineage

11.39 Transformation

11.40 Derivation

11.41 ProvenanceRecord

11.42 DataContract

11.43 ProducerExpectation

11.44 ConsumerExpectation

43 KiB

Raw Blame History