Files
kontextual-engine/workplans/KONT-WP-0013-blob-storage-content-streaming.md

5.8 KiB

id, type, title, domain, repo, status, owner, topic_slug, planning_priority, planning_order, created, updated, state_hub_workstream_id
id type title domain repo status owner topic_slug planning_priority planning_order created updated state_hub_workstream_id
KONT-WP-0013 workplan Blob Storage Deduplication And Content Streaming markitect kontextual-engine completed codex markitect high 13 2026-05-07 2026-05-07 21355091-1ebe-4662-983c-4795deea2adc

KONT-WP-0013: Blob Storage Deduplication And Content Streaming

Purpose

Implement efficient blob handling and content streaming for kontextual-engine. The engine should store and expose representation bytes through governed, deduplicating interfaces while preserving the existing asset/representation/provenance model.

References

  • docs/blob-storage-content-streaming-workplan.md
  • docs/architecture-blueprint.md
  • docs/cmis-deployment-compatibility.md
  • src/kontextual_engine/core/assets.py
  • src/kontextual_engine/core/cmis.py

Boundary

This workplan adds content-addressed blob infrastructure and stream interfaces. It does not introduce AtomPub, SOAP/Web Services, chunk-level deduplication, or a general document-management storage model.

It includes an optional S3 backend as an infrastructure adapter behind the same blob storage port. S3 object keys are digest-derived, so object storage can be used without changing engine semantics or CMIS profile governance.

Architecture Constraint

Blob bytes are infrastructure state. Engine semantics remain attached to AssetRepresentation, AssetVersion, policy decisions, audit records, source references, and lineage. CMIS content streaming must delegate through engine-native content services instead of bypassing governance.

D13.1 - Define blob storage port and blob reference model

id: KONT-WP-0013-T001
status: done
priority: high
state_hub_task_id: "6bb5b49a-cf9f-47ce-86d3-24b47a20a2c6"

Acceptance:

  • A stable blob storage port is defined for put, read/open, stat, exists, and delete-unreferenced operations.
  • Blob references include digest, size, media type where appropriate, storage key, and adapter name.
  • Port contracts support deterministic tests without requiring filesystem or external object storage.

D13.2 - Implement content-addressed local blob adapter

id: KONT-WP-0013-T002
status: done
priority: high
state_hub_task_id: "661386c7-8094-4f0f-928c-c17f5b3a9132"

Acceptance:

  • Local adapter stores blobs by digest-derived path.
  • Writes are idempotent and verify digest/size.
  • Duplicate content does not duplicate stored bytes.
  • Atomic write behavior is covered by tests where practical.

D13.3 - Implement representation content service

id: KONT-WP-0013-T003
status: done
priority: high
state_hub_task_id: "00bc34c5-0f79-47b6-b305-f47311edd3a7"

Acceptance:

  • Service creates AssetRepresentation records from bytes through blob storage.
  • Existing asset service mutation paths can delegate to the content service.
  • Content changes create versions and audit events.
  • Existing opaque storage_ref behavior remains compatible.

D13.4 - Add blob reference accounting and safe cleanup

id: KONT-WP-0013-T004
status: done
priority: medium
state_hub_task_id: "cc4445d9-f773-4337-afd4-aeccc743dc1e"

Acceptance:

  • Referenced blob discovery is deterministic from representation records.
  • Cleanup can identify unreferenced blobs without deleting active content.
  • Dry-run cleanup reports reclaimable bytes and references.

D13.5 - Expose engine-native content stream interfaces

id: KONT-WP-0013-T005
status: done
priority: high
state_hub_task_id: "db0e8a2d-50ce-439c-8393-d65e2fc4bc9e"

Acceptance:

  • Service/runtime methods can fetch representation bytes or stream handles.
  • Response metadata includes digest, size, media type, and representation ID.
  • Policy checks happen before bytes are exposed.
  • Tests cover source, normalized, and derived representation reads.

D13.6 - Integrate CMIS content stream byte semantics

id: KONT-WP-0013-T006
status: done
priority: high
state_hub_task_id: "2f1da1fb-9634-4ba6-931a-3e29394efd37"

Acceptance:

  • CMIS getContentStream can return actual bytes/stream semantics when content is locally available.
  • CMIS setContentStream writes through the deduplicating content service.
  • CMIS descriptors remain available for metadata-only clients.
  • Unsupported external/opaque storage refs produce structured diagnostics.

D13.7 - Document deployment, migration, and capacity posture

id: KONT-WP-0013-T007
status: done
priority: medium
state_hub_task_id: "987ad4f6-8658-4e93-82c2-b9fa0a3a2270"

Acceptance:

  • Blob root configuration and local adapter behavior are documented.
  • Existing representations with opaque storage_ref values have a migration posture.
  • Capacity tests demonstrate dedupe effectiveness and avoid excessive fixture size.
  • Operational cleanup guidance is documented.

Definition Of Done

  • Blob storage port and local content-addressed adapter exist.
  • Representation bytes can be stored, deduplicated, read, and governed.
  • CMIS content stream routes can expose real bytes when available.
  • Existing tests continue to pass.
  • Focused dedupe/content-stream tests cover duplicate content, readback, policy denial, cleanup dry-run, and CMIS integration.

Completion Notes

  • Implemented BlobStorage port with put_bytes, read_bytes, iter_bytes, stat, exists, and delete_unreferenced.
  • Added in-memory, local filesystem, and optional S3 content-addressed adapters.
  • Added governed representation content service for byte-backed representations, chunked streams, policy checks, audit events, and cleanup.
  • Wired CMIS setContentStream and byte stream routes through the content service; repeated content updates now expose the latest source representation.
  • Added tests for dedupe, local/S3 adapter behavior, content-kind reads, policy denial, cleanup dry-run, and CMIS stream integration.