5.8 KiB
id, type, title, domain, repo, status, owner, topic_slug, planning_priority, planning_order, created, updated, state_hub_workstream_id
| id | type | title | domain | repo | status | owner | topic_slug | planning_priority | planning_order | created | updated | state_hub_workstream_id |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| KONT-WP-0013 | workplan | Blob Storage Deduplication And Content Streaming | markitect | kontextual-engine | completed | codex | markitect | high | 13 | 2026-05-07 | 2026-05-07 | 21355091-1ebe-4662-983c-4795deea2adc |
KONT-WP-0013: Blob Storage Deduplication And Content Streaming
Purpose
Implement efficient blob handling and content streaming for
kontextual-engine. The engine should store and expose representation bytes
through governed, deduplicating interfaces while preserving the existing
asset/representation/provenance model.
References
docs/blob-storage-content-streaming-workplan.mddocs/architecture-blueprint.mddocs/cmis-deployment-compatibility.mdsrc/kontextual_engine/core/assets.pysrc/kontextual_engine/core/cmis.py
Boundary
This workplan adds content-addressed blob infrastructure and stream interfaces. It does not introduce AtomPub, SOAP/Web Services, chunk-level deduplication, or a general document-management storage model.
It includes an optional S3 backend as an infrastructure adapter behind the same blob storage port. S3 object keys are digest-derived, so object storage can be used without changing engine semantics or CMIS profile governance.
Architecture Constraint
Blob bytes are infrastructure state. Engine semantics remain attached to
AssetRepresentation, AssetVersion, policy decisions, audit records,
source references, and lineage. CMIS content streaming must delegate through
engine-native content services instead of bypassing governance.
D13.1 - Define blob storage port and blob reference model
id: KONT-WP-0013-T001
status: done
priority: high
state_hub_task_id: "6bb5b49a-cf9f-47ce-86d3-24b47a20a2c6"
Acceptance:
- A stable blob storage port is defined for put, read/open, stat, exists, and delete-unreferenced operations.
- Blob references include digest, size, media type where appropriate, storage key, and adapter name.
- Port contracts support deterministic tests without requiring filesystem or external object storage.
D13.2 - Implement content-addressed local blob adapter
id: KONT-WP-0013-T002
status: done
priority: high
state_hub_task_id: "661386c7-8094-4f0f-928c-c17f5b3a9132"
Acceptance:
- Local adapter stores blobs by digest-derived path.
- Writes are idempotent and verify digest/size.
- Duplicate content does not duplicate stored bytes.
- Atomic write behavior is covered by tests where practical.
D13.3 - Implement representation content service
id: KONT-WP-0013-T003
status: done
priority: high
state_hub_task_id: "00bc34c5-0f79-47b6-b305-f47311edd3a7"
Acceptance:
- Service creates
AssetRepresentationrecords from bytes through blob storage. - Existing asset service mutation paths can delegate to the content service.
- Content changes create versions and audit events.
- Existing opaque
storage_refbehavior remains compatible.
D13.4 - Add blob reference accounting and safe cleanup
id: KONT-WP-0013-T004
status: done
priority: medium
state_hub_task_id: "cc4445d9-f773-4337-afd4-aeccc743dc1e"
Acceptance:
- Referenced blob discovery is deterministic from representation records.
- Cleanup can identify unreferenced blobs without deleting active content.
- Dry-run cleanup reports reclaimable bytes and references.
D13.5 - Expose engine-native content stream interfaces
id: KONT-WP-0013-T005
status: done
priority: high
state_hub_task_id: "db0e8a2d-50ce-439c-8393-d65e2fc4bc9e"
Acceptance:
- Service/runtime methods can fetch representation bytes or stream handles.
- Response metadata includes digest, size, media type, and representation ID.
- Policy checks happen before bytes are exposed.
- Tests cover source, normalized, and derived representation reads.
D13.6 - Integrate CMIS content stream byte semantics
id: KONT-WP-0013-T006
status: done
priority: high
state_hub_task_id: "2f1da1fb-9634-4ba6-931a-3e29394efd37"
Acceptance:
- CMIS
getContentStreamcan return actual bytes/stream semantics when content is locally available. - CMIS
setContentStreamwrites through the deduplicating content service. - CMIS descriptors remain available for metadata-only clients.
- Unsupported external/opaque storage refs produce structured diagnostics.
D13.7 - Document deployment, migration, and capacity posture
id: KONT-WP-0013-T007
status: done
priority: medium
state_hub_task_id: "987ad4f6-8658-4e93-82c2-b9fa0a3a2270"
Acceptance:
- Blob root configuration and local adapter behavior are documented.
- Existing representations with opaque
storage_refvalues have a migration posture. - Capacity tests demonstrate dedupe effectiveness and avoid excessive fixture size.
- Operational cleanup guidance is documented.
Definition Of Done
- Blob storage port and local content-addressed adapter exist.
- Representation bytes can be stored, deduplicated, read, and governed.
- CMIS content stream routes can expose real bytes when available.
- Existing tests continue to pass.
- Focused dedupe/content-stream tests cover duplicate content, readback, policy denial, cleanup dry-run, and CMIS integration.
Completion Notes
- Implemented
BlobStorageport withput_bytes,read_bytes,iter_bytes,stat,exists, anddelete_unreferenced. - Added in-memory, local filesystem, and optional S3 content-addressed adapters.
- Added governed representation content service for byte-backed representations, chunked streams, policy checks, audit events, and cleanup.
- Wired CMIS
setContentStreamand byte stream routes through the content service; repeated content updates now expose the latest source representation. - Added tests for dedupe, local/S3 adapter behavior, content-kind reads, policy denial, cleanup dry-run, and CMIS stream integration.