Files
artifact-store/workplans/ARTIFACT-STORE-WP-0006-garbage-collection.md

3.5 KiB

id, type, title, repo, domain, status, owner, topic_slug, planning_priority, planning_order, created, updated, state_hub_workstream_id
id type title repo domain status owner topic_slug planning_priority planning_order created updated state_hub_workstream_id
ARTIFACT-STORE-WP-0006 workplan Garbage Collection And Reference Counting artifact-store stack done codex stack high 6 2026-05-16 2026-05-16 ccef72e9-a160-45c0-9952-c64be7c8cfa4

ARTIFACT-STORE-WP-0006: Garbage Collection And Reference Counting

Purpose

Turn WP-0003 deletion eligibility into actual byte reclamation while preserving auditability and global content-addressed deduplication. GC must never delete bytes still referenced by a non-deleted storage location.

Constraints

  • ADR-0001: content-addressed storage with global deduplication.
  • ADR-0002: event log is the source of truth; materialised views are replayable.
  • ADR-0004: byte deletion goes through the data plane, not through registry-specific backend code.
  • WP-0003 deletion eligibility and retention holds are the policy gate.

Prerequisites

  • WP-0001 through WP-0003 done.
  • WP-0004 backend SPI delete exists for all configured backends.

D6.1 - Reference-Counted GC Planner

id: ARTIFACT-STORE-WP-0006-T001
status: done
priority: high
state_hub_task_id: "438ed392-0f07-46cb-a6f5-88ce57b33fce"

Acceptance:

  • GC selects only packages whose retention_state.eligible_for_deletion is true and active_hold_id is null.
  • It computes references by (backend_id, content_address) across all non-deleted storage locations.
  • It releases an eligible package's storage locations without deleting bytes that are still referenced elsewhere.

D6.2 - Byte Deletion And Audit Events

id: ARTIFACT-STORE-WP-0006-T002
status: done
priority: high
state_hub_task_id: "8f512753-c402-480a-8517-990fccf09295"

Acceptance:

  • When the eligible package set owns the final reference to a content address, GC calls DataPlane.delete_object.
  • GC emits replayable audit events for every released storage location, including whether the physical object was deleted or retained due to remaining references.
  • Replay marks released storage locations as deleted and packages as garbage_collected once every storage location for that package is deleted.

D6.3 - Operator Command And Docs

id: ARTIFACT-STORE-WP-0006-T003
status: done
priority: medium
state_hub_task_id: "a36dce56-f87b-431a-b875-fc567593ddd3"

Acceptance:

  • artifactstore retention gc runs one GC pass and prints a JSON summary.
  • docs/OPERATOR.md documents the safe sequence: artifactstore retention sweep then artifactstore retention gc.
  • The command is idempotent: running it again after a clean pass does not delete or rewrite anything.

D6.4 - Verification Tests

id: ARTIFACT-STORE-WP-0006-T004
status: done
priority: high
state_hub_task_id: "b2a2d94f-bc5a-47ca-b540-920d94bff06e"

Acceptance:

  • Tests cover unique-object deletion, shared-object reference retention, hold-protected packages, idempotent reruns, replay, and CLI output.
  • Full pytest, ruff, and mypy pass.

Verification

  • Focused tests: tests/integration/test_garbage_collection.py and tests/integration/test_cli_commands.py passed.
  • ruff check . passed.
  • mypy src tests passed.

Success criteria

  • Expired, unheld packages can be reclaimed without losing bytes still referenced by retained packages.
  • The event log explains every logical release and physical delete.
  • A replayed database reconstructs the same deleted storage-location state and garbage_collected package status.