--- id: ARTIFACT-STORE-WP-0006 type: workplan title: "Garbage Collection And Reference Counting" repo: artifact-store domain: stack status: done owner: codex topic_slug: stack planning_priority: high planning_order: 6 created: "2026-05-16" updated: "2026-05-16" state_hub_workstream_id: "ccef72e9-a160-45c0-9952-c64be7c8cfa4" --- # ARTIFACT-STORE-WP-0006: Garbage Collection And Reference Counting ## Purpose Turn WP-0003 deletion eligibility into actual byte reclamation while preserving auditability and global content-addressed deduplication. GC must never delete bytes still referenced by a non-deleted storage location. ## Constraints - ADR-0001: content-addressed storage with global deduplication. - ADR-0002: event log is the source of truth; materialised views are replayable. - ADR-0004: byte deletion goes through the data plane, not through registry-specific backend code. - WP-0003 deletion eligibility and retention holds are the policy gate. ## Prerequisites - WP-0001 through WP-0003 done. - WP-0004 backend SPI delete exists for all configured backends. ## D6.1 - Reference-Counted GC Planner ```task id: ARTIFACT-STORE-WP-0006-T001 status: done priority: high state_hub_task_id: "438ed392-0f07-46cb-a6f5-88ce57b33fce" ``` Acceptance: - GC selects only packages whose `retention_state.eligible_for_deletion` is true and `active_hold_id` is null. - It computes references by `(backend_id, content_address)` across all non-deleted storage locations. - It releases an eligible package's storage locations without deleting bytes that are still referenced elsewhere. ## D6.2 - Byte Deletion And Audit Events ```task id: ARTIFACT-STORE-WP-0006-T002 status: done priority: high state_hub_task_id: "8f512753-c402-480a-8517-990fccf09295" ``` Acceptance: - When the eligible package set owns the final reference to a content address, GC calls `DataPlane.delete_object`. - GC emits replayable audit events for every released storage location, including whether the physical object was deleted or retained due to remaining references. - Replay marks released storage locations as `deleted` and packages as `garbage_collected` once every storage location for that package is deleted. ## D6.3 - Operator Command And Docs ```task id: ARTIFACT-STORE-WP-0006-T003 status: done priority: medium state_hub_task_id: "a36dce56-f87b-431a-b875-fc567593ddd3" ``` Acceptance: - `artifactstore retention gc` runs one GC pass and prints a JSON summary. - `docs/OPERATOR.md` documents the safe sequence: `artifactstore retention sweep` then `artifactstore retention gc`. - The command is idempotent: running it again after a clean pass does not delete or rewrite anything. ## D6.4 - Verification Tests ```task id: ARTIFACT-STORE-WP-0006-T004 status: done priority: high state_hub_task_id: "b2a2d94f-bc5a-47ca-b540-920d94bff06e" ``` Acceptance: - Tests cover unique-object deletion, shared-object reference retention, hold-protected packages, idempotent reruns, replay, and CLI output. - Full `pytest`, `ruff`, and `mypy` pass. ## Verification - Focused tests: `tests/integration/test_garbage_collection.py` and `tests/integration/test_cli_commands.py` passed. - `ruff check .` passed. - `mypy src tests` passed. ## Success criteria - Expired, unheld packages can be reclaimed without losing bytes still referenced by retained packages. - The event log explains every logical release and physical delete. - A replayed database reconstructs the same `deleted` storage-location state and `garbage_collected` package status.