Files
artifact-store/workplans/ARTIFACT-STORE-WP-0006-garbage-collection.md

125 lines
3.5 KiB
Markdown

---
id: ARTIFACT-STORE-WP-0006
type: workplan
title: "Garbage Collection And Reference Counting"
repo: artifact-store
domain: stack
status: done
owner: codex
topic_slug: stack
planning_priority: high
planning_order: 6
created: "2026-05-16"
updated: "2026-05-16"
state_hub_workstream_id: "ccef72e9-a160-45c0-9952-c64be7c8cfa4"
---
# ARTIFACT-STORE-WP-0006: Garbage Collection And Reference Counting
## Purpose
Turn WP-0003 deletion eligibility into actual byte reclamation while
preserving auditability and global content-addressed deduplication. GC
must never delete bytes still referenced by a non-deleted storage
location.
## Constraints
- ADR-0001: content-addressed storage with global deduplication.
- ADR-0002: event log is the source of truth; materialised views are
replayable.
- ADR-0004: byte deletion goes through the data plane, not through
registry-specific backend code.
- WP-0003 deletion eligibility and retention holds are the policy gate.
## Prerequisites
- WP-0001 through WP-0003 done.
- WP-0004 backend SPI delete exists for all configured backends.
## D6.1 - Reference-Counted GC Planner
```task
id: ARTIFACT-STORE-WP-0006-T001
status: done
priority: high
state_hub_task_id: "438ed392-0f07-46cb-a6f5-88ce57b33fce"
```
Acceptance:
- GC selects only packages whose `retention_state.eligible_for_deletion`
is true and `active_hold_id` is null.
- It computes references by `(backend_id, content_address)` across all
non-deleted storage locations.
- It releases an eligible package's storage locations without deleting
bytes that are still referenced elsewhere.
## D6.2 - Byte Deletion And Audit Events
```task
id: ARTIFACT-STORE-WP-0006-T002
status: done
priority: high
state_hub_task_id: "8f512753-c402-480a-8517-990fccf09295"
```
Acceptance:
- When the eligible package set owns the final reference to a content
address, GC calls `DataPlane.delete_object`.
- GC emits replayable audit events for every released storage location,
including whether the physical object was deleted or retained due to
remaining references.
- Replay marks released storage locations as `deleted` and packages as
`garbage_collected` once every storage location for that package is
deleted.
## D6.3 - Operator Command And Docs
```task
id: ARTIFACT-STORE-WP-0006-T003
status: done
priority: medium
state_hub_task_id: "a36dce56-f87b-431a-b875-fc567593ddd3"
```
Acceptance:
- `artifactstore retention gc` runs one GC pass and prints a JSON
summary.
- `docs/OPERATOR.md` documents the safe sequence:
`artifactstore retention sweep` then `artifactstore retention gc`.
- The command is idempotent: running it again after a clean pass does
not delete or rewrite anything.
## D6.4 - Verification Tests
```task
id: ARTIFACT-STORE-WP-0006-T004
status: done
priority: high
state_hub_task_id: "b2a2d94f-bc5a-47ca-b540-920d94bff06e"
```
Acceptance:
- Tests cover unique-object deletion, shared-object reference retention,
hold-protected packages, idempotent reruns, replay, and CLI output.
- Full `pytest`, `ruff`, and `mypy` pass.
## Verification
- Focused tests: `tests/integration/test_garbage_collection.py` and
`tests/integration/test_cli_commands.py` passed.
- `ruff check .` passed.
- `mypy src tests` passed.
## Success criteria
- Expired, unheld packages can be reclaimed without losing bytes still
referenced by retained packages.
- The event log explains every logical release and physical delete.
- A replayed database reconstructs the same `deleted` storage-location
state and `garbage_collected` package status.