Files
phase-memory/workplans/PMEM-WP-0006-retrieval-activation-quality-and-evaluation.md

239 lines
6.7 KiB
Markdown

---
id: PMEM-WP-0006
type: workplan
title: "Retrieval, Activation Quality, And Evaluation"
domain: markitect
repo: phase-memory
status: finished
owner: phase-memory
topic_slug: activation-quality
planning_priority: P2
planning_order: 60
related_workplans:
- PMEM-WP-0002
- PMEM-WP-0005
created: "2026-05-18"
updated: "2026-05-18"
state_hub_workstream_id: "68417e46-d0a4-4168-bdd6-9ba6ac7847c1"
---
# PMEM-WP-0006: Retrieval, Activation Quality, And Evaluation
## Goal
Move activation planning beyond deterministic id ordering into explainable,
policy-aware memory retrieval and evaluation.
INTENT.md expects activation memory to compile graph neighborhoods, decision
paths, conversation episodes, and knowledge slices into bounded context
packages. The current activation planner selects nodes under item and token
budgets, but it does not yet model graph neighborhoods, event paths, freshness,
confidence, or retrieval quality.
## Current Evidence
Current activation behavior:
- prioritizes explicitly requested node ids
- sorts remaining nodes by id
- estimates tokens with word count
- includes events only when they reference packages or activations
- emits Markitect-compatible selection dictionaries
- explains budget omissions
This is enough for a first handoff, but not enough to satisfy the full
activation-memory intent.
## Non-Goals
- Do not require embeddings or vector stores in the default path.
- Do not use live LLM ranking in deterministic tests.
- Do not own benchmark dashboards that belong in `infospace-bench`.
- Do not optimize for a single application domain.
## Implementation Update - 2026-05-18
The retrieval, activation quality, and evaluation slice is complete.
Implemented outputs:
- `phase_memory.retrieval` adds deterministic graph-neighborhood retrieval,
candidate scoring, event-path selection, pluggable token estimator protocol,
neighborhood activation planning, and activation quality reports.
- Retrieval supports max hops, edge-kind filters, direction filters, phase
filters, and memory kind filters.
- Event-path activation selects bounded event windows from structured
`MemoryPath` records and treats inactive paths as opt-in.
- Ranking signals include explicit priority, graph distance, phase, lifecycle
state, confidence, source-backed status, freshness, and policy allowance.
- `WordCountTokenEstimator` provides deterministic local budget accounting.
- `activation_quality_report` emits selected expected nodes, omitted required
nodes, policy-denied required nodes, token budget utilization, stale item
count, provenance coverage, source span coverage, and explanation coverage.
- `docs/activation-quality.md` documents retrieval, event paths, scoring,
estimator boundaries, and evaluation metrics.
Validation:
- `python3 -m pytest` -> 46 passed.
## T01 - Add deterministic graph-neighborhood retrieval
```task
id: PMEM-WP-0006-T01
status: done
priority: high
state_hub_task_id: "8ed0909f-9e8e-4d49-9312-dca267df29f5"
```
Add selection strategies that can expand from seed nodes across graph edges:
- one-hop neighbors
- bounded multi-hop neighbors
- edge-kind filters
- source and target direction filters
- phase filters
- memory kind filters
Output: retrieval planner and tests for stable graph-neighborhood selection.
## T02 - Add event-path activation
```task
id: PMEM-WP-0006-T02
status: done
priority: high
state_hub_task_id: "5d48ba91-fef0-4d4f-a560-836abed1c527"
```
Select conversational path episodes and event windows as first-class
activation inputs.
Output: event-path selection support, path budget behavior, and tests around
active, abandoned, and merged paths.
## T03 - Add ranking signals and explanations
```task
id: PMEM-WP-0006-T03
status: done
priority: high
state_hub_task_id: "0f6340ef-f7bd-408b-b98e-6d90188c5969"
```
Rank activation candidates using deterministic local signals:
- explicit priority
- graph distance from seed
- lifecycle state
- phase
- freshness
- confidence
- policy allowance
- source-backed status
- prior activation references
Output: scoring model, per-item selection reason, and omitted-item reason.
## T04 - Improve token and budget accounting
```task
id: PMEM-WP-0006-T04
status: done
priority: medium
state_hub_task_id: "12d83382-a767-45a8-b7cc-8c3f6f3f4c37"
```
Replace rough word-count behavior with a pluggable budget estimator that can
stay deterministic locally and later delegate to provider-specific tokenizers.
Output: estimator protocol, default estimator, and tests for node, event, and
package budget pressure.
## T05 - Add evaluation fixture scenarios
```task
id: PMEM-WP-0006-T05
status: done
priority: medium
state_hub_task_id: "509e9417-3aa7-4899-aed5-20749372fe00"
```
Add small evaluation fixtures inspired by adjacent `infospace-bench` pilots:
- restart package
- decision recall
- stale source refresh
- policy-denied activation
- compacted trace window
- conflicting preference update
Output: fixture set and expected activation plans.
## T06 - Add maturity metrics for activation quality
```task
id: PMEM-WP-0006-T06
status: done
priority: medium
state_hub_task_id: "477a896a-8013-42a5-b965-b1ccd2577fec"
```
Define local metrics that can be exported to `infospace-bench` later:
- selected expected nodes
- omitted required nodes
- policy-denied required nodes
- token budget utilization
- stale item activation count
- provenance coverage
- source span coverage
- explanation coverage
Output: metrics helper and JSON report fixture.
## T07 - Document retrieval and evaluation behavior
```task
id: PMEM-WP-0006-T07
status: done
priority: medium
state_hub_task_id: "551432e4-2551-49fa-b17b-f762853a6a50"
```
Document activation strategy, scoring inputs, limitations, and evaluation
fixtures.
Output: activation planning guide and scorecard update.
## Acceptance Criteria
- `python3 -m pytest` passes.
- Activation can select graph neighborhoods and event paths under budget.
- Every selected and omitted item has a machine-readable reason.
- Evaluation fixtures produce deterministic activation quality reports.
- Optional semantic indexes remain behind the `SemanticIndex` port.
## Closure Review - 2026-05-18
**Outcome:** All tasks completed.
### Completed
- PMEM-WP-0006-T01 - Add deterministic graph-neighborhood retrieval
- PMEM-WP-0006-T02 - Add event-path activation
- PMEM-WP-0006-T03 - Add ranking signals and explanations
- PMEM-WP-0006-T04 - Improve token and budget accounting
- PMEM-WP-0006-T05 - Add evaluation fixture scenarios
- PMEM-WP-0006-T06 - Add maturity metrics for activation quality
- PMEM-WP-0006-T07 - Document retrieval and evaluation behavior
### Cancelled
None.
### Carried Forward
Service contracts, runtime configuration, health diagnostics, and external
adapter conformance remain in PMEM-WP-0007.