generated from coulomb/repo-seed
239 lines
6.7 KiB
Markdown
239 lines
6.7 KiB
Markdown
---
|
|
id: PMEM-WP-0006
|
|
type: workplan
|
|
title: "Retrieval, Activation Quality, And Evaluation"
|
|
domain: markitect
|
|
repo: phase-memory
|
|
status: finished
|
|
owner: phase-memory
|
|
topic_slug: activation-quality
|
|
planning_priority: P2
|
|
planning_order: 60
|
|
related_workplans:
|
|
- PMEM-WP-0002
|
|
- PMEM-WP-0005
|
|
created: "2026-05-18"
|
|
updated: "2026-05-18"
|
|
state_hub_workstream_id: "68417e46-d0a4-4168-bdd6-9ba6ac7847c1"
|
|
---
|
|
|
|
# PMEM-WP-0006: Retrieval, Activation Quality, And Evaluation
|
|
|
|
## Goal
|
|
|
|
Move activation planning beyond deterministic id ordering into explainable,
|
|
policy-aware memory retrieval and evaluation.
|
|
|
|
INTENT.md expects activation memory to compile graph neighborhoods, decision
|
|
paths, conversation episodes, and knowledge slices into bounded context
|
|
packages. The current activation planner selects nodes under item and token
|
|
budgets, but it does not yet model graph neighborhoods, event paths, freshness,
|
|
confidence, or retrieval quality.
|
|
|
|
## Current Evidence
|
|
|
|
Current activation behavior:
|
|
|
|
- prioritizes explicitly requested node ids
|
|
- sorts remaining nodes by id
|
|
- estimates tokens with word count
|
|
- includes events only when they reference packages or activations
|
|
- emits Markitect-compatible selection dictionaries
|
|
- explains budget omissions
|
|
|
|
This is enough for a first handoff, but not enough to satisfy the full
|
|
activation-memory intent.
|
|
|
|
## Non-Goals
|
|
|
|
- Do not require embeddings or vector stores in the default path.
|
|
- Do not use live LLM ranking in deterministic tests.
|
|
- Do not own benchmark dashboards that belong in `infospace-bench`.
|
|
- Do not optimize for a single application domain.
|
|
|
|
## Implementation Update - 2026-05-18
|
|
|
|
The retrieval, activation quality, and evaluation slice is complete.
|
|
|
|
Implemented outputs:
|
|
|
|
- `phase_memory.retrieval` adds deterministic graph-neighborhood retrieval,
|
|
candidate scoring, event-path selection, pluggable token estimator protocol,
|
|
neighborhood activation planning, and activation quality reports.
|
|
- Retrieval supports max hops, edge-kind filters, direction filters, phase
|
|
filters, and memory kind filters.
|
|
- Event-path activation selects bounded event windows from structured
|
|
`MemoryPath` records and treats inactive paths as opt-in.
|
|
- Ranking signals include explicit priority, graph distance, phase, lifecycle
|
|
state, confidence, source-backed status, freshness, and policy allowance.
|
|
- `WordCountTokenEstimator` provides deterministic local budget accounting.
|
|
- `activation_quality_report` emits selected expected nodes, omitted required
|
|
nodes, policy-denied required nodes, token budget utilization, stale item
|
|
count, provenance coverage, source span coverage, and explanation coverage.
|
|
- `docs/activation-quality.md` documents retrieval, event paths, scoring,
|
|
estimator boundaries, and evaluation metrics.
|
|
|
|
Validation:
|
|
|
|
- `python3 -m pytest` -> 46 passed.
|
|
|
|
## T01 - Add deterministic graph-neighborhood retrieval
|
|
|
|
```task
|
|
id: PMEM-WP-0006-T01
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "8ed0909f-9e8e-4d49-9312-dca267df29f5"
|
|
```
|
|
|
|
Add selection strategies that can expand from seed nodes across graph edges:
|
|
|
|
- one-hop neighbors
|
|
- bounded multi-hop neighbors
|
|
- edge-kind filters
|
|
- source and target direction filters
|
|
- phase filters
|
|
- memory kind filters
|
|
|
|
Output: retrieval planner and tests for stable graph-neighborhood selection.
|
|
|
|
## T02 - Add event-path activation
|
|
|
|
```task
|
|
id: PMEM-WP-0006-T02
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "5d48ba91-fef0-4d4f-a560-836abed1c527"
|
|
```
|
|
|
|
Select conversational path episodes and event windows as first-class
|
|
activation inputs.
|
|
|
|
Output: event-path selection support, path budget behavior, and tests around
|
|
active, abandoned, and merged paths.
|
|
|
|
## T03 - Add ranking signals and explanations
|
|
|
|
```task
|
|
id: PMEM-WP-0006-T03
|
|
status: done
|
|
priority: high
|
|
state_hub_task_id: "0f6340ef-f7bd-408b-b98e-6d90188c5969"
|
|
```
|
|
|
|
Rank activation candidates using deterministic local signals:
|
|
|
|
- explicit priority
|
|
- graph distance from seed
|
|
- lifecycle state
|
|
- phase
|
|
- freshness
|
|
- confidence
|
|
- policy allowance
|
|
- source-backed status
|
|
- prior activation references
|
|
|
|
Output: scoring model, per-item selection reason, and omitted-item reason.
|
|
|
|
## T04 - Improve token and budget accounting
|
|
|
|
```task
|
|
id: PMEM-WP-0006-T04
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "12d83382-a767-45a8-b7cc-8c3f6f3f4c37"
|
|
```
|
|
|
|
Replace rough word-count behavior with a pluggable budget estimator that can
|
|
stay deterministic locally and later delegate to provider-specific tokenizers.
|
|
|
|
Output: estimator protocol, default estimator, and tests for node, event, and
|
|
package budget pressure.
|
|
|
|
## T05 - Add evaluation fixture scenarios
|
|
|
|
```task
|
|
id: PMEM-WP-0006-T05
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "509e9417-3aa7-4899-aed5-20749372fe00"
|
|
```
|
|
|
|
Add small evaluation fixtures inspired by adjacent `infospace-bench` pilots:
|
|
|
|
- restart package
|
|
- decision recall
|
|
- stale source refresh
|
|
- policy-denied activation
|
|
- compacted trace window
|
|
- conflicting preference update
|
|
|
|
Output: fixture set and expected activation plans.
|
|
|
|
## T06 - Add maturity metrics for activation quality
|
|
|
|
```task
|
|
id: PMEM-WP-0006-T06
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "477a896a-8013-42a5-b965-b1ccd2577fec"
|
|
```
|
|
|
|
Define local metrics that can be exported to `infospace-bench` later:
|
|
|
|
- selected expected nodes
|
|
- omitted required nodes
|
|
- policy-denied required nodes
|
|
- token budget utilization
|
|
- stale item activation count
|
|
- provenance coverage
|
|
- source span coverage
|
|
- explanation coverage
|
|
|
|
Output: metrics helper and JSON report fixture.
|
|
|
|
## T07 - Document retrieval and evaluation behavior
|
|
|
|
```task
|
|
id: PMEM-WP-0006-T07
|
|
status: done
|
|
priority: medium
|
|
state_hub_task_id: "551432e4-2551-49fa-b17b-f762853a6a50"
|
|
```
|
|
|
|
Document activation strategy, scoring inputs, limitations, and evaluation
|
|
fixtures.
|
|
|
|
Output: activation planning guide and scorecard update.
|
|
|
|
## Acceptance Criteria
|
|
|
|
- `python3 -m pytest` passes.
|
|
- Activation can select graph neighborhoods and event paths under budget.
|
|
- Every selected and omitted item has a machine-readable reason.
|
|
- Evaluation fixtures produce deterministic activation quality reports.
|
|
- Optional semantic indexes remain behind the `SemanticIndex` port.
|
|
|
|
## Closure Review - 2026-05-18
|
|
|
|
**Outcome:** All tasks completed.
|
|
|
|
### Completed
|
|
|
|
- PMEM-WP-0006-T01 - Add deterministic graph-neighborhood retrieval
|
|
- PMEM-WP-0006-T02 - Add event-path activation
|
|
- PMEM-WP-0006-T03 - Add ranking signals and explanations
|
|
- PMEM-WP-0006-T04 - Improve token and budget accounting
|
|
- PMEM-WP-0006-T05 - Add evaluation fixture scenarios
|
|
- PMEM-WP-0006-T06 - Add maturity metrics for activation quality
|
|
- PMEM-WP-0006-T07 - Document retrieval and evaluation behavior
|
|
|
|
### Cancelled
|
|
|
|
None.
|
|
|
|
### Carried Forward
|
|
|
|
Service contracts, runtime configuration, health diagnostics, and external
|
|
adapter conformance remain in PMEM-WP-0007.
|