Implement refinement hardening workplan

This commit is contained in:
2026-05-18 23:56:41 +02:00
parent 836acf7e01
commit 0eea94d05e
17 changed files with 1164 additions and 68 deletions

View File

@@ -26,74 +26,83 @@ to 5.
## Current Score
Overall maturity: **3.8 / 5**
Overall maturity: **4.0 / 5**
Two sub-scores make the result easier to reason about:
- Local integration maturity: **4.1 / 5**
- Operational maturity: **3.2 / 5**
- Local integration maturity: **4.3 / 5**
- Operational maturity: **3.5 / 5**
The repo is strong as a deterministic local library and service-boundary core.
It is not yet production-operational because the external adapters are fakes,
durability semantics are basic, service bindings are framework-neutral shapes
rather than deployable endpoints, and evaluation coverage is still narrow.
service bindings are framework-neutral shapes rather than deployable endpoints,
and migration behavior is diagnostic rather than an operator-applied migration
system.
## Dimension Scorecard
| Dimension | Score | Target | Evidence | Needed Next |
| --- | ---: | ---: | --- | --- |
| Intent and boundaries | 4.4 | 5.0 | `INTENT.md`, `SCOPE.md`, `README.md`, architecture docs, adjacent-repo boundary docs | Keep docs current as live adapters and service bindings clarify real ownership. |
| Package and API foundation | 4.2 | 4.5 | Python package, public exports, runtime facade, CLI, service config, dependency-light tests | Add API stability notes and compatibility checks for public exports. |
| Package and API foundation | 4.3 | 4.5 | Python package, public exports, runtime facade, CLI, service runner export, service config, dependency-light tests | Add public export compatibility checks and release notes discipline. |
| Markitect profile contract ingress | 3.7 | 4.5 | Profile loading, diagnostics, runtime envelopes, profile-derived config, local alias normalization | Add richer compatibility fixtures and schema drift diagnostics. |
| Graph and event ingress | 3.7 | 4.5 | Graph loading, endpoint diagnostics, event model, JSONL log, export, repair checks, fake graph/event adapters | Add broader malformed/large graph fixtures and migration repair coverage. |
| Graph and event ingress | 3.9 | 4.5 | Graph loading, endpoint diagnostics, event model, JSONL log, export, repair checks, corrupt-record diagnostics, fake graph/event adapters | Add broader malformed/large graph fixtures and operator repair utilities. |
| Phase domain model | 3.5 | 4.5 | Phases, lifecycle states, actions, paths, retention rules, profile-derived transition rules | Add migration semantics for profile/rule changes over durable stores. |
| Profile execution planning | 4.0 | 4.5 | Adapter plan, capabilities, policy gates, fallback behavior, config-driven local/external resolution | Add compatibility gates for live adapter packs. |
| Lifecycle planning and apply | 3.6 | 4.5 | Dry-run lifecycle plans, profile rules, review-gated local apply | Add service `lifecycle.apply` handling, migration semantics, and better apply audit queries. |
| Activation planning | 3.8 | 4.8 | Budgeted activation, selections, package request, graph neighborhoods, paths, ranking, metrics | Wire semantic-index-assisted retrieval and expand evaluation corpora. |
| Local persistence | 3.2 | 4.5 | File-backed graph store, JSONL event log, audit sink, export, repair diagnostics | Add atomic writes, schema migration, compaction/retention utilities, and stronger corruption recovery. |
| Policy, review, and audit | 3.5 | 5.0 | Operation points, review records, audit schema, denials, redaction, fake external policy/audit adapters | Add audit query service, retention policy behavior, and live policy adapter boundary. |
| Observability and operations | 3.3 | 4.8 | Health report, config diagnostics, adapter status, fake telemetry audit sink | Add metrics/event export, retention diagnostics, and deployable health/readiness binding. |
| Profile execution planning | 4.2 | 4.5 | Adapter plan, capabilities, policy gates, fallback behavior, config-driven local/external resolution, adapter pack manifests | Add compatibility gates for live adapter packs. |
| Lifecycle planning and apply | 4.0 | 4.5 | Dry-run lifecycle plans, profile rules, review-gated local apply, service `lifecycle.apply`, apply audit queries | Add operator migration semantics and richer apply rollback/repair drills. |
| Activation planning | 4.0 | 4.8 | Budgeted activation, selections, package request, graph neighborhoods, paths, ranking, metrics, multi-scenario evaluation fixtures | Wire semantic-index-assisted retrieval into runtime planning. |
| Local persistence | 3.7 | 4.5 | File-backed graph store, JSONL event log, audit sink, atomic JSON writes, metadata migration diagnostics, export, repair diagnostics | Add executable migrations, compaction/retention utilities, and stronger corruption recovery. |
| Policy, review, and audit | 3.9 | 5.0 | Operation points, review records, audit schema, queryable audit sinks, denials, redaction, fake external policy/audit adapters | Add live policy adapter boundary and enforceable audit retention policy. |
| Observability and operations | 3.6 | 4.8 | Health report, config diagnostics, adapter status, fake telemetry audit sink, operational recipe | Add metrics/event export and deployable health/readiness binding. |
| Markitect interop | 3.7 | 4.5 | Local validation, package request/response envelopes, fake compiler | Add optional live Markitect compiler adapter and contract compatibility suite. |
| Kontextual/Infospace interop | 3.1 | 4.5 | Delegation envelope, fake runtime registry, activation quality report fixture | Add live/fake delegation scenarios and broader Infospace restart reports. |
| Testing and evaluation | 3.8 | 4.5 | 60 deterministic tests over runtime, CLI, adapters, policy, activation, lifecycle, service, fakes | Add multi-profile/multi-graph evaluation corpus and regression thresholds. |
| Service readiness | 3.9 | 4.8 | Service contracts, local runner, health, config, adapter conformance, fake pack | Implement missing service operations and optional framework binding. |
| Developer experience | 3.8 | 4.5 | README, package map, CLI examples, persistence/policy/interop/service/lifecycle/fake-pack docs | Add troubleshooting, examples, and end-to-end recipes. |
| Kontextual/Infospace interop | 3.3 | 4.5 | Delegation envelope, fake runtime registry, activation quality report fixture, adapter compatibility manifests | Add live/fake delegation scenarios and broader Infospace restart reports. |
| Testing and evaluation | 4.1 | 4.5 | 70 deterministic tests over runtime, CLI, adapters, policy, activation, lifecycle, service, fakes, and evaluation scenarios | Add larger regression corpus and threshold trend reports. |
| Service readiness | 4.2 | 4.8 | Service contracts, full local runner parity, health, config, adapter conformance, fake pack | Add optional framework binding and deployable readiness endpoints. |
| Developer experience | 4.1 | 4.5 | README, package map, CLI examples, persistence/policy/interop/service/lifecycle/fake-pack docs, operational recipe | Add troubleshooting matrix and embedded-service examples. |
## Assessment
The project has a credible core. The runtime envelopes, policy/review model,
profile-derived configuration, lifecycle rules, local persistence, fake
external pack, and conformance helpers form a solid integration boundary.
The project has crossed the local integration-readiness threshold. The runtime
envelopes, policy/review model, profile-derived configuration, lifecycle rules,
local persistence diagnostics, queryable audit path, fake external pack
manifests, and conformance helpers form a solid integration boundary.
The biggest optimization opportunity is not another broad feature burst. It is
closing the gap between declared contracts and runnable operational behavior:
the service contract advertises operations that the local runner only partly
handles, persistence needs migration/durability semantics, and evaluation needs
more than one small fixture family.
The biggest optimization opportunity is now the next operational layer:
turning diagnostic-only durability into operator actions, adding optional
deployable service bindings, and testing live or live-shaped adapters behind
the same conformance suite as the fake pack.
## Recommended Refinement Workplan
## Completed Refinement Workplan
Create and execute `PMEM-WP-0011`: refinement hardening and operational
readiness.
`PMEM-WP-0011` moved the score from 3.8 to 4.0 by adding:
- full local service runner parity for `SERVICE_OPERATIONS`;
- service-covered `package.compile`, `lifecycle.apply`, and `audit.query`;
- queryable audit sinks with retention metadata;
- local-store atomic JSON writes, migration diagnostics, and corrupt-record
repair diagnostics;
- three evaluation scenario families covering policy denial, lifecycle rules,
event-path activation, semantic-index hints, and budget pressure;
- adapter pack manifests and explicit missing-capability diagnostics;
- an operational end-to-end recipe.
## Recommended Next Refinement
Create and execute `PMEM-WP-0012`: live-adapter and service-binding readiness.
Highest-value tasks:
- Bring service runner parity to the published operation catalog:
`package.compile`, `lifecycle.apply`, and `audit.query`.
- Add local-store schema migration and repair hardening, including atomic write
behavior and migration diagnostics.
- Expand evaluation fixtures across multiple profiles, graph shapes, policies,
lifecycle rules, and activation budgets.
- Add live-adapter readiness manifests so fake and future live packs can be
tested by the same compatibility suite.
- Add audit query and retention semantics that make policy/audit behavior
inspectable after runtime operations.
- Improve DX with troubleshooting, end-to-end recipes, and API compatibility
notes.
- Add an optional framework binding around `LocalServiceRunner` with health and
readiness endpoints.
- Add executable local-store migrations, not only diagnostics.
- Add live-shaped Markitect/Kontextual adapter fixtures behind the manifest and
conformance suite.
- Add audit retention enforcement and telemetry export drills.
- Grow the evaluation corpus into threshold reports that can catch regressions.
## Score Movement Gates
Move overall score to **4.0** when:
Achieved overall score **4.0** when:
- Service runner handles every operation in `SERVICE_OPERATIONS`.
- Audit query and lifecycle apply are covered through service contracts.

View File

@@ -0,0 +1,136 @@
# Operational Readiness Recipe
Updated: 2026-05-18
This recipe exercises the local operational surface without requiring live
Markitect, Kontextual, or telemetry services. It is the expected smoke path for
embedding `phase-memory` in another local agent runtime.
## Local End-To-End Flow
```python
import json
from pathlib import Path
from phase_memory import LocalServiceRunner
fixtures = Path("tests/fixtures")
profile = json.loads((fixtures / "memory-profile.json").read_text(encoding="utf-8"))
graph = json.loads((fixtures / "memory-graph.json").read_text(encoding="utf-8"))
runner = LocalServiceRunner()
profile_plan = runner.handle("profile.plan", {"profile": profile, "source_ref": "recipe:profile"})
graph_import = runner.handle("graph.import", {"graph": graph, "source_ref": "recipe:graph"})
lifecycle = runner.handle(
"graph.lifecycle.plan",
{
"profile": profile,
"graph": graph,
"parameters": {"refresh_digests": {"event.restart": "new-digest"}},
"source_ref": "recipe:lifecycle",
},
)
activation = runner.handle(
"graph.activation.plan",
{
"graph": graph,
"budget": {"max_items": 3, "max_tokens": 60},
"profile_id": profile["id"],
"source_ref": "recipe:activation",
},
)
package = runner.handle(
"package.compile",
{
"selection": activation["data"]["activation_plan"]["selection"],
"source_ref": "recipe:package",
},
)
audit = runner.handle("audit.query", {"filters": {"operation": "package.compile"}})
health = runner.handle("health.check")
```
Expected checks:
- `profile_plan["valid"]`, `graph_import["valid"]`, `activation["valid"]`, and
`package["valid"]` are true.
- `lifecycle["data"]["dry_run_actions"]` contains the planned refresh action.
- `audit["count"]` is at least 1 and `audit["retention"]` declares the active
audit sink retention mode.
- `health["ok"]` is true.
## Review-Gated Apply
Lifecycle actions that require review are denied until an approval marker or
matching review record is supplied:
```python
denied = runner.handle("lifecycle.apply", {"actions": lifecycle["data"]["dry_run_actions"]})
approved = runner.handle(
"lifecycle.apply",
{
"actions": lifecycle["data"]["dry_run_actions"],
"approval_marker": "review:operator-approved",
},
)
```
Use `audit.query` with `{"operation": "lifecycle.apply", "dry_run": False}` to
trace denied and approved apply attempts.
## Persistence Repair Drill
File-backed operation is configured through a profile or explicit
`RuntimeConfig`:
```python
from phase_memory import RuntimeConfig, LocalServiceRunner
config = RuntimeConfig.from_profile(profile, local_store_path=".phase-memory-local")
runner = LocalServiceRunner(config=config)
repair = runner.runtime.repair_diagnostics(source_ref=config.local_store_path)
```
Repair diagnostics distinguish:
- `store_migration_required` for old or missing local-store schema metadata.
- `planned_store_migrations` when metadata declares pending migrations.
- `corrupt_store_record` for unreadable node, edge, or path JSON.
- `missing_edge_source` / `missing_edge_target` for graph reference damage.
- `orphaned_path_event` when paths reference absent event-log records.
## Adapter Pack Compatibility
Fake and future live adapter packs should publish a manifest with:
- declared capabilities;
- ownership boundaries for every adapter;
- required conformance helpers.
Validate a pack before wiring it into the runtime:
```python
from phase_memory import fake_external_adapter_pack, validate_adapter_pack_manifest
diagnostics = validate_adapter_pack_manifest(fake_external_adapter_pack())
assert diagnostics == ()
```
Missing capabilities are reported as `missing_adapter_capability` diagnostics
with the adapter and capability names attached.
## API Compatibility Expectations
The stable embedding surface is:
- `PhaseMemoryRuntime` methods and JSON-serializable envelopes.
- `LocalServiceRunner.handle(operation, payload)` for every operation in
`service_contracts()["operations"]`.
- `RuntimeConfig` and `resolve_runtime_adapters` for local/external adapter
resolution.
- Adapter conformance helpers in `phase_memory.service`.
- External adapter pack manifests and validation helpers.
New public operations should be added to the service contract first, then to
the local runner, runtime tests, and docs in the same change.