Implement refinement hardening workplan

2026-05-18 23:56:41 +02:00
parent 836acf7e01
commit 0eea94d05e
17 changed files with 1164 additions and 68 deletions
--- a/docs/maturity-scorecard.md
+++ b/docs/maturity-scorecard.md
@@ -26,74 +26,83 @@ to 5.

 ## Current Score

-Overall maturity: **3.8 / 5**
+Overall maturity: **4.0 / 5**

 Two sub-scores make the result easier to reason about:

- Local integration maturity: **4.1 / 5**
- Operational maturity: **3.2 / 5**
+- Local integration maturity: **4.3 / 5**
+- Operational maturity: **3.5 / 5**

 The repo is strong as a deterministic local library and service-boundary core.
 It is not yet production-operational because the external adapters are fakes,
-durability semantics are basic, service bindings are framework-neutral shapes
-rather than deployable endpoints, and evaluation coverage is still narrow.
+service bindings are framework-neutral shapes rather than deployable endpoints,
+and migration behavior is diagnostic rather than an operator-applied migration
+system.

 ## Dimension Scorecard

 | Dimension | Score | Target | Evidence | Needed Next |
 | --- | ---: | ---: | --- | --- |
 | Intent and boundaries | 4.4 | 5.0 | `INTENT.md`, `SCOPE.md`, `README.md`, architecture docs, adjacent-repo boundary docs | Keep docs current as live adapters and service bindings clarify real ownership. |
-| Package and API foundation | 4.2 | 4.5 | Python package, public exports, runtime facade, CLI, service config, dependency-light tests | Add API stability notes and compatibility checks for public exports. |
+| Package and API foundation | 4.3 | 4.5 | Python package, public exports, runtime facade, CLI, service runner export, service config, dependency-light tests | Add public export compatibility checks and release notes discipline. |
 | Markitect profile contract ingress | 3.7 | 4.5 | Profile loading, diagnostics, runtime envelopes, profile-derived config, local alias normalization | Add richer compatibility fixtures and schema drift diagnostics. |
-| Graph and event ingress | 3.7 | 4.5 | Graph loading, endpoint diagnostics, event model, JSONL log, export, repair checks, fake graph/event adapters | Add broader malformed/large graph fixtures and migration repair coverage. |
+| Graph and event ingress | 3.9 | 4.5 | Graph loading, endpoint diagnostics, event model, JSONL log, export, repair checks, corrupt-record diagnostics, fake graph/event adapters | Add broader malformed/large graph fixtures and operator repair utilities. |
 | Phase domain model | 3.5 | 4.5 | Phases, lifecycle states, actions, paths, retention rules, profile-derived transition rules | Add migration semantics for profile/rule changes over durable stores. |
-| Profile execution planning | 4.0 | 4.5 | Adapter plan, capabilities, policy gates, fallback behavior, config-driven local/external resolution | Add compatibility gates for live adapter packs. |
-| Lifecycle planning and apply | 3.6 | 4.5 | Dry-run lifecycle plans, profile rules, review-gated local apply | Add service `lifecycle.apply` handling, migration semantics, and better apply audit queries. |
-| Activation planning | 3.8 | 4.8 | Budgeted activation, selections, package request, graph neighborhoods, paths, ranking, metrics | Wire semantic-index-assisted retrieval and expand evaluation corpora. |
-| Local persistence | 3.2 | 4.5 | File-backed graph store, JSONL event log, audit sink, export, repair diagnostics | Add atomic writes, schema migration, compaction/retention utilities, and stronger corruption recovery. |
-| Policy, review, and audit | 3.5 | 5.0 | Operation points, review records, audit schema, denials, redaction, fake external policy/audit adapters | Add audit query service, retention policy behavior, and live policy adapter boundary. |
-| Observability and operations | 3.3 | 4.8 | Health report, config diagnostics, adapter status, fake telemetry audit sink | Add metrics/event export, retention diagnostics, and deployable health/readiness binding. |
+| Profile execution planning | 4.2 | 4.5 | Adapter plan, capabilities, policy gates, fallback behavior, config-driven local/external resolution, adapter pack manifests | Add compatibility gates for live adapter packs. |
+| Lifecycle planning and apply | 4.0 | 4.5 | Dry-run lifecycle plans, profile rules, review-gated local apply, service `lifecycle.apply`, apply audit queries | Add operator migration semantics and richer apply rollback/repair drills. |
+| Activation planning | 4.0 | 4.8 | Budgeted activation, selections, package request, graph neighborhoods, paths, ranking, metrics, multi-scenario evaluation fixtures | Wire semantic-index-assisted retrieval into runtime planning. |
+| Local persistence | 3.7 | 4.5 | File-backed graph store, JSONL event log, audit sink, atomic JSON writes, metadata migration diagnostics, export, repair diagnostics | Add executable migrations, compaction/retention utilities, and stronger corruption recovery. |
+| Policy, review, and audit | 3.9 | 5.0 | Operation points, review records, audit schema, queryable audit sinks, denials, redaction, fake external policy/audit adapters | Add live policy adapter boundary and enforceable audit retention policy. |
+| Observability and operations | 3.6 | 4.8 | Health report, config diagnostics, adapter status, fake telemetry audit sink, operational recipe | Add metrics/event export and deployable health/readiness binding. |
 | Markitect interop | 3.7 | 4.5 | Local validation, package request/response envelopes, fake compiler | Add optional live Markitect compiler adapter and contract compatibility suite. |
-| Kontextual/Infospace interop | 3.1 | 4.5 | Delegation envelope, fake runtime registry, activation quality report fixture | Add live/fake delegation scenarios and broader Infospace restart reports. |
-| Testing and evaluation | 3.8 | 4.5 | 60 deterministic tests over runtime, CLI, adapters, policy, activation, lifecycle, service, fakes | Add multi-profile/multi-graph evaluation corpus and regression thresholds. |
-| Service readiness | 3.9 | 4.8 | Service contracts, local runner, health, config, adapter conformance, fake pack | Implement missing service operations and optional framework binding. |
-| Developer experience | 3.8 | 4.5 | README, package map, CLI examples, persistence/policy/interop/service/lifecycle/fake-pack docs | Add troubleshooting, examples, and end-to-end recipes. |
+| Kontextual/Infospace interop | 3.3 | 4.5 | Delegation envelope, fake runtime registry, activation quality report fixture, adapter compatibility manifests | Add live/fake delegation scenarios and broader Infospace restart reports. |
+| Testing and evaluation | 4.1 | 4.5 | 70 deterministic tests over runtime, CLI, adapters, policy, activation, lifecycle, service, fakes, and evaluation scenarios | Add larger regression corpus and threshold trend reports. |
+| Service readiness | 4.2 | 4.8 | Service contracts, full local runner parity, health, config, adapter conformance, fake pack | Add optional framework binding and deployable readiness endpoints. |
+| Developer experience | 4.1 | 4.5 | README, package map, CLI examples, persistence/policy/interop/service/lifecycle/fake-pack docs, operational recipe | Add troubleshooting matrix and embedded-service examples. |

 ## Assessment

-The project has a credible core. The runtime envelopes, policy/review model,
-profile-derived configuration, lifecycle rules, local persistence, fake
-external pack, and conformance helpers form a solid integration boundary.
+The project has crossed the local integration-readiness threshold. The runtime
+envelopes, policy/review model, profile-derived configuration, lifecycle rules,
+local persistence diagnostics, queryable audit path, fake external pack
+manifests, and conformance helpers form a solid integration boundary.

-The biggest optimization opportunity is not another broad feature burst. It is
-closing the gap between declared contracts and runnable operational behavior:
-the service contract advertises operations that the local runner only partly
-handles, persistence needs migration/durability semantics, and evaluation needs
-more than one small fixture family.
+The biggest optimization opportunity is now the next operational layer:
+turning diagnostic-only durability into operator actions, adding optional
+deployable service bindings, and testing live or live-shaped adapters behind
+the same conformance suite as the fake pack.

-## Recommended Refinement Workplan
+## Completed Refinement Workplan

-Create and execute `PMEM-WP-0011`: refinement hardening and operational
-readiness.
+`PMEM-WP-0011` moved the score from 3.8 to 4.0 by adding:
+
+- full local service runner parity for `SERVICE_OPERATIONS`;
+- service-covered `package.compile`, `lifecycle.apply`, and `audit.query`;
+- queryable audit sinks with retention metadata;
+- local-store atomic JSON writes, migration diagnostics, and corrupt-record
+  repair diagnostics;
+- three evaluation scenario families covering policy denial, lifecycle rules,
+  event-path activation, semantic-index hints, and budget pressure;
+- adapter pack manifests and explicit missing-capability diagnostics;
+- an operational end-to-end recipe.
+
+## Recommended Next Refinement
+
+Create and execute `PMEM-WP-0012`: live-adapter and service-binding readiness.

 Highest-value tasks:

- Bring service runner parity to the published operation catalog:
-  `package.compile`, `lifecycle.apply`, and `audit.query`.
- Add local-store schema migration and repair hardening, including atomic write
-  behavior and migration diagnostics.
- Expand evaluation fixtures across multiple profiles, graph shapes, policies,
-  lifecycle rules, and activation budgets.
- Add live-adapter readiness manifests so fake and future live packs can be
-  tested by the same compatibility suite.
- Add audit query and retention semantics that make policy/audit behavior
-  inspectable after runtime operations.
- Improve DX with troubleshooting, end-to-end recipes, and API compatibility
-  notes.
+- Add an optional framework binding around `LocalServiceRunner` with health and
+  readiness endpoints.
+- Add executable local-store migrations, not only diagnostics.
+- Add live-shaped Markitect/Kontextual adapter fixtures behind the manifest and
+  conformance suite.
+- Add audit retention enforcement and telemetry export drills.
+- Grow the evaluation corpus into threshold reports that can catch regressions.

 ## Score Movement Gates

-Move overall score to **4.0** when:
+Achieved overall score **4.0** when:

 - Service runner handles every operation in `SERVICE_OPERATIONS`.
 - Audit query and lifecycle apply are covered through service contracts.
--- a/docs/operational-readiness.md
+++ b/docs/operational-readiness.md
@@ -0,0 +1,136 @@
+# Operational Readiness Recipe
+
+Updated: 2026-05-18
+
+This recipe exercises the local operational surface without requiring live
+Markitect, Kontextual, or telemetry services. It is the expected smoke path for
+embedding `phase-memory` in another local agent runtime.
+
+## Local End-To-End Flow
+
+```python
+import json
+from pathlib import Path
+
+from phase_memory import LocalServiceRunner
+
+fixtures = Path("tests/fixtures")
+profile = json.loads((fixtures / "memory-profile.json").read_text(encoding="utf-8"))
+graph = json.loads((fixtures / "memory-graph.json").read_text(encoding="utf-8"))
+
+runner = LocalServiceRunner()
+
+profile_plan = runner.handle("profile.plan", {"profile": profile, "source_ref": "recipe:profile"})
+graph_import = runner.handle("graph.import", {"graph": graph, "source_ref": "recipe:graph"})
+lifecycle = runner.handle(
+    "graph.lifecycle.plan",
+    {
+        "profile": profile,
+        "graph": graph,
+        "parameters": {"refresh_digests": {"event.restart": "new-digest"}},
+        "source_ref": "recipe:lifecycle",
+    },
+)
+activation = runner.handle(
+    "graph.activation.plan",
+    {
+        "graph": graph,
+        "budget": {"max_items": 3, "max_tokens": 60},
+        "profile_id": profile["id"],
+        "source_ref": "recipe:activation",
+    },
+)
+package = runner.handle(
+    "package.compile",
+    {
+        "selection": activation["data"]["activation_plan"]["selection"],
+        "source_ref": "recipe:package",
+    },
+)
+audit = runner.handle("audit.query", {"filters": {"operation": "package.compile"}})
+health = runner.handle("health.check")
+```
+
+Expected checks:
+
+- `profile_plan["valid"]`, `graph_import["valid"]`, `activation["valid"]`, and
+  `package["valid"]` are true.
+- `lifecycle["data"]["dry_run_actions"]` contains the planned refresh action.
+- `audit["count"]` is at least 1 and `audit["retention"]` declares the active
+  audit sink retention mode.
+- `health["ok"]` is true.
+
+## Review-Gated Apply
+
+Lifecycle actions that require review are denied until an approval marker or
+matching review record is supplied:
+
+```python
+denied = runner.handle("lifecycle.apply", {"actions": lifecycle["data"]["dry_run_actions"]})
+approved = runner.handle(
+    "lifecycle.apply",
+    {
+        "actions": lifecycle["data"]["dry_run_actions"],
+        "approval_marker": "review:operator-approved",
+    },
+)
+```
+
+Use `audit.query` with `{"operation": "lifecycle.apply", "dry_run": False}` to
+trace denied and approved apply attempts.
+
+## Persistence Repair Drill
+
+File-backed operation is configured through a profile or explicit
+`RuntimeConfig`:
+
+```python
+from phase_memory import RuntimeConfig, LocalServiceRunner
+
+config = RuntimeConfig.from_profile(profile, local_store_path=".phase-memory-local")
+runner = LocalServiceRunner(config=config)
+repair = runner.runtime.repair_diagnostics(source_ref=config.local_store_path)
+```
+
+Repair diagnostics distinguish:
+
+- `store_migration_required` for old or missing local-store schema metadata.
+- `planned_store_migrations` when metadata declares pending migrations.
+- `corrupt_store_record` for unreadable node, edge, or path JSON.
+- `missing_edge_source` / `missing_edge_target` for graph reference damage.
+- `orphaned_path_event` when paths reference absent event-log records.
+
+## Adapter Pack Compatibility
+
+Fake and future live adapter packs should publish a manifest with:
+
+- declared capabilities;
+- ownership boundaries for every adapter;
+- required conformance helpers.
+
+Validate a pack before wiring it into the runtime:
+
+```python
+from phase_memory import fake_external_adapter_pack, validate_adapter_pack_manifest
+
+diagnostics = validate_adapter_pack_manifest(fake_external_adapter_pack())
+assert diagnostics == ()
+```
+
+Missing capabilities are reported as `missing_adapter_capability` diagnostics
+with the adapter and capability names attached.
+
+## API Compatibility Expectations
+
+The stable embedding surface is:
+
+- `PhaseMemoryRuntime` methods and JSON-serializable envelopes.
+- `LocalServiceRunner.handle(operation, payload)` for every operation in
+  `service_contracts()["operations"]`.
+- `RuntimeConfig` and `resolve_runtime_adapters` for local/external adapter
+  resolution.
+- Adapter conformance helpers in `phase_memory.service`.
+- External adapter pack manifests and validation helpers.
+
+New public operations should be added to the service contract first, then to
+the local runner, runtime tests, and docs in the same change.