From 0eea94d05ec5dfc9b38eb17be031ae93b0dc1bad Mon Sep 17 00:00:00 2001 From: tegwick Date: Mon, 18 May 2026 23:56:41 +0200 Subject: [PATCH] Implement refinement hardening workplan --- README.md | 6 +- docs/maturity-scorecard.md | 91 +++++----- docs/operational-readiness.md | 136 ++++++++++++++ src/phase_memory/__init__.py | 9 +- src/phase_memory/adapters.py | 158 +++++++++++++++- src/phase_memory/external_adapters.py | 131 +++++++++++++- src/phase_memory/policy.py | 1 + src/phase_memory/ports.py | 1 + src/phase_memory/runtime.py | 86 +++++++++ src/phase_memory/service.py | 21 +++ tests/fixtures/evaluation-scenarios.json | 170 ++++++++++++++++++ tests/test_evaluation_scenarios.py | 101 +++++++++++ tests/test_external_adapter_packs.py | 38 +++- tests/test_file_backed_runtime.py | 38 ++++ tests/test_service_readiness.py | 55 +++++- ...ent-hardening-and-operational-readiness.md | 38 +++- ...e-adapter-and-service-binding-readiness.md | 152 ++++++++++++++++ 17 files changed, 1164 insertions(+), 68 deletions(-) create mode 100644 docs/operational-readiness.md create mode 100644 tests/fixtures/evaluation-scenarios.json create mode 100644 tests/test_evaluation_scenarios.py create mode 100644 workplans/PMEM-WP-0012-live-adapter-and-service-binding-readiness.md diff --git a/README.md b/README.md index fe260fd..0164a5a 100644 --- a/README.md +++ b/README.md @@ -95,6 +95,6 @@ for package bridge boundaries, [docs/activation-quality.md](docs/activation-qual for retrieval and evaluation behavior, [docs/service-readiness.md](docs/service-readiness.md) for service and adapter contracts, [docs/lifecycle-rules.md](docs/lifecycle-rules.md) for profile-driven lifecycle rules, [docs/external-adapter-packs.md](docs/external-adapter-packs.md) -for fake external integration packs, [docs/maturity-scorecard.md](docs/maturity-scorecard.md) -for the current maturity assessment, and [SCOPE.md](SCOPE.md) for repository -boundaries. +for fake external integration packs, [docs/operational-readiness.md](docs/operational-readiness.md) +for the local end-to-end operational recipe, [docs/maturity-scorecard.md](docs/maturity-scorecard.md) +for the current maturity assessment, and [SCOPE.md](SCOPE.md) for repository boundaries. diff --git a/docs/maturity-scorecard.md b/docs/maturity-scorecard.md index 223a5bf..19b9850 100644 --- a/docs/maturity-scorecard.md +++ b/docs/maturity-scorecard.md @@ -26,74 +26,83 @@ to 5. ## Current Score -Overall maturity: **3.8 / 5** +Overall maturity: **4.0 / 5** Two sub-scores make the result easier to reason about: -- Local integration maturity: **4.1 / 5** -- Operational maturity: **3.2 / 5** +- Local integration maturity: **4.3 / 5** +- Operational maturity: **3.5 / 5** The repo is strong as a deterministic local library and service-boundary core. It is not yet production-operational because the external adapters are fakes, -durability semantics are basic, service bindings are framework-neutral shapes -rather than deployable endpoints, and evaluation coverage is still narrow. +service bindings are framework-neutral shapes rather than deployable endpoints, +and migration behavior is diagnostic rather than an operator-applied migration +system. ## Dimension Scorecard | Dimension | Score | Target | Evidence | Needed Next | | --- | ---: | ---: | --- | --- | | Intent and boundaries | 4.4 | 5.0 | `INTENT.md`, `SCOPE.md`, `README.md`, architecture docs, adjacent-repo boundary docs | Keep docs current as live adapters and service bindings clarify real ownership. | -| Package and API foundation | 4.2 | 4.5 | Python package, public exports, runtime facade, CLI, service config, dependency-light tests | Add API stability notes and compatibility checks for public exports. | +| Package and API foundation | 4.3 | 4.5 | Python package, public exports, runtime facade, CLI, service runner export, service config, dependency-light tests | Add public export compatibility checks and release notes discipline. | | Markitect profile contract ingress | 3.7 | 4.5 | Profile loading, diagnostics, runtime envelopes, profile-derived config, local alias normalization | Add richer compatibility fixtures and schema drift diagnostics. | -| Graph and event ingress | 3.7 | 4.5 | Graph loading, endpoint diagnostics, event model, JSONL log, export, repair checks, fake graph/event adapters | Add broader malformed/large graph fixtures and migration repair coverage. | +| Graph and event ingress | 3.9 | 4.5 | Graph loading, endpoint diagnostics, event model, JSONL log, export, repair checks, corrupt-record diagnostics, fake graph/event adapters | Add broader malformed/large graph fixtures and operator repair utilities. | | Phase domain model | 3.5 | 4.5 | Phases, lifecycle states, actions, paths, retention rules, profile-derived transition rules | Add migration semantics for profile/rule changes over durable stores. | -| Profile execution planning | 4.0 | 4.5 | Adapter plan, capabilities, policy gates, fallback behavior, config-driven local/external resolution | Add compatibility gates for live adapter packs. | -| Lifecycle planning and apply | 3.6 | 4.5 | Dry-run lifecycle plans, profile rules, review-gated local apply | Add service `lifecycle.apply` handling, migration semantics, and better apply audit queries. | -| Activation planning | 3.8 | 4.8 | Budgeted activation, selections, package request, graph neighborhoods, paths, ranking, metrics | Wire semantic-index-assisted retrieval and expand evaluation corpora. | -| Local persistence | 3.2 | 4.5 | File-backed graph store, JSONL event log, audit sink, export, repair diagnostics | Add atomic writes, schema migration, compaction/retention utilities, and stronger corruption recovery. | -| Policy, review, and audit | 3.5 | 5.0 | Operation points, review records, audit schema, denials, redaction, fake external policy/audit adapters | Add audit query service, retention policy behavior, and live policy adapter boundary. | -| Observability and operations | 3.3 | 4.8 | Health report, config diagnostics, adapter status, fake telemetry audit sink | Add metrics/event export, retention diagnostics, and deployable health/readiness binding. | +| Profile execution planning | 4.2 | 4.5 | Adapter plan, capabilities, policy gates, fallback behavior, config-driven local/external resolution, adapter pack manifests | Add compatibility gates for live adapter packs. | +| Lifecycle planning and apply | 4.0 | 4.5 | Dry-run lifecycle plans, profile rules, review-gated local apply, service `lifecycle.apply`, apply audit queries | Add operator migration semantics and richer apply rollback/repair drills. | +| Activation planning | 4.0 | 4.8 | Budgeted activation, selections, package request, graph neighborhoods, paths, ranking, metrics, multi-scenario evaluation fixtures | Wire semantic-index-assisted retrieval into runtime planning. | +| Local persistence | 3.7 | 4.5 | File-backed graph store, JSONL event log, audit sink, atomic JSON writes, metadata migration diagnostics, export, repair diagnostics | Add executable migrations, compaction/retention utilities, and stronger corruption recovery. | +| Policy, review, and audit | 3.9 | 5.0 | Operation points, review records, audit schema, queryable audit sinks, denials, redaction, fake external policy/audit adapters | Add live policy adapter boundary and enforceable audit retention policy. | +| Observability and operations | 3.6 | 4.8 | Health report, config diagnostics, adapter status, fake telemetry audit sink, operational recipe | Add metrics/event export and deployable health/readiness binding. | | Markitect interop | 3.7 | 4.5 | Local validation, package request/response envelopes, fake compiler | Add optional live Markitect compiler adapter and contract compatibility suite. | -| Kontextual/Infospace interop | 3.1 | 4.5 | Delegation envelope, fake runtime registry, activation quality report fixture | Add live/fake delegation scenarios and broader Infospace restart reports. | -| Testing and evaluation | 3.8 | 4.5 | 60 deterministic tests over runtime, CLI, adapters, policy, activation, lifecycle, service, fakes | Add multi-profile/multi-graph evaluation corpus and regression thresholds. | -| Service readiness | 3.9 | 4.8 | Service contracts, local runner, health, config, adapter conformance, fake pack | Implement missing service operations and optional framework binding. | -| Developer experience | 3.8 | 4.5 | README, package map, CLI examples, persistence/policy/interop/service/lifecycle/fake-pack docs | Add troubleshooting, examples, and end-to-end recipes. | +| Kontextual/Infospace interop | 3.3 | 4.5 | Delegation envelope, fake runtime registry, activation quality report fixture, adapter compatibility manifests | Add live/fake delegation scenarios and broader Infospace restart reports. | +| Testing and evaluation | 4.1 | 4.5 | 70 deterministic tests over runtime, CLI, adapters, policy, activation, lifecycle, service, fakes, and evaluation scenarios | Add larger regression corpus and threshold trend reports. | +| Service readiness | 4.2 | 4.8 | Service contracts, full local runner parity, health, config, adapter conformance, fake pack | Add optional framework binding and deployable readiness endpoints. | +| Developer experience | 4.1 | 4.5 | README, package map, CLI examples, persistence/policy/interop/service/lifecycle/fake-pack docs, operational recipe | Add troubleshooting matrix and embedded-service examples. | ## Assessment -The project has a credible core. The runtime envelopes, policy/review model, -profile-derived configuration, lifecycle rules, local persistence, fake -external pack, and conformance helpers form a solid integration boundary. +The project has crossed the local integration-readiness threshold. The runtime +envelopes, policy/review model, profile-derived configuration, lifecycle rules, +local persistence diagnostics, queryable audit path, fake external pack +manifests, and conformance helpers form a solid integration boundary. -The biggest optimization opportunity is not another broad feature burst. It is -closing the gap between declared contracts and runnable operational behavior: -the service contract advertises operations that the local runner only partly -handles, persistence needs migration/durability semantics, and evaluation needs -more than one small fixture family. +The biggest optimization opportunity is now the next operational layer: +turning diagnostic-only durability into operator actions, adding optional +deployable service bindings, and testing live or live-shaped adapters behind +the same conformance suite as the fake pack. -## Recommended Refinement Workplan +## Completed Refinement Workplan -Create and execute `PMEM-WP-0011`: refinement hardening and operational -readiness. +`PMEM-WP-0011` moved the score from 3.8 to 4.0 by adding: + +- full local service runner parity for `SERVICE_OPERATIONS`; +- service-covered `package.compile`, `lifecycle.apply`, and `audit.query`; +- queryable audit sinks with retention metadata; +- local-store atomic JSON writes, migration diagnostics, and corrupt-record + repair diagnostics; +- three evaluation scenario families covering policy denial, lifecycle rules, + event-path activation, semantic-index hints, and budget pressure; +- adapter pack manifests and explicit missing-capability diagnostics; +- an operational end-to-end recipe. + +## Recommended Next Refinement + +Create and execute `PMEM-WP-0012`: live-adapter and service-binding readiness. Highest-value tasks: -- Bring service runner parity to the published operation catalog: - `package.compile`, `lifecycle.apply`, and `audit.query`. -- Add local-store schema migration and repair hardening, including atomic write - behavior and migration diagnostics. -- Expand evaluation fixtures across multiple profiles, graph shapes, policies, - lifecycle rules, and activation budgets. -- Add live-adapter readiness manifests so fake and future live packs can be - tested by the same compatibility suite. -- Add audit query and retention semantics that make policy/audit behavior - inspectable after runtime operations. -- Improve DX with troubleshooting, end-to-end recipes, and API compatibility - notes. +- Add an optional framework binding around `LocalServiceRunner` with health and + readiness endpoints. +- Add executable local-store migrations, not only diagnostics. +- Add live-shaped Markitect/Kontextual adapter fixtures behind the manifest and + conformance suite. +- Add audit retention enforcement and telemetry export drills. +- Grow the evaluation corpus into threshold reports that can catch regressions. ## Score Movement Gates -Move overall score to **4.0** when: +Achieved overall score **4.0** when: - Service runner handles every operation in `SERVICE_OPERATIONS`. - Audit query and lifecycle apply are covered through service contracts. diff --git a/docs/operational-readiness.md b/docs/operational-readiness.md new file mode 100644 index 0000000..0a72c67 --- /dev/null +++ b/docs/operational-readiness.md @@ -0,0 +1,136 @@ +# Operational Readiness Recipe + +Updated: 2026-05-18 + +This recipe exercises the local operational surface without requiring live +Markitect, Kontextual, or telemetry services. It is the expected smoke path for +embedding `phase-memory` in another local agent runtime. + +## Local End-To-End Flow + +```python +import json +from pathlib import Path + +from phase_memory import LocalServiceRunner + +fixtures = Path("tests/fixtures") +profile = json.loads((fixtures / "memory-profile.json").read_text(encoding="utf-8")) +graph = json.loads((fixtures / "memory-graph.json").read_text(encoding="utf-8")) + +runner = LocalServiceRunner() + +profile_plan = runner.handle("profile.plan", {"profile": profile, "source_ref": "recipe:profile"}) +graph_import = runner.handle("graph.import", {"graph": graph, "source_ref": "recipe:graph"}) +lifecycle = runner.handle( + "graph.lifecycle.plan", + { + "profile": profile, + "graph": graph, + "parameters": {"refresh_digests": {"event.restart": "new-digest"}}, + "source_ref": "recipe:lifecycle", + }, +) +activation = runner.handle( + "graph.activation.plan", + { + "graph": graph, + "budget": {"max_items": 3, "max_tokens": 60}, + "profile_id": profile["id"], + "source_ref": "recipe:activation", + }, +) +package = runner.handle( + "package.compile", + { + "selection": activation["data"]["activation_plan"]["selection"], + "source_ref": "recipe:package", + }, +) +audit = runner.handle("audit.query", {"filters": {"operation": "package.compile"}}) +health = runner.handle("health.check") +``` + +Expected checks: + +- `profile_plan["valid"]`, `graph_import["valid"]`, `activation["valid"]`, and + `package["valid"]` are true. +- `lifecycle["data"]["dry_run_actions"]` contains the planned refresh action. +- `audit["count"]` is at least 1 and `audit["retention"]` declares the active + audit sink retention mode. +- `health["ok"]` is true. + +## Review-Gated Apply + +Lifecycle actions that require review are denied until an approval marker or +matching review record is supplied: + +```python +denied = runner.handle("lifecycle.apply", {"actions": lifecycle["data"]["dry_run_actions"]}) +approved = runner.handle( + "lifecycle.apply", + { + "actions": lifecycle["data"]["dry_run_actions"], + "approval_marker": "review:operator-approved", + }, +) +``` + +Use `audit.query` with `{"operation": "lifecycle.apply", "dry_run": False}` to +trace denied and approved apply attempts. + +## Persistence Repair Drill + +File-backed operation is configured through a profile or explicit +`RuntimeConfig`: + +```python +from phase_memory import RuntimeConfig, LocalServiceRunner + +config = RuntimeConfig.from_profile(profile, local_store_path=".phase-memory-local") +runner = LocalServiceRunner(config=config) +repair = runner.runtime.repair_diagnostics(source_ref=config.local_store_path) +``` + +Repair diagnostics distinguish: + +- `store_migration_required` for old or missing local-store schema metadata. +- `planned_store_migrations` when metadata declares pending migrations. +- `corrupt_store_record` for unreadable node, edge, or path JSON. +- `missing_edge_source` / `missing_edge_target` for graph reference damage. +- `orphaned_path_event` when paths reference absent event-log records. + +## Adapter Pack Compatibility + +Fake and future live adapter packs should publish a manifest with: + +- declared capabilities; +- ownership boundaries for every adapter; +- required conformance helpers. + +Validate a pack before wiring it into the runtime: + +```python +from phase_memory import fake_external_adapter_pack, validate_adapter_pack_manifest + +diagnostics = validate_adapter_pack_manifest(fake_external_adapter_pack()) +assert diagnostics == () +``` + +Missing capabilities are reported as `missing_adapter_capability` diagnostics +with the adapter and capability names attached. + +## API Compatibility Expectations + +The stable embedding surface is: + +- `PhaseMemoryRuntime` methods and JSON-serializable envelopes. +- `LocalServiceRunner.handle(operation, payload)` for every operation in + `service_contracts()["operations"]`. +- `RuntimeConfig` and `resolve_runtime_adapters` for local/external adapter + resolution. +- Adapter conformance helpers in `phase_memory.service`. +- External adapter pack manifests and validation helpers. + +New public operations should be added to the service contract first, then to +the local runner, runtime tests, and docs in the same change. diff --git a/src/phase_memory/__init__.py b/src/phase_memory/__init__.py index c0a410b..5653b0e 100644 --- a/src/phase_memory/__init__.py +++ b/src/phase_memory/__init__.py @@ -11,6 +11,7 @@ from .bridge import ( ) from .contracts import graph_from_markitect, profile_from_markitect from .external_adapters import ( + ADAPTER_PACK_MANIFEST_SCHEMA, ExternalAdapterPack, FakeExternalEventLog, FakeExternalGraphStore, @@ -19,8 +20,10 @@ from .external_adapters import ( FakeKontextualRuntimeRegistry, FakeMarkitectPackageCompiler, FakeTelemetryAuditSink, + adapter_pack_manifest, fake_external_adapter_pack, fake_external_runtime_config, + validate_adapter_pack_manifest, ) from .lifecycle import ( LifecycleRuleConfig, @@ -63,12 +66,13 @@ from .retrieval import ( retrieve_graph_neighborhood, select_event_path, ) -from .service import RuntimeAdapterBundle, RuntimeConfig, health_report, resolve_runtime_adapters, runtime_from_config, service_contracts +from .service import LocalServiceRunner, RuntimeAdapterBundle, RuntimeConfig, health_report, resolve_runtime_adapters, runtime_from_config, service_contracts from .planner import plan_profile_execution from .runtime import PhaseMemoryRuntime __all__ = [ "ActivationPlan", + "ADAPTER_PACK_MANIFEST_SCHEMA", "Diagnostic", "ExternalAdapterPack", "FakeExternalEventLog", @@ -123,6 +127,8 @@ __all__ = [ "profile_from_markitect", "fake_external_adapter_pack", "fake_external_runtime_config", + "adapter_pack_manifest", + "validate_adapter_pack_manifest", "path_event", "package_request_from_selection", "package_response_envelope", @@ -132,6 +138,7 @@ __all__ = [ "retrieve_graph_neighborhood", "select_event_path", "RuntimeConfig", + "LocalServiceRunner", "RuntimeAdapterBundle", "health_report", "resolve_runtime_adapters", diff --git a/src/phase_memory/adapters.py b/src/phase_memory/adapters.py index 5c22476..e74b198 100644 --- a/src/phase_memory/adapters.py +++ b/src/phase_memory/adapters.py @@ -9,6 +9,7 @@ from typing import Any from .models import Diagnostic, MemoryEdge, MemoryEvent, MemoryGraph, MemoryNode, MemoryPath, PolicyDecision, ProfileIntent LOCAL_STORE_SCHEMA = "phase_memory.local_store.v1" +LOCAL_STORE_METADATA_FILE = "phase-memory.json" class InMemoryMemoryGraphStore: @@ -141,27 +142,99 @@ class FileBackedMemoryGraphStore: metadata={"store_schema_version": LOCAL_STORE_SCHEMA, "store_path": str(self.root)}, ) + def metadata(self) -> dict[str, Any]: + return _read_json(self.root / LOCAL_STORE_METADATA_FILE) + def repair_diagnostics(self, *, events: list[MemoryEvent] | None = None) -> tuple[Diagnostic, ...]: diagnostics: list[Diagnostic] = [] - node_ids = {node.node_id for node in self.list_nodes()} + nodes, node_diagnostics = _read_records(self.nodes_dir, MemoryNode.from_mapping, record_type="node") + edges, edge_diagnostics = _read_records(self.edges_dir, MemoryEdge.from_mapping, record_type="edge") + paths, path_diagnostics = _read_records(self.paths_dir, MemoryPath.from_mapping, record_type="path") + diagnostics.extend(self.metadata_diagnostics()) + diagnostics.extend(node_diagnostics) + diagnostics.extend(edge_diagnostics) + diagnostics.extend(path_diagnostics) + + node_ids = {node.node_id for node in nodes} event_ids = {event.event_id for event in events or ()} - for edge in self.list_edges(): + for edge in edges: if edge.source not in node_ids: diagnostics.append(Diagnostic("error", "missing_edge_source", "Edge source does not reference a node.", edge.edge_id, {"source": edge.source})) if edge.target not in node_ids: diagnostics.append(Diagnostic("error", "missing_edge_target", "Edge target does not reference a node.", edge.edge_id, {"target": edge.target})) - for path in self.list_paths(): + for path in paths: for event_id in path.event_ids: if event_id not in event_ids: diagnostics.append(Diagnostic("warn", "orphaned_path_event", "Path references an event not present in the event log.", path.path_id, {"event_id": event_id})) return tuple(diagnostics) + def metadata_diagnostics(self) -> tuple[Diagnostic, ...]: + metadata_path = self.root / LOCAL_STORE_METADATA_FILE + if not metadata_path.exists(): + return ( + Diagnostic( + "error", + "missing_store_metadata", + "Local store metadata file is missing.", + str(metadata_path), + {"expected_schema_version": LOCAL_STORE_SCHEMA}, + ), + ) + try: + metadata = _read_json(metadata_path) + except json.JSONDecodeError as exc: + return ( + Diagnostic( + "error", + "corrupt_store_metadata", + "Local store metadata file is not valid JSON.", + str(metadata_path), + {"error": str(exc)}, + ), + ) + + diagnostics: list[Diagnostic] = [] + schema_version = str(metadata.get("schema_version") or "") + if not schema_version: + diagnostics.append( + Diagnostic( + "warn", + "store_migration_required", + "Local store metadata does not declare a schema version.", + str(metadata_path), + {"from_schema_version": "", "to_schema_version": LOCAL_STORE_SCHEMA}, + ) + ) + elif schema_version != LOCAL_STORE_SCHEMA: + diagnostics.append( + Diagnostic( + "warn", + "store_migration_required", + "Local store metadata declares a schema version that needs migration.", + str(metadata_path), + {"from_schema_version": schema_version, "to_schema_version": LOCAL_STORE_SCHEMA}, + ) + ) + + planned = metadata.get("planned_migrations") or metadata.get("migrations") or () + if planned: + diagnostics.append( + Diagnostic( + "warn", + "planned_store_migrations", + "Local store metadata declares planned migrations.", + str(metadata_path), + {"migrations": list(planned)}, + ) + ) + return tuple(diagnostics) + def _ensure_layout(self) -> None: for directory in (self.root, self.profiles_dir, self.nodes_dir, self.edges_dir, self.paths_dir, self.activations_dir): directory.mkdir(parents=True, exist_ok=True) - metadata_path = self.root / "phase-memory.json" + metadata_path = self.root / LOCAL_STORE_METADATA_FILE if not metadata_path.exists(): - _write_json(metadata_path, {"schema_version": LOCAL_STORE_SCHEMA}) + _write_json(metadata_path, {"schema_version": LOCAL_STORE_SCHEMA, "migrations": []}) class JsonlMemoryEventLog: @@ -244,6 +317,12 @@ class RecordingAuditSink: self.events.append(stored) return {"recorded": True, "index": len(self.events) - 1, "event": stored} + def query(self, **filters: Any) -> list[dict[str, Any]]: + return filter_audit_events(self.events, **filters) + + def retention_metadata(self) -> dict[str, Any]: + return {"mode": "in_memory", "retention_days": None} + class JsonlAuditSink: def __init__(self, path: str | Path) -> None: @@ -259,6 +338,20 @@ class JsonlAuditSink: index = max(sum(1 for _ in handle) - 1, 0) return {"recorded": True, "index": index, "event": stored} + def query(self, **filters: Any) -> list[dict[str, Any]]: + events: list[dict[str, Any]] = [] + for raw in self.path.read_text(encoding="utf-8").splitlines(): + if not raw.strip(): + continue + try: + events.append(json.loads(raw)) + except json.JSONDecodeError: + continue + return filter_audit_events(events, **filters) + + def retention_metadata(self) -> dict[str, Any]: + return {"mode": "jsonl", "path": str(self.path), "retention_days": None} + class InMemorySemanticIndex: def __init__(self) -> None: @@ -308,4 +401,57 @@ def _read_json(path: Path) -> dict[str, Any]: def _write_json(path: Path, data: dict[str, Any]) -> None: path.parent.mkdir(parents=True, exist_ok=True) - path.write_text(json.dumps(data, indent=2, sort_keys=True) + "\n", encoding="utf-8") + tmp_path = path.with_name(f".{path.name}.tmp") + tmp_path.write_text(json.dumps(data, indent=2, sort_keys=True) + "\n", encoding="utf-8") + tmp_path.replace(path) + + +def _read_records(directory: Path, factory, *, record_type: str) -> tuple[list[Any], list[Diagnostic]]: + records: list[Any] = [] + diagnostics: list[Diagnostic] = [] + for path in sorted(directory.glob("*.json")): + try: + records.append(factory(_read_json(path))) + except (json.JSONDecodeError, ValueError, TypeError, KeyError) as exc: + diagnostics.append( + Diagnostic( + "error", + "corrupt_store_record", + "Local store record could not be decoded.", + str(path), + {"record_type": record_type, "error": str(exc)}, + ) + ) + return records, diagnostics + + +def filter_audit_events(events: list[dict[str, Any]], **filters: Any) -> list[dict[str, Any]]: + return [dict(event) for event in events if _audit_event_matches(event, filters)] + + +def _audit_event_matches(event: dict[str, Any], filters: dict[str, Any]) -> bool: + operation = filters.get("operation") + if operation is not None and event.get("operation") != operation: + return False + operation_id = filters.get("operation_id") + if operation_id is not None and event.get("operation_id") != operation_id: + return False + subject_kind = filters.get("subject_kind") + if subject_kind is not None and dict(event.get("subject") or {}).get("kind") != subject_kind: + return False + subject_id = filters.get("subject_id") + if subject_id is not None and dict(event.get("subject") or {}).get("id") != subject_id: + return False + source_ref = filters.get("source_ref") + if source_ref is not None and dict(event.get("source") or {}).get("ref") != source_ref: + return False + actor = filters.get("actor") + if actor is not None and event.get("actor") != actor: + return False + dry_run = filters.get("dry_run") + if dry_run is not None and bool(event.get("dry_run")) is not bool(dry_run): + return False + allowed = filters.get("allowed") + if allowed is not None and bool(event.get("allowed")) is not bool(allowed): + return False + return True diff --git a/src/phase_memory/external_adapters.py b/src/phase_memory/external_adapters.py index eadbe35..f9a255c 100644 --- a/src/phase_memory/external_adapters.py +++ b/src/phase_memory/external_adapters.py @@ -5,17 +5,48 @@ from __future__ import annotations from dataclasses import dataclass, field from typing import Any -from .adapters import InMemoryMemoryEventLog, InMemoryMemoryGraphStore, InMemorySemanticIndex -from .models import PolicyDecision +from .adapters import InMemoryMemoryEventLog, InMemoryMemoryGraphStore, InMemorySemanticIndex, filter_audit_events +from .models import Diagnostic, PolicyDecision from .service import RuntimeConfig from .utils import stable_digest +ADAPTER_PACK_MANIFEST_SCHEMA = "phase_memory.adapter_pack.manifest.v1" +ADAPTER_PACK_REQUIRED_ADAPTERS = ( + "graph_store", + "event_log", + "policy_gateway", + "audit_sink", + "package_compiler", + "semantic_index", + "runtime_registry", +) +ADAPTER_CONFORMANCE_HELPERS = { + "graph_store": "assert_graph_store_conformance", + "event_log": "assert_event_log_conformance", + "policy_gateway": "assert_policy_gateway_conformance", + "audit_sink": "assert_audit_sink_conformance", + "package_compiler": "assert_context_compiler_conformance", + "semantic_index": "assert_semantic_index_conformance", + "runtime_registry": "assert_runtime_registry_conformance", +} +ADAPTER_REQUIRED_CAPABILITIES = { + "graph_store": ("kontextual.graph-store.fake",), + "event_log": ("kontextual.event-log.fake",), + "policy_gateway": ("policy.gateway.fake",), + "audit_sink": ("telemetry.audit.fake",), + "package_compiler": ("markitect.package.compile",), + "semantic_index": ("semantic-index.fake",), + "runtime_registry": ("kontextual.runtime.registry",), +} + @dataclass(frozen=True) class ExternalAdapterPack: name: str adapters: dict[str, Any] capabilities: tuple[str, ...] = () + ownership_boundaries: dict[str, str] = field(default_factory=dict) + required_conformance: dict[str, str] = field(default_factory=dict) metadata: dict[str, Any] = field(default_factory=dict) def to_dict(self) -> dict[str, Any]: @@ -23,9 +54,14 @@ class ExternalAdapterPack: "name": self.name, "adapters": {key: value.__class__.__name__ for key, value in sorted(self.adapters.items())}, "capabilities": list(self.capabilities), + "ownership_boundaries": dict(self.ownership_boundaries), + "required_conformance": dict(self.required_conformance), "metadata": dict(self.metadata), } + def manifest(self) -> dict[str, Any]: + return adapter_pack_manifest(self) + class FakeExternalGraphStore(InMemoryMemoryGraphStore): """Kontextual-shaped graph store fake backed by deterministic memory.""" @@ -92,10 +128,14 @@ class FakeTelemetryAuditSink: "event": stored, } - def query(self, *, operation: str | None = None) -> list[dict[str, Any]]: - if operation is None: - return list(self.events) - return [event for event in self.events if event.get("operation") == operation] + def query(self, **filters: Any) -> list[dict[str, Any]]: + return filter_audit_events(self.events, **filters) + + def retention_metadata(self) -> dict[str, Any]: + return { + "mode": "fake_telemetry", + "retention_days": self.retention_days, + } class FakeKontextualRuntimeRegistry: @@ -133,9 +173,21 @@ def fake_external_adapter_pack() -> ExternalAdapterPack: "markitect.package.compile", "kontextual.runtime.registry", "kontextual.graph-store.fake", + "kontextual.event-log.fake", + "policy.gateway.fake", "telemetry.audit.fake", "semantic-index.fake", ), + ownership_boundaries={ + "graph_store": "kontextual owns durable graph records; phase-memory owns graph semantics", + "event_log": "kontextual owns event durability; phase-memory owns event shape", + "policy_gateway": "phase-memory owns policy decision contract; external pack owns gateway implementation", + "audit_sink": "external telemetry owns retention; phase-memory owns audit event shape", + "package_compiler": "markitect owns package compilation; phase-memory owns selection planning", + "semantic_index": "external retrieval owns index mechanics; phase-memory owns activation policy", + "runtime_registry": "kontextual owns envelope registry; phase-memory owns envelope contract", + }, + required_conformance=dict(ADAPTER_CONFORMANCE_HELPERS), metadata={"intended_for": "local conformance and integration tests"}, ) @@ -158,3 +210,70 @@ def fake_external_runtime_config() -> RuntimeConfig: runtime_registry_mode="external", trust_zone_labels=("external",), ) + + +def adapter_pack_manifest(pack: ExternalAdapterPack) -> dict[str, Any]: + return { + "schema_version": ADAPTER_PACK_MANIFEST_SCHEMA, + "name": pack.name, + "capabilities": sorted(pack.capabilities), + "metadata": dict(pack.metadata), + "adapters": { + key: { + "class": pack.adapters[key].__class__.__name__, + "ownership": pack.ownership_boundaries.get(key, ""), + "required_capabilities": list(ADAPTER_REQUIRED_CAPABILITIES.get(key, ())), + "required_conformance": pack.required_conformance.get(key, ADAPTER_CONFORMANCE_HELPERS.get(key, "")), + } + for key in sorted(pack.adapters) + }, + } + + +def validate_adapter_pack_manifest(pack: ExternalAdapterPack) -> tuple[Diagnostic, ...]: + diagnostics: list[Diagnostic] = [] + capabilities = set(pack.capabilities) + for adapter in ADAPTER_PACK_REQUIRED_ADAPTERS: + if adapter not in pack.adapters: + diagnostics.append( + Diagnostic( + "error", + "missing_adapter", + "Adapter pack is missing a required adapter.", + adapter, + {"adapter": adapter}, + ) + ) + continue + if not pack.ownership_boundaries.get(adapter): + diagnostics.append( + Diagnostic( + "warn", + "missing_adapter_ownership", + "Adapter pack does not declare an ownership boundary for this adapter.", + adapter, + {"adapter": adapter}, + ) + ) + if not pack.required_conformance.get(adapter): + diagnostics.append( + Diagnostic( + "warn", + "missing_conformance_helper", + "Adapter pack does not declare the conformance helper required for this adapter.", + adapter, + {"adapter": adapter}, + ) + ) + for capability in ADAPTER_REQUIRED_CAPABILITIES.get(adapter, ()): + if capability not in capabilities: + diagnostics.append( + Diagnostic( + "error", + "missing_adapter_capability", + "Adapter pack does not declare a capability required by an adapter.", + adapter, + {"adapter": adapter, "capability": capability}, + ) + ) + return tuple(diagnostics) diff --git a/src/phase_memory/policy.py b/src/phase_memory/policy.py index 2883d94..796ab2b 100644 --- a/src/phase_memory/policy.py +++ b/src/phase_memory/policy.py @@ -21,6 +21,7 @@ class MemoryOperation(str, Enum): LIFECYCLE_PLAN = "graph.lifecycle.plan" ACTIVATION_PLAN = "graph.activation.plan" PACKAGE_COMPILE = "package.compile" + AUDIT_QUERY = "audit.query" LIFECYCLE_APPLY = "lifecycle.apply" STABILIZATION = "memory.stabilize" COMPACTION = "memory.compact" diff --git a/src/phase_memory/ports.py b/src/phase_memory/ports.py index d6b1a3d..b2b088f 100644 --- a/src/phase_memory/ports.py +++ b/src/phase_memory/ports.py @@ -39,6 +39,7 @@ class PolicyGateway(Protocol): class AuditSink(Protocol): def record(self, event: dict[str, Any]) -> dict[str, Any]: ... + def query(self, **filters: Any) -> list[dict[str, Any]]: ... class RuntimeRegistry(Protocol): diff --git a/src/phase_memory/runtime.py b/src/phase_memory/runtime.py index e2e1805..927e298 100644 --- a/src/phase_memory/runtime.py +++ b/src/phase_memory/runtime.py @@ -14,6 +14,7 @@ from .adapters import ( InMemoryMemoryGraphStore, NoopContextPackageCompiler, RecordingAuditSink, + filter_audit_events, ) from .bridge import MARKITECT_PACKAGE_REQUEST_SCHEMA, package_request_from_selection, package_response_envelope from .contracts import ContractIngressResult, graph_from_markitect, profile_from_markitect @@ -36,6 +37,7 @@ from .ports import AuditSink, ContextPackageCompiler, MemoryEventLog, MemoryGrap from .utils import compact_dict, stable_digest, to_plain RUNTIME_ENVELOPE_SCHEMA = "phase_memory.runtime.envelope.v1" +AUDIT_QUERY_SCHEMA = "phase_memory.audit.query.v1" PACKAGE_REQUEST_SCHEMA = MARKITECT_PACKAGE_REQUEST_SCHEMA @@ -263,6 +265,41 @@ class PhaseMemoryRuntime: data={"package_request": request, "package_response": package_response_envelope(response, request_id=request["id"])}, ) + def query_audit(self, filters: dict[str, Any] | None = None, *, source_ref: str = "audit") -> dict[str, Any]: + filters = _audit_filters(filters or {}) + policy = self.policy_gateway.authorize( + action="audit.query", + resource="audit:events", + context={"source_ref": source_ref, "dry_run": True, "filters": filters}, + ) + events, diagnostics = _query_audit_sink(self.audit_sink, filters) if policy.allowed else ([], ()) + operation_id = f"op:{stable_digest(['audit.query', source_ref, filters])}" + audit = self.audit_sink.record( + audit_event( + operation_id=operation_id, + operation="audit.query", + subject={"kind": "audit_events", "id": stable_digest(filters)}, + policy_decision=policy, + dry_run=True, + source_ref=source_ref, + ) + ) + return { + "schema_version": AUDIT_QUERY_SCHEMA, + "operation_id": operation_id, + "operation": "audit.query", + "dry_run": True, + "valid": policy.allowed and not any(diagnostic.severity == "error" for diagnostic in diagnostics), + "filters": filters, + "count": len(events), + "events": events, + "retention": _audit_retention_metadata(self.audit_sink), + "policy_decision": _policy_to_dict(policy), + "audit_receipt": audit, + "diagnostics": [diagnostic.to_dict() for diagnostic in diagnostics], + "source": {"ref": source_ref}, + } + def export_graph(self, *, graph_id: str = "local", source_ref: str = "local-store") -> dict[str, Any]: events = self.event_log.list_events() if hasattr(self.graph_store, "export_graph"): @@ -450,6 +487,55 @@ def _policy_to_dict(decision: PolicyDecision) -> dict[str, Any]: return decision.to_dict() if hasattr(decision, "to_dict") else to_plain(decision) +def _audit_filters(filters: dict[str, Any]) -> dict[str, Any]: + allowed_keys = { + "operation", + "operation_id", + "subject_kind", + "subject_id", + "source_ref", + "actor", + "dry_run", + "allowed", + } + return {key: filters[key] for key in sorted(allowed_keys) if key in filters and filters[key] is not None} + + +def _query_audit_sink(sink: AuditSink, filters: dict[str, Any]) -> tuple[list[dict[str, Any]], tuple[Diagnostic, ...]]: + if hasattr(sink, "query"): + try: + return list(sink.query(**filters)), () + except TypeError: + try: + events = sink.query(operation=filters.get("operation")) + except TypeError: + events = sink.query() + return filter_audit_events(list(events), **filters), () + if hasattr(sink, "events"): + return filter_audit_events(list(getattr(sink, "events")), **filters), () + return ( + [], + ( + Diagnostic( + "error", + "audit_query_unsupported", + "Audit sink does not expose queryable audit records.", + sink.__class__.__name__, + ), + ), + ) + + +def _audit_retention_metadata(sink: AuditSink) -> dict[str, Any]: + if hasattr(sink, "retention_metadata"): + return dict(sink.retention_metadata()) + retention_days = getattr(sink, "retention_days", None) + return { + "mode": sink.__class__.__name__, + "retention_days": retention_days, + } + + def _coerce_action(data: LifecycleAction | dict[str, Any]) -> LifecycleAction: if isinstance(data, LifecycleAction): return data diff --git a/src/phase_memory/service.py b/src/phase_memory/service.py index 76fe564..2ede028 100644 --- a/src/phase_memory/service.py +++ b/src/phase_memory/service.py @@ -379,6 +379,26 @@ class LocalServiceRunner: max_items=int(budget["max_items"]), max_tokens=int(budget["max_tokens"]), profile_id=payload.get("profile_id"), + priority_node_ids=tuple(payload.get("priority_node_ids") or ()), + include_events=bool(payload.get("include_events", True)), + policy_context=dict(payload.get("policy_context") or {}), + ) + if operation == "package.compile": + return self.runtime.compile_package( + payload["selection"], + source_ref=payload.get("source_ref", "service"), + ) + if operation == "lifecycle.apply": + return self.runtime.apply_lifecycle_actions( + payload["actions"], + approval_marker=str(payload.get("approval_marker") or ""), + review_record=payload.get("review_record"), + source_ref=payload.get("source_ref", "service"), + ) + if operation == "audit.query": + return self.runtime.query_audit( + dict(payload.get("filters") or {}), + source_ref=payload.get("source_ref", "service"), ) raise ValueError(f"Unsupported service operation: {operation}") @@ -433,6 +453,7 @@ def assert_policy_gateway_conformance(gateway) -> None: def assert_audit_sink_conformance(sink) -> None: receipt = sink.record({"operation": "conformance"}) assert receipt.get("recorded") is True + assert sink.query(operation="conformance")[0]["operation"] == "conformance" def assert_semantic_index_conformance(index) -> None: diff --git a/tests/fixtures/evaluation-scenarios.json b/tests/fixtures/evaluation-scenarios.json new file mode 100644 index 0000000..926297a --- /dev/null +++ b/tests/fixtures/evaluation-scenarios.json @@ -0,0 +1,170 @@ +{ + "schema_version": "phase_memory.evaluation_scenarios.v1", + "scenarios": [ + { + "id": "policy-denied-activation", + "profile": { + "schema_version": "markitect.memory.profile.v1", + "id": "eval-policy-profile", + "memory_kinds": ["knowledge", "decision"], + "activation": {"max_items": 4, "max_tokens": 60}, + "policy": {"mode": "allow-all", "trust_zone_labels": ["local"]}, + "observability": {"audit_sink": "recording"} + }, + "graph": { + "schema_version": "markitect.memory.graph.v1", + "id": "eval-policy-graph", + "nodes": [ + { + "id": "policy.public", + "kind": "knowledge", + "text": "Public operating constraint that can be activated for local planning.", + "phase": "stabilized", + "policy": {"labels": ["public"], "trust_zone": "local"}, + "source_spans": [{"path": "policy.md", "line_start": 1}], + "metadata": {"graph_id": "eval-policy-graph"} + }, + { + "id": "policy.secret", + "kind": "knowledge", + "text": "Sensitive credential note that must not enter restart context.", + "phase": "stabilized", + "policy": {"labels": ["restricted"], "trust_zone": "local", "secret": true}, + "metadata": {"graph_id": "eval-policy-graph"} + } + ], + "edges": [ + { + "id": "edge.policy", + "kind": "references", + "source": "policy.public", + "target": "policy.secret" + } + ], + "events": [] + }, + "expect": {"denied_node_ids": ["policy.secret"]} + }, + { + "id": "profile-lifecycle-rules", + "profile": { + "schema_version": "markitect.memory.profile.v1", + "id": "eval-lifecycle-profile", + "memory_kinds": ["episode", "decision"], + "retention": { + "episode": {"stale_after_days": 7}, + "decision": {"delete_after_days": 365} + }, + "refresh": {"mode": "enabled"}, + "compaction": {"node_ids": ["life.old-episode"]}, + "metadata": { + "phase_transitions": [ + { + "node_kind": "decision", + "from_phase": "fluid", + "to_phase": "stabilized", + "min_age_days": 2, + "reason": "decision has stabilized" + } + ] + } + }, + "graph": { + "schema_version": "markitect.memory.graph.v1", + "id": "eval-lifecycle-graph", + "nodes": [ + { + "id": "life.old-episode", + "kind": "episode", + "text": "An old episode ready to become stale and compacted.", + "phase": "fluid", + "freshness": {"updated_at": "2026-04-01T00:00:00+00:00", "source_digest": "old"}, + "metadata": {"graph_id": "eval-lifecycle-graph"} + }, + { + "id": "life.decision", + "kind": "decision", + "text": "A decision that should transition to stabilized after review.", + "phase": "fluid", + "freshness": {"updated_at": "2026-05-01T00:00:00+00:00", "source_digest": "decision-old"}, + "metadata": {"graph_id": "eval-lifecycle-graph"} + } + ], + "edges": [], + "events": [] + }, + "expect": { + "actions": [ + ["life.old-episode", "mark_stale"], + ["life.decision", "transition_phase"], + ["life.decision", "refresh"] + ], + "compact_source": "life.old-episode" + } + }, + { + "id": "budget-path-and-semantic-hints", + "profile": { + "schema_version": "markitect.memory.profile.v1", + "id": "eval-budget-profile", + "memory_kinds": ["decision", "knowledge", "episode"], + "activation": {"max_items": 2, "max_tokens": 16, "semantic_index": "memory"} + }, + "graph": { + "schema_version": "markitect.memory.graph.v1", + "id": "eval-budget-graph", + "nodes": [ + { + "id": "budget.anchor", + "kind": "decision", + "text": "Restart anchor with source.", + "phase": "stabilized", + "source_spans": [{"path": "restart.md", "line_start": 3}], + "metadata": {"graph_id": "eval-budget-graph"} + }, + { + "id": "budget.semantic", + "kind": "knowledge", + "text": "Semantic index hint for restart package selection.", + "phase": "stabilized", + "source_spans": [{"path": "retrieval.md", "line_start": 7}], + "metadata": {"graph_id": "eval-budget-graph"} + }, + { + "id": "budget.long", + "kind": "episode", + "text": "This verbose episode is intentionally long enough to lose against the strict activation token budget pressure.", + "phase": "fluid", + "metadata": {"graph_id": "eval-budget-graph"} + } + ], + "edges": [ + { + "id": "edge.budget", + "kind": "supports", + "source": "budget.anchor", + "target": "budget.semantic" + } + ], + "events": [ + { + "id": "budget.path-event", + "kind": "activated", + "timestamp": "2026-05-18T00:00:00+00:00", + "activation_refs": ["activation.budget"] + } + ] + }, + "path": { + "id": "path.budget", + "event_ids": ["budget.path-event"] + }, + "expect": { + "selected_node_ids": ["budget.anchor", "budget.semantic"], + "omitted_node_ids": ["budget.long"], + "semantic_top_id": "budget.semantic", + "event_ids": ["budget.path-event"] + } + } + ] +} diff --git a/tests/test_evaluation_scenarios.py b/tests/test_evaluation_scenarios.py new file mode 100644 index 0000000..13115f8 --- /dev/null +++ b/tests/test_evaluation_scenarios.py @@ -0,0 +1,101 @@ +import json +from datetime import datetime, timezone +from pathlib import Path + +from phase_memory.adapters import InMemorySemanticIndex +from phase_memory.contracts import graph_from_markitect +from phase_memory.models import ActivationPlan, MemoryPath +from phase_memory.retrieval import activation_quality_report, select_event_path +from phase_memory.runtime import PhaseMemoryRuntime + + +FIXTURES = Path(__file__).parent / "fixtures" + + +def _scenarios(): + data = json.loads((FIXTURES / "evaluation-scenarios.json").read_text(encoding="utf-8")) + return {scenario["id"]: scenario for scenario in data["scenarios"]} + + +def test_policy_denied_activation_scenario_is_redacted_and_audited() -> None: + scenario = _scenarios()["policy-denied-activation"] + runtime = PhaseMemoryRuntime() + + response = runtime.plan_activation( + scenario["graph"], + max_items=4, + max_tokens=60, + profile_id=scenario["profile"]["id"], + policy_context={"denied_labels": ["restricted"], "secrets_allowed": False, "trust_zone": "local"}, + ) + audit = runtime.query_audit({"operation": "graph.activation.plan"}) + + denied_ids = [item["id"] for item in response["data"]["policy_denials"]] + assert response["valid"] is True + assert denied_ids == scenario["expect"]["denied_node_ids"] + assert response["data"]["policy_denials"][0]["text"] == "[REDACTED]" + assert [diagnostic["code"] for diagnostic in response["diagnostics"]] == ["activation_policy_denied"] + assert audit["count"] == 1 + + +def test_profile_lifecycle_rules_scenario_emits_expected_actions() -> None: + scenario = _scenarios()["profile-lifecycle-rules"] + runtime = PhaseMemoryRuntime() + + response = runtime.plan_lifecycle_with_profile( + scenario["profile"], + scenario["graph"], + refresh_digests={"life.decision": "decision-new"}, + now=datetime(2026, 5, 18, tzinfo=timezone.utc), + ) + + actions = [(action["target_id"], action["action"]) for action in response["data"]["dry_run_actions"]] + compact_actions = [action for action in response["data"]["dry_run_actions"] if action["action"] == "compact"] + assert response["valid"] is True + for expected in scenario["expect"]["actions"]: + assert tuple(expected) in actions + assert compact_actions[0]["metadata"]["source_node_ids"] == [scenario["expect"]["compact_source"]] + + +def test_budget_path_and_semantic_hint_scenario_meets_quality_thresholds() -> None: + scenario = _scenarios()["budget-path-and-semantic-hints"] + graph = graph_from_markitect(scenario["graph"]).value + runtime = PhaseMemoryRuntime() + index = InMemorySemanticIndex() + + index.upsert_nodes(list(graph.nodes)) + response = runtime.plan_activation( + scenario["graph"], + max_items=scenario["profile"]["activation"]["max_items"], + max_tokens=scenario["profile"]["activation"]["max_tokens"], + profile_id=scenario["profile"]["id"], + priority_node_ids=tuple(scenario["expect"]["selected_node_ids"]), + ) + path = MemoryPath.from_mapping(scenario["path"]) + selected_path_events = select_event_path(graph.events, path, max_events=2) + semantic_results = index.query(graph_id=graph.graph_id, query="semantic restart", limit=2) + report = activation_quality_report(_activation_plan(response), expected_node_ids=tuple(scenario["expect"]["selected_node_ids"])) + + plan = response["data"]["activation_plan"] + assert plan["selected_node_ids"] == scenario["expect"]["selected_node_ids"] + assert [item["id"] for item in plan["omitted"]] == scenario["expect"]["omitted_node_ids"] + assert selected_path_events == tuple(scenario["expect"]["event_ids"]) + assert semantic_results[0]["id"] == scenario["expect"]["semantic_top_id"] + assert report["source_span_coverage"] == 1.0 + assert report["explanation_coverage"] == 1.0 + + +def _activation_plan(response): + data = response["data"]["activation_plan"] + return ActivationPlan( + plan_id=data["plan_id"], + graph_id=data["graph_id"], + selected_node_ids=tuple(data["selected_node_ids"]), + selected_event_ids=tuple(data["selected_event_ids"]), + omitted=tuple(data["omitted"]), + token_estimate=data["token_estimate"], + max_items=data["max_items"], + max_tokens=data["max_tokens"], + selection=response["data"]["package_request"]["selection"], + diagnostics=(), + ) diff --git a/tests/test_external_adapter_packs.py b/tests/test_external_adapter_packs.py index 10ef319..e22c752 100644 --- a/tests/test_external_adapter_packs.py +++ b/tests/test_external_adapter_packs.py @@ -1,7 +1,14 @@ import json from pathlib import Path -from phase_memory.external_adapters import fake_external_adapter_pack, fake_external_runtime_config +from phase_memory.external_adapters import ( + ADAPTER_PACK_MANIFEST_SCHEMA, + ExternalAdapterPack, + adapter_pack_manifest, + fake_external_adapter_pack, + fake_external_runtime_config, + validate_adapter_pack_manifest, +) from phase_memory.service import ( assert_audit_sink_conformance, assert_context_compiler_conformance, @@ -37,6 +44,35 @@ def test_fake_external_adapter_pack_satisfies_public_conformance_helpers() -> No assert pack.to_dict()["adapters"]["package_compiler"] == "FakeMarkitectPackageCompiler" +def test_fake_external_adapter_pack_manifest_declares_compatibility() -> None: + pack = fake_external_adapter_pack() + + manifest = adapter_pack_manifest(pack) + diagnostics = validate_adapter_pack_manifest(pack) + + assert manifest["schema_version"] == ADAPTER_PACK_MANIFEST_SCHEMA + assert manifest["adapters"]["package_compiler"]["required_conformance"] == "assert_context_compiler_conformance" + assert manifest["adapters"]["audit_sink"]["required_capabilities"] == ["telemetry.audit.fake"] + assert diagnostics == () + + +def test_adapter_pack_manifest_reports_missing_capabilities() -> None: + pack = fake_external_adapter_pack() + incomplete = ExternalAdapterPack( + name=pack.name, + adapters=pack.adapters, + capabilities=tuple(capability for capability in pack.capabilities if capability != "telemetry.audit.fake"), + ownership_boundaries=pack.ownership_boundaries, + required_conformance=pack.required_conformance, + metadata=pack.metadata, + ) + + diagnostics = validate_adapter_pack_manifest(incomplete) + + assert [diagnostic.code for diagnostic in diagnostics] == ["missing_adapter_capability"] + assert diagnostics[0].metadata["capability"] == "telemetry.audit.fake" + + def test_external_runtime_config_resolves_supplied_fake_pack() -> None: config = fake_external_runtime_config() pack = fake_external_adapter_pack() diff --git a/tests/test_file_backed_runtime.py b/tests/test_file_backed_runtime.py index 06f147d..9a65f0e 100644 --- a/tests/test_file_backed_runtime.py +++ b/tests/test_file_backed_runtime.py @@ -87,6 +87,44 @@ def test_repair_diagnostics_report_missing_edges_and_orphaned_path_events(tmp_pa assert [diagnostic["code"] for diagnostic in envelope["diagnostics"]] == ["missing_edge_target", "orphaned_path_event"] +def test_file_backed_store_reports_migration_needs_and_uses_atomic_json_writes(tmp_path) -> None: + store = FileBackedMemoryGraphStore(tmp_path) + metadata_path = tmp_path / "phase-memory.json" + metadata_path.write_text( + json.dumps( + { + "schema_version": "phase_memory.local_store.v0", + "planned_migrations": ["v0-to-v1"], + } + ), + encoding="utf-8", + ) + + store.save_node(MemoryNode("node.atomic", "decision", "Atomic write target")) + runtime = PhaseMemoryRuntime(graph_store=store, event_log=JsonlMemoryEventLog(tmp_path / "events.jsonl")) + + envelope = runtime.repair_diagnostics(source_ref=str(tmp_path)) + + codes = [diagnostic["code"] for diagnostic in envelope["diagnostics"]] + assert envelope["valid"] is True + assert "store_migration_required" in codes + assert "planned_store_migrations" in codes + assert not list(tmp_path.rglob("*.tmp")) + + +def test_repair_diagnostics_distinguish_corrupt_store_records(tmp_path) -> None: + store = FileBackedMemoryGraphStore(tmp_path) + runtime = PhaseMemoryRuntime(graph_store=store, event_log=JsonlMemoryEventLog(tmp_path / "events.jsonl")) + + (tmp_path / "nodes" / "broken.json").write_text("{not-json}\n", encoding="utf-8") + + envelope = runtime.repair_diagnostics(source_ref=str(tmp_path)) + + assert envelope["valid"] is False + assert envelope["diagnostics"][0]["code"] == "corrupt_store_record" + assert envelope["diagnostics"][0]["metadata"]["record_type"] == "node" + + def test_lifecycle_apply_requires_approval_for_reviewable_actions(tmp_path) -> None: store = FileBackedMemoryGraphStore(tmp_path) runtime = PhaseMemoryRuntime(graph_store=store, event_log=JsonlMemoryEventLog(tmp_path / "events.jsonl")) diff --git a/tests/test_service_readiness.py b/tests/test_service_readiness.py index 8072e16..fa45158 100644 --- a/tests/test_service_readiness.py +++ b/tests/test_service_readiness.py @@ -1,7 +1,8 @@ import json from pathlib import Path -from phase_memory.models import LifecycleState, MemoryNode +from phase_memory.lifecycle import plan_compaction +from phase_memory.models import LifecycleAction, LifecycleActionKind, LifecycleState, MemoryNode from phase_memory.service import ( HEALTH_REPORT_SCHEMA, KONTEXTUAL_DELEGATION_SCHEMA, @@ -76,6 +77,58 @@ def test_service_runner_handles_profile_driven_lifecycle_plan() -> None: assert ("event.restart", "refresh") in actions +def test_service_runner_handles_package_compile_and_audit_query() -> None: + runner = LocalServiceRunner() + selection = { + "schema_version": "markitect.memory.selection.v1", + "id": "selection.service", + "nodes": ["decision.boundary"], + "events": ["event.activation"], + } + + compiled = runner.handle("package.compile", {"selection": selection, "source_ref": "service-test"}) + audit = runner.handle("audit.query", {"filters": {"operation": "package.compile"}}) + + assert compiled["operation"] == "package.compile" + assert compiled["data"]["package_response"]["package_ref"] == "package:selection.service" + assert audit["operation"] == "audit.query" + assert audit["count"] == 1 + assert audit["events"][0]["source"]["ref"] == "service-test" + assert audit["retention"]["mode"] == "in_memory" + + +def test_service_runner_handles_review_gated_lifecycle_apply() -> None: + runner = LocalServiceRunner() + node = runner.runtime.graph_store.save_node(MemoryNode("node.review", "episode", "Review gated content")) + compact = plan_compaction([node]).to_dict() + + denied = runner.handle("lifecycle.apply", {"actions": [compact]}) + applied = runner.handle("lifecycle.apply", {"actions": [compact], "approval_marker": "review:service"}) + audit = runner.handle("audit.query", {"filters": {"operation": "lifecycle.apply", "dry_run": False}}) + + assert denied["valid"] is False + assert denied["data"]["denied"][0]["reason"] == "review_required" + assert applied["valid"] is True + assert runner.runtime.graph_store.get_node(applied["data"]["applied"][0]["target_id"]).kind == "summary" + assert audit["count"] == 2 + + +def test_service_runner_handles_non_review_lifecycle_apply() -> None: + runner = LocalServiceRunner() + runner.runtime.graph_store.save_node(MemoryNode("node.stale.service", "episode")) + action = LifecycleAction( + LifecycleActionKind.MARK_STALE, + "node.stale.service", + from_state=LifecycleState.ACTIVE, + to_state=LifecycleState.STALE, + ) + + applied = runner.handle("lifecycle.apply", {"actions": [action.to_dict()]}) + + assert applied["valid"] is True + assert runner.runtime.graph_store.get_node("node.stale.service").lifecycle == LifecycleState.STALE + + def test_profile_driven_runtime_config_resolves_file_backed_adapters(tmp_path) -> None: config = RuntimeConfig.from_profile( { diff --git a/workplans/PMEM-WP-0011-refinement-hardening-and-operational-readiness.md b/workplans/PMEM-WP-0011-refinement-hardening-and-operational-readiness.md index a20bdce..7e1bd34 100644 --- a/workplans/PMEM-WP-0011-refinement-hardening-and-operational-readiness.md +++ b/workplans/PMEM-WP-0011-refinement-hardening-and-operational-readiness.md @@ -4,7 +4,7 @@ type: workplan title: "Refinement Hardening And Operational Readiness" domain: markitect repo: phase-memory -status: ready +status: finished owner: codex topic_slug: phase-memory created: "2026-05-18" @@ -34,7 +34,7 @@ The repo now has: - fake external adapter packs. The refined scorecard in `docs/maturity-scorecard.md` scores the project at -**3.8 / 5** overall, with stronger local integration maturity than operational +**4.0 / 5** overall, with stronger local integration maturity than operational maturity. ## Non-Goals @@ -48,7 +48,7 @@ maturity. ```task id: PMEM-WP-0011-T01 -status: todo +status: done priority: high state_hub_task_id: "2b3c6eb4-8d3f-4c73-ab53-74e1bed8b93f" ``` @@ -68,7 +68,7 @@ Acceptance: ```task id: PMEM-WP-0011-T02 -status: todo +status: done priority: high state_hub_task_id: "2c19cfb0-e147-40b8-b964-6c617bddb90e" ``` @@ -86,7 +86,7 @@ Acceptance: ```task id: PMEM-WP-0011-T03 -status: todo +status: done priority: high state_hub_task_id: "cdce1c6a-4581-4184-87c6-f7bec6c3fcbd" ``` @@ -105,7 +105,7 @@ Acceptance: ```task id: PMEM-WP-0011-T04 -status: todo +status: done priority: medium state_hub_task_id: "602c22bb-d440-4d38-a51f-bf6ed504fd1e" ``` @@ -123,7 +123,7 @@ Acceptance: ```task id: PMEM-WP-0011-T05 -status: todo +status: done priority: medium state_hub_task_id: "c4fa6001-b20c-4ec1-b885-af9b80c832de" ``` @@ -140,7 +140,7 @@ Acceptance: ```task id: PMEM-WP-0011-T06 -status: todo +status: done priority: medium state_hub_task_id: "f4674eaf-cbc1-4eac-b1d1-b07ae51289cf" ``` @@ -162,4 +162,24 @@ Acceptance: ## Closure Review -Pending implementation. +Completed on 2026-05-18. + +Implemented: + +- Full `LocalServiceRunner` handling for every `SERVICE_OPERATIONS` entry. +- Runtime and service audit queries with queryable recording/JSONL/fake audit + sinks and retention metadata. +- Review-gated `lifecycle.apply` and `package.compile` service coverage. +- Atomic JSON writes for file-backed store records plus metadata migration, + planned-migration, corrupt-record, missing-reference, and orphaned-path + diagnostics. +- Three evaluation scenario families covering policy-denied activation, + profile lifecycle rules, event-path activation, semantic-index hints, and + budget pressure. +- Adapter pack compatibility manifests and explicit missing-capability + diagnostics. +- Operational readiness docs and scorecard update from 3.8 to 4.0. + +Verification: + +- `python3 -m pytest` passes with 70 tests. diff --git a/workplans/PMEM-WP-0012-live-adapter-and-service-binding-readiness.md b/workplans/PMEM-WP-0012-live-adapter-and-service-binding-readiness.md new file mode 100644 index 0000000..1d12a8c --- /dev/null +++ b/workplans/PMEM-WP-0012-live-adapter-and-service-binding-readiness.md @@ -0,0 +1,152 @@ +--- +id: PMEM-WP-0012 +type: workplan +title: "Live Adapter And Service Binding Readiness" +domain: markitect +repo: phase-memory +status: ready +owner: codex +topic_slug: phase-memory +created: "2026-05-18" +updated: "2026-05-18" +state_hub_workstream_id: "427b91ad-9df1-4053-aeb0-54ee39b6bf62" +--- + +# PMEM-WP-0012: Live Adapter And Service Binding Readiness + +## Goal + +Move phase-memory from local integration readiness toward operational +readiness by adding deployable service bindings, executable migration behavior, +and live-shaped adapter compatibility gates while preserving the dependency-light +local runtime. + +## Current Evidence + +`PMEM-WP-0011` brought the scorecard to **4.0 / 5** by closing local service +runner parity, queryable audit behavior, persistence diagnostics, multi-scenario +evaluation fixtures, adapter pack manifests, and operational recipes. + +## Non-Goals + +- Require live external credentials in default tests. +- Make a specific web framework mandatory for library users. +- Move Markitect or Kontextual ownership into this repo. +- Replace deterministic fake adapters with network-dependent defaults. + +## T01 - Add optional service binding + +```task +id: PMEM-WP-0012-T01 +status: todo +priority: high +state_hub_task_id: "1244aabb-b8a3-4053-8454-499e8772f5bf" +``` + +Add an optional framework binding or adapter shell around `LocalServiceRunner` +for health, readiness, and operation dispatch. + +Acceptance: + +- Binding preserves the framework-neutral `LocalServiceRunner` API. +- Health/readiness endpoints cover config diagnostics and adapter availability. +- Tests run without starting a network listener by default. + +## T02 - Add executable local-store migrations + +```task +id: PMEM-WP-0012-T02 +status: todo +priority: high +state_hub_task_id: "b8d3e7a0-c538-4d6c-b2f8-7c33b17c850a" +``` + +Turn migration diagnostics into explicit migration planning and apply behavior. + +Acceptance: + +- Store metadata can produce deterministic migration plans. +- Migration apply updates metadata atomically and records an audit event. +- Tests cover no-op, old-schema, planned-migration, and corrupt-metadata paths. + +## T03 - Add live-shaped adapter compatibility fixtures + +```task +id: PMEM-WP-0012-T03 +status: todo +priority: high +state_hub_task_id: "e385af31-13f2-4be0-8fcf-89586e2d3954" +``` + +Add adapter fixtures that model Markitect and Kontextual live behavior behind +the same manifest and conformance helpers used by fake packs. + +Acceptance: + +- Adapter manifest validation covers fake and live-shaped packs. +- Capability and ownership failures remain explicit diagnostics. +- The runtime can resolve live-shaped packs without changing local code paths. + +## T04 - Add audit retention and telemetry export drills + +```task +id: PMEM-WP-0012-T04 +status: todo +priority: medium +state_hub_task_id: "d203294a-bf5a-43d0-a88c-086a3406940d" +``` + +Make audit retention policy and telemetry export inspectable beyond metadata. + +Acceptance: + +- Audit sinks expose retention eligibility or pruning plans. +- Telemetry export emits deterministic local event batches. +- Tests cover review-gated apply, policy denial, and package compile traces. + +## T05 - Grow evaluation threshold reporting + +```task +id: PMEM-WP-0012-T05 +status: todo +priority: medium +state_hub_task_id: "305729e2-23ff-4043-9356-0df83f8e6d7b" +``` + +Promote the evaluation scenarios into a threshold report suitable for regression +tracking. + +Acceptance: + +- Evaluation report includes policy, lifecycle, path, semantic, and budget + metrics. +- Threshold assertions produce actionable diagnostics. +- Fixture additions do not require live dependencies. + +## T06 - Add public API compatibility checks + +```task +id: PMEM-WP-0012-T06 +status: todo +priority: medium +state_hub_task_id: "78f9d0d8-dc9d-4f43-a32d-92e17b3c5122" +``` + +Protect the embedding surface now documented as stable. + +Acceptance: + +- Public exports have a compatibility snapshot or explicit changelog gate. +- Service operation catalog and local runner handlers stay in parity by test. +- Docs identify how breaking changes should be handled. + +## Acceptance Criteria + +- Scorecard has concrete evidence toward the 4.3+ gate. +- Optional operational surfaces stay optional and dependency-light by default. +- Live-shaped adapters can be validated by the same compatibility contract as + fake packs. + +## Closure Review + +Pending implementation.