diff --git a/README.md b/README.md index f11590c..fe260fd 100644 --- a/README.md +++ b/README.md @@ -95,5 +95,6 @@ for package bridge boundaries, [docs/activation-quality.md](docs/activation-qual for retrieval and evaluation behavior, [docs/service-readiness.md](docs/service-readiness.md) for service and adapter contracts, [docs/lifecycle-rules.md](docs/lifecycle-rules.md) for profile-driven lifecycle rules, [docs/external-adapter-packs.md](docs/external-adapter-packs.md) -for fake external integration packs, and [SCOPE.md](SCOPE.md) for repository +for fake external integration packs, [docs/maturity-scorecard.md](docs/maturity-scorecard.md) +for the current maturity assessment, and [SCOPE.md](SCOPE.md) for repository boundaries. diff --git a/docs/maturity-scorecard.md b/docs/maturity-scorecard.md new file mode 100644 index 0000000..223a5bf --- /dev/null +++ b/docs/maturity-scorecard.md @@ -0,0 +1,113 @@ +# Phase Memory Maturity Scorecard + +Updated: 2026-05-18 + +## Purpose + +This scorecard tracks progress toward `INTENT.md`: a profile-driven, +phase-aware memory infrastructure layer for agentic systems. + +The original scorecard treated roadmap closure and fake external adapters as +near-operational maturity. The refined scoring below is stricter: fake adapters +prove wiring and contracts, but live durability, migration, telemetry, service +bindings, and broader evaluation corpora are still needed before scoring close +to 5. + +## Scoring Model + +| Score | Meaning | +| --- | --- | +| 0 | Not started. | +| 1 | Intent or docs only. | +| 2 | Deterministic local library behavior with tests. | +| 3 | Usable runtime or CLI behavior with stable envelopes. | +| 4 | Integration-ready local service boundary with policy, persistence, interop, and conformance coverage. | +| 5 | Operationally mature with live adapter implementations, migrations, telemetry, retention, service bindings, and evaluation gates. | + +## Current Score + +Overall maturity: **3.8 / 5** + +Two sub-scores make the result easier to reason about: + +- Local integration maturity: **4.1 / 5** +- Operational maturity: **3.2 / 5** + +The repo is strong as a deterministic local library and service-boundary core. +It is not yet production-operational because the external adapters are fakes, +durability semantics are basic, service bindings are framework-neutral shapes +rather than deployable endpoints, and evaluation coverage is still narrow. + +## Dimension Scorecard + +| Dimension | Score | Target | Evidence | Needed Next | +| --- | ---: | ---: | --- | --- | +| Intent and boundaries | 4.4 | 5.0 | `INTENT.md`, `SCOPE.md`, `README.md`, architecture docs, adjacent-repo boundary docs | Keep docs current as live adapters and service bindings clarify real ownership. | +| Package and API foundation | 4.2 | 4.5 | Python package, public exports, runtime facade, CLI, service config, dependency-light tests | Add API stability notes and compatibility checks for public exports. | +| Markitect profile contract ingress | 3.7 | 4.5 | Profile loading, diagnostics, runtime envelopes, profile-derived config, local alias normalization | Add richer compatibility fixtures and schema drift diagnostics. | +| Graph and event ingress | 3.7 | 4.5 | Graph loading, endpoint diagnostics, event model, JSONL log, export, repair checks, fake graph/event adapters | Add broader malformed/large graph fixtures and migration repair coverage. | +| Phase domain model | 3.5 | 4.5 | Phases, lifecycle states, actions, paths, retention rules, profile-derived transition rules | Add migration semantics for profile/rule changes over durable stores. | +| Profile execution planning | 4.0 | 4.5 | Adapter plan, capabilities, policy gates, fallback behavior, config-driven local/external resolution | Add compatibility gates for live adapter packs. | +| Lifecycle planning and apply | 3.6 | 4.5 | Dry-run lifecycle plans, profile rules, review-gated local apply | Add service `lifecycle.apply` handling, migration semantics, and better apply audit queries. | +| Activation planning | 3.8 | 4.8 | Budgeted activation, selections, package request, graph neighborhoods, paths, ranking, metrics | Wire semantic-index-assisted retrieval and expand evaluation corpora. | +| Local persistence | 3.2 | 4.5 | File-backed graph store, JSONL event log, audit sink, export, repair diagnostics | Add atomic writes, schema migration, compaction/retention utilities, and stronger corruption recovery. | +| Policy, review, and audit | 3.5 | 5.0 | Operation points, review records, audit schema, denials, redaction, fake external policy/audit adapters | Add audit query service, retention policy behavior, and live policy adapter boundary. | +| Observability and operations | 3.3 | 4.8 | Health report, config diagnostics, adapter status, fake telemetry audit sink | Add metrics/event export, retention diagnostics, and deployable health/readiness binding. | +| Markitect interop | 3.7 | 4.5 | Local validation, package request/response envelopes, fake compiler | Add optional live Markitect compiler adapter and contract compatibility suite. | +| Kontextual/Infospace interop | 3.1 | 4.5 | Delegation envelope, fake runtime registry, activation quality report fixture | Add live/fake delegation scenarios and broader Infospace restart reports. | +| Testing and evaluation | 3.8 | 4.5 | 60 deterministic tests over runtime, CLI, adapters, policy, activation, lifecycle, service, fakes | Add multi-profile/multi-graph evaluation corpus and regression thresholds. | +| Service readiness | 3.9 | 4.8 | Service contracts, local runner, health, config, adapter conformance, fake pack | Implement missing service operations and optional framework binding. | +| Developer experience | 3.8 | 4.5 | README, package map, CLI examples, persistence/policy/interop/service/lifecycle/fake-pack docs | Add troubleshooting, examples, and end-to-end recipes. | + +## Assessment + +The project has a credible core. The runtime envelopes, policy/review model, +profile-derived configuration, lifecycle rules, local persistence, fake +external pack, and conformance helpers form a solid integration boundary. + +The biggest optimization opportunity is not another broad feature burst. It is +closing the gap between declared contracts and runnable operational behavior: +the service contract advertises operations that the local runner only partly +handles, persistence needs migration/durability semantics, and evaluation needs +more than one small fixture family. + +## Recommended Refinement Workplan + +Create and execute `PMEM-WP-0011`: refinement hardening and operational +readiness. + +Highest-value tasks: + +- Bring service runner parity to the published operation catalog: + `package.compile`, `lifecycle.apply`, and `audit.query`. +- Add local-store schema migration and repair hardening, including atomic write + behavior and migration diagnostics. +- Expand evaluation fixtures across multiple profiles, graph shapes, policies, + lifecycle rules, and activation budgets. +- Add live-adapter readiness manifests so fake and future live packs can be + tested by the same compatibility suite. +- Add audit query and retention semantics that make policy/audit behavior + inspectable after runtime operations. +- Improve DX with troubleshooting, end-to-end recipes, and API compatibility + notes. + +## Score Movement Gates + +Move overall score to **4.0** when: + +- Service runner handles every operation in `SERVICE_OPERATIONS`. +- Audit query and lifecycle apply are covered through service contracts. +- Local persistence has migration diagnostics. +- Evaluation fixtures cover at least three profile/graph families. + +Move overall score to **4.3+** when: + +- Live optional Markitect or Kontextual adapter can be used behind the same + conformance suite as the fake pack. +- Operational docs include a deployable service binding or a clear embedding + recipe. + +Move overall score to **4.7+** only when: + +- Live adapter behavior, telemetry, audit retention, migration, and evaluation + gates are all exercised by repeatable tests or documented operator drills. diff --git a/workplans/PMEM-WP-0011-refinement-hardening-and-operational-readiness.md b/workplans/PMEM-WP-0011-refinement-hardening-and-operational-readiness.md new file mode 100644 index 0000000..a20bdce --- /dev/null +++ b/workplans/PMEM-WP-0011-refinement-hardening-and-operational-readiness.md @@ -0,0 +1,165 @@ +--- +id: PMEM-WP-0011 +type: workplan +title: "Refinement Hardening And Operational Readiness" +domain: markitect +repo: phase-memory +status: ready +owner: codex +topic_slug: phase-memory +created: "2026-05-18" +updated: "2026-05-18" +state_hub_workstream_id: "a427c05f-0ff5-49d1-b719-7dfd4f1f8571" +--- + +# PMEM-WP-0011: Refinement Hardening And Operational Readiness + +## Goal + +Close the gap exposed by the refined maturity scorecard: the local core is +integration-ready, but the operational surface still needs service parity, +durability/migration semantics, richer evaluation, and adapter compatibility +gates. + +## Current Evidence + +The repo now has: + +- deterministic runtime envelopes and CLI behavior; +- file-backed graph/event/audit adapters; +- policy, review, and activation denials; +- Markitect package bridge envelopes; +- profile-derived runtime config and lifecycle rules; +- service contracts, health checks, and adapter conformance helpers; +- fake external adapter packs. + +The refined scorecard in `docs/maturity-scorecard.md` scores the project at +**3.8 / 5** overall, with stronger local integration maturity than operational +maturity. + +## Non-Goals + +- Build production-hosted services in this repo. +- Add network credentials or live service dependencies to default tests. +- Replace fake adapter packs with mandatory live adapters. +- Expand beyond phase-memory's ownership boundary. + +## T01 - Bring service runner to contract parity + +```task +id: PMEM-WP-0011-T01 +status: todo +priority: high +state_hub_task_id: "2b3c6eb4-8d3f-4c73-ab53-74e1bed8b93f" +``` + +Implement local service runner handling for every operation in +`SERVICE_OPERATIONS`, especially `package.compile`, `lifecycle.apply`, and +`audit.query`. + +Acceptance: + +- Every operation in `service_contracts()` has a local runner path or explicit + unsupported diagnostic. +- Tests cover successful `package.compile`, review-gated `lifecycle.apply`, + and audit querying. + +## T02 - Harden local persistence migration and repair + +```task +id: PMEM-WP-0011-T02 +status: todo +priority: high +state_hub_task_id: "2c19cfb0-e147-40b8-b964-6c617bddb90e" +``` + +Add migration and repair semantics for the local file-backed store. + +Acceptance: + +- Store metadata can declare schema versions and planned migrations. +- Repair diagnostics distinguish corruption, missing references, and migration + needs. +- Writes are made safer through atomic write behavior where practical. + +## T03 - Expand evaluation fixtures and gates + +```task +id: PMEM-WP-0011-T03 +status: todo +priority: high +state_hub_task_id: "cdce1c6a-4581-4184-87c6-f7bec6c3fcbd" +``` + +Broaden activation/lifecycle evaluation beyond the single primary fixture +family. + +Acceptance: + +- Add at least three profile/graph scenario families. +- Cover policy-denied activation, profile lifecycle rules, event-path + activation, semantic-index hints, and budget pressure. +- Add deterministic report fixtures or threshold assertions. + +## T04 - Add adapter pack compatibility manifests + +```task +id: PMEM-WP-0011-T04 +status: todo +priority: medium +state_hub_task_id: "602c22bb-d440-4d38-a51f-bf6ed504fd1e" +``` + +Define manifest metadata for fake and future live adapter packs. + +Acceptance: + +- Adapter packs can declare capabilities, ownership boundaries, and required + conformance helpers. +- Fake pack tests use the manifest rather than only direct class assertions. +- Missing capability diagnostics are explicit. + +## T05 - Make policy/audit retention inspectable + +```task +id: PMEM-WP-0011-T05 +status: todo +priority: medium +state_hub_task_id: "c4fa6001-b20c-4ec1-b885-af9b80c832de" +``` + +Add query and retention behavior around audit records. + +Acceptance: + +- Audit sink query behavior is exposed through runtime/service paths. +- Retention metadata is visible and testable. +- Review and denial audit records can be traced after operations. + +## T06 - Improve operational developer experience + +```task +id: PMEM-WP-0011-T06 +status: todo +priority: medium +state_hub_task_id: "f4674eaf-cbc1-4eac-b1d1-b07ae51289cf" +``` + +Add end-to-end recipes, troubleshooting, and API compatibility notes. + +Acceptance: + +- README links to the refined maturity scorecard. +- Docs include a local end-to-end recipe from profile/graph import to + lifecycle, activation, package compile, audit query, and health. +- Public API compatibility expectations are documented. + +## Acceptance Criteria + +- Refined maturity blockers have concrete executable coverage. +- The scorecard can move from 3.8 toward 4.0 based on behavior, not optimism. +- StateHub has this refinement workplan as the next actionable PMEM workstream. + +## Closure Review + +Pending implementation.