phase-memory/docs/maturity-scorecard.md

# Phase Memory Maturity Scorecard

Updated: 2026-05-19

## Purpose

This scorecard tracks progress toward `INTENT.md`: a profile-driven,
phase-aware memory infrastructure layer for agentic systems.

The original scorecard treated roadmap closure and fake external adapters as
near-operational maturity. The refined scoring below is stricter: fake adapters
prove wiring and contracts, but live durability, migration, telemetry, service
bindings, and broader evaluation corpora are still needed before scoring close
to 5.

## Scoring Model

| Score | Meaning |
| --- | --- |
| 0 | Not started. |
| 1 | Intent or docs only. |
| 2 | Deterministic local library behavior with tests. |
| 3 | Usable runtime or CLI behavior with stable envelopes. |
| 4 | Integration-ready local service boundary with policy, persistence, interop, and conformance coverage. |
| 5 | Operationally mature with live adapter implementations, migrations, telemetry, retention, service bindings, and evaluation gates. |

## Current Score

Overall maturity: **4.4 / 5**

Two sub-scores make the result easier to reason about:

- Local integration maturity: **4.7 / 5**
- Operational maturity: **4.2 / 5**

The repo is strong as a deterministic local library and service-boundary core.
It now has credential-safe operator artifacts, managed deployment manifest
validation, persisted evaluation trend histories, and a troubleshooting matrix.
It is not yet production-operational because real endpoint and managed platform
evidence still requires an approved operator environment.

## Dimension Scorecard

| Dimension | Score | Target | Evidence | Needed Next |
| --- | ---: | ---: | --- | --- |
| Intent and boundaries | 4.4 | 5.0 | `INTENT.md`, `SCOPE.md`, `README.md`, architecture docs, adjacent-repo boundary docs | Keep docs current as live adapters and service bindings clarify real ownership. |
| Package and API foundation | 4.7 | 4.8 | Python package, public exports, runtime facade, CLI, service runner export, service config, deployment/troubleshooting helpers, dependency-light tests, public API snapshot, release-note template | Add compatibility migration examples from a real release. |
| Markitect profile contract ingress | 3.7 | 4.5 | Profile loading, diagnostics, runtime envelopes, profile-derived config, local alias normalization | Add richer compatibility fixtures and schema drift diagnostics. |
| Graph and event ingress | 4.0 | 4.5 | Graph loading, endpoint diagnostics, event model, JSONL log, export, repair checks, corrupt-record diagnostics, fake and live-shaped graph/event adapters | Add broader malformed/large graph fixtures and operator repair utilities. |
| Phase domain model | 3.5 | 4.5 | Phases, lifecycle states, actions, paths, retention rules, profile-derived transition rules | Add migration semantics for profile/rule changes over durable stores. |
| Profile execution planning | 4.3 | 4.5 | Adapter plan, capabilities, policy gates, fallback behavior, config-driven local/external resolution, adapter pack manifests, live-shaped compatibility gates | Add compatibility gates for credentialed live adapter packs. |
| Lifecycle planning and apply | 4.1 | 4.5 | Dry-run lifecycle plans, profile rules, review-gated local apply, service `lifecycle.apply`, apply audit/export queries | Add richer apply rollback and repair drills. |
| Activation planning | 4.0 | 4.8 | Budgeted activation, selections, package request, graph neighborhoods, paths, ranking, metrics, multi-scenario evaluation fixtures | Wire semantic-index-assisted retrieval into runtime planning. |
| Local persistence | 4.0 | 4.5 | File-backed graph store, JSONL event log, audit sink, atomic JSON writes, executable metadata migrations, migration audit, export, repair diagnostics | Add compaction/retention utilities and stronger corruption recovery. |
| Policy, review, and audit | 4.5 | 5.0 | Operation points, review records, audit schema, queryable/exportable audit sinks, retention plans and apply, denials, redaction, fake/live-shaped policy/audit adapters, credential-safe telemetry retention drill | Add live policy adapter boundary and external telemetry pruning evidence. |
| Observability and operations | 4.5 | 4.8 | Health report, readiness report, config diagnostics, adapter status, service binding, stdlib service entrypoint, managed deployment manifest validation, operator runbook, fake/live-shaped telemetry audit sinks | Pilot the managed package in an operator deployment target. |
| Markitect interop | 4.2 | 4.5 | Local validation, package request/response envelopes, fake/live-shaped compiler fixtures, credential-gated drill contract, redacted operator reports | Add credentialed Markitect compiler execution and schema drift suite. |
| Kontextual/Infospace interop | 4.0 | 4.5 | Delegation envelope, fake/live-shaped runtime registry, credential-gated drill contract, redacted operator reports, activation quality report fixture, adapter compatibility manifests | Add credentialed Kontextual execution and broader Infospace restart reports. |
| Testing and evaluation | 4.6 | 4.7 | Deterministic tests over runtime, CLI, adapters, policy, activation, lifecycle, service, fakes, live-shaped packs, credential skip gates, API snapshots, evaluation threshold/trend reports, persisted trend history | Add larger regression corpus and make trend history a release gate. |
| Service readiness | 4.7 | 4.8 | Service contracts, full local runner parity, framework-neutral service binding, WSGI adapter, stdlib service entrypoint, health/readiness, config, adapter conformance, managed deployment manifest validation | Pilot managed deployment packaging on the target platform. |
| Developer experience | 4.6 | 4.7 | README, package map, CLI examples, persistence/policy/interop/service/lifecycle/fake-pack docs, operational recipe, operator runbook, API compatibility docs, release-note template, troubleshooting matrix | Refine troubleshooting from real operator feedback. |

## Assessment

The project has crossed the local integration-readiness threshold. The runtime
envelopes, policy/review model, profile-derived configuration, lifecycle rules,
local persistence migrations, queryable/exportable/prunable audit path, fake
and live-shaped external pack manifests, credential-gated drills, service
binding and stdlib entrypoint, API snapshots, release discipline, and
conformance helpers form a solid integration boundary.

The biggest optimization opportunity is now evidence, not scaffolding:
run the credentialed reports against real services, pilot the managed manifest
on a target platform, and make persisted trend history part of the operator
release gate.

## Completed Refinement Workplan

`PMEM-WP-0011` moved the score from 3.8 to 4.0 by adding:

- full local service runner parity for `SERVICE_OPERATIONS`;
- service-covered `package.compile`, `lifecycle.apply`, and `audit.query`;
- queryable audit sinks with retention metadata;
- local-store atomic JSON writes, migration diagnostics, and corrupt-record
  repair diagnostics;
- three evaluation scenario families covering policy denial, lifecycle rules,
  event-path activation, semantic-index hints, and budget pressure;
- adapter pack manifests and explicit missing-capability diagnostics;
- an operational end-to-end recipe.

`PMEM-WP-0012` moved the score from 4.0 to 4.2 by adding:

- framework-neutral `ServiceBinding` and WSGI adapter tests without starting a
  listener;
- executable local-store migration planning/apply behavior with audit traces;
- live-shaped Markitect/Kontextual/telemetry adapter fixtures behind the same
  manifest and conformance contract;
- audit retention plans and export batches;
- evaluation threshold reports over the scenario corpus;
- public API and service operation compatibility snapshots.

`PMEM-WP-0013` moved the score from 4.2 to 4.3 by adding:

- credential-gated adapter drill helpers and skipped smoke tests that list
  required environment variables;
- stdlib `phase-memory-service` packaging with check mode and WSGI dispatch;
- operator readiness runbook for service startup, migrations, audit retention,
  credentialed drills, and rollback;
- audit retention apply behavior with audit trace coverage;
- evaluation trend artifacts with threshold and regression deltas;
- release-note template gating for public API snapshot changes.

`PMEM-WP-0014` moved the score from 4.3 to 4.4 by adding:

- credential-safe operator reports with token and endpoint redaction;
- credentialed telemetry retention drill coverage through live-shaped or
  operator-approved fixture paths;
- managed deployment manifest generation and validation for service entrypoint,
  probes, rollback, replicas, and local-store mounts;
- deterministic persisted evaluation trend history;
- operator troubleshooting matrix coverage for credential, readiness,
  migration, audit retention, and adapter-manifest failures.

## Recommended Next Refinement

Create and execute `PMEM-WP-0015`: credentialed live pilot and deployment
evidence.

Highest-value tasks:

- Run the redacted credentialed report against real Markitect/Kontextual
  endpoints in an operator environment.
- Pilot the managed deployment manifest on the target platform.
- Capture external telemetry retention evidence.
- Promote trend history into a release/regression gate.
- Refine troubleshooting from actual operator feedback.

## Score Movement Gates

Achieved overall score **4.0** when:

- Service runner handles every operation in `SERVICE_OPERATIONS`.
- Audit query and lifecycle apply are covered through service contracts.
- Local persistence has migration diagnostics.
- Evaluation fixtures cover at least three profile/graph families.

Achieved overall score **4.3+** when:

- Credentialed optional Markitect or Kontextual adapter smoke drills are
  available behind the same conformance suite as the fake/live-shaped packs and
  skip cleanly without credentials.
- Operational docs include deployable service packaging and an operator
  readiness runbook.

Achieved overall score **4.4+** when:

- Credentialed operator report artifacts redact credential values and endpoint
  URLs.
- Managed deployment manifest validation covers service entrypoint, probes,
  rollback, replicas, and store mounts.
- Evaluation trend artifacts can be persisted into deterministic history.
- Troubleshooting docs map common operator diagnostics to actions.

Move overall score to **4.7+** only when:

- Live adapter behavior, telemetry, audit retention, migration, and evaluation
  gates are all exercised by repeatable tests or documented operator drills.