Files
phase-memory/docs/maturity-scorecard.md

10 KiB

Phase Memory Maturity Scorecard

Updated: 2026-05-19

Purpose

This scorecard tracks progress toward INTENT.md: a profile-driven, phase-aware memory infrastructure layer for agentic systems.

The original scorecard treated roadmap closure and fake external adapters as near-operational maturity. The refined scoring below is stricter: fake adapters prove wiring and contracts, but live durability, migration, telemetry, service bindings, and broader evaluation corpora are still needed before scoring close to 5.

Scoring Model

Score Meaning
0 Not started.
1 Intent or docs only.
2 Deterministic local library behavior with tests.
3 Usable runtime or CLI behavior with stable envelopes.
4 Integration-ready local service boundary with policy, persistence, interop, and conformance coverage.
5 Operationally mature with live adapter implementations, migrations, telemetry, retention, service bindings, and evaluation gates.

Current Score

Overall maturity: 4.4 / 5

Two sub-scores make the result easier to reason about:

  • Local integration maturity: 4.7 / 5
  • Operational maturity: 4.2 / 5

The repo is strong as a deterministic local library and service-boundary core. It now has credential-safe operator artifacts, managed deployment manifest validation, persisted evaluation trend histories, and a troubleshooting matrix. It is not yet production-operational because real endpoint and managed platform evidence still requires an approved operator environment.

Dimension Scorecard

Dimension Score Target Evidence Needed Next
Intent and boundaries 4.4 5.0 INTENT.md, SCOPE.md, README.md, architecture docs, adjacent-repo boundary docs Keep docs current as live adapters and service bindings clarify real ownership.
Package and API foundation 4.7 4.8 Python package, public exports, runtime facade, CLI, service runner export, service config, deployment/troubleshooting helpers, dependency-light tests, public API snapshot, release-note template Add compatibility migration examples from a real release.
Markitect profile contract ingress 3.7 4.5 Profile loading, diagnostics, runtime envelopes, profile-derived config, local alias normalization Add richer compatibility fixtures and schema drift diagnostics.
Graph and event ingress 4.0 4.5 Graph loading, endpoint diagnostics, event model, JSONL log, export, repair checks, corrupt-record diagnostics, fake and live-shaped graph/event adapters Add broader malformed/large graph fixtures and operator repair utilities.
Phase domain model 3.5 4.5 Phases, lifecycle states, actions, paths, retention rules, profile-derived transition rules Add migration semantics for profile/rule changes over durable stores.
Profile execution planning 4.3 4.5 Adapter plan, capabilities, policy gates, fallback behavior, config-driven local/external resolution, adapter pack manifests, live-shaped compatibility gates Add compatibility gates for credentialed live adapter packs.
Lifecycle planning and apply 4.1 4.5 Dry-run lifecycle plans, profile rules, review-gated local apply, service lifecycle.apply, apply audit/export queries Add richer apply rollback and repair drills.
Activation planning 4.0 4.8 Budgeted activation, selections, package request, graph neighborhoods, paths, ranking, metrics, multi-scenario evaluation fixtures Wire semantic-index-assisted retrieval into runtime planning.
Local persistence 4.0 4.5 File-backed graph store, JSONL event log, audit sink, atomic JSON writes, executable metadata migrations, migration audit, export, repair diagnostics Add compaction/retention utilities and stronger corruption recovery.
Policy, review, and audit 4.5 5.0 Operation points, review records, audit schema, queryable/exportable audit sinks, retention plans and apply, denials, redaction, fake/live-shaped policy/audit adapters, credential-safe telemetry retention drill Add live policy adapter boundary and external telemetry pruning evidence.
Observability and operations 4.5 4.8 Health report, readiness report, config diagnostics, adapter status, service binding, stdlib service entrypoint, managed deployment manifest validation, operator runbook, fake/live-shaped telemetry audit sinks Pilot the managed package in an operator deployment target.
Markitect interop 4.2 4.5 Local validation, package request/response envelopes, fake/live-shaped compiler fixtures, credential-gated drill contract, redacted operator reports Add credentialed Markitect compiler execution and schema drift suite.
Kontextual/Infospace interop 4.0 4.5 Delegation envelope, fake/live-shaped runtime registry, credential-gated drill contract, redacted operator reports, activation quality report fixture, adapter compatibility manifests Add credentialed Kontextual execution and broader Infospace restart reports.
Testing and evaluation 4.6 4.7 Deterministic tests over runtime, CLI, adapters, policy, activation, lifecycle, service, fakes, live-shaped packs, credential skip gates, API snapshots, evaluation threshold/trend reports, persisted trend history Add larger regression corpus and make trend history a release gate.
Service readiness 4.7 4.8 Service contracts, full local runner parity, framework-neutral service binding, WSGI adapter, stdlib service entrypoint, health/readiness, config, adapter conformance, managed deployment manifest validation Pilot managed deployment packaging on the target platform.
Developer experience 4.6 4.7 README, package map, CLI examples, persistence/policy/interop/service/lifecycle/fake-pack docs, operational recipe, operator runbook, API compatibility docs, release-note template, troubleshooting matrix Refine troubleshooting from real operator feedback.

Assessment

The project has crossed the local integration-readiness threshold. The runtime envelopes, policy/review model, profile-derived configuration, lifecycle rules, local persistence migrations, queryable/exportable/prunable audit path, fake and live-shaped external pack manifests, credential-gated drills, service binding and stdlib entrypoint, API snapshots, release discipline, and conformance helpers form a solid integration boundary.

The biggest optimization opportunity is now evidence, not scaffolding: run the credentialed reports against real services, pilot the managed manifest on a target platform, and make persisted trend history part of the operator release gate.

Completed Refinement Workplan

PMEM-WP-0011 moved the score from 3.8 to 4.0 by adding:

  • full local service runner parity for SERVICE_OPERATIONS;
  • service-covered package.compile, lifecycle.apply, and audit.query;
  • queryable audit sinks with retention metadata;
  • local-store atomic JSON writes, migration diagnostics, and corrupt-record repair diagnostics;
  • three evaluation scenario families covering policy denial, lifecycle rules, event-path activation, semantic-index hints, and budget pressure;
  • adapter pack manifests and explicit missing-capability diagnostics;
  • an operational end-to-end recipe.

PMEM-WP-0012 moved the score from 4.0 to 4.2 by adding:

  • framework-neutral ServiceBinding and WSGI adapter tests without starting a listener;
  • executable local-store migration planning/apply behavior with audit traces;
  • live-shaped Markitect/Kontextual/telemetry adapter fixtures behind the same manifest and conformance contract;
  • audit retention plans and export batches;
  • evaluation threshold reports over the scenario corpus;
  • public API and service operation compatibility snapshots.

PMEM-WP-0013 moved the score from 4.2 to 4.3 by adding:

  • credential-gated adapter drill helpers and skipped smoke tests that list required environment variables;
  • stdlib phase-memory-service packaging with check mode and WSGI dispatch;
  • operator readiness runbook for service startup, migrations, audit retention, credentialed drills, and rollback;
  • audit retention apply behavior with audit trace coverage;
  • evaluation trend artifacts with threshold and regression deltas;
  • release-note template gating for public API snapshot changes.

PMEM-WP-0014 moved the score from 4.3 to 4.4 by adding:

  • credential-safe operator reports with token and endpoint redaction;
  • credentialed telemetry retention drill coverage through live-shaped or operator-approved fixture paths;
  • managed deployment manifest generation and validation for service entrypoint, probes, rollback, replicas, and local-store mounts;
  • deterministic persisted evaluation trend history;
  • operator troubleshooting matrix coverage for credential, readiness, migration, audit retention, and adapter-manifest failures.

Create and execute PMEM-WP-0015: credentialed live pilot and deployment evidence.

Highest-value tasks:

  • Run the redacted credentialed report against real Markitect/Kontextual endpoints in an operator environment.
  • Pilot the managed deployment manifest on the target platform.
  • Capture external telemetry retention evidence.
  • Promote trend history into a release/regression gate.
  • Refine troubleshooting from actual operator feedback.

Score Movement Gates

Achieved overall score 4.0 when:

  • Service runner handles every operation in SERVICE_OPERATIONS.
  • Audit query and lifecycle apply are covered through service contracts.
  • Local persistence has migration diagnostics.
  • Evaluation fixtures cover at least three profile/graph families.

Achieved overall score 4.3+ when:

  • Credentialed optional Markitect or Kontextual adapter smoke drills are available behind the same conformance suite as the fake/live-shaped packs and skip cleanly without credentials.
  • Operational docs include deployable service packaging and an operator readiness runbook.

Achieved overall score 4.4+ when:

  • Credentialed operator report artifacts redact credential values and endpoint URLs.
  • Managed deployment manifest validation covers service entrypoint, probes, rollback, replicas, and store mounts.
  • Evaluation trend artifacts can be persisted into deterministic history.
  • Troubleshooting docs map common operator diagnostics to actions.

Move overall score to 4.7+ only when:

  • Live adapter behavior, telemetry, audit retention, migration, and evaluation gates are all exercised by repeatable tests or documented operator drills.