Files
phase-memory/docs/maturity-scorecard.md

8.4 KiB

Phase Memory Maturity Scorecard

Updated: 2026-05-19

Purpose

This scorecard tracks progress toward INTENT.md: a profile-driven, phase-aware memory infrastructure layer for agentic systems.

The original scorecard treated roadmap closure and fake external adapters as near-operational maturity. The refined scoring below is stricter: fake adapters prove wiring and contracts, but live durability, migration, telemetry, service bindings, and broader evaluation corpora are still needed before scoring close to 5.

Scoring Model

Score Meaning
0 Not started.
1 Intent or docs only.
2 Deterministic local library behavior with tests.
3 Usable runtime or CLI behavior with stable envelopes.
4 Integration-ready local service boundary with policy, persistence, interop, and conformance coverage.
5 Operationally mature with live adapter implementations, migrations, telemetry, retention, service bindings, and evaluation gates.

Current Score

Overall maturity: 4.2 / 5

Two sub-scores make the result easier to reason about:

  • Local integration maturity: 4.5 / 5
  • Operational maturity: 3.8 / 5

The repo is strong as a deterministic local library and service-boundary core. It is not yet production-operational because adapter coverage is still live-shaped rather than credentialed live integration, and service bindings are framework-neutral embedding surfaces rather than a deployed service.

Dimension Scorecard

Dimension Score Target Evidence Needed Next
Intent and boundaries 4.4 5.0 INTENT.md, SCOPE.md, README.md, architecture docs, adjacent-repo boundary docs Keep docs current as live adapters and service bindings clarify real ownership.
Package and API foundation 4.5 4.8 Python package, public exports, runtime facade, CLI, service runner export, service config, dependency-light tests, public API snapshot Add release notes discipline and compatibility migration examples.
Markitect profile contract ingress 3.7 4.5 Profile loading, diagnostics, runtime envelopes, profile-derived config, local alias normalization Add richer compatibility fixtures and schema drift diagnostics.
Graph and event ingress 4.0 4.5 Graph loading, endpoint diagnostics, event model, JSONL log, export, repair checks, corrupt-record diagnostics, fake and live-shaped graph/event adapters Add broader malformed/large graph fixtures and operator repair utilities.
Phase domain model 3.5 4.5 Phases, lifecycle states, actions, paths, retention rules, profile-derived transition rules Add migration semantics for profile/rule changes over durable stores.
Profile execution planning 4.3 4.5 Adapter plan, capabilities, policy gates, fallback behavior, config-driven local/external resolution, adapter pack manifests, live-shaped compatibility gates Add compatibility gates for credentialed live adapter packs.
Lifecycle planning and apply 4.1 4.5 Dry-run lifecycle plans, profile rules, review-gated local apply, service lifecycle.apply, apply audit/export queries Add richer apply rollback and repair drills.
Activation planning 4.0 4.8 Budgeted activation, selections, package request, graph neighborhoods, paths, ranking, metrics, multi-scenario evaluation fixtures Wire semantic-index-assisted retrieval into runtime planning.
Local persistence 4.0 4.5 File-backed graph store, JSONL event log, audit sink, atomic JSON writes, executable metadata migrations, migration audit, export, repair diagnostics Add compaction/retention utilities and stronger corruption recovery.
Policy, review, and audit 4.2 5.0 Operation points, review records, audit schema, queryable/exportable audit sinks, retention plans, denials, redaction, fake/live-shaped policy/audit adapters Add live policy adapter boundary and enforceable audit retention pruning.
Observability and operations 4.0 4.8 Health report, readiness report, config diagnostics, adapter status, service binding, fake/live-shaped telemetry audit sinks, operational recipe Add metrics/event export to external telemetry and deployable service packaging.
Markitect interop 4.0 4.5 Local validation, package request/response envelopes, fake and live-shaped compiler fixtures Add optional credentialed Markitect compiler adapter and schema drift suite.
Kontextual/Infospace interop 3.7 4.5 Delegation envelope, fake and live-shaped runtime registry, activation quality report fixture, adapter compatibility manifests Add credentialed Kontextual adapter drill and broader Infospace restart reports.
Testing and evaluation 4.3 4.7 Deterministic tests over runtime, CLI, adapters, policy, activation, lifecycle, service, fakes, live-shaped packs, API snapshots, and evaluation threshold reports Add larger regression corpus and threshold trend reports.
Service readiness 4.5 4.8 Service contracts, full local runner parity, framework-neutral service binding, WSGI adapter, health/readiness, config, adapter conformance Add deployable packaging and operator readiness runbooks.
Developer experience 4.3 4.7 README, package map, CLI examples, persistence/policy/interop/service/lifecycle/fake-pack docs, operational recipe, API compatibility docs Add troubleshooting matrix and release note templates.

Assessment

The project has crossed the local integration-readiness threshold. The runtime envelopes, policy/review model, profile-derived configuration, lifecycle rules, local persistence migrations, queryable/exportable audit path, fake and live-shaped external pack manifests, service binding, API snapshots, and conformance helpers form a solid integration boundary.

The biggest optimization opportunity is now the next operational layer: moving from live-shaped local fixtures to credentialed live adapter drills, packaging the service binding for deployment, and growing evaluation thresholds into trend reports.

Completed Refinement Workplan

PMEM-WP-0011 moved the score from 3.8 to 4.0 by adding:

  • full local service runner parity for SERVICE_OPERATIONS;
  • service-covered package.compile, lifecycle.apply, and audit.query;
  • queryable audit sinks with retention metadata;
  • local-store atomic JSON writes, migration diagnostics, and corrupt-record repair diagnostics;
  • three evaluation scenario families covering policy denial, lifecycle rules, event-path activation, semantic-index hints, and budget pressure;
  • adapter pack manifests and explicit missing-capability diagnostics;
  • an operational end-to-end recipe.

PMEM-WP-0012 moved the score from 4.0 to 4.2 by adding:

  • framework-neutral ServiceBinding and WSGI adapter tests without starting a listener;
  • executable local-store migration planning/apply behavior with audit traces;
  • live-shaped Markitect/Kontextual/telemetry adapter fixtures behind the same manifest and conformance contract;
  • audit retention plans and export batches;
  • evaluation threshold reports over the scenario corpus;
  • public API and service operation compatibility snapshots.

Create and execute PMEM-WP-0013: credentialed adapter drills and deployment packaging.

Highest-value tasks:

  • Add optional credentialed Markitect/Kontextual adapter smoke drills that are skipped unless credentials are present.
  • Package the service binding as a deployable local service with operator readiness checks.
  • Add audit retention pruning and telemetry export enforcement.
  • Grow evaluation reporting into historical threshold trends.
  • Add release note and migration-note templates for compatibility changes.

Score Movement Gates

Achieved overall score 4.0 when:

  • Service runner handles every operation in SERVICE_OPERATIONS.
  • Audit query and lifecycle apply are covered through service contracts.
  • Local persistence has migration diagnostics.
  • Evaluation fixtures cover at least three profile/graph families.

Move overall score to 4.3+ when:

  • Credentialed optional Markitect or Kontextual adapter smoke drills run behind the same conformance suite as the fake/live-shaped packs.
  • Operational docs include deployable service packaging and an operator readiness runbook.

Move overall score to 4.7+ only when:

  • Live adapter behavior, telemetry, audit retention, migration, and evaluation gates are all exercised by repeatable tests or documented operator drills.