Files
phase-memory/docs/maturity-scorecard.md

9.1 KiB

Phase Memory Maturity Scorecard

Updated: 2026-05-19

Purpose

This scorecard tracks progress toward INTENT.md: a profile-driven, phase-aware memory infrastructure layer for agentic systems.

The original scorecard treated roadmap closure and fake external adapters as near-operational maturity. The refined scoring below is stricter: fake adapters prove wiring and contracts, but live durability, migration, telemetry, service bindings, and broader evaluation corpora are still needed before scoring close to 5.

Scoring Model

Score Meaning
0 Not started.
1 Intent or docs only.
2 Deterministic local library behavior with tests.
3 Usable runtime or CLI behavior with stable envelopes.
4 Integration-ready local service boundary with policy, persistence, interop, and conformance coverage.
5 Operationally mature with live adapter implementations, migrations, telemetry, retention, service bindings, and evaluation gates.

Current Score

Overall maturity: 4.3 / 5

Two sub-scores make the result easier to reason about:

  • Local integration maturity: 4.6 / 5
  • Operational maturity: 4.0 / 5

The repo is strong as a deterministic local library and service-boundary core. It is not yet production-operational because adapter coverage is still credential-gated rather than continuously exercised against live services, and service packaging is stdlib/local rather than deployed to a managed environment.

Dimension Scorecard

Dimension Score Target Evidence Needed Next
Intent and boundaries 4.4 5.0 INTENT.md, SCOPE.md, README.md, architecture docs, adjacent-repo boundary docs Keep docs current as live adapters and service bindings clarify real ownership.
Package and API foundation 4.6 4.8 Python package, public exports, runtime facade, CLI, service runner export, service config, dependency-light tests, public API snapshot, release-note template Add compatibility migration examples from a real release.
Markitect profile contract ingress 3.7 4.5 Profile loading, diagnostics, runtime envelopes, profile-derived config, local alias normalization Add richer compatibility fixtures and schema drift diagnostics.
Graph and event ingress 4.0 4.5 Graph loading, endpoint diagnostics, event model, JSONL log, export, repair checks, corrupt-record diagnostics, fake and live-shaped graph/event adapters Add broader malformed/large graph fixtures and operator repair utilities.
Phase domain model 3.5 4.5 Phases, lifecycle states, actions, paths, retention rules, profile-derived transition rules Add migration semantics for profile/rule changes over durable stores.
Profile execution planning 4.3 4.5 Adapter plan, capabilities, policy gates, fallback behavior, config-driven local/external resolution, adapter pack manifests, live-shaped compatibility gates Add compatibility gates for credentialed live adapter packs.
Lifecycle planning and apply 4.1 4.5 Dry-run lifecycle plans, profile rules, review-gated local apply, service lifecycle.apply, apply audit/export queries Add richer apply rollback and repair drills.
Activation planning 4.0 4.8 Budgeted activation, selections, package request, graph neighborhoods, paths, ranking, metrics, multi-scenario evaluation fixtures Wire semantic-index-assisted retrieval into runtime planning.
Local persistence 4.0 4.5 File-backed graph store, JSONL event log, audit sink, atomic JSON writes, executable metadata migrations, migration audit, export, repair diagnostics Add compaction/retention utilities and stronger corruption recovery.
Policy, review, and audit 4.4 5.0 Operation points, review records, audit schema, queryable/exportable audit sinks, retention plans and apply, denials, redaction, fake/live-shaped policy/audit adapters Add live policy adapter boundary and credentialed telemetry pruning drill.
Observability and operations 4.3 4.8 Health report, readiness report, config diagnostics, adapter status, service binding, stdlib service entrypoint, operator runbook, fake/live-shaped telemetry audit sinks Add metrics/event export to external telemetry and managed deployment packaging.
Markitect interop 4.1 4.5 Local validation, package request/response envelopes, fake/live-shaped compiler fixtures, credential-gated drill contract Add credentialed Markitect compiler execution and schema drift suite.
Kontextual/Infospace interop 3.9 4.5 Delegation envelope, fake/live-shaped runtime registry, credential-gated drill contract, activation quality report fixture, adapter compatibility manifests Add credentialed Kontextual execution and broader Infospace restart reports.
Testing and evaluation 4.5 4.7 Deterministic tests over runtime, CLI, adapters, policy, activation, lifecycle, service, fakes, live-shaped packs, credential skip gates, API snapshots, evaluation threshold and trend reports Add larger regression corpus and persisted trend history.
Service readiness 4.6 4.8 Service contracts, full local runner parity, framework-neutral service binding, WSGI adapter, stdlib service entrypoint, health/readiness, config, adapter conformance Add managed deployment packaging.
Developer experience 4.5 4.7 README, package map, CLI examples, persistence/policy/interop/service/lifecycle/fake-pack docs, operational recipe, operator runbook, API compatibility docs, release-note template Add troubleshooting matrix from real operator feedback.

Assessment

The project has crossed the local integration-readiness threshold. The runtime envelopes, policy/review model, profile-derived configuration, lifecycle rules, local persistence migrations, queryable/exportable/prunable audit path, fake and live-shaped external pack manifests, credential-gated drills, service binding and stdlib entrypoint, API snapshots, release discipline, and conformance helpers form a solid integration boundary.

The biggest optimization opportunity is now the next operational layer: running the credential-gated drills against real services, adding managed deployment packaging, and growing evaluation trends into a historical corpus.

Completed Refinement Workplan

PMEM-WP-0011 moved the score from 3.8 to 4.0 by adding:

  • full local service runner parity for SERVICE_OPERATIONS;
  • service-covered package.compile, lifecycle.apply, and audit.query;
  • queryable audit sinks with retention metadata;
  • local-store atomic JSON writes, migration diagnostics, and corrupt-record repair diagnostics;
  • three evaluation scenario families covering policy denial, lifecycle rules, event-path activation, semantic-index hints, and budget pressure;
  • adapter pack manifests and explicit missing-capability diagnostics;
  • an operational end-to-end recipe.

PMEM-WP-0012 moved the score from 4.0 to 4.2 by adding:

  • framework-neutral ServiceBinding and WSGI adapter tests without starting a listener;
  • executable local-store migration planning/apply behavior with audit traces;
  • live-shaped Markitect/Kontextual/telemetry adapter fixtures behind the same manifest and conformance contract;
  • audit retention plans and export batches;
  • evaluation threshold reports over the scenario corpus;
  • public API and service operation compatibility snapshots.

PMEM-WP-0013 moved the score from 4.2 to 4.3 by adding:

  • credential-gated adapter drill helpers and skipped smoke tests that list required environment variables;
  • stdlib phase-memory-service packaging with check mode and WSGI dispatch;
  • operator readiness runbook for service startup, migrations, audit retention, credentialed drills, and rollback;
  • audit retention apply behavior with audit trace coverage;
  • evaluation trend artifacts with threshold and regression deltas;
  • release-note template gating for public API snapshot changes.

Create and execute PMEM-WP-0014: live credential execution and managed deployment hardening.

Highest-value tasks:

  • Run the credential-gated drills against real Markitect/Kontextual endpoints in an operator environment.
  • Add managed deployment packaging and readiness probes.
  • Persist evaluation trend reports across runs.
  • Add credentialed telemetry export and retention pruning drills.
  • Expand troubleshooting from actual operator feedback.

Score Movement Gates

Achieved overall score 4.0 when:

  • Service runner handles every operation in SERVICE_OPERATIONS.
  • Audit query and lifecycle apply are covered through service contracts.
  • Local persistence has migration diagnostics.
  • Evaluation fixtures cover at least three profile/graph families.

Achieved overall score 4.3+ when:

  • Credentialed optional Markitect or Kontextual adapter smoke drills are available behind the same conformance suite as the fake/live-shaped packs and skip cleanly without credentials.
  • Operational docs include deployable service packaging and an operator readiness runbook.

Move overall score to 4.7+ only when:

  • Live adapter behavior, telemetry, audit retention, migration, and evaluation gates are all exercised by repeatable tests or documented operator drills.