Files
phase-memory/docs/maturity-scorecard.md

6.9 KiB

Phase Memory Maturity Scorecard

Updated: 2026-05-18

Purpose

This scorecard tracks progress toward INTENT.md: a profile-driven, phase-aware memory infrastructure layer for agentic systems.

The original scorecard treated roadmap closure and fake external adapters as near-operational maturity. The refined scoring below is stricter: fake adapters prove wiring and contracts, but live durability, migration, telemetry, service bindings, and broader evaluation corpora are still needed before scoring close to 5.

Scoring Model

Score Meaning
0 Not started.
1 Intent or docs only.
2 Deterministic local library behavior with tests.
3 Usable runtime or CLI behavior with stable envelopes.
4 Integration-ready local service boundary with policy, persistence, interop, and conformance coverage.
5 Operationally mature with live adapter implementations, migrations, telemetry, retention, service bindings, and evaluation gates.

Current Score

Overall maturity: 3.8 / 5

Two sub-scores make the result easier to reason about:

  • Local integration maturity: 4.1 / 5
  • Operational maturity: 3.2 / 5

The repo is strong as a deterministic local library and service-boundary core. It is not yet production-operational because the external adapters are fakes, durability semantics are basic, service bindings are framework-neutral shapes rather than deployable endpoints, and evaluation coverage is still narrow.

Dimension Scorecard

Dimension Score Target Evidence Needed Next
Intent and boundaries 4.4 5.0 INTENT.md, SCOPE.md, README.md, architecture docs, adjacent-repo boundary docs Keep docs current as live adapters and service bindings clarify real ownership.
Package and API foundation 4.2 4.5 Python package, public exports, runtime facade, CLI, service config, dependency-light tests Add API stability notes and compatibility checks for public exports.
Markitect profile contract ingress 3.7 4.5 Profile loading, diagnostics, runtime envelopes, profile-derived config, local alias normalization Add richer compatibility fixtures and schema drift diagnostics.
Graph and event ingress 3.7 4.5 Graph loading, endpoint diagnostics, event model, JSONL log, export, repair checks, fake graph/event adapters Add broader malformed/large graph fixtures and migration repair coverage.
Phase domain model 3.5 4.5 Phases, lifecycle states, actions, paths, retention rules, profile-derived transition rules Add migration semantics for profile/rule changes over durable stores.
Profile execution planning 4.0 4.5 Adapter plan, capabilities, policy gates, fallback behavior, config-driven local/external resolution Add compatibility gates for live adapter packs.
Lifecycle planning and apply 3.6 4.5 Dry-run lifecycle plans, profile rules, review-gated local apply Add service lifecycle.apply handling, migration semantics, and better apply audit queries.
Activation planning 3.8 4.8 Budgeted activation, selections, package request, graph neighborhoods, paths, ranking, metrics Wire semantic-index-assisted retrieval and expand evaluation corpora.
Local persistence 3.2 4.5 File-backed graph store, JSONL event log, audit sink, export, repair diagnostics Add atomic writes, schema migration, compaction/retention utilities, and stronger corruption recovery.
Policy, review, and audit 3.5 5.0 Operation points, review records, audit schema, denials, redaction, fake external policy/audit adapters Add audit query service, retention policy behavior, and live policy adapter boundary.
Observability and operations 3.3 4.8 Health report, config diagnostics, adapter status, fake telemetry audit sink Add metrics/event export, retention diagnostics, and deployable health/readiness binding.
Markitect interop 3.7 4.5 Local validation, package request/response envelopes, fake compiler Add optional live Markitect compiler adapter and contract compatibility suite.
Kontextual/Infospace interop 3.1 4.5 Delegation envelope, fake runtime registry, activation quality report fixture Add live/fake delegation scenarios and broader Infospace restart reports.
Testing and evaluation 3.8 4.5 60 deterministic tests over runtime, CLI, adapters, policy, activation, lifecycle, service, fakes Add multi-profile/multi-graph evaluation corpus and regression thresholds.
Service readiness 3.9 4.8 Service contracts, local runner, health, config, adapter conformance, fake pack Implement missing service operations and optional framework binding.
Developer experience 3.8 4.5 README, package map, CLI examples, persistence/policy/interop/service/lifecycle/fake-pack docs Add troubleshooting, examples, and end-to-end recipes.

Assessment

The project has a credible core. The runtime envelopes, policy/review model, profile-derived configuration, lifecycle rules, local persistence, fake external pack, and conformance helpers form a solid integration boundary.

The biggest optimization opportunity is not another broad feature burst. It is closing the gap between declared contracts and runnable operational behavior: the service contract advertises operations that the local runner only partly handles, persistence needs migration/durability semantics, and evaluation needs more than one small fixture family.

Create and execute PMEM-WP-0011: refinement hardening and operational readiness.

Highest-value tasks:

  • Bring service runner parity to the published operation catalog: package.compile, lifecycle.apply, and audit.query.
  • Add local-store schema migration and repair hardening, including atomic write behavior and migration diagnostics.
  • Expand evaluation fixtures across multiple profiles, graph shapes, policies, lifecycle rules, and activation budgets.
  • Add live-adapter readiness manifests so fake and future live packs can be tested by the same compatibility suite.
  • Add audit query and retention semantics that make policy/audit behavior inspectable after runtime operations.
  • Improve DX with troubleshooting, end-to-end recipes, and API compatibility notes.

Score Movement Gates

Move overall score to 4.0 when:

  • Service runner handles every operation in SERVICE_OPERATIONS.
  • Audit query and lifecycle apply are covered through service contracts.
  • Local persistence has migration diagnostics.
  • Evaluation fixtures cover at least three profile/graph families.

Move overall score to 4.3+ when:

  • Live optional Markitect or Kontextual adapter can be used behind the same conformance suite as the fake pack.
  • Operational docs include a deployable service binding or a clear embedding recipe.

Move overall score to 4.7+ only when:

  • Live adapter behavior, telemetry, audit retention, migration, and evaluation gates are all exercised by repeatable tests or documented operator drills.