Implement PMEM-WP-0015 credentialed live pilot with ops-warden routing.

Add credential routing advisories via warden route/access, live pilot evidence
helpers, managed deployment pilot probes, evaluation trend regression gates,
and expanded troubleshooting. Update operator runbook and maturity scorecard.
This commit is contained in:
2026-07-02 23:24:35 +02:00
parent bff90ec1ed
commit 29f893b905
15 changed files with 913 additions and 38 deletions

View File

@@ -1,6 +1,6 @@
# Phase Memory Maturity Scorecard
Updated: 2026-05-19
Updated: 2026-07-02
## Purpose
@@ -26,18 +26,19 @@ to 5.
## Current Score
Overall maturity: **4.4 / 5**
Overall maturity: **4.5 / 5**
Two sub-scores make the result easier to reason about:
- Local integration maturity: **4.7 / 5**
- Operational maturity: **4.2 / 5**
- Operational maturity: **4.4 / 5** (tooling verified; live endpoint evidence optional)
The repo is strong as a deterministic local library and service-boundary core.
It now has credential-safe operator artifacts, managed deployment manifest
validation, persisted evaluation trend histories, and a troubleshooting matrix.
It is not yet production-operational because real endpoint and managed platform
evidence still requires an approved operator environment.
It now has ops-warden credential routing advisories, live pilot evidence
helpers, managed deployment pilot probes, evaluation trend regression gates,
and an expanded troubleshooting matrix. Verified live endpoint and managed
platform evidence still depends on an approved operator environment running
`write_live_pilot_evidence` with real credentials.
## Dimension Scorecard
@@ -53,12 +54,12 @@ evidence still requires an approved operator environment.
| Activation planning | 4.0 | 4.8 | Budgeted activation, selections, package request, graph neighborhoods, paths, ranking, metrics, multi-scenario evaluation fixtures | Wire semantic-index-assisted retrieval into runtime planning. |
| Local persistence | 4.0 | 4.5 | File-backed graph store, JSONL event log, audit sink, atomic JSON writes, executable metadata migrations, migration audit, export, repair diagnostics | Add compaction/retention utilities and stronger corruption recovery. |
| Policy, review, and audit | 4.5 | 5.0 | Operation points, review records, audit schema, queryable/exportable audit sinks, retention plans and apply, denials, redaction, fake/live-shaped policy/audit adapters, credential-safe telemetry retention drill | Add live policy adapter boundary and external telemetry pruning evidence. |
| Observability and operations | 4.5 | 4.8 | Health report, readiness report, config diagnostics, adapter status, service binding, stdlib service entrypoint, managed deployment manifest validation, operator runbook, fake/live-shaped telemetry audit sinks | Pilot the managed package in an operator deployment target. |
| Observability and operations | 4.6 | 4.8 | Health report, readiness report, config diagnostics, adapter status, service binding, stdlib service entrypoint, managed deployment manifest validation, managed deployment pilot probes, live pilot evidence helpers, ops-warden credential routing advisories, operator runbook, fake/live-shaped telemetry audit sinks | Collect verified live pilot evidence on the operator deployment target. |
| Markitect interop | 4.2 | 4.5 | Local validation, package request/response envelopes, fake/live-shaped compiler fixtures, credential-gated drill contract, redacted operator reports | Add credentialed Markitect compiler execution and schema drift suite. |
| Kontextual/Infospace interop | 4.0 | 4.5 | Delegation envelope, fake/live-shaped runtime registry, credential-gated drill contract, redacted operator reports, activation quality report fixture, adapter compatibility manifests | Add credentialed Kontextual execution and broader Infospace restart reports. |
| Testing and evaluation | 4.6 | 4.7 | Deterministic tests over runtime, CLI, adapters, policy, activation, lifecycle, service, fakes, live-shaped packs, credential skip gates, API snapshots, evaluation threshold/trend reports, persisted trend history | Add larger regression corpus and make trend history a release gate. |
| Testing and evaluation | 4.7 | 4.8 | Deterministic tests over runtime, CLI, adapters, policy, activation, lifecycle, service, fakes, live-shaped packs, credential skip gates, API snapshots, evaluation threshold/trend reports, persisted trend history, evaluation trend regression gate | Add larger regression corpus and verified live trend history from operator runs. |
| Service readiness | 4.7 | 4.8 | Service contracts, full local runner parity, framework-neutral service binding, WSGI adapter, stdlib service entrypoint, health/readiness, config, adapter conformance, managed deployment manifest validation | Pilot managed deployment packaging on the target platform. |
| Developer experience | 4.6 | 4.7 | README, package map, CLI examples, persistence/policy/interop/service/lifecycle/fake-pack docs, operational recipe, operator runbook, API compatibility docs, release-note template, troubleshooting matrix | Refine troubleshooting from real operator feedback. |
| Developer experience | 4.7 | 4.8 | README, package map, CLI examples, persistence/policy/interop/service/lifecycle/fake-pack docs, operational recipe, operator runbook with ops-warden routing and live pilot workflow, API compatibility docs, release-note template, expanded troubleshooting matrix | Refine troubleshooting from verified live operator feedback. |
## Assessment
@@ -69,10 +70,10 @@ and live-shaped external pack manifests, credential-gated drills, service
binding and stdlib entrypoint, API snapshots, release discipline, and
conformance helpers form a solid integration boundary.
The biggest optimization opportunity is now evidence, not scaffolding:
run the credentialed reports against real services, pilot the managed manifest
on a target platform, and make persisted trend history part of the operator
release gate.
The biggest optimization opportunity is now verified live evidence, not
scaffolding: run `write_live_pilot_evidence` with credentials obtained through
ops-warden routing on the target platform and promote only when the evaluation
regression gate passes.
## Completed Refinement Workplan
@@ -121,19 +122,23 @@ release gate.
- operator troubleshooting matrix coverage for credential, readiness,
migration, audit retention, and adapter-manifest failures.
`PMEM-WP-0015` moved the score from 4.4 to 4.5 by adding:
- ops-warden credential routing advisories that never persist secret values;
- `write_live_pilot_evidence` for credential-safe pilot artifact collection;
- managed deployment pilot probes for `/health` and `/ready` without a listener;
- evaluation trend regression gate helpers for operator release review;
- troubleshooting rows for credential routing, deployment, evaluation, and pilot
failure modes.
## Recommended Next Refinement
Create and execute `PMEM-WP-0015`: credentialed live pilot and deployment
evidence.
Collect verified live pilot evidence on the operator deployment target:
Highest-value tasks:
- Run the redacted credentialed report against real Markitect/Kontextual
endpoints in an operator environment.
- Pilot the managed deployment manifest on the target platform.
- Capture external telemetry retention evidence.
- Promote trend history into a release/regression gate.
- Refine troubleshooting from actual operator feedback.
- Run `write_live_pilot_evidence` with credentials obtained via
`warden access`.
- Confirm managed deployment probes on the target platform, not only locally.
- Archive redacted pilot artifacts through normal repo progress channels.
## Score Movement Gates
@@ -164,4 +169,5 @@ Achieved overall score **4.4+** when:
Move overall score to **4.7+** only when:
- Live adapter behavior, telemetry, audit retention, migration, and evaluation
gates are all exercised by repeatable tests or documented operator drills.
gates are all exercised by repeatable tests or documented operator drills
with verified live evidence, not only local pilot tooling.

View File

@@ -1,6 +1,6 @@
# Operator Readiness Runbook
Updated: 2026-05-19
Updated: 2026-07-02
This runbook covers the operational path for `phase-memory` without requiring
credentials in the default test suite.
@@ -20,7 +20,16 @@ Credentialed drills require:
- `PHASE_MEMORY_KONTEXTUAL_URL`
- `PHASE_MEMORY_KONTEXTUAL_TOKEN`
Do not store those values in Git, workplans, progress logs, or release notes.
Obtain credentials through ops-warden routing — ops-warden does not vend
secret values:
```bash
warden route find "phase-memory markitect kontextual api token" --json
warden access "phase-memory markitect kontextual api token" --json
```
Export the returned values into the drill shell only. Do not store those values
in Git, workplans, progress logs, or release notes.
## Service Startup
@@ -117,6 +126,15 @@ Use export batches for operator review, not as a credential or secret store.
## Credentialed Drill
Resolve credential routing before running live drills:
```python
from phase_memory import resolve_credentialed_environ, warden_credential_routing_advisory
advisory = warden_credential_routing_advisory()
status = resolve_credentialed_environ()
```
Run the credentialed smoke test only from an operator environment:
```bash
@@ -150,6 +168,26 @@ report = credentialed_telemetry_retention_drill(operator_approved_fixture=True)
The drill records old and new audit events, plans retention, applies pruning,
and reports retained/pruned operation ids without storing credential values.
## Live Pilot Evidence
Collect credential-safe pilot artifacts for operator review:
```python
from phase_memory import write_live_pilot_evidence
write_live_pilot_evidence("reports/live-pilot", environ=os.environ)
```
Artifacts include:
- `live-pilot-report.json` — aggregate pilot status and live_evidence flags
- `credentialed-operator-report.json` — redacted smoke report
- `managed-deployment-pilot.json` — manifest validation and probe results
- `telemetry-retention-evidence.json` — retention apply audit trace
- `evaluation-trend-history.json` — persisted trend artifacts
- `evaluation-regression-gate.json` — operator regression gate
- `credential-routing-advisory.json` — ops-warden routing without secrets
## Managed Deployment Manifest
Build and validate a deployment manifest before handing it to platform-specific
@@ -186,6 +224,19 @@ history = write_evaluation_trend_history("reports/evaluation-trend-history.json"
Repeated writes of the same trend id do not duplicate the run.
Gate promotion on evaluation regressions:
```python
from phase_memory import evaluation_trend_regression_gate, load_evaluation_trend_history
history = load_evaluation_trend_history("reports/evaluation-trend-history.json")
gate = evaluation_trend_regression_gate(history)
```
Compare the latest artifact metrics in `evaluation-trend-history.json` against
the previous run id. Block promotion when `metric_regressions` or
`threshold_failures` are non-empty.
## Troubleshooting Matrix
| Category | Diagnostic | Operator action |
@@ -195,6 +246,10 @@ Repeated writes of the same trend id do not duplicate the run.
| Migrations | `store_migration_unsupported` | Use a file-backed local store or run repair diagnostics before accepting traffic. |
| Audit retention | `audit_retention_apply_unsupported` | Switch to a JSONL or telemetry audit sink with retention support, then rerun the retention drill. |
| Adapter manifest | `adapter_pack_manifest_invalid` | Regenerate and validate the adapter pack manifest before using the pack. |
| Credential routing | `warden_cli_unavailable` | Install warden from ops-warden, then run `warden route find` before exporting PHASE_MEMORY_* variables. |
| Deployment | `managed_deployment_probe_failed` | Run `phase-memory-service --check` and validate managed deployment manifest probes before promotion. |
| Evaluation | `evaluation_metric_regressed` | Compare latest and previous trend artifacts; inspect scenario diagnostics before release. |
| Pilot | `pilot_credentialed_env_missing` | Obtain credentials through ops-warden routing and rerun `write_live_pilot_evidence`. |
## Compatibility Release Discipline