Implement PMEM-WP-0015 credentialed live pilot with ops-warden routing.

Add credential routing advisories via warden route/access, live pilot evidence helpers, managed deployment pilot probes, evaluation trend regression gates, and expanded troubleshooting. Update operator runbook and maturity scorecard.
2026-07-02 23:24:35 +02:00
parent bff90ec1ed
commit 29f893b905
15 changed files with 913 additions and 38 deletions
--- a/docs/maturity-scorecard.md
+++ b/docs/maturity-scorecard.md
@@ -1,6 +1,6 @@
 # Phase Memory Maturity Scorecard

-Updated: 2026-05-19
+Updated: 2026-07-02

 ## Purpose

@@ -26,18 +26,19 @@ to 5.

 ## Current Score

-Overall maturity: **4.4 / 5**
+Overall maturity: **4.5 / 5**

 Two sub-scores make the result easier to reason about:

 - Local integration maturity: **4.7 / 5**
- Operational maturity: **4.2 / 5**
+- Operational maturity: **4.4 / 5** (tooling verified; live endpoint evidence optional)

 The repo is strong as a deterministic local library and service-boundary core.
-It now has credential-safe operator artifacts, managed deployment manifest
-validation, persisted evaluation trend histories, and a troubleshooting matrix.
-It is not yet production-operational because real endpoint and managed platform
-evidence still requires an approved operator environment.
+It now has ops-warden credential routing advisories, live pilot evidence
+helpers, managed deployment pilot probes, evaluation trend regression gates,
+and an expanded troubleshooting matrix. Verified live endpoint and managed
+platform evidence still depends on an approved operator environment running
+`write_live_pilot_evidence` with real credentials.

 ## Dimension Scorecard

@@ -53,12 +54,12 @@ evidence still requires an approved operator environment.
 | Activation planning | 4.0 | 4.8 | Budgeted activation, selections, package request, graph neighborhoods, paths, ranking, metrics, multi-scenario evaluation fixtures | Wire semantic-index-assisted retrieval into runtime planning. |
 | Local persistence | 4.0 | 4.5 | File-backed graph store, JSONL event log, audit sink, atomic JSON writes, executable metadata migrations, migration audit, export, repair diagnostics | Add compaction/retention utilities and stronger corruption recovery. |
 | Policy, review, and audit | 4.5 | 5.0 | Operation points, review records, audit schema, queryable/exportable audit sinks, retention plans and apply, denials, redaction, fake/live-shaped policy/audit adapters, credential-safe telemetry retention drill | Add live policy adapter boundary and external telemetry pruning evidence. |
-| Observability and operations | 4.5 | 4.8 | Health report, readiness report, config diagnostics, adapter status, service binding, stdlib service entrypoint, managed deployment manifest validation, operator runbook, fake/live-shaped telemetry audit sinks | Pilot the managed package in an operator deployment target. |
+| Observability and operations | 4.6 | 4.8 | Health report, readiness report, config diagnostics, adapter status, service binding, stdlib service entrypoint, managed deployment manifest validation, managed deployment pilot probes, live pilot evidence helpers, ops-warden credential routing advisories, operator runbook, fake/live-shaped telemetry audit sinks | Collect verified live pilot evidence on the operator deployment target. |
 | Markitect interop | 4.2 | 4.5 | Local validation, package request/response envelopes, fake/live-shaped compiler fixtures, credential-gated drill contract, redacted operator reports | Add credentialed Markitect compiler execution and schema drift suite. |
 | Kontextual/Infospace interop | 4.0 | 4.5 | Delegation envelope, fake/live-shaped runtime registry, credential-gated drill contract, redacted operator reports, activation quality report fixture, adapter compatibility manifests | Add credentialed Kontextual execution and broader Infospace restart reports. |
-| Testing and evaluation | 4.6 | 4.7 | Deterministic tests over runtime, CLI, adapters, policy, activation, lifecycle, service, fakes, live-shaped packs, credential skip gates, API snapshots, evaluation threshold/trend reports, persisted trend history | Add larger regression corpus and make trend history a release gate. |
+| Testing and evaluation | 4.7 | 4.8 | Deterministic tests over runtime, CLI, adapters, policy, activation, lifecycle, service, fakes, live-shaped packs, credential skip gates, API snapshots, evaluation threshold/trend reports, persisted trend history, evaluation trend regression gate | Add larger regression corpus and verified live trend history from operator runs. |
 | Service readiness | 4.7 | 4.8 | Service contracts, full local runner parity, framework-neutral service binding, WSGI adapter, stdlib service entrypoint, health/readiness, config, adapter conformance, managed deployment manifest validation | Pilot managed deployment packaging on the target platform. |
-| Developer experience | 4.6 | 4.7 | README, package map, CLI examples, persistence/policy/interop/service/lifecycle/fake-pack docs, operational recipe, operator runbook, API compatibility docs, release-note template, troubleshooting matrix | Refine troubleshooting from real operator feedback. |
+| Developer experience | 4.7 | 4.8 | README, package map, CLI examples, persistence/policy/interop/service/lifecycle/fake-pack docs, operational recipe, operator runbook with ops-warden routing and live pilot workflow, API compatibility docs, release-note template, expanded troubleshooting matrix | Refine troubleshooting from verified live operator feedback. |

 ## Assessment

@@ -69,10 +70,10 @@ and live-shaped external pack manifests, credential-gated drills, service
 binding and stdlib entrypoint, API snapshots, release discipline, and
 conformance helpers form a solid integration boundary.

-The biggest optimization opportunity is now evidence, not scaffolding:
-run the credentialed reports against real services, pilot the managed manifest
-on a target platform, and make persisted trend history part of the operator
-release gate.
+The biggest optimization opportunity is now verified live evidence, not
+scaffolding: run `write_live_pilot_evidence` with credentials obtained through
+ops-warden routing on the target platform and promote only when the evaluation
+regression gate passes.

 ## Completed Refinement Workplan

@@ -121,19 +122,23 @@ release gate.
 - operator troubleshooting matrix coverage for credential, readiness,
  migration, audit retention, and adapter-manifest failures.

+`PMEM-WP-0015` moved the score from 4.4 to 4.5 by adding:
+
+- ops-warden credential routing advisories that never persist secret values;
+- `write_live_pilot_evidence` for credential-safe pilot artifact collection;
+- managed deployment pilot probes for `/health` and `/ready` without a listener;
+- evaluation trend regression gate helpers for operator release review;
+- troubleshooting rows for credential routing, deployment, evaluation, and pilot
+  failure modes.
+
 ## Recommended Next Refinement

-Create and execute `PMEM-WP-0015`: credentialed live pilot and deployment
-evidence.
+Collect verified live pilot evidence on the operator deployment target:

-Highest-value tasks:
-
- Run the redacted credentialed report against real Markitect/Kontextual
-  endpoints in an operator environment.
- Pilot the managed deployment manifest on the target platform.
- Capture external telemetry retention evidence.
- Promote trend history into a release/regression gate.
- Refine troubleshooting from actual operator feedback.
+- Run `write_live_pilot_evidence` with credentials obtained via
+  `warden access`.
+- Confirm managed deployment probes on the target platform, not only locally.
+- Archive redacted pilot artifacts through normal repo progress channels.

 ## Score Movement Gates

@@ -164,4 +169,5 @@ Achieved overall score **4.4+** when:
 Move overall score to **4.7+** only when:

 - Live adapter behavior, telemetry, audit retention, migration, and evaluation
-  gates are all exercised by repeatable tests or documented operator drills.
+  gates are all exercised by repeatable tests or documented operator drills
+  with verified live evidence, not only local pilot tooling.
--- a/docs/operator-readiness-runbook.md
+++ b/docs/operator-readiness-runbook.md
@@ -1,6 +1,6 @@
 # Operator Readiness Runbook

-Updated: 2026-05-19
+Updated: 2026-07-02

 This runbook covers the operational path for `phase-memory` without requiring
 credentials in the default test suite.
@@ -20,7 +20,16 @@ Credentialed drills require:
 - `PHASE_MEMORY_KONTEXTUAL_URL`
 - `PHASE_MEMORY_KONTEXTUAL_TOKEN`

-Do not store those values in Git, workplans, progress logs, or release notes.
+Obtain credentials through ops-warden routing — ops-warden does not vend
+secret values:
+
+```bash
+warden route find "phase-memory markitect kontextual api token" --json
+warden access "phase-memory markitect kontextual api token" --json
+```
+
+Export the returned values into the drill shell only. Do not store those values
+in Git, workplans, progress logs, or release notes.

 ## Service Startup

@@ -117,6 +126,15 @@ Use export batches for operator review, not as a credential or secret store.

 ## Credentialed Drill

+Resolve credential routing before running live drills:
+
+```python
+from phase_memory import resolve_credentialed_environ, warden_credential_routing_advisory
+
+advisory = warden_credential_routing_advisory()
+status = resolve_credentialed_environ()
+```
+
 Run the credentialed smoke test only from an operator environment:

 ```bash
@@ -150,6 +168,26 @@ report = credentialed_telemetry_retention_drill(operator_approved_fixture=True)
 The drill records old and new audit events, plans retention, applies pruning,
 and reports retained/pruned operation ids without storing credential values.

+## Live Pilot Evidence
+
+Collect credential-safe pilot artifacts for operator review:
+
+```python
+from phase_memory import write_live_pilot_evidence
+
+write_live_pilot_evidence("reports/live-pilot", environ=os.environ)
+```
+
+Artifacts include:
+
+- `live-pilot-report.json` — aggregate pilot status and live_evidence flags
+- `credentialed-operator-report.json` — redacted smoke report
+- `managed-deployment-pilot.json` — manifest validation and probe results
+- `telemetry-retention-evidence.json` — retention apply audit trace
+- `evaluation-trend-history.json` — persisted trend artifacts
+- `evaluation-regression-gate.json` — operator regression gate
+- `credential-routing-advisory.json` — ops-warden routing without secrets
+
 ## Managed Deployment Manifest

 Build and validate a deployment manifest before handing it to platform-specific
@@ -186,6 +224,19 @@ history = write_evaluation_trend_history("reports/evaluation-trend-history.json"

 Repeated writes of the same trend id do not duplicate the run.

+Gate promotion on evaluation regressions:
+
+```python
+from phase_memory import evaluation_trend_regression_gate, load_evaluation_trend_history
+
+history = load_evaluation_trend_history("reports/evaluation-trend-history.json")
+gate = evaluation_trend_regression_gate(history)
+```
+
+Compare the latest artifact metrics in `evaluation-trend-history.json` against
+the previous run id. Block promotion when `metric_regressions` or
+`threshold_failures` are non-empty.
+
 ## Troubleshooting Matrix

 | Category | Diagnostic | Operator action |
@@ -195,6 +246,10 @@ Repeated writes of the same trend id do not duplicate the run.
 | Migrations | `store_migration_unsupported` | Use a file-backed local store or run repair diagnostics before accepting traffic. |
 | Audit retention | `audit_retention_apply_unsupported` | Switch to a JSONL or telemetry audit sink with retention support, then rerun the retention drill. |
 | Adapter manifest | `adapter_pack_manifest_invalid` | Regenerate and validate the adapter pack manifest before using the pack. |
+| Credential routing | `warden_cli_unavailable` | Install warden from ops-warden, then run `warden route find` before exporting PHASE_MEMORY_* variables. |
+| Deployment | `managed_deployment_probe_failed` | Run `phase-memory-service --check` and validate managed deployment manifest probes before promotion. |
+| Evaluation | `evaluation_metric_regressed` | Compare latest and previous trend artifacts; inspect scenario diagnostics before release. |
+| Pilot | `pilot_credentialed_env_missing` | Obtain credentials through ops-warden routing and rerun `write_live_pilot_evidence`. |

 ## Compatibility Release Discipline