Add credential routing advisories via warden route/access, live pilot evidence helpers, managed deployment pilot probes, evaluation trend regression gates, and expanded troubleshooting. Update operator runbook and maturity scorecard.
8.3 KiB
Operator Readiness Runbook
Updated: 2026-07-02
This runbook covers the operational path for phase-memory without requiring
credentials in the default test suite.
Modes
| Mode | Purpose | Credentials | Network |
|---|---|---|---|
| Local fixture | Default deterministic runtime and tests. | No | No |
| Live-shaped | Adapter manifests and behavior that model live services locally. | No | No |
| Credentialed live drill | Operator-provided smoke drill for real endpoints. | Yes, via env only | Optional |
Credentialed drills require:
PHASE_MEMORY_MARKITECT_URLPHASE_MEMORY_MARKITECT_TOKENPHASE_MEMORY_KONTEXTUAL_URLPHASE_MEMORY_KONTEXTUAL_TOKEN
Obtain credentials through ops-warden routing — ops-warden does not vend secret values:
warden route find "phase-memory markitect kontextual api token" --json
warden access "phase-memory markitect kontextual api token" --json
Export the returned values into the drill shell only. Do not store those values in Git, workplans, progress logs, or release notes.
Service Startup
The deployable stdlib entrypoint is phase-memory-service.
Readiness check without listening:
phase-memory-service --check --store .phase-memory-local
Start the stdlib WSGI service:
phase-memory-service --host 127.0.0.1 --port 8080 --store .phase-memory-local
Routes:
GET /healthGET /readyGET /contractsPOST /operations/{operation}POST /operationswith{"operation": "...", "payload": {...}}
Readiness Checks
Before accepting traffic:
- Run
phase-memory-service --check. - Verify
/readyreportsok: true. - Verify
unsupported_operationsis empty. - Verify adapter diagnostics have no
errorseverity. - Verify the public API snapshot test passes after any operation/export change.
Migration Apply
Plan and apply local-store metadata migrations through the runtime:
from phase_memory import RuntimeConfig, runtime_from_config
config = RuntimeConfig(local_store_path=".phase-memory-local")
runtime = runtime_from_config(config)
plan = runtime.plan_store_migration(source_ref=config.local_store_path)
result = runtime.apply_store_migration(
plan["data"]["migration_plan"],
actor="operator",
source_ref=config.local_store_path,
)
Expected:
- no
errordiagnostics in the plan; result["valid"] is True;- metadata is updated atomically;
audit.querycan find thestore.migration.applyevent.
Rollback:
- stop the service;
- restore the previous local store directory from backup;
- rerun
phase-memory-service --check; - rerun
runtime.repair_diagnostics().
Audit Export And Retention
Plan retention:
plan = runtime.audit_retention_plan(retention_days=30)
Apply retention:
result = runtime.apply_audit_retention(plan["plan"])
Expected:
- eligible operation ids are pruned;
audit.retention.applyis recorded after pruning;- no retention apply happens when the sink reports unsupported behavior.
Export a trace batch:
export = runtime.export_audit_events({"operation": "package.compile"})
Use export batches for operator review, not as a credential or secret store.
Credentialed Drill
Resolve credential routing before running live drills:
from phase_memory import resolve_credentialed_environ, warden_credential_routing_advisory
advisory = warden_credential_routing_advisory()
status = resolve_credentialed_environ()
Run the credentialed smoke test only from an operator environment:
PHASE_MEMORY_MARKITECT_URL=... \
PHASE_MEMORY_MARKITECT_TOKEN=... \
PHASE_MEMORY_KONTEXTUAL_URL=... \
PHASE_MEMORY_KONTEXTUAL_TOKEN=... \
python3 -m pytest tests/test_credentialed_drills.py
The report redacts tokens and uses a credential fingerprint rather than persisting secrets.
Persist a redacted operator report from the same environment:
from phase_memory import write_credentialed_operator_report
write_credentialed_operator_report("reports/credentialed-operator-report.json")
Run the credentialed telemetry retention drill when an operator has approved using the local fixture path or the required credentials are present:
from phase_memory import credentialed_telemetry_retention_drill
report = credentialed_telemetry_retention_drill(operator_approved_fixture=True)
The drill records old and new audit events, plans retention, applies pruning, and reports retained/pruned operation ids without storing credential values.
Live Pilot Evidence
Collect credential-safe pilot artifacts for operator review:
from phase_memory import write_live_pilot_evidence
write_live_pilot_evidence("reports/live-pilot", environ=os.environ)
Artifacts include:
live-pilot-report.json— aggregate pilot status and live_evidence flagscredentialed-operator-report.json— redacted smoke reportmanaged-deployment-pilot.json— manifest validation and probe resultstelemetry-retention-evidence.json— retention apply audit traceevaluation-trend-history.json— persisted trend artifactsevaluation-regression-gate.json— operator regression gatecredential-routing-advisory.json— ops-warden routing without secrets
Managed Deployment Manifest
Build and validate a deployment manifest before handing it to platform-specific packaging:
from phase_memory import managed_deployment_manifest, validate_managed_deployment_manifest
from phase_memory import ServiceAppConfig
manifest = managed_deployment_manifest(
ServiceAppConfig(host="0.0.0.0", port=8080, local_store_path="/var/lib/phase-memory")
)
validation = validate_managed_deployment_manifest(manifest)
Required manifest features:
phase-memory-servicecommand entrypoint;/healthliveness probe;/readyreadiness probe;- writable local-store mount;
- rollback checks that include
phase-memory-service --checkandruntime.repair_diagnostics.
Evaluation Trend History
Persist trend artifacts into a history file after evaluation runs:
from phase_memory import write_evaluation_trend_history
history = write_evaluation_trend_history("reports/evaluation-trend-history.json", trend)
Repeated writes of the same trend id do not duplicate the run.
Gate promotion on evaluation regressions:
from phase_memory import evaluation_trend_regression_gate, load_evaluation_trend_history
history = load_evaluation_trend_history("reports/evaluation-trend-history.json")
gate = evaluation_trend_regression_gate(history)
Compare the latest artifact metrics in evaluation-trend-history.json against
the previous run id. Block promotion when metric_regressions or
threshold_failures are non-empty.
Troubleshooting Matrix
| Category | Diagnostic | Operator action |
|---|---|---|
| Credentials | credential_env_missing |
Set the four credential environment variables in the drill shell; do not write them to files. |
| Readiness | unsupported_operation |
Run service contract and public API snapshot tests, then update dispatch or release notes. |
| Migrations | store_migration_unsupported |
Use a file-backed local store or run repair diagnostics before accepting traffic. |
| Audit retention | audit_retention_apply_unsupported |
Switch to a JSONL or telemetry audit sink with retention support, then rerun the retention drill. |
| Adapter manifest | adapter_pack_manifest_invalid |
Regenerate and validate the adapter pack manifest before using the pack. |
| Credential routing | warden_cli_unavailable |
Install warden from ops-warden, then run warden route find before exporting PHASE_MEMORY_* variables. |
| Deployment | managed_deployment_probe_failed |
Run phase-memory-service --check and validate managed deployment manifest probes before promotion. |
| Evaluation | evaluation_metric_regressed |
Compare latest and previous trend artifacts; inspect scenario diagnostics before release. |
| Pilot | pilot_credentialed_env_missing |
Obtain credentials through ops-warden routing and rerun write_live_pilot_evidence. |
Compatibility Release Discipline
When public exports or service operations change:
- Update
tests/fixtures/public-api-snapshot.json. - Fill in
docs/release-note-template.md. - Call out changed exports, changed service operations, migration needs, and operator action.
- Link the workplan or decision that authorized the change.