Files
phase-memory/docs/operator-readiness-runbook.md

208 lines
6.0 KiB
Markdown

# Operator Readiness Runbook
Updated: 2026-05-19
This runbook covers the operational path for `phase-memory` without requiring
credentials in the default test suite.
## Modes
| Mode | Purpose | Credentials | Network |
| --- | --- | --- | --- |
| Local fixture | Default deterministic runtime and tests. | No | No |
| Live-shaped | Adapter manifests and behavior that model live services locally. | No | No |
| Credentialed live drill | Operator-provided smoke drill for real endpoints. | Yes, via env only | Optional |
Credentialed drills require:
- `PHASE_MEMORY_MARKITECT_URL`
- `PHASE_MEMORY_MARKITECT_TOKEN`
- `PHASE_MEMORY_KONTEXTUAL_URL`
- `PHASE_MEMORY_KONTEXTUAL_TOKEN`
Do not store those values in Git, workplans, progress logs, or release notes.
## Service Startup
The deployable stdlib entrypoint is `phase-memory-service`.
Readiness check without listening:
```bash
phase-memory-service --check --store .phase-memory-local
```
Start the stdlib WSGI service:
```bash
phase-memory-service --host 127.0.0.1 --port 8080 --store .phase-memory-local
```
Routes:
- `GET /health`
- `GET /ready`
- `GET /contracts`
- `POST /operations/{operation}`
- `POST /operations` with `{"operation": "...", "payload": {...}}`
## Readiness Checks
Before accepting traffic:
1. Run `phase-memory-service --check`.
2. Verify `/ready` reports `ok: true`.
3. Verify `unsupported_operations` is empty.
4. Verify adapter diagnostics have no `error` severity.
5. Verify the public API snapshot test passes after any operation/export change.
## Migration Apply
Plan and apply local-store metadata migrations through the runtime:
```python
from phase_memory import RuntimeConfig, runtime_from_config
config = RuntimeConfig(local_store_path=".phase-memory-local")
runtime = runtime_from_config(config)
plan = runtime.plan_store_migration(source_ref=config.local_store_path)
result = runtime.apply_store_migration(
plan["data"]["migration_plan"],
actor="operator",
source_ref=config.local_store_path,
)
```
Expected:
- no `error` diagnostics in the plan;
- `result["valid"] is True`;
- metadata is updated atomically;
- `audit.query` can find the `store.migration.apply` event.
Rollback:
- stop the service;
- restore the previous local store directory from backup;
- rerun `phase-memory-service --check`;
- rerun `runtime.repair_diagnostics()`.
## Audit Export And Retention
Plan retention:
```python
plan = runtime.audit_retention_plan(retention_days=30)
```
Apply retention:
```python
result = runtime.apply_audit_retention(plan["plan"])
```
Expected:
- eligible operation ids are pruned;
- `audit.retention.apply` is recorded after pruning;
- no retention apply happens when the sink reports unsupported behavior.
Export a trace batch:
```python
export = runtime.export_audit_events({"operation": "package.compile"})
```
Use export batches for operator review, not as a credential or secret store.
## Credentialed Drill
Run the credentialed smoke test only from an operator environment:
```bash
PHASE_MEMORY_MARKITECT_URL=... \
PHASE_MEMORY_MARKITECT_TOKEN=... \
PHASE_MEMORY_KONTEXTUAL_URL=... \
PHASE_MEMORY_KONTEXTUAL_TOKEN=... \
python3 -m pytest tests/test_credentialed_drills.py
```
The report redacts tokens and uses a credential fingerprint rather than
persisting secrets.
Persist a redacted operator report from the same environment:
```python
from phase_memory import write_credentialed_operator_report
write_credentialed_operator_report("reports/credentialed-operator-report.json")
```
Run the credentialed telemetry retention drill when an operator has approved
using the local fixture path or the required credentials are present:
```python
from phase_memory import credentialed_telemetry_retention_drill
report = credentialed_telemetry_retention_drill(operator_approved_fixture=True)
```
The drill records old and new audit events, plans retention, applies pruning,
and reports retained/pruned operation ids without storing credential values.
## Managed Deployment Manifest
Build and validate a deployment manifest before handing it to platform-specific
packaging:
```python
from phase_memory import managed_deployment_manifest, validate_managed_deployment_manifest
from phase_memory import ServiceAppConfig
manifest = managed_deployment_manifest(
ServiceAppConfig(host="0.0.0.0", port=8080, local_store_path="/var/lib/phase-memory")
)
validation = validate_managed_deployment_manifest(manifest)
```
Required manifest features:
- `phase-memory-service` command entrypoint;
- `/health` liveness probe;
- `/ready` readiness probe;
- writable local-store mount;
- rollback checks that include `phase-memory-service --check` and
`runtime.repair_diagnostics`.
## Evaluation Trend History
Persist trend artifacts into a history file after evaluation runs:
```python
from phase_memory import write_evaluation_trend_history
history = write_evaluation_trend_history("reports/evaluation-trend-history.json", trend)
```
Repeated writes of the same trend id do not duplicate the run.
## Troubleshooting Matrix
| Category | Diagnostic | Operator action |
| --- | --- | --- |
| Credentials | `credential_env_missing` | Set the four credential environment variables in the drill shell; do not write them to files. |
| Readiness | `unsupported_operation` | Run service contract and public API snapshot tests, then update dispatch or release notes. |
| Migrations | `store_migration_unsupported` | Use a file-backed local store or run repair diagnostics before accepting traffic. |
| Audit retention | `audit_retention_apply_unsupported` | Switch to a JSONL or telemetry audit sink with retention support, then rerun the retention drill. |
| Adapter manifest | `adapter_pack_manifest_invalid` | Regenerate and validate the adapter pack manifest before using the pack. |
## Compatibility Release Discipline
When public exports or service operations change:
1. Update `tests/fixtures/public-api-snapshot.json`.
2. Fill in `docs/release-note-template.md`.
3. Call out changed exports, changed service operations, migration needs, and
operator action.
4. Link the workplan or decision that authorized the change.