generated from coulomb/repo-seed
208 lines
6.0 KiB
Markdown
208 lines
6.0 KiB
Markdown
# Operator Readiness Runbook
|
|
|
|
Updated: 2026-05-19
|
|
|
|
This runbook covers the operational path for `phase-memory` without requiring
|
|
credentials in the default test suite.
|
|
|
|
## Modes
|
|
|
|
| Mode | Purpose | Credentials | Network |
|
|
| --- | --- | --- | --- |
|
|
| Local fixture | Default deterministic runtime and tests. | No | No |
|
|
| Live-shaped | Adapter manifests and behavior that model live services locally. | No | No |
|
|
| Credentialed live drill | Operator-provided smoke drill for real endpoints. | Yes, via env only | Optional |
|
|
|
|
Credentialed drills require:
|
|
|
|
- `PHASE_MEMORY_MARKITECT_URL`
|
|
- `PHASE_MEMORY_MARKITECT_TOKEN`
|
|
- `PHASE_MEMORY_KONTEXTUAL_URL`
|
|
- `PHASE_MEMORY_KONTEXTUAL_TOKEN`
|
|
|
|
Do not store those values in Git, workplans, progress logs, or release notes.
|
|
|
|
## Service Startup
|
|
|
|
The deployable stdlib entrypoint is `phase-memory-service`.
|
|
|
|
Readiness check without listening:
|
|
|
|
```bash
|
|
phase-memory-service --check --store .phase-memory-local
|
|
```
|
|
|
|
Start the stdlib WSGI service:
|
|
|
|
```bash
|
|
phase-memory-service --host 127.0.0.1 --port 8080 --store .phase-memory-local
|
|
```
|
|
|
|
Routes:
|
|
|
|
- `GET /health`
|
|
- `GET /ready`
|
|
- `GET /contracts`
|
|
- `POST /operations/{operation}`
|
|
- `POST /operations` with `{"operation": "...", "payload": {...}}`
|
|
|
|
## Readiness Checks
|
|
|
|
Before accepting traffic:
|
|
|
|
1. Run `phase-memory-service --check`.
|
|
2. Verify `/ready` reports `ok: true`.
|
|
3. Verify `unsupported_operations` is empty.
|
|
4. Verify adapter diagnostics have no `error` severity.
|
|
5. Verify the public API snapshot test passes after any operation/export change.
|
|
|
|
## Migration Apply
|
|
|
|
Plan and apply local-store metadata migrations through the runtime:
|
|
|
|
```python
|
|
from phase_memory import RuntimeConfig, runtime_from_config
|
|
|
|
config = RuntimeConfig(local_store_path=".phase-memory-local")
|
|
runtime = runtime_from_config(config)
|
|
plan = runtime.plan_store_migration(source_ref=config.local_store_path)
|
|
result = runtime.apply_store_migration(
|
|
plan["data"]["migration_plan"],
|
|
actor="operator",
|
|
source_ref=config.local_store_path,
|
|
)
|
|
```
|
|
|
|
Expected:
|
|
|
|
- no `error` diagnostics in the plan;
|
|
- `result["valid"] is True`;
|
|
- metadata is updated atomically;
|
|
- `audit.query` can find the `store.migration.apply` event.
|
|
|
|
Rollback:
|
|
|
|
- stop the service;
|
|
- restore the previous local store directory from backup;
|
|
- rerun `phase-memory-service --check`;
|
|
- rerun `runtime.repair_diagnostics()`.
|
|
|
|
## Audit Export And Retention
|
|
|
|
Plan retention:
|
|
|
|
```python
|
|
plan = runtime.audit_retention_plan(retention_days=30)
|
|
```
|
|
|
|
Apply retention:
|
|
|
|
```python
|
|
result = runtime.apply_audit_retention(plan["plan"])
|
|
```
|
|
|
|
Expected:
|
|
|
|
- eligible operation ids are pruned;
|
|
- `audit.retention.apply` is recorded after pruning;
|
|
- no retention apply happens when the sink reports unsupported behavior.
|
|
|
|
Export a trace batch:
|
|
|
|
```python
|
|
export = runtime.export_audit_events({"operation": "package.compile"})
|
|
```
|
|
|
|
Use export batches for operator review, not as a credential or secret store.
|
|
|
|
## Credentialed Drill
|
|
|
|
Run the credentialed smoke test only from an operator environment:
|
|
|
|
```bash
|
|
PHASE_MEMORY_MARKITECT_URL=... \
|
|
PHASE_MEMORY_MARKITECT_TOKEN=... \
|
|
PHASE_MEMORY_KONTEXTUAL_URL=... \
|
|
PHASE_MEMORY_KONTEXTUAL_TOKEN=... \
|
|
python3 -m pytest tests/test_credentialed_drills.py
|
|
```
|
|
|
|
The report redacts tokens and uses a credential fingerprint rather than
|
|
persisting secrets.
|
|
|
|
Persist a redacted operator report from the same environment:
|
|
|
|
```python
|
|
from phase_memory import write_credentialed_operator_report
|
|
|
|
write_credentialed_operator_report("reports/credentialed-operator-report.json")
|
|
```
|
|
|
|
Run the credentialed telemetry retention drill when an operator has approved
|
|
using the local fixture path or the required credentials are present:
|
|
|
|
```python
|
|
from phase_memory import credentialed_telemetry_retention_drill
|
|
|
|
report = credentialed_telemetry_retention_drill(operator_approved_fixture=True)
|
|
```
|
|
|
|
The drill records old and new audit events, plans retention, applies pruning,
|
|
and reports retained/pruned operation ids without storing credential values.
|
|
|
|
## Managed Deployment Manifest
|
|
|
|
Build and validate a deployment manifest before handing it to platform-specific
|
|
packaging:
|
|
|
|
```python
|
|
from phase_memory import managed_deployment_manifest, validate_managed_deployment_manifest
|
|
from phase_memory import ServiceAppConfig
|
|
|
|
manifest = managed_deployment_manifest(
|
|
ServiceAppConfig(host="0.0.0.0", port=8080, local_store_path="/var/lib/phase-memory")
|
|
)
|
|
validation = validate_managed_deployment_manifest(manifest)
|
|
```
|
|
|
|
Required manifest features:
|
|
|
|
- `phase-memory-service` command entrypoint;
|
|
- `/health` liveness probe;
|
|
- `/ready` readiness probe;
|
|
- writable local-store mount;
|
|
- rollback checks that include `phase-memory-service --check` and
|
|
`runtime.repair_diagnostics`.
|
|
|
|
## Evaluation Trend History
|
|
|
|
Persist trend artifacts into a history file after evaluation runs:
|
|
|
|
```python
|
|
from phase_memory import write_evaluation_trend_history
|
|
|
|
history = write_evaluation_trend_history("reports/evaluation-trend-history.json", trend)
|
|
```
|
|
|
|
Repeated writes of the same trend id do not duplicate the run.
|
|
|
|
## Troubleshooting Matrix
|
|
|
|
| Category | Diagnostic | Operator action |
|
|
| --- | --- | --- |
|
|
| Credentials | `credential_env_missing` | Set the four credential environment variables in the drill shell; do not write them to files. |
|
|
| Readiness | `unsupported_operation` | Run service contract and public API snapshot tests, then update dispatch or release notes. |
|
|
| Migrations | `store_migration_unsupported` | Use a file-backed local store or run repair diagnostics before accepting traffic. |
|
|
| Audit retention | `audit_retention_apply_unsupported` | Switch to a JSONL or telemetry audit sink with retention support, then rerun the retention drill. |
|
|
| Adapter manifest | `adapter_pack_manifest_invalid` | Regenerate and validate the adapter pack manifest before using the pack. |
|
|
|
|
## Compatibility Release Discipline
|
|
|
|
When public exports or service operations change:
|
|
|
|
1. Update `tests/fixtures/public-api-snapshot.json`.
|
|
2. Fill in `docs/release-note-template.md`.
|
|
3. Call out changed exports, changed service operations, migration needs, and
|
|
operator action.
|
|
4. Link the workplan or decision that authorized the change.
|