From a3129afa4209839ff3078ea61faed0ec80de7287 Mon Sep 17 00:00:00 2001 From: tegwick Date: Mon, 1 Jun 2026 23:38:26 +0200 Subject: [PATCH] Added PRD for what we want to do --- spec/ProductRequirementsDefinition.md | 624 ++++++++++++++++++++++++++ 1 file changed, 624 insertions(+) create mode 100644 spec/ProductRequirementsDefinition.md diff --git a/spec/ProductRequirementsDefinition.md b/spec/ProductRequirementsDefinition.md new file mode 100644 index 0000000..6d25ef0 --- /dev/null +++ b/spec/ProductRequirementsDefinition.md @@ -0,0 +1,624 @@ +# Audit Core Product Requirements Definition + +## 1. Product Summary + +Audit Core is a standalone audit fabric for multi-tenant platforms. It provides +an API-driven control plane and data plane for ingesting audit events, +normalizing them into a common envelope, routing them to durable retention and +search backends, and proving that retained audit batches have not been silently +altered or omitted. + +Audit Core must integrate well with NetKingdom, Railiance, OpenBao, Kubernetes, +identity providers, and application runtimes, but it must remain independently +deployable and usable without NetKingdom. + +## 2. Goals + +- Provide a trustworthy audit substrate for platform, tenant, application, and + security-scope events. +- Support enterprise and critical-infrastructure expectations for retention, + integrity, export, tenant isolation, and operator accountability. +- Give upstream platforms an API layer for setup, validation, source + registration, policy assignment, and integration status. +- Separate immutable archive from hot search so cost, retention, and + performance can be tuned independently. +- Allow tenants to opt into audit capability tiers for tenant-owned scopes + while keeping platform-control-plane audit mandatory. +- Start with a small inspectable implementation that can grow without hidden + coupling to the first deployment. + +## 3. Non-Goals + +- Audit Core is not a generic observability platform for all logs, metrics, and + traces. +- Audit Core is not a SIEM in its first release, though it should support SIEM + export and later detection integrations. +- Audit Core does not replace OpenBao audit devices, Kubernetes audit policy, + identity-provider logs, or application-local audit semantics. +- Audit Core does not provision tenants or identities itself. +- Audit Core does not make authorization decisions for upstream applications. +- Audit Core must not become a dumping ground for plaintext secrets, root + tokens, unseal shares, OTP seeds, private keys, or passwords. + +## 4. Users And Actors + +### Platform Operator + +Runs Audit Core, configures sources, validates ingestion, manages retention and +archive policy, and performs incident or compliance export. + +### Security Operator + +Investigates privileged operations, break-glass actions, suspicious access, and +retention integrity. Needs cross-source correlation and tamper-evidence. + +### Tenant Admin + +Configures tenant-owned audit tiers, reviews tenant-scoped events, and exports +records according to entitlement. + +### Platform Integrator + +Integrates Audit Core with NetKingdom, Railiance, OpenBao, Kubernetes, or an +application. Needs stable APIs, setup contracts, conformance fixtures, and +clear failure modes. + +### Auditor + +Reviews retained event batches, manifests, export packages, and integrity +proofs. Needs non-secret evidence and reproducible verification procedures. + +## 5. Core Product Concepts + +### Tenant + +A customer, organization, environment, or administrative domain whose events +can be isolated, retained, searched, exported, billed, and governed. + +### Scope + +A subdivision of audit ownership or policy. Examples: + +- platform-control-plane; +- tenant workload; +- application; +- security operation; +- source-specific stream. + +### Source + +A concrete audit emitter such as OpenBao, Kubernetes, KeyCape, privacyIDEA, +LLDAP, Authelia, an application, a policy engine, or a backup/restore tool. + +### Stream + +A typed flow of events from a source under a retention and routing policy. + +### Event Envelope + +The normalized representation of one audit event inside Audit Core. + +### Batch + +A durable group of normalized events written to archive and assigned a manifest. + +### Manifest + +Metadata that describes a batch, including hashes, ordering, counts, source +range, timestamps, schema version, and optional signature. + +### Hot Search + +Shorter-retention searchable storage such as Loki or OpenSearch. + +### Immutable Archive + +The long-retention evidence record, preferably object storage with object lock +or another append-only/tamper-resistant backend. + +## 6. Capability Tiers + +Audit Core must support policy profiles that can implement these tiers: + +| Tier | Description | Intended Use | +| --- | --- | --- | +| Platform Mandatory | Always-on audit for shared platform control-plane events | OpenBao, IAM, Kubernetes admin, break-glass | +| Archive Only | Durable retained batches and manifests, limited query | Low-cost compliance retention | +| Search | Archive plus hot search backend | Operational investigation | +| Security | Search plus alerting, detection, and SOC export | Enterprise security operations | +| Dedicated | Stronger tenant isolation with dedicated buckets, indexes, keys, or runtimes | Regulated or high-isolation tenants | + +Tenant-owned scopes may opt into or out of optional tiers. Platform mandatory +scopes cannot be disabled by tenant preference. + +## 7. Functional Requirements + +### 7.1 Source Registration + +- The system must provide an API to register audit sources. +- A source must have a stable id, source type, owner, scope mapping, and + ingestion credential policy. +- A source must declare the event schemas or adapters it emits. +- A source must expose health/readiness state. +- Source registration must be possible without NetKingdom. +- NetKingdom-specific source metadata must be optional extension data. + +### 7.2 Tenant And Scope Registration + +- The system must provide APIs to create and update tenants and scopes. +- Each scope must have an ownership class: platform, tenant, application, + security, or source. +- A scope must bind to one or more retention and routing policies. +- Tenant and scope ids must be stable and suitable for partitioning archive and + search data. +- The model must permit static local tenants for standalone deployments. +- The model must permit NetKingdom-managed tenants through integration APIs. + +### 7.3 Event Ingestion + +- The system must expose an HTTP ingestion API. +- The system should later support collector-native inputs where useful, such as + file tailing, OTLP logs, or batch import. +- Ingestion must accept normalized events. +- Ingestion should support source-specific raw events through adapters. +- Ingestion must reject events missing required source, tenant/scope, timestamp, + action, resource, or result metadata unless a configured adapter can supply + them. +- Ingestion failures must be visible through metrics, health, and audit events. +- Ingestion must not print or persist ingestion secrets in logs. + +### 7.4 Event Normalization + +- The system must normalize incoming events into the Audit Core event envelope. +- The envelope must preserve source-specific extension fields. +- The envelope must include enough metadata for cross-source correlation. +- The normalizer must distinguish platform-scope events from tenant-scope + events. +- The normalizer must classify sensitivity and retention class. +- The normalizer must hash or reference original payloads where full payload + retention is not allowed. + +### 7.5 Routing + +- The system must route normalized events according to policy. +- Routing must support archive-only mode. +- Routing must support archive plus hot-search mode. +- Routing should support later SIEM export. +- Routing must support per-tenant and per-scope policy. +- Routing failures must be visible and must not be silently ignored. +- Policy must define whether ingestion should fail closed or fail open when a + sink is unavailable. + +### 7.6 Immutable Archive + +- The system must write audit batches to a durable archive. +- The first archive implementation should support local filesystem for + development and S3-compatible object storage for production. +- Archive paths must partition by tenant/scope/source/date or an equivalent + queryable structure. +- Production archive must support encryption at rest. +- Production archive should support WORM/Object Lock or an equivalent + immutability control. +- Archive writes must produce batch manifests. +- Archive retention must be controlled by explicit retention policy. + +### 7.7 Tamper Evidence + +- The system must generate a manifest for each archive batch. +- A manifest must include: + - batch id; + - tenant id; + - scope id; + - source id; + - schema version; + - event count; + - first and last event timestamps; + - previous batch hash where applicable; + - content hash; + - original payload hash or source offsets where applicable; + - writer identity; + - creation timestamp. +- The system should support signed manifests. +- The system should support hash chains per stream. +- The system should later support optional external transparency or ledger + backends such as immudb, Rekor, or another append-only service. + +### 7.8 Hot Search + +- The system must define a sink interface for hot-search backends. +- The first hot-search backend should be selected explicitly during design: + Loki for cost-effective log search, or OpenSearch for richer query and + enterprise dashboards. +- Hot search retention may be shorter than archive retention. +- Hot search must retain tenant/scope/source labels or equivalent partitioning. +- Hot search access must respect tenant and operator entitlements. +- Hot search must not be treated as the immutable evidence record. + +### 7.9 Export + +- The system must provide controlled export jobs. +- Exports must be scoped by tenant, scope, time range, source, and entitlement. +- Export packages must include manifests and enough metadata to verify + integrity. +- Export jobs must be audited. +- Export access must support approval workflows later. + +### 7.10 Readiness And Validation + +- The system must provide a readiness API. +- Readiness must report source registration, ingestion state, archive state, + hot-search state, manifest state, retention policy, and degraded conditions. +- Readiness must support platform gates such as "is OpenBao audit durable + enough for production secret custody?" +- Readiness output must be machine-readable for NetKingdom setup UI + integration. +- Readiness must be useful without NetKingdom. + +### 7.11 Setup API + +- The system must expose a setup/capabilities API describing required steps. +- The setup API should return: + - required environment; + - missing backends; + - configured sinks; + - source templates; + - policy templates; + - validation commands or endpoints; + - current integration state. +- NetKingdom should be able to drive an operator setup flow from this API. + +### 7.12 OpenBao Integration + +- The system must support ingesting OpenBao audit logs. +- The first OpenBao path may read from file audit output exported or tailed from + `/openbao/audit/openbao-audit.log`. +- The integration must preserve OpenBao request and response correlation. +- The integration must recognize that OpenBao HMACs sensitive strings in audit + logs and must not attempt to reverse or weaken that protection. +- The integration must tag OpenBao events as platform-control-plane by default + unless policy maps a stream to a tenant scope. +- The integration must provide a readiness check suitable for NetKingdom's + OpenBao production gate. + +### 7.13 Kubernetes Integration + +- The system should support Kubernetes audit or workload audit ingestion. +- Kubernetes sources must distinguish cluster platform events from tenant + workload events. +- The integration should support namespace and service-account mapping. +- The integration should not require every Kubernetes cluster to run + NetKingdom. + +### 7.14 NetKingdom Integration + +- The system must provide APIs that let NetKingdom register tenants, scopes, + source mappings, and policy templates. +- The system must accept NetKingdom IAM Profile claims when configured. +- The system must not require NetKingdom-specific claims in standalone mode. +- The system must expose status suitable for the NetKingdom control surface. +- NetKingdom setup must not require Audit Core to import NetKingdom internal + modules. + +## 8. Non-Functional Requirements + +### 8.1 Security + +- All production APIs must require authentication. +- Authorization must enforce tenant and scope boundaries. +- Ingestion credentials must be scoped per source. +- Operator privileges must be separable from tenant-admin privileges. +- Secrets must not be printed in logs, manifests, exports, or error responses. +- Event payloads must support sensitivity classification. + +### 8.2 Reliability + +- Ingestion should tolerate downstream hot-search outages when archive policy + allows. +- Archive failure handling must be explicit and policy-driven. +- Collectors or shippers should support persistent buffering. +- The system must expose degraded states rather than silently dropping events. +- Batch writes must be idempotent or safely retryable. + +### 8.3 Scalability + +- Tenant, scope, and source partitioning must be part of the data model from + the beginning. +- The archive layout must scale to many tenants and streams. +- The ingestion API should be horizontally scalable. +- Hot-search backends must be replaceable as volume grows. + +### 8.4 Portability + +- The system must run locally for development. +- The system should deploy cleanly to Kubernetes. +- Storage backends must be pluggable. +- NetKingdom integration must be optional. + +### 8.5 Operability + +- The system must expose health and readiness endpoints. +- The system must expose metrics for ingestion, rejection, archive writes, + manifest generation, sink failures, retries, and lag. +- The system must include operator runbooks for backup, restore, degraded + ingestion, export, and tenant onboarding. +- The system must provide conformance tests for the API and event envelope. + +### 8.6 Compliance And Evidence + +- Retention policies must be explicit and reviewable. +- Archive batches and manifests must be exportable. +- Export jobs must be auditable. +- Integrity verification must be repeatable by an auditor without privileged + access to live systems where possible. + +## 9. API Requirements + +The first API version should be `/v1`. + +Required endpoints: + +| Endpoint | Method | Purpose | +| --- | --- | --- | +| `/v1/capabilities` | GET | Describe enabled features, backends, and integration templates | +| `/v1/readiness` | GET | Report setup and runtime readiness | +| `/v1/tenants` | POST | Register a tenant | +| `/v1/tenants/{tenant_id}` | GET | Read tenant metadata | +| `/v1/scopes` | POST | Register a scope | +| `/v1/sources` | POST | Register a source | +| `/v1/sources/{source_id}/status` | GET | Source ingestion and routing status | +| `/v1/events` | POST | Ingest one or more normalized events | +| `/v1/policies/{scope_id}` | PUT | Set retention and routing policy | +| `/v1/evidence/batches` | GET | List retained batches by filter | +| `/v1/evidence/batches/{batch_id}/manifest` | GET | Retrieve a batch manifest | +| `/v1/exports` | POST | Create a controlled export job | +| `/v1/exports/{export_id}` | GET | Read export status and metadata | + +The API must be described with OpenAPI. Each endpoint must have at least one +conformance test before it is treated as stable. + +## 10. Event Envelope Requirements + +The event envelope must include: + +```json +{ + "schema_version": "audit-core.event.v1", + "event_id": "uuid-or-source-stable-id", + "source": { + "id": "openbao-platform", + "type": "openbao", + "instance": "railiance01/openbao" + }, + "tenant": { + "id": "platform", + "ownership": "platform" + }, + "scope": { + "id": "platform-control-plane", + "type": "security" + }, + "time": { + "emitted_at": "2026-06-01T20:00:00Z", + "observed_at": "2026-06-01T20:00:01Z" + }, + "actor": { + "subject": "uid=platform-root,ou=people,dc=netkingdom,dc=local", + "issuer": "https://kc.coulomb.social", + "groups": ["net-kingdom-admins"], + "auth_context": ["mfa"] + }, + "action": "openbao.audit.list", + "resource": { + "type": "openbao.sys.audit", + "id": "file/" + }, + "result": { + "outcome": "success", + "reason": "authorized" + }, + "correlation": { + "request_id": "source-request-id", + "trace_id": "optional-trace-id" + }, + "classification": { + "sensitivity": "restricted", + "retention_class": "platform-mandatory" + }, + "payload": { + "normalized": {}, + "source_hash": "sha256:..." + } +} +``` + +The envelope may evolve, but compatibility rules must be explicit. + +## 11. Storage Requirements + +### Development Storage + +- Local filesystem archive. +- Local manifest files. +- Optional in-memory or SQLite metadata store. +- No external NetKingdom dependency. + +### Production Storage + +- S3-compatible archive backend. +- Support for encryption at rest. +- Support for object lock or equivalent immutability where available. +- Pluggable metadata store. +- Optional hot-search backend. +- Optional external tamper-evidence backend. + +## 12. Deployment Requirements + +Minimum deployment profiles: + +- **local-dev**: single process, local archive, local auth bypass or static dev + token. +- **single-cluster**: API plus archive writer, Kubernetes deployment, object + storage archive. +- **search-enabled**: single-cluster plus Loki or OpenSearch sink. +- **enterprise**: HA API, object-lock archive, hot search, signed manifests, + per-tenant policy, backup/restore runbooks. + +## 13. Integration Requirements + +### NetKingdom Setup + +Audit Core must provide enough API information for NetKingdom to: + +- discover whether Audit Core exists; +- register or validate tenants/scopes; +- configure source mappings; +- show readiness in a setup UI; +- display degraded states; +- record non-secret evidence; +- link to Audit Core export/evidence endpoints. + +### Railiance Platform + +Railiance may operate the runtime backends: + +- object storage; +- hot-search backend; +- Kubernetes deployment; +- backup/restore substrate; +- OpenBao audit source access. + +Audit Core must not assume those services are always owned by Railiance. + +### OpenBao + +OpenBao integration must start as a source adapter, not a hard dependency. +Audit Core must work even when OpenBao is absent. + +## 14. Validation And Testing Requirements + +Required test categories: + +- event envelope schema validation; +- API conformance; +- source registration behavior; +- tenant/scope policy behavior; +- archive batch creation; +- manifest hash verification; +- OpenBao sample audit ingestion; +- readiness responses for healthy and degraded configurations; +- export package integrity; +- authz boundary checks. + +Required fixtures: + +- sample OpenBao audit request/response lines; +- sample Kubernetes audit event; +- sample application audit event; +- sample NetKingdom IAM Profile actor claim set; +- malformed event cases; +- tenant-boundary violation cases. + +## 15. Milestones + +### M0 - Repository Foundation + +- `INTENT.md` +- Product requirements definition +- Basic repo structure +- Initial workplan +- License and contribution notes + +### M1 - Contract Skeleton + +- OpenAPI draft +- event envelope schema +- source/tenant/scope/policy models +- conformance fixture layout + +### M2 - Local Archive Prototype + +- HTTP ingestion +- local archive writer +- batch manifests +- hash-chain per stream +- readiness API + +### M3 - OpenBao Source Adapter + +- OpenBao audit file parser +- source registration template +- sample ingestion from fixture +- readiness check for OpenBao audit durability + +### M4 - Object Storage Archive + +- S3-compatible archive writer +- retention metadata +- manifest export +- object-lock/WORM deployment guidance + +### M5 - NetKingdom Integration + +- NetKingdom setup API flow +- tenant/scope sync adapter +- readiness display contract +- control-surface handoff docs + +### M6 - Hot Search + +- selected hot-search backend adapter +- tenant/scope labels or indexes +- query/export integration + +### M7 - Enterprise Hardening + +- HA deployment +- signed manifests +- backup/restore runbooks +- audit of Audit Core administrative actions +- access review workflows + +## 16. Open Questions + +- Which hot-search backend should be the first supported production target: + Loki or OpenSearch? +- Should immutable archive be the only mandatory production sink for v1? +- Should tamper evidence start with signed hash-chain manifests or with an + embedded ledger such as immudb? +- What is the minimum acceptable metadata store for production? +- How should tenant billing/cost accounting be represented in policy? +- Which claims from NetKingdom IAM Profile should become the default actor + mapping? +- Which platform-control-plane audit streams are mandatory from day one? +- Should Audit Core provide its own collector, or only integrate with Fluent + Bit, Vector, and OpenTelemetry Collector? + +## 17. Acceptance Criteria For First Useful Release + +The first useful release is complete when: + +- Audit Core runs locally without NetKingdom. +- At least one tenant, scope, source, and policy can be registered by API. +- Normalized events can be ingested by HTTP. +- Events are written to archive batches. +- Batches have verifiable manifests. +- Readiness reports healthy and degraded states. +- OpenBao audit fixture ingestion works. +- NetKingdom can call a setup/readiness API without importing Audit Core + internals. +- Documentation explains what is guaranteed and what is not. + +## 18. Terminology + +- **Audit custody**: the responsibility for retaining audit evidence with + enough durability and integrity to support later review. +- **Hot search**: operational query storage optimized for investigation, not + necessarily immutable retention. +- **Immutable archive**: long-term retained evidence protected from alteration + and deletion by policy and storage controls. +- **Tamper evidence**: metadata that makes alteration, deletion, reordering, or + omission detectable. +- **Platform mandatory audit**: audit records required to operate shared + infrastructure safely, regardless of tenant tier. +- **Tenant optional audit**: tenant-owned audit records governed by selected + tier and cost policy.