From a3129afa4209839ff3078ea61faed0ec80de7287 Mon Sep 17 00:00:00 2001
From: tegwick <bernd.worsch@gmail.com>
Date: Mon, 1 Jun 2026 23:38:26 +0200
Subject: [PATCH] Added PRD for what we want to do

---
 spec/ProductRequirementsDefinition.md | 624 ++++++++++++++++++++++++++
 1 file changed, 624 insertions(+)
 create mode 100644 spec/ProductRequirementsDefinition.md

diff --git a/spec/ProductRequirementsDefinition.md b/spec/ProductRequirementsDefinition.md
new file mode 100644
index 0000000..6d25ef0
--- /dev/null
+++ b/spec/ProductRequirementsDefinition.md
@@ -0,0 +1,624 @@
+# Audit Core Product Requirements Definition
+
+## 1. Product Summary
+
+Audit Core is a standalone audit fabric for multi-tenant platforms. It provides
+an API-driven control plane and data plane for ingesting audit events,
+normalizing them into a common envelope, routing them to durable retention and
+search backends, and proving that retained audit batches have not been silently
+altered or omitted.
+
+Audit Core must integrate well with NetKingdom, Railiance, OpenBao, Kubernetes,
+identity providers, and application runtimes, but it must remain independently
+deployable and usable without NetKingdom.
+
+## 2. Goals
+
+- Provide a trustworthy audit substrate for platform, tenant, application, and
+  security-scope events.
+- Support enterprise and critical-infrastructure expectations for retention,
+  integrity, export, tenant isolation, and operator accountability.
+- Give upstream platforms an API layer for setup, validation, source
+  registration, policy assignment, and integration status.
+- Separate immutable archive from hot search so cost, retention, and
+  performance can be tuned independently.
+- Allow tenants to opt into audit capability tiers for tenant-owned scopes
+  while keeping platform-control-plane audit mandatory.
+- Start with a small inspectable implementation that can grow without hidden
+  coupling to the first deployment.
+
+## 3. Non-Goals
+
+- Audit Core is not a generic observability platform for all logs, metrics, and
+  traces.
+- Audit Core is not a SIEM in its first release, though it should support SIEM
+  export and later detection integrations.
+- Audit Core does not replace OpenBao audit devices, Kubernetes audit policy,
+  identity-provider logs, or application-local audit semantics.
+- Audit Core does not provision tenants or identities itself.
+- Audit Core does not make authorization decisions for upstream applications.
+- Audit Core must not become a dumping ground for plaintext secrets, root
+  tokens, unseal shares, OTP seeds, private keys, or passwords.
+
+## 4. Users And Actors
+
+### Platform Operator
+
+Runs Audit Core, configures sources, validates ingestion, manages retention and
+archive policy, and performs incident or compliance export.
+
+### Security Operator
+
+Investigates privileged operations, break-glass actions, suspicious access, and
+retention integrity. Needs cross-source correlation and tamper-evidence.
+
+### Tenant Admin
+
+Configures tenant-owned audit tiers, reviews tenant-scoped events, and exports
+records according to entitlement.
+
+### Platform Integrator
+
+Integrates Audit Core with NetKingdom, Railiance, OpenBao, Kubernetes, or an
+application. Needs stable APIs, setup contracts, conformance fixtures, and
+clear failure modes.
+
+### Auditor
+
+Reviews retained event batches, manifests, export packages, and integrity
+proofs. Needs non-secret evidence and reproducible verification procedures.
+
+## 5. Core Product Concepts
+
+### Tenant
+
+A customer, organization, environment, or administrative domain whose events
+can be isolated, retained, searched, exported, billed, and governed.
+
+### Scope
+
+A subdivision of audit ownership or policy. Examples:
+
+- platform-control-plane;
+- tenant workload;
+- application;
+- security operation;
+- source-specific stream.
+
+### Source
+
+A concrete audit emitter such as OpenBao, Kubernetes, KeyCape, privacyIDEA,
+LLDAP, Authelia, an application, a policy engine, or a backup/restore tool.
+
+### Stream
+
+A typed flow of events from a source under a retention and routing policy.
+
+### Event Envelope
+
+The normalized representation of one audit event inside Audit Core.
+
+### Batch
+
+A durable group of normalized events written to archive and assigned a manifest.
+
+### Manifest
+
+Metadata that describes a batch, including hashes, ordering, counts, source
+range, timestamps, schema version, and optional signature.
+
+### Hot Search
+
+Shorter-retention searchable storage such as Loki or OpenSearch.
+
+### Immutable Archive
+
+The long-retention evidence record, preferably object storage with object lock
+or another append-only/tamper-resistant backend.
+
+## 6. Capability Tiers
+
+Audit Core must support policy profiles that can implement these tiers:
+
+| Tier | Description | Intended Use |
+| --- | --- | --- |
+| Platform Mandatory | Always-on audit for shared platform control-plane events | OpenBao, IAM, Kubernetes admin, break-glass |
+| Archive Only | Durable retained batches and manifests, limited query | Low-cost compliance retention |
+| Search | Archive plus hot search backend | Operational investigation |
+| Security | Search plus alerting, detection, and SOC export | Enterprise security operations |
+| Dedicated | Stronger tenant isolation with dedicated buckets, indexes, keys, or runtimes | Regulated or high-isolation tenants |
+
+Tenant-owned scopes may opt into or out of optional tiers. Platform mandatory
+scopes cannot be disabled by tenant preference.
+
+## 7. Functional Requirements
+
+### 7.1 Source Registration
+
+- The system must provide an API to register audit sources.
+- A source must have a stable id, source type, owner, scope mapping, and
+  ingestion credential policy.
+- A source must declare the event schemas or adapters it emits.
+- A source must expose health/readiness state.
+- Source registration must be possible without NetKingdom.
+- NetKingdom-specific source metadata must be optional extension data.
+
+### 7.2 Tenant And Scope Registration
+
+- The system must provide APIs to create and update tenants and scopes.
+- Each scope must have an ownership class: platform, tenant, application,
+  security, or source.
+- A scope must bind to one or more retention and routing policies.
+- Tenant and scope ids must be stable and suitable for partitioning archive and
+  search data.
+- The model must permit static local tenants for standalone deployments.
+- The model must permit NetKingdom-managed tenants through integration APIs.
+
+### 7.3 Event Ingestion
+
+- The system must expose an HTTP ingestion API.
+- The system should later support collector-native inputs where useful, such as
+  file tailing, OTLP logs, or batch import.
+- Ingestion must accept normalized events.
+- Ingestion should support source-specific raw events through adapters.
+- Ingestion must reject events missing required source, tenant/scope, timestamp,
+  action, resource, or result metadata unless a configured adapter can supply
+  them.
+- Ingestion failures must be visible through metrics, health, and audit events.
+- Ingestion must not print or persist ingestion secrets in logs.
+
+### 7.4 Event Normalization
+
+- The system must normalize incoming events into the Audit Core event envelope.
+- The envelope must preserve source-specific extension fields.
+- The envelope must include enough metadata for cross-source correlation.
+- The normalizer must distinguish platform-scope events from tenant-scope
+  events.
+- The normalizer must classify sensitivity and retention class.
+- The normalizer must hash or reference original payloads where full payload
+  retention is not allowed.
+
+### 7.5 Routing
+
+- The system must route normalized events according to policy.
+- Routing must support archive-only mode.
+- Routing must support archive plus hot-search mode.
+- Routing should support later SIEM export.
+- Routing must support per-tenant and per-scope policy.
+- Routing failures must be visible and must not be silently ignored.
+- Policy must define whether ingestion should fail closed or fail open when a
+  sink is unavailable.
+
+### 7.6 Immutable Archive
+
+- The system must write audit batches to a durable archive.
+- The first archive implementation should support local filesystem for
+  development and S3-compatible object storage for production.
+- Archive paths must partition by tenant/scope/source/date or an equivalent
+  queryable structure.
+- Production archive must support encryption at rest.
+- Production archive should support WORM/Object Lock or an equivalent
+  immutability control.
+- Archive writes must produce batch manifests.
+- Archive retention must be controlled by explicit retention policy.
+
+### 7.7 Tamper Evidence
+
+- The system must generate a manifest for each archive batch.
+- A manifest must include:
+  - batch id;
+  - tenant id;
+  - scope id;
+  - source id;
+  - schema version;
+  - event count;
+  - first and last event timestamps;
+  - previous batch hash where applicable;
+  - content hash;
+  - original payload hash or source offsets where applicable;
+  - writer identity;
+  - creation timestamp.
+- The system should support signed manifests.
+- The system should support hash chains per stream.
+- The system should later support optional external transparency or ledger
+  backends such as immudb, Rekor, or another append-only service.
+
+### 7.8 Hot Search
+
+- The system must define a sink interface for hot-search backends.
+- The first hot-search backend should be selected explicitly during design:
+  Loki for cost-effective log search, or OpenSearch for richer query and
+  enterprise dashboards.
+- Hot search retention may be shorter than archive retention.
+- Hot search must retain tenant/scope/source labels or equivalent partitioning.
+- Hot search access must respect tenant and operator entitlements.
+- Hot search must not be treated as the immutable evidence record.
+
+### 7.9 Export
+
+- The system must provide controlled export jobs.
+- Exports must be scoped by tenant, scope, time range, source, and entitlement.
+- Export packages must include manifests and enough metadata to verify
+  integrity.
+- Export jobs must be audited.
+- Export access must support approval workflows later.
+
+### 7.10 Readiness And Validation
+
+- The system must provide a readiness API.
+- Readiness must report source registration, ingestion state, archive state,
+  hot-search state, manifest state, retention policy, and degraded conditions.
+- Readiness must support platform gates such as "is OpenBao audit durable
+  enough for production secret custody?"
+- Readiness output must be machine-readable for NetKingdom setup UI
+  integration.
+- Readiness must be useful without NetKingdom.
+
+### 7.11 Setup API
+
+- The system must expose a setup/capabilities API describing required steps.
+- The setup API should return:
+  - required environment;
+  - missing backends;
+  - configured sinks;
+  - source templates;
+  - policy templates;
+  - validation commands or endpoints;
+  - current integration state.
+- NetKingdom should be able to drive an operator setup flow from this API.
+
+### 7.12 OpenBao Integration
+
+- The system must support ingesting OpenBao audit logs.
+- The first OpenBao path may read from file audit output exported or tailed from
+  `/openbao/audit/openbao-audit.log`.
+- The integration must preserve OpenBao request and response correlation.
+- The integration must recognize that OpenBao HMACs sensitive strings in audit
+  logs and must not attempt to reverse or weaken that protection.
+- The integration must tag OpenBao events as platform-control-plane by default
+  unless policy maps a stream to a tenant scope.
+- The integration must provide a readiness check suitable for NetKingdom's
+  OpenBao production gate.
+
+### 7.13 Kubernetes Integration
+
+- The system should support Kubernetes audit or workload audit ingestion.
+- Kubernetes sources must distinguish cluster platform events from tenant
+  workload events.
+- The integration should support namespace and service-account mapping.
+- The integration should not require every Kubernetes cluster to run
+  NetKingdom.
+
+### 7.14 NetKingdom Integration
+
+- The system must provide APIs that let NetKingdom register tenants, scopes,
+  source mappings, and policy templates.
+- The system must accept NetKingdom IAM Profile claims when configured.
+- The system must not require NetKingdom-specific claims in standalone mode.
+- The system must expose status suitable for the NetKingdom control surface.
+- NetKingdom setup must not require Audit Core to import NetKingdom internal
+  modules.
+
+## 8. Non-Functional Requirements
+
+### 8.1 Security
+
+- All production APIs must require authentication.
+- Authorization must enforce tenant and scope boundaries.
+- Ingestion credentials must be scoped per source.
+- Operator privileges must be separable from tenant-admin privileges.
+- Secrets must not be printed in logs, manifests, exports, or error responses.
+- Event payloads must support sensitivity classification.
+
+### 8.2 Reliability
+
+- Ingestion should tolerate downstream hot-search outages when archive policy
+  allows.
+- Archive failure handling must be explicit and policy-driven.
+- Collectors or shippers should support persistent buffering.
+- The system must expose degraded states rather than silently dropping events.
+- Batch writes must be idempotent or safely retryable.
+
+### 8.3 Scalability
+
+- Tenant, scope, and source partitioning must be part of the data model from
+  the beginning.
+- The archive layout must scale to many tenants and streams.
+- The ingestion API should be horizontally scalable.
+- Hot-search backends must be replaceable as volume grows.
+
+### 8.4 Portability
+
+- The system must run locally for development.
+- The system should deploy cleanly to Kubernetes.
+- Storage backends must be pluggable.
+- NetKingdom integration must be optional.
+
+### 8.5 Operability
+
+- The system must expose health and readiness endpoints.
+- The system must expose metrics for ingestion, rejection, archive writes,
+  manifest generation, sink failures, retries, and lag.
+- The system must include operator runbooks for backup, restore, degraded
+  ingestion, export, and tenant onboarding.
+- The system must provide conformance tests for the API and event envelope.
+
+### 8.6 Compliance And Evidence
+
+- Retention policies must be explicit and reviewable.
+- Archive batches and manifests must be exportable.
+- Export jobs must be auditable.
+- Integrity verification must be repeatable by an auditor without privileged
+  access to live systems where possible.
+
+## 9. API Requirements
+
+The first API version should be `/v1`.
+
+Required endpoints:
+
+| Endpoint | Method | Purpose |
+| --- | --- | --- |
+| `/v1/capabilities` | GET | Describe enabled features, backends, and integration templates |
+| `/v1/readiness` | GET | Report setup and runtime readiness |
+| `/v1/tenants` | POST | Register a tenant |
+| `/v1/tenants/{tenant_id}` | GET | Read tenant metadata |
+| `/v1/scopes` | POST | Register a scope |
+| `/v1/sources` | POST | Register a source |
+| `/v1/sources/{source_id}/status` | GET | Source ingestion and routing status |
+| `/v1/events` | POST | Ingest one or more normalized events |
+| `/v1/policies/{scope_id}` | PUT | Set retention and routing policy |
+| `/v1/evidence/batches` | GET | List retained batches by filter |
+| `/v1/evidence/batches/{batch_id}/manifest` | GET | Retrieve a batch manifest |
+| `/v1/exports` | POST | Create a controlled export job |
+| `/v1/exports/{export_id}` | GET | Read export status and metadata |
+
+The API must be described with OpenAPI. Each endpoint must have at least one
+conformance test before it is treated as stable.
+
+## 10. Event Envelope Requirements
+
+The event envelope must include:
+
+```json
+{
+  "schema_version": "audit-core.event.v1",
+  "event_id": "uuid-or-source-stable-id",
+  "source": {
+    "id": "openbao-platform",
+    "type": "openbao",
+    "instance": "railiance01/openbao"
+  },
+  "tenant": {
+    "id": "platform",
+    "ownership": "platform"
+  },
+  "scope": {
+    "id": "platform-control-plane",
+    "type": "security"
+  },
+  "time": {
+    "emitted_at": "2026-06-01T20:00:00Z",
+    "observed_at": "2026-06-01T20:00:01Z"
+  },
+  "actor": {
+    "subject": "uid=platform-root,ou=people,dc=netkingdom,dc=local",
+    "issuer": "https://kc.coulomb.social",
+    "groups": ["net-kingdom-admins"],
+    "auth_context": ["mfa"]
+  },
+  "action": "openbao.audit.list",
+  "resource": {
+    "type": "openbao.sys.audit",
+    "id": "file/"
+  },
+  "result": {
+    "outcome": "success",
+    "reason": "authorized"
+  },
+  "correlation": {
+    "request_id": "source-request-id",
+    "trace_id": "optional-trace-id"
+  },
+  "classification": {
+    "sensitivity": "restricted",
+    "retention_class": "platform-mandatory"
+  },
+  "payload": {
+    "normalized": {},
+    "source_hash": "sha256:..."
+  }
+}
+```
+
+The envelope may evolve, but compatibility rules must be explicit.
+
+## 11. Storage Requirements
+
+### Development Storage
+
+- Local filesystem archive.
+- Local manifest files.
+- Optional in-memory or SQLite metadata store.
+- No external NetKingdom dependency.
+
+### Production Storage
+
+- S3-compatible archive backend.
+- Support for encryption at rest.
+- Support for object lock or equivalent immutability where available.
+- Pluggable metadata store.
+- Optional hot-search backend.
+- Optional external tamper-evidence backend.
+
+## 12. Deployment Requirements
+
+Minimum deployment profiles:
+
+- **local-dev**: single process, local archive, local auth bypass or static dev
+  token.
+- **single-cluster**: API plus archive writer, Kubernetes deployment, object
+  storage archive.
+- **search-enabled**: single-cluster plus Loki or OpenSearch sink.
+- **enterprise**: HA API, object-lock archive, hot search, signed manifests,
+  per-tenant policy, backup/restore runbooks.
+
+## 13. Integration Requirements
+
+### NetKingdom Setup
+
+Audit Core must provide enough API information for NetKingdom to:
+
+- discover whether Audit Core exists;
+- register or validate tenants/scopes;
+- configure source mappings;
+- show readiness in a setup UI;
+- display degraded states;
+- record non-secret evidence;
+- link to Audit Core export/evidence endpoints.
+
+### Railiance Platform
+
+Railiance may operate the runtime backends:
+
+- object storage;
+- hot-search backend;
+- Kubernetes deployment;
+- backup/restore substrate;
+- OpenBao audit source access.
+
+Audit Core must not assume those services are always owned by Railiance.
+
+### OpenBao
+
+OpenBao integration must start as a source adapter, not a hard dependency.
+Audit Core must work even when OpenBao is absent.
+
+## 14. Validation And Testing Requirements
+
+Required test categories:
+
+- event envelope schema validation;
+- API conformance;
+- source registration behavior;
+- tenant/scope policy behavior;
+- archive batch creation;
+- manifest hash verification;
+- OpenBao sample audit ingestion;
+- readiness responses for healthy and degraded configurations;
+- export package integrity;
+- authz boundary checks.
+
+Required fixtures:
+
+- sample OpenBao audit request/response lines;
+- sample Kubernetes audit event;
+- sample application audit event;
+- sample NetKingdom IAM Profile actor claim set;
+- malformed event cases;
+- tenant-boundary violation cases.
+
+## 15. Milestones
+
+### M0 - Repository Foundation
+
+- `INTENT.md`
+- Product requirements definition
+- Basic repo structure
+- Initial workplan
+- License and contribution notes
+
+### M1 - Contract Skeleton
+
+- OpenAPI draft
+- event envelope schema
+- source/tenant/scope/policy models
+- conformance fixture layout
+
+### M2 - Local Archive Prototype
+
+- HTTP ingestion
+- local archive writer
+- batch manifests
+- hash-chain per stream
+- readiness API
+
+### M3 - OpenBao Source Adapter
+
+- OpenBao audit file parser
+- source registration template
+- sample ingestion from fixture
+- readiness check for OpenBao audit durability
+
+### M4 - Object Storage Archive
+
+- S3-compatible archive writer
+- retention metadata
+- manifest export
+- object-lock/WORM deployment guidance
+
+### M5 - NetKingdom Integration
+
+- NetKingdom setup API flow
+- tenant/scope sync adapter
+- readiness display contract
+- control-surface handoff docs
+
+### M6 - Hot Search
+
+- selected hot-search backend adapter
+- tenant/scope labels or indexes
+- query/export integration
+
+### M7 - Enterprise Hardening
+
+- HA deployment
+- signed manifests
+- backup/restore runbooks
+- audit of Audit Core administrative actions
+- access review workflows
+
+## 16. Open Questions
+
+- Which hot-search backend should be the first supported production target:
+  Loki or OpenSearch?
+- Should immutable archive be the only mandatory production sink for v1?
+- Should tamper evidence start with signed hash-chain manifests or with an
+  embedded ledger such as immudb?
+- What is the minimum acceptable metadata store for production?
+- How should tenant billing/cost accounting be represented in policy?
+- Which claims from NetKingdom IAM Profile should become the default actor
+  mapping?
+- Which platform-control-plane audit streams are mandatory from day one?
+- Should Audit Core provide its own collector, or only integrate with Fluent
+  Bit, Vector, and OpenTelemetry Collector?
+
+## 17. Acceptance Criteria For First Useful Release
+
+The first useful release is complete when:
+
+- Audit Core runs locally without NetKingdom.
+- At least one tenant, scope, source, and policy can be registered by API.
+- Normalized events can be ingested by HTTP.
+- Events are written to archive batches.
+- Batches have verifiable manifests.
+- Readiness reports healthy and degraded states.
+- OpenBao audit fixture ingestion works.
+- NetKingdom can call a setup/readiness API without importing Audit Core
+  internals.
+- Documentation explains what is guaranteed and what is not.
+
+## 18. Terminology
+
+- **Audit custody**: the responsibility for retaining audit evidence with
+  enough durability and integrity to support later review.
+- **Hot search**: operational query storage optimized for investigation, not
+  necessarily immutable retention.
+- **Immutable archive**: long-term retained evidence protected from alteration
+  and deletion by policy and storage controls.
+- **Tamper evidence**: metadata that makes alteration, deletion, reordering, or
+  omission detectable.
+- **Platform mandatory audit**: audit records required to operate shared
+  infrastructure safely, regardless of tenant tier.
+- **Tenant optional audit**: tenant-owned audit records governed by selected
+  tier and cost policy.