generated from coulomb/repo-seed
261 lines
9.4 KiB
Markdown
261 lines
9.4 KiB
Markdown
# Implementation Guide: DirektVermittlungDe
|
|
|
|
While the API Documentation explains *how to use* the API, this guide explains *how to build* it, focusing on the backend engineering challenges like encryption handling, database schema, and performance patterns defined in your architecture.
|
|
|
|
---
|
|
|
|
|
|
**Status:** Draft v1.0
|
|
**Target Audience:** Backend Engineering Team
|
|
[cite_start]**Context:** Implements constraints from `DvdArchitektur.txt` [cite: 6]
|
|
|
|
---
|
|
|
|
## 1\. Technology Stack & Setup
|
|
|
|
[cite_start]Based on the architectural constraints[cite: 45, 46, 47], the recommended reference stack is:
|
|
|
|
* **Service Layer:** Java (Spring Boot 3.x) or Go (Gin/Echo) for high-concurrency performance.
|
|
* **Primary Database:** PostgreSQL 15+ (Relational data for Routing/Threads).
|
|
* **Blob Storage:** S3-Compatible Storage (AWS S3 / MinIO) for encrypted PDF payloads.
|
|
* **Cache/PubSub:** Redis 7.x (Session store, Rate limiting, Async Job queues).
|
|
|
|
### Project Structure (Bounded Contexts)
|
|
|
|
[cite_start]Organize the codebase into modules matching the architecture[cite: 43]:
|
|
|
|
* `dvd-intake-service`: Handles `/documents` and Metadata extraction.
|
|
* `dvd-communication-service`: Handles `/threads` and `/messages`.
|
|
* `dvd-routing-engine`: The logic component for assigning units.
|
|
* `dvd-export-worker`: Async background workers for eAkte exports.
|
|
|
|
-----
|
|
|
|
## 2\. Security Implementation Details
|
|
|
|
### [cite_start]2.1 Handling "Blind" E2E Encryption [cite: 27]
|
|
|
|
The backend **must not** attempt to decrypt the `encryptedPayload`.
|
|
|
|
* **Ingest:** Receive the `encryptedPayload` (Base64 or binary multipart). Stream it directly to S3 Blob Storage. Do not load the full file into RAM to avoid OOM on large files.
|
|
* **Metadata:** Only persist the `metadata` JSON object to PostgreSQL for routing logic.
|
|
* **Validation:** Verify the `encryptedPayload` is a valid encrypted container (e.g., check PGP/AES headers) but treat the content as opaque.
|
|
|
|
### [cite_start]2.2 Stateless Authentication (OAuth2) [cite: 44]
|
|
|
|
* **Gateway Layer:** Implement a centralized API Gateway (e.g., Spring Cloud Gateway / Nginx) that validates JWT signatures (JWKS) from BundID (Citizens) and Authority-IDP (Officials).
|
|
* **Context Propagation:** Extract the `sub` (User ID) and `scope` from the JWT and pass them to downstream microservices via HTTP Headers (e.g., `X-User-Id`, `X-User-Role`).
|
|
|
|
-----
|
|
|
|
## 3\. Database Schema Recommendations (PostgreSQL)
|
|
|
|
[cite_start]Map the domain objects [cite: 64] to the following relational schema.
|
|
|
|
**Table: `documents`**
|
|
|
|
```sql
|
|
CREATE TABLE documents (
|
|
id UUID PRIMARY KEY,
|
|
reference_number VARCHAR(50) NOT NULL, -- "Aktenzeichen"
|
|
authority_id VARCHAR(50) NOT NULL, -- Routing target
|
|
status VARCHAR(20) DEFAULT 'RECEIVED', -- RECEIVED, ROUTED, ASSIGNED
|
|
storage_path VARCHAR(255) NOT NULL, -- S3 Key for encrypted blob
|
|
created_at TIMESTAMPTZ DEFAULT NOW(),
|
|
[cite_start]retention_date TIMESTAMPTZ -- For GDPR auto-deletion [cite: 14]
|
|
);
|
|
CREATE INDEX idx_docs_authority ON documents(authority_id, status);
|
|
```
|
|
|
|
**Table: `threads`**
|
|
|
|
```sql
|
|
CREATE TABLE threads (
|
|
id UUID PRIMARY KEY,
|
|
document_id UUID REFERENCES documents(id),
|
|
type VARCHAR(20) NOT NULL, -- CHAT, CALLBACK, APPOINTMENT
|
|
assigned_official_id VARCHAR(100), -- Nullable until claimed
|
|
last_activity_at TIMESTAMPTZ
|
|
);
|
|
```
|
|
|
|
**Table: `messages`**
|
|
|
|
```sql
|
|
CREATE TABLE messages (
|
|
id UUID PRIMARY KEY,
|
|
thread_id UUID REFERENCES threads(id),
|
|
sender_role VARCHAR(20) NOT NULL,
|
|
content_blob TEXT NOT NULL, -- Encrypted content
|
|
created_at TIMESTAMPTZ DEFAULT NOW()
|
|
);
|
|
-- Efficient Cursor Pagination: Index on (thread_id, created_at)
|
|
CREATE INDEX idx_msgs_thread_time ON messages(thread_id, created_at DESC);
|
|
```
|
|
|
|
-----
|
|
|
|
## 4\. Performance & Scalability Patterns
|
|
|
|
### [cite_start]4.1 Rate Limiting (Redis Token Bucket) [cite: 24]
|
|
|
|
To protect against DDoS and ensure fair usage (NFR-2), implement specific limits:
|
|
|
|
* **Citizens:** 10 requests/minute (prevent spamming threads).
|
|
* **Officials:** 1000 requests/minute (allow rapid batch processing).
|
|
|
|
*Implementation Tip:* Use a Redis-based "Sliding Window" Lua script. Key format: `rate_limit:{user_id}`.
|
|
|
|
### [cite_start]4.2 Caching Strategy [cite: 47]
|
|
|
|
* **Routing Rules:** Cache `RoutingRules` in Redis for 1 hour. Invalidate immediately on Admin updates.
|
|
* **ETags:** For `GET /documents/{id}`, generate an ETag based on the `updated_at` timestamp. Return `304 Not Modified` to save bandwidth if the client has the latest version.
|
|
|
|
-----
|
|
|
|
## [cite_start]5. Async Export Workflow [cite: 16]
|
|
|
|
For the `POST /exports` endpoint:
|
|
|
|
1. **API Layer:** Validate request -\> Publish event `ExportRequested` to RabbitMQ/Redis Stream -\> Return `202 Accepted` + `jobId`.
|
|
2. **Worker:**
|
|
* Consume `ExportRequested`.
|
|
* Fetch `encryptedPayload` from S3.
|
|
* Fetch Message History from Postgres.
|
|
* *Note:* The Worker might need a special "Authority Key" to re-encrypt the package for the target eAkte system, depending on the specific crypto-concept.
|
|
* Push result to the Authority's Ingest Interface.
|
|
* Update Job Status to `COMPLETED`.
|
|
|
|
-----
|
|
|
|
## 6\. Definition of Done Checklist
|
|
|
|
Before deploying to the staging environment, ensure:
|
|
|
|
* [cite_start][ ] **Load Test:** System handles 500 concurrent document uploads without error[cite: 24].
|
|
* [ ] **Security Audit:** Confirm no PII (Aktenzeichen) is logged in plaintext application logs.
|
|
* [cite_start][ ] **Cleanup:** The "GDPR Reaper" job is active and deletes `documents` where `retention_date < NOW()`[cite: 14].
|
|
|
|
|
|
# Implementation Guide: DirektVermittlungDe
|
|
|
|
**Status:** Draft v1.1
|
|
**Stack:** Python / FastAPI
|
|
**Context:** Implements constraints from `DvdArchitektur.txt` and ADR-007.
|
|
|
|
## 1. Technology Stack
|
|
* **Language:** Python 3.11+
|
|
* **Web Framework:** FastAPI (with Uvicorn + Gunicorn)
|
|
* **Validation:** Pydantic V2 (Strict Mode)
|
|
* **Database ORM:** SQLAlchemy (Async) or Tortoise-ORM
|
|
* **Task Queue:** ARQ (Redis-based) or Celery
|
|
* **Primary DB:** PostgreSQL 15+
|
|
* **Blob Store:** MinIO / AWS S3
|
|
|
|
## 2. Project Structure & Patterns
|
|
Organize the monolithic repo or microservices using "Clean Architecture":
|
|
|
|
```text
|
|
/src
|
|
/domain # Pydantic models & Business Rules (Pure Python)
|
|
/adapters # DB, S3, External APIs
|
|
/service # Application Logic
|
|
/api # FastAPI Routes
|
|
/workers # Background Job Definitions
|
|
```
|
|
|
|
## 2. Project Structure & Patterns
|
|
Organize the monolithic repo or microservices using "Clean Architecture":
|
|
|
|
```text
|
|
/src
|
|
/domain # Pydantic models & Business Rules (Pure Python)
|
|
/adapters # DB, S3, External APIs
|
|
/service # Application Logic
|
|
/api # FastAPI Routes
|
|
/workers # Background Job Definitions
|
|
|
|
## 3. The "Hybrid Concurrency" Pattern (Critical)
|
|
|
|
To meet NFR-1 (<300ms) and NFR-2 (10k sessions), you must not block the Event Loop.
|
|
|
|
### 3.1 The Rule
|
|
|
|
- NEVER use time.sleep, requests, or heavy computation (e.g., pypdf, cryptography) inside an async def.
|
|
- ALWAYS use await for I/O.
|
|
- ALWAYS use loop.run_in_executor for CPU tasks.
|
|
|
|
### 3.2 Implementation Snippet
|
|
|
|
```python
|
|
import asyncio
|
|
from concurrent.futures import ProcessPoolExecutor
|
|
from fastapi import APIRouter, UploadFile
|
|
import some_heavy_crypto_lib
|
|
|
|
router = APIRouter()
|
|
# Create a dedicated pool for CPU tasks
|
|
cpu_pool = ProcessPoolExecutor(max_workers=4)
|
|
|
|
def cpu_bound_decryption(payload: bytes) -> dict:
|
|
# This runs in a separate process, bypassing the GIL
|
|
return some_heavy_crypto_lib.decrypt_and_parse(payload)
|
|
|
|
@router.post("/documents")
|
|
async def upload_document(file: UploadFile):
|
|
content = await file.read() # Non-blocking I/O
|
|
|
|
# Offload CPU work to the pool
|
|
loop = asyncio.get_running_loop()
|
|
metadata = await loop.run_in_executor(
|
|
cpu_pool,
|
|
cpu_bound_decryption,
|
|
content
|
|
)
|
|
|
|
return {"status": "processed", "meta": metadata}
|
|
```
|
|
|
|
## 4. Security Implementation
|
|
|
|
### 4.1 "Blind" Ingest
|
|
|
|
- Stream uploads directly to S3 using aiobotocore to avoid loading 50MB PDFs into RAM.
|
|
- Do not attempt to read the encryptedPayload in the main web service process.
|
|
|
|
### 4.2 Auth Middleware
|
|
|
|
Use fastapi.security.OAuth2AuthorizationCodeBearer. Implement a dependency that validates the JWT signature using a cached JWKS (JSON Web Key Set) to avoid a network call on every request.
|
|
|
|
## 5. Database Schema (SQLAlchemy Async)
|
|
|
|
```python
|
|
from sqlalchemy.ext.asyncio import AsyncSession
|
|
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
|
|
from sqlalchemy import String, UUID, DateTime
|
|
import uuid
|
|
from datetime import datetime
|
|
|
|
class Base(DeclarativeBase):
|
|
pass
|
|
|
|
class Document(Base):
|
|
__tablename__ = "documents"
|
|
id: Mapped[uuid.UUID] = mapped_column(primary_key=True, default=uuid.uuid4)
|
|
reference_number: Mapped[str] = mapped_column(String(50), index=True)
|
|
status: Mapped[str] = mapped_column(String(20), default="RECEIVED")
|
|
# …
|
|
```
|
|
|
|
## 6. Testing Strategy (Agentic TDD)
|
|
|
|
- Framework: pytest + pytest-asyncio.
|
|
- Mocking: Use respx for mocking external HTTP calls (Authority Systems).
|
|
- Database: Use testcontainers-python to spin up a real Postgres for integration tests.
|
|
- Prompting the Agent: "Write an async pytest for POST /documents. Use ProcessPoolExecutor mock to verify CPU offloading."
|
|
|
|
|
|
|
|
xxx
|
|
|